How to use lens package to extract arbitrary subtrees from an AST

Question

I'm exploring the use of the lens package for the purposes of analyzing and transforming this AST, but I am unsure of whether it is a good fit for this task. I think it could be, but its surface area is so large and dense that I can't tell.

A representative operation I want to do is the following. Given an AST, I want to extract "footer" sections from the tree:

Traverse it looking for FooterAnnotation nodes.
Once I find such a node, accumulate all of the following nodes into a list, up until the end of a DocBlock or any other Annotation node.
Ideally at the end of this, I have not only extracted those nodes into a list, but I've also pruned them out of the original AST, so I can pass whatever is left on to the next phase.

At the moment I have this code which does part of the job. Here's the meat of it:

node :: Node -> Env
node n = case n of
  CommandAnnotation _  -> stop
  DocBlock d           -> do
    (_, acc)           <- get
    ns                 <- nodes d
    put (False, acc)   -- Make sure we reset state on exiting docblock.
    return $ acc ++ ns
  FooterAnnotation     -> start
  MappingAnnotation _  -> stop
  MappingsAnnotation   -> stop
  OptionAnnotation {}  -> stop
  PluginAnnotation {}  -> stop
  Unit u               -> nodes u
  _                    -> do
    (capture, acc)     <- get
    return $ if capture
             then acc ++ [n]
             else acc

What this is doing is traversing the AST, and using the State monad to indicate whether or not I am capturing "footer" nodes. I turn capturing on and off with those start and stop functions, which just update the state. When capturing, I accumulate every node into the list.

So, this works, but I notably am not modifying the original AST in any way, and this is where I think the lens package could come in handy, given that it provides a bunch of operators, some of which are explicitly designed to work with the State monad. However, with my limited fu, I am finding the documentation a bit inaccessible and am not sure where to start.

Additionally, I haven't been able to find any examples of using the lens library to remove elements from a structure. A traversal, for example, should "leave the same number of elements as a candidate for subsequent Traversal that it started with", so I am wondering if I need to replace "pruned" nodes with a new AST Empty node that just fills in the gap where they were. Is this right?

Cirdec · Accepted Answer

Lens style uniplate lets us break down a problem working with an entire data structure into pieces that work on only one place in the data structure at a time. We will apply an operation on a single node to every node in the AST.

Operate on a Node

The operation on a single node will extracting any footers, which we'll tell to a Writer, and return the modified node with the footers removed. From your question, I'm assuming you only want to remove footers from DocBlock; you can remove them from other nodes the same way. The other nodes will be returned unmodified.

import qualified Data.DList as DList
import Control.Monad.Trans.Writer

extractNodeFooters :: Node -> Writer (DList.DList [Node]) Node
extractNodeFooters (DocBlock nodes) = do
    let (footers, remainder) = extractFooters nodes
    tell (DList.fromList footers)
    return (DocBlock remainder)
extractNodeFooters node = return node

The difference list DList avoids quadratic performance accumulating the extracted footers.

Some boring parsing

extractFooters pull out blocks starting at footers and ending at the next annotation or the end of the list. It is written in terms of extracting blocks from lists in general. This is a parsing problem; it's curious that we need to apply it to an already parsed AST.

import Control.Applicative

isAnnotation :: Node -> Bool
isAnnotation x = case x of
    PluginAnnotation _ _   -> True
    FunctionAnnotation _   -> True
    IndentAnnotation       -> True
    DedentAnnotation       -> True
    CommandAnnotation _    -> True
    FooterAnnotation       -> True
    MappingsAnnotation     -> True
    MappingAnnotation _    -> True
    OptionAnnotation _ _ _ -> True
    HeadingAnnotation _    -> True
    SubheadingAnnotation _ -> True
    otherwise              -> False


extractBlocks :: Alternative f => (a -> Maybe (a -> Bool)) -> [a] -> (f [a], [a])
extractBlocks start = go
    where
        go     [] = (empty, [])
        go (x:xs) = maybe no_extract extract (start x)
            where
                no_extract = (extracted, x:unextracted)
                    where
                        ~(extracted, unextracted) = go xs
                extract stop = (pure (x:block) <|> extracted, unextracted)
                    where
                        ~(block, remainder) = break stop xs
                        ~(extracted, unextracted) = go remainder

extractFooters :: Alternative f => [Node] -> (f [Node], [Node])
extractFooters = extractBlocks (\x -> if (x==FooterAnnotation) then Just isAnnotation else Nothing)

Operate on every node

We're going to operate on every node of the following AST

example = Unit [
    Code "Unit Code",
    DocBlock [
        Code "DocBlock Code",
        DocBlock [
            Code "DocBlock DocBlock Code",
            FooterAnnotation,
            Code "DocBlock DocBlock FooterAnnotation Code"
        ],
        FooterAnnotation,
        Code "DocBlock FooterAnnotation Code",
        DocBlock [
            Code "DocBlock FooterAnnotation DocBlock Code",
            FooterAnnotation,
            Code "DocBlock FooterAnnotation DocBlock FooterAnnotation Code"
        ]
    ],
    FooterAnnotation,
    Code "Unit FooterAnnotation Code"]

If we applied extractNodeFooters to example it would do nothing, because extractNodeFooters only changes DocBlock nodes and example is a root Unit.

Direct descendants

The generic uniplate traversal derived for types with a Data instance applies an operation to every direct descendant of a node. It does not recursively modify the deeper descendants. If we applied uniplate extractNodeFooters to example, it should remove the footer from the outermost DocBlock, which is a direct descendant of the root Unit. It won't change any of the other DocBlocks. That's exactly what it does.

print . uniplate extractNodeFooters $ example removes only the FooterAnnotation in the DocBlock that's a descendent of Unit

Unit [
    Code "Unit Code",
    DocBlock [
        Code "DocBlock Code",
        DocBlock [
            Code "DocBlock DocBlock Code",
            FooterAnnotation,
            Code "DocBlock DocBlock Footer Annotation Code"
        ]
    ],
    FooterAnnotation,
    Code "Unit FooterAnnotation Code"
]

It logs the one annotation it removed

[

    [
        FooterAnnotation,
        Code "DocBlock FooterAnnotation Code",
        DocBlock [
            Code "DocBlock FooterAnnotation DocBlock Code",
            FooterAnnotation,
            Code "DocBlock FooterAnnotation DocBlock FooterAnnotation Code"
        ]
    ]
]

Deeper

To remove the annotations everywhere we'll have to recursively apply uniplate on each descendant node. We have two generic choices. We can apply our operation to a node before applying it to all the descendants or we can do it afterwards. These are called preorder or postorder traversals. When transforming data, we usually want the postorder traversal because all of the descendants will already be transformed whenever we deal with them.

import Control.Monad

postorder :: Monad m => ((a -> m c) -> (a -> m b)) -> (b -> m c) -> (a -> m c)
postorder t f = go
    where 
        go = t go >=> f

preorder :: Monad m => ((a -> m c) -> (b -> m c)) -> (a -> m b) -> (a -> m c)
preorder t f = go
    where 
        go = f >=> t go

Postorder

The postorder traversal will extract all of the footers from inner nodes before extracting footers from outer nodes. This means that not only will every footer be extracted, but every footer that is inside another footer will be extracted from the footer. print . postorder uniplate extractNodeFooters $ example removes every footer and logs each footer separately.

Unit [
    Code "Unit Code",
    DocBlock [
        Code "DocBlock Code",
        DocBlock [
            Code "DocBlock DocBlock Code"
        ]
    ],
    FooterAnnotation,
    Code "Unit FooterAnnotation Code"
]

None of the three logged footers contain footers.

[
    [FooterAnnotation,Code "DocBlock DocBlock FooterAnnotation Code"],
    [FooterAnnotation,Code "DocBlock FooterAnnotation DocBlock FooterAnnotation Code"],
    [
        FooterAnnotation,
        Code "DocBlock FooterAnnotation Code",
        DocBlock [
            Code "DocBlock FooterAnnotation DocBlock Code"
        ]
    ]
]

Preorder

The preorder traversal will extract all of the footers from outer nodes before extracting footers from inner nodes. This means that every footer will be extracted intact. print . preorder uniplate extractNodeFooters $ example removes every footer and logs it intact. The resulting AST is the same as for the postorder traversal; all of the footers have been removed from DocBlocks.

Unit [
    Code "Unit Code",
    DocBlock [
        Code "DocBlock Code",
        DocBlock [
            Code "DocBlock DocBlock Code"
        ]
    ],
    FooterAnnotation,
    Code "Unit FooterAnnotation Code"
]

One of the two logged footers contains another footer which wasn't extracted and logged separately.

[
    [
        FooterAnnotation,
        Code "DocBlock FooterAnnotation Code",
        DocBlock [
            Code "DocBlock FooterAnnotation DocBlock Code",
            FooterAnnotation,
            Code "DocBlock FooterAnnotation DocBlock FooterAnnotation Code"
        ]
    ],
    [FooterAnnotation, Code "DocBlock DocBlock FooterAnnotation Code"]
]

How to use lens package to extract arbitrary subtrees from an AST

Tags:

haskell

haskell-lens

Greg Hurrell

1 Answers

Operate on a Node

Some boring parsing

Operate on every node

Direct descendants

Deeper

Postorder

Preorder

Cirdec

Recent Activity

Donate For Us

How to use lens package to extract arbitrary subtrees from an AST

Tags:

haskell

haskell-lens

Greg Hurrell

1 Answers

Operate on a Node

Some boring parsing

Operate on every node

Direct descendants

Deeper

Postorder

Preorder

Cirdec

Related questions

Recent Activity

Donate For Us