Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

parsec: unexpected character parsing nested comments

Tags:

haskell

parsec

I am trying to parse nested C-like block comments

import Text.ParserCombinators.Parsec
import Control.Monad (liftM)

flat :: Monad m => m [[a]] -> m [a]
flat = liftM concat

comment :: Parser String
comment = between (string "/*") (string "*/") (try nested <|> content)
  where
    content = many (try (noneOf "*/")
                   <|> try (char '*' >> notFollowedBy (char '/') >> return '*')
                   <|> try (char '/' >> notFollowedBy (char '*') >> return '/'))
    nested  = flat $ many comment

"1234567890" parses fine, but when I try

parse comment "" "/*123/*456*/789*/"

I get

Left (line 1, column 3):
unexpected "1"
expecting "/*" or "*/"

I cannot figure out why, I have try everywhere I can think of. Please help.

like image 522
akonsu Avatar asked Dec 02 '25 00:12

akonsu


1 Answers

In an expression like a <|> b, if a can match the empty string then b will never be tried, and this is happening in try nested <|> content.

You can fix your approach by requiring at least one comment match or another character:

comment :: Parser String
comment = between (string "/*") (string "*/") ( flat $ many $ (try comment <|> liftM toString other ) )
  where
    toString x = [x]
    other = try (noneOf "*/")
            <|> try (char '*' >> notFollowedBy (char '/') >> return '*')
            <|> try (char '/' >> notFollowedBy (char '*') >> return '/')

FWIW, here is how Text.Parsec.Token does it:

https://github.com/aslatter/parsec/blob/master/Text/Parsec/Token.hs#L698-714

For your specific case the equivalent code is:

import Data.List (nub)

commentStart = "/*"
commentEnd = "*/"

multiLineComment =
    do { try (string commentStart)
       ; inComment
       }

inComment = inCommentMulti

inCommentMulti
    =   do{ try (string commentEnd) ; return () }
    <|> do{ multiLineComment                     ; inCommentMulti }
    <|> do{ skipMany1 (noneOf startEnd)          ; inCommentMulti }
    <|> do{ oneOf startEnd                       ; inCommentMulti }
    <?> "end of comment"
    where
      startEnd   = nub (commentEnd ++ commentStart)
like image 80
ErikR Avatar answered Dec 03 '25 20:12

ErikR



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!