Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to ignore arbitrary tokens using parsec?

Tags:

haskell

parsec

I wanted to replace sed and awk with Parsec. For example, extract number from strings like unknown structure but containing the number 42 and maybe some other stuff.

I run into "unexpected end of input". I'm looking for equivalent of non-greedy .*([0-9]+).*.

module Main where

import Text.Parsec

parser :: Parsec String () Int
parser = do
    _ <- many anyToken
    x <- read <$> many1 digit
    _ <- many anyToken
    return x

main :: IO ()
main = interact (show . parse parser "STDIN")
like image 376
sevo Avatar asked Oct 20 '25 20:10

sevo


2 Answers

This can be easily done with my library regex-applicative. It gives you both the combinator interface and the features of regular expressions that you seem to want.

Here's a working version that's closest to your example:

{-# LANGUAGE ApplicativeDo #-}
import Text.Regex.Applicative
import Text.Regex.Applicative.Common (decimal)

parser :: RE Char Int
parser = do
    _ <- few anySym
    x <- decimal
    _ <- many anySym
    return x

main :: IO ()
main = interact (show . match parser)

Here's an even shorter version, using findFirstInfix:

import Text.Regex.Applicative
import Text.Regex.Applicative.Common (decimal)

main :: IO ()
main = interact (snd3 . findFirstInfix decimal)
  where snd3 (_, r, _) = r

If you want to perform actual tokenization (e.g. skip 93 in foo93bar), then take a look at lexer-applicative, a tokenizer based on regex-applicative.

like image 99
Roman Cheplyaka Avatar answered Oct 23 '25 01:10

Roman Cheplyaka


Replacing sed and awk with parsers is what the replace-megaparsec library is all about.

Extract numbers from unstructured strings with the sepCap parser combinator.

import Replace.Megaparsec
import Text.Megaparsec
import Text.Megaparsec.Char.Lexer

parseTest (sepCap (decimal :: Parsec Void String Int))
  $ "unknown structure but containing the number 42 and maybe some other stuff"
[ Left "unknown structure but containing the number "
, Right 42
, Left " and maybe some other stuff"
]
like image 41
James Brock Avatar answered Oct 23 '25 00:10

James Brock