Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to implement monadic parsing?

I'm stuck on understanding how Monad works. I'm writing a Parser by using Grammar S->aSb. Input is "a^n b^n for n>=0", e.g "aabb"; Return value for this Parser is a Boolean Value.

Previously by using temporary variables for this Grammar. Now I want to implement this parser by using Monad instead of temporary Variables. But i tried many times und still stuck on this problem.


object Parser {
  case class Parser[+A](parse: List[Char] => Option[(A, List[Char])]) {
    def accepts(l: List[Char]): Boolean
    // If List[Char] is empty after parsing it was successful
    = parse(l).map(_._2.isEmpty).getOrElse(false)

    // Not sure that this exactly does
    def flatMap[B](f: A => Parser[B]): Parser[B] = Parser {
      input => parse(input).flatMap { case (x, midput) => f(x).parse(midput) }
    }

    // a.map( i => i + 1) parses using Parser a and applies function i+1
    def map[B](f: A => B): Parser[B] = Parser {
      input => parse(input) match {
        case Some((t, remains)) => Some((f(t), remains))
        case _ => None
      }
    }

    // a.orElse(b) tries to use Parser a first
    // If a returns None it tries Parser b
    // If b also returns None all rules are exhausted
    def orElse[B](that: Parser[B]): Parser[Either[A, B]] = Parser {
      input => parse(input) match {
        case Some((p, remains)) => Some((Left(p), remains))
        case None => that.parse(input) match {
          case Some((p, remains)) => Some((Right(p), remains))
          case None => None
        }
      }
    }
  }

  // Not sure if this is correct
  def unit[T](x: T): Parser[T] = Parser {
    _ => Some((x, List()))
  }

  // Consumes c if possible
  def char(c: Char): Parser[Unit] = Parser {
    case x :: rest if c == x => Some(((), rest))
    case _ => None
  }

  def main(args: Array[String]): Unit = {
    val S: Parser[Int]
    = char('a').flatMap { _ =>
      S.flatMap { i =>
        char('b').map { _ =>
          i + 1
        }
      }
    }.orElse(unit(0)).map(_.merge)

    S.accepts("".toList) // true
    S.accepts("aaabbb".toList) // true
    S.accepts("aaa".toList) // false
    S.accepts("bbbaaa".toList) // false
  }
}

like image 576
Vincent Avatar asked Nov 29 '25 01:11

Vincent


1 Answers

Usually, when we say "monadic parsing", we mean making Parser a monad. We write

class Parser[+A] { ... }

A Parser[A] takes input and returns the parsed A, or maybe it fails, or maybe there will be some input left over. Let's keep it really simple: a Parser[A] takes a List[Char], and Optionally returns an A and the remaining List[Char].

case class Parser[+A](parse: List[Char] => Option[(A, List[Char])]) {
  def accepts(l: List[Char]): Boolean
    = parse(l).map(_._2.isEmpty).getOrElse(false)
  // do not bother with the List('#') stuff
}

You build up a Parser by using combinators. a.flatMap(b) is a parser that matches a followed by b

// case class Parser[+A](...) {
  def flatMap[B](f: A => Parser[B]): Parser[B] = Parser { input =>
    parse(input).flatMap { case (x, midput) => f(x).parse(midput) }
  }
// }

and Parser.unit(x) returns x without consuming any input, which is why Monad is important. You should also have map, which alters the returned value without changing what's matched. You also need a combinator for alternation. I'll leave those for you to implement.

object Parser {
    def unit[T](x: T): Parser[T] = ???
}
// case class Parser[+A](...) {
    def map[B](f: A => B): Parser[B] = ???

    // left-biased greedy: if this parser succeeds (produces Some) then
    // that parser is never tried (i.e. no backtracking)
    // replacing Option with Seq is the easiest way to get backtracking
    // but we don't need it to define S
    def orElse[B](that: Parser[B]): Parser[Either[A, B]] = ???
// }

You also want some basic Parsers to build more complicated ones from. Parser.char(x) matches the single char x and returns nothing useful.

// object Parser {
  def char(c: Char): Parser[Unit] = Parser {
    case x :: rest if c == x => Some(((), rest))
    case _ => None
  }
// }

Then you can define S in a pretty natural manner. You can even make the parser return an Int for how many as/how many bs were matched:

lazy val S: Parser[Int]
  = (for { _ <- Parser.char('a')
           i <- S
           _ <- Parser.char('b')
         } yield (i + 1)).orElse(Parser.unit(0)).map(_.merge)
// i.e
lazy val S: Parser[Int]
  = Parser.char('a').flatMap { _ =>
      S.flatMap { i =>
        Parser.char('b').map { _ =>
          i + 1
        }
      }
    }.orElse(Parser.unit(0)).map(_.merge)

S.accepts("".toList) // true
S.accepts("aaabbb".toList) // true
S.accepts("aaa".toList) // false
S.accepts("bbbaaa".toList) // false

You do not have to move the List[Char] around in the definition of S, because the combinators we've written do that for you, leaving behind only the logic of the grammar itself.

like image 180
HTNW Avatar answered Dec 01 '25 14:12

HTNW