I am parsing a small declarative language where in a scope you can have variables declared (with a type), and then later on, just like in most other languages, the name (without the type) is used.
The declaration of variable would look like this:
?varname
?varname1 ?varname2 - type1
?varname3 ?varname4 ?varname5 - type2
If the type is omitted, the default type should be object, like in the first case. 
So for that I have a specific parser which returns a list of my own domain object called LiftedTerm (you can just assume its a tuple with the name of the variable and the type of the variable, in reality there is some more stuff in it but irrelevant for this problem):
def typed_list_variables : Parser[List[LiftedTerm]]= typed_variables.+ ^^ { case list => list.flatten.map(variable =>
        LiftedTerm(variable._1, variable._2 match {
          case "object" => ObjectType
          case _ => TermType(variable._2)
        })) }
def typed_variables = ((variable+) ~ (("-" ~> primitive_type)?)) ^^ {
    case variables ~ primitive_type => 
         for (variable <- variables) yield variable -> primitive_type.getOrElse("object")
}
def variable = """\?[a-zA-Z][a-zA-Z0-9_-]*""".r
def primitive_type = """[a-zA-Z][a-zA-Z0-9_-]*""".r
All this works perfectly fine.
Now further down in the same 'scope' I have to parse the parts where there is a reference to these variables. The variable obviously won't be declared again in full. So, in the above example, places where ?varname1 is used won't include type1. However, when I parse the rest of the input I wish to get the reference of the right LiftedTerm object, rather than just a string. 
I have some recursive structures in place, so I don't wish to do this mapping at the top level parser. I don't wish to make a 'global mapping' of these either in my RegexParsers object because most of these are scoped and only relevant for a small piece of the input.
Is there a way of passing contextual information to a parser? Ideally I pass the list of LiftedTerm (or better still a map from the variable names String -> LiftedTerm) into the recursive parser calls.
(Apologies if this is something obvious, I am still new to Scala and even newer to parser combinators).
Parser interface is the key concept of Apache Tika. It hides the complexity of different file formats and parsing libraries while providing a simple and powerful mechanism for client applications to extract structured text content and metadata from all sorts of documents.
Parser combinators are generally slower than a hand-written or code-generated parser. That's somewhat innate due to the overhead of “threading” (for lack of a better word) your control flow through many function calls.
Ans: Parsing (also known as syntax analysis) can be defined as a process of analyzing a text which contains a sequence of tokens, to determine its grammatical structure with respect to a given grammar.
AFAIK, scala's combinator parser library is limited to contex-free grammars. Hence, your usecase is not supported.
The proper way to go would be to extend scala.util.parsing.combinator.Parsers and provide a custom Parser class which carries your context around. Than you need to define all the combinators to also deal with the context.
edit: As has been pointed out below, parsers have a method into and flatMap, therefore, when you have a parser that yields your context, you can combine it with another parser that requires a context in monadic style.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With