I am developing a parser for a language similar to SQL and I have the problem of creating some of the rules of language, such as: expression IS NULL
and expression IN (expression1, expression2, ...)
with priority between logical and mathematical operators.
I uploaded a GitHub test project https://github.com/anpv/SpracheTest/ but this variant is not good.
I tried to use the following rules:
private static readonly Parser<AstNode> InOperator =
from expr in Parse.Ref(() => Expression)
from inKeyword in Parse.IgnoreCase("in").Token()
from values in Parse
.Ref(() => Expression)
.DelimitedBy(Comma)
.Contained(OpenParenthesis, CloseParenthesis)
select new InOperator(expr, values);
private static readonly Parser<AstNode> IsNullOperator =
from expr in Parse.Ref(() => Expression)
from isNullKeyword in Parse
.IgnoreCase("is")
.Then(_ => Parse.WhiteSpace.AtLeastOnce())
.Then(_ => Parse.IgnoreCase("null"))
select new IsNullOperator(expr);
private static readonly Parser<AstNode> Equality =
Parse
.ChainOperator(Eq, IsNullOperator.Or(InOperator).Or(Additive), MakeBinary);
which throws ParseException
in code like ScriptParser.ParseExpression("1 is null")
or ScriptParser.ParseExpression("1 in (1, 2, 3)"): "Parsing failure: Left recursion in the grammar."
.
How can I look-ahead for Expression, or do other variants exist to solve this problem?
The answer is, unfortunately, the Sprache cannot parse a left-recursive grammar. I stumbled on comments in the source code talking about how buggy support for left-recursive grammars had been removed when researching this question (which was also how I found your question) - see the source code.
In order to deal with this problem you need to reorganize how you do your parsing. If you are writing a simple expression parser, for example, this is a common problem you have to deal with. Searching the web there is lots of discussion of how to remove left-recursion from a grammar, in particular, for expressions.
In your case, I expect you'll need to do something like:
term := everything simple in an expression (like "1", "2", "3", etc.)
expression := term [ IN ( expression*) | IS NULL | "+" expression | "-" expression | etc.]
or similar - basically - you have to unwind the recursion yourself. By doing that I was able to fix my issues with expressions. I suspect any basic compiler book probably has a section on how to "normalize" a grammar.
It makes building whatever object you are returning from the parser a bit more of a pain, but in the select statement instead of doing "select new Expression(arg1, arg2)" I changed it to be a function call, and the function decides on the specific object being returned depending on what the arguments were.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With