Antlr 4: Method for switching modes in parser

Question

I'm trying to build a MVS JCL recognizer using Antlr4. The general endeavour is going reasonably well, but I am having trouble handling the MVS equivalent of *nix "here docs" (inline files). I cannot use lexer modes to flip-flop between JCL and here-doc content, so I am looking for alternatives that I might use a parser level.

IBM MVS allows the use of "instream datasets", similar to *nix here-docs.

Example:

This defines a three-line inline file, terminated by the characters "ZZ" and accessible to a referencing program using the label "ANYNAME":

//ANYNAME  DD *,SYMBOLS=(JCLONLY,FILEREF),DLM=ZZ
HEREDOC TEXT 1
HEREDOC TEXT 2
HEREDOC TEXT 3
ZZ
//NEXTFILE DD ...stuff...

ANYNAME is a handle by which a program can access the here-doc content.

DD * is mandatory and informs MVS that a here-doc follows.

SYMBOLS=(JCLONLY,FILEREF) is optional detail relating to how the here-doc is handled.

DLM=ZZ is also optional and defines the here-doc terminator (default terminator = /*).

I need to be able, at parser level, to process the //ANYNAME... line (I have that bit), then to read the here-doc content until I find the (possibly non-default) here-doc terminator. In a sense, this looks like a lexer modes opportunity- but at this point I am working within the parser and I do not have a fixed terminator to work with.

I need guidance on how to switch modes to handle my here-doc, then switch back again to continue processing my JCL.

A hugely abridged version of my grammar follows (the actual grammar, so far, is about 2,200 lines and is incomplete).

Thanks for any insights. I appreciate your help, comments and suggestions.

/* the ddstmt parser rule should be considered the main entry point. It handles (at least):

           //ANYNAME  DD *,SYMBOLS=(JCLONLY,FILEREF),DLM=ZZ
    and    //         DD *,DLM=ZZ
    and    //ANYNAME  DD *,SYMBOLS=EXECSYS
    and    //ANYNAME  DD *

  I need to be able process the above line as JCL then read the here-doc content...

                   "HEREDOC TEXT 1"
                   "HEREDOC TEXT 2"
                   "HEREDOC TEXT 3"

  as either a single token or a series of tokens, then, after reading the here-doc 
  delimiter...

                   "ZZ"

 , go back to processing regular JCL again.

*/


    /* lexer rules: */

                LINECOMMENT3        :   SLASH SLASH STAR                            ;
                DSLASH              :   SLASH SLASH                                 ;
                INSTREAMTERMINATE   :   SLASH STAR                                  ;
                SLASH               :   '/'                                         ;
                STAR                :   '*'                                         ;
                OPAREN              :   '('                                         ;
                CPAREN              :   ')'                                         ;
                COMMA               :   ','                                         ;

                KWDD                :   'DD'                                        ;
                KWDLM               :   'DLM'                                       ;
                KWSYMBOLS           :   'SYMBOLS'                                   ;
                KWDATA              :   'DATA'                                      ;

                SYMBOLSTARGET       :   'JCLONLY'|'EXECSYS'|'CNVTSYS'               ;
                EQ                  :   '='                                         ;
                APOST               :   '\''                                        ;
                fragment
                SPC                 :   ' '                                         ;
                SPCS                :   SPC+                                        ;
                NL                  :   ('
'? '
')                                ;
                UNQUOTEDTEXT        :   (APOST APOST|~[=\'\"
	,/() ])+          ;


    /* parser rules: */

                label               :   unquotedtext
                                    ;
                separator           :   SPCS
                                    ;

        /* handle crazy JCL comment rules - start */
                    partcomment         :   SPCS partcommenttext NL
                                        ;
                    partcommenttext     :   ((~NL+?)?)
                                        ;
                    linecomment         :   LINECOMMENT3 linecommenttext NL
                                        ;
                    linecommenttext     :   ((~NL+?)?)
                                        ;
                    postcommaeol        :   ( (partcomment|NL) linecomment* DSLASH SPCS )?
                                        ;
                    poststmteol         :   ( (partcomment|NL) linecomment* )?
                                        ;
        /* handle crazy JCL comment rules - end */

                ddstmt              :   DSLASH (label|) separator KWDD separator dddecl
                                    ;
                dddecl              :   ...
                                    |   ddinstreamdecl
                                    |   ...
                                    ;
                ddinstreamdecl      :   (STAR|KWDATA) poststmteol ddinstreamopts
                                    ;
                ddinstreamopts      :    ( COMMA postcommaeol ddinstreamopt poststmteol )*
                                    ;
                ddinstreamopt       :    (   ddinstreamdelim
                                         |   symbolsdecl
                                         )
                                    ;
                ddinstreamdelim     :   KWDLM EQ unquotedtext
                                    ;
                symbolsdecl         :   KWSYMBOLS EQ symbolsdef
                                    ;
                symbolsdef          :   OPAREN symbolstarget ( COMMA symbolsloggingdd )? CPAREN
                                    |   symbolstarget
                                    ;
                symbolstarget       :   SYMBOLSTARGET
                                    ;
                symbolsloggingdd    :   unquotedtext
                                    ;
                unquotedtext        :   UNQUOTEDTEXT
                                    ;

Sam Harwell · Accepted Answer

Your lexer needs to be able to tokenize the entire document prior to the beginning of the parsing operation. Any attempt to control the lexer from within the parser is a recipe for endless nightmares down the road. The following fragments of a PHP Lexer show how predicates can be used in combination with lexer modes to detect the end of a string with a user-defined delimiter. The key part is recording the start delimiter, and then checking tokens which start at the beginning of the line against it.

PHP_NOWDOC_START
    :   '<<<\'' PHP_IDENTIFIER '\'' {_input.La(1) == '
' || _input.La(1) == '
'}?
        -> pushMode(PhpNowDoc)
    ;

mode PhpNowDoc;

    PhpNowDoc_NEWLINE : NEWLINE -> type(NEWLINE);

    PHP_NOWDOC_END
        :   {_input.La(-1) == '
'}?
            PHP_IDENTIFIER ';'?
            {CheckHeredocEnd(_input.La(1), Text);}?
            -> popMode
        ;

    PHP_NOWDOC_TEXT
        :   ~[
]+
        ;

The identifier is actually recorded in a custom override of NextToken() (shown here for a C# target):

public override IToken NextToken()
{
    IToken token = base.NextToken();
    switch (token.Type)
    {
    case PHP_NOWDOC_START:
        // <<<'identifier'
        _heredocIdentifier = token.Text.Substring(3).Trim('\'');
        break;

    case PHP_NOWDOC_END:
        _heredocIdentifier = null;
        break;

    default:
        break;
    }

    return token;
}

private bool CheckHeredocEnd(int la1, string text)
{
    // identifier
    //  - or -
    // identifier;
    bool semi = text[text.Length - 1] == ';';
    string identifier = semi ? text.Substring(0, text.Length - 1) : text;
    return string.Equals(identifier, HeredocIdentifier, StringComparison.Ordinal);
}

Antlr 4: Method for switching modes in parser

Tags:

parsing

antlr

antlr4

v0rl0n

1 Answers

Sam Harwell

Recent Activity

Donate For Us

Antlr 4: Method for switching modes in parser

Tags:

parsing

antlr

antlr4

v0rl0n

1 Answers

Sam Harwell

Related questions

Recent Activity

Donate For Us