Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tokenize a source code

Tags:

tokenize

Is there any library available that can tokenize source code written in different programming languages (java/C/C++)? (can possible identify part of it like starting and ending of a function, which are identifiers). I do not want to parse the source code, that can be overly complex. Moreover the source code may not be error free. Thanking in advance.

like image 920
Muhammad Asaduzzaman Avatar asked Oct 16 '25 15:10

Muhammad Asaduzzaman


1 Answers

You can tokenize source code using a lexical analyzer (or lexer, for short) like flex (under C) or JLex (under Java). The easiest way to get grammars to tokenize Java, C, and C++ may be to use (subject to licensing terms) the code from an open source compiler using your favorite lexer. Even if you find the licensing conditions too onerous, they should be educational to look through...

However, you still won't be able to identify the beginning and end of a function without parsing.

like image 158
comingstorm Avatar answered Oct 18 '25 06:10

comingstorm



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!