I want to instrument my code directly by pre-processing source files with sed/awk. I cannot use other methods like debugger traces or gcc option -finstrument-functions
. In this last case, the addresses are rebased in some way I cannot manage and I miss the correspondence with the symbol table. Other methods presented here (ptrace, etrace, callgraph, etc) or here work well on a simple example, but not in my real project.
The problem is that when processing big open source projects, the writing standards of functions differ, not only between C and C++ files, but often in the same file. The {
may be at the end of the argument list, or on the other line, structures or assignment may use a starting {
, making simple function parsing false.
So the solution presented in the above links that insert a macro in the beginning of the function definition does not work in general, and it is not feasible to correct by hand kilo lines of code (KLOC).
sed 's/^{/{ENTRY/'
So, how to target reliably functions definitions in C/C++ code with regular expressions usable in sed or awk? Possibly by using a part of the gcc precompiler code? I am looking for something possibly off-the -shelf please.
sed
or awk
(or any purely textual approach) are the wrong tools to process reliably C code (and you probably should work on the pre-processed form).
You want to work on some form of the compiler's AST. Of course the internal representations inside a compiler are specific to the compiler (and perhaps even to its version).
If using a recent GCC you could customize it using MELT (and add your passes to GCC) -or with your own plugin in C++.
If using Clang/LLVM you could also customize it by adding your passes.
The Coccinelle tool might also be relevant.
Any such approach requires a significant amount of work (probably weeks) since you'll need to understand in detail the internal representations of the particular compiler you are using. And C is complex enough to make that non-trivial.
You cannot do this with any tool that does not understand the specific version of C your code is written in (e.g. C++ or ANSI-C or C-99). As a trivial example - what does "//" mean in a "C function"? Well if it's inside a string it's a literal pair of slashes, and if it's outside of a string it might be the start of a comment if the code is C++ or C-99 but its not the start of a comment in ANSI-C. What if it's inside /* ... // ... */
? If what looks like a function definition follows a "//" is that really a function?
You don't say what it is you want to do ("pre-process the code" doesn't tell us anything) but you should look into using something like I posted at Remove multi-line comments to use gcc to strip code of comments and then a C beautifier like "indent" or "cb" to reformat the code consistently and/or take a look at "cscope" or "ccalls" if you're just looking for a tool to list functions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With