Say there are two methods in my library:
void com.somepackage.SomeClass.someSink(String s)
and
int com.someotherpackage.SomeOtherClass.someSource(int i)
The first method is used as a data sink, while the second as a data source in my code. The type parameters int, String are just given as an example and may change in the actual situation.
I want to detect the usage of these methods in some code that satisfy a certain pattern given below:
x) is generated by the sourcey) is generated using a series of transformations f1(f2(... fn(x))
y is given to the sink.The transformations can be any arbitrary functions as long as there is a sequence of calls from the function that generates the data for the sink to a function that takes in data from the source. The functions may take any other parameters as well and are to be used as a black-box.
The scanning can be at the source or bytecode level. What are the tools available out there for this type of analysis?
Prefer non-IDE based tools with Java APIs.
[EDIT:] to clarify more, someSink and someSource are arbitrary methods names in classes SomeSome and SomeOtherClass respectively. They may or may not be static and may take arbitrary number of parameters (which I should be able to define). The type of the parameters is also not arbitrary. The only requirement is that the tool should scan the code and output line numbers where the pattern occurs. So the tool might work this way:
Example output:
MyClass1.java:12: value1 = com.someotherpackage.SomeOtherClass.someSource(...)
MyClass2.java:23: value2 = foo(value1, ...)
MyClass3.java:3: value3 = bar(value2)
MyClass4.java:22: com.somepackage.SomeClass.someSink(value3, ...)
Note: If a function does not take parameters but has some side affects on the data also needs to be considered. (Example a = source(); void foo(){ c = a+b }; foo(); sink(c) is a pattern that needs to be caught.) 
Source code analysis tools, also known as Static Application Security Testing (SAST) Tools, can help analyze source code or compiled versions of code to help find security flaws. SAST tools can be added into your IDE. Such tools can help you detect issues during software development.
Static information flow inference analysis is a technique which automatically infers information flows based on data or control dependence. It can be utilized for the purposes of general program understanding, detection of security attacks and security vulnerabilities, and type in- ference for security type systems.
Data-flow analysis is a technique for gathering information about the possible set of values calculated at various points in a computer program.
After doing some research, I find that soot is the best suited for this kind of task. Soot is more mature than other open source alternatives such as PQL.
So the role of the source and sink methods is simply that x originates in the source method (somewhere) and is consumed (somewhere) in the target method? How do you characterize "x", or do you simply want all x that have this property?
Assuming you have identified a specific x in the source method, do you a) insist that x be passed to the target method only by method calls [which would make the target method the last call in your chain of calls], or can one of the intermediate values be copied? b) insist that each function call has exactly one argument?
We have done something like this for large C systems. The problem was to trace an assigned variable into a use in other functions whereever they might be, including values not identical in representation but identical in intent ("abstract copy"; the string "1.0" is abstractly equivalent to the integer 1 if I use the string eventually as a number; "int_to_string" is an "abstract copy" function that converts a value in one representation to an equivalent value in another.).
What we needed for this is a reaching definitions analysis for each function ("where does the value from a specific assignment go?"), and an "abstract copy" reaching analysis that determines where a reaching value is consumed by special functions tagged as "abstract copies", and where the result of that abstact copy function reaches to. Then a transitive closure of "x reaches z" and "x reaches f(x) reaches z" computed where x can go.
We did this using our DMS Software Reengineering Toolkit, which provides generic parsing and flow analysis machinery, and DMS's C Front End, which implements the specific reaching and abstract-copy-reaching computations for C. DMS has a Java Front End which computes reaching definitions; one would have add the abstact-copy-reaching logic and reimplement the transitive closure code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With