I have a huge set (20000) of boolean expressions. They consist of AND, OR and NOT operators and a large number of boolean variables A1, A2, A3 ... (about 1000). Most expression contain only 5, maybe 20 of these variables.
Given an assignment of the variables (A1 = true, A2 = false, A3 = false ...) I have to find those expressions that evaluate to false.
The same set of expressions will be evaluated for multiple (10-100) assignments
For this purpose:
How should I store the expressions on disk so I can load and parse them fast (I currently have them either as some specialized DSL or as a more or less normalized (and dead slow) relational data structure, but I can change that)
Is there a fast algorithm / data structure for evaluating such expressions that I can use?
Do implementations on the JVM exist?
You may want to look at converting your expressions into Conjunctive Normal Form and combining like terms. You then can have a two-way mapping of an expression to a set of terms, any of which evaluating to false implies that the whole expression evaluates false. For each assignment of variables, start with a set of expressions, evaluate CNF terms until one evaluates to false. If that term is false, then all expressions involving that term will also be false, so those expressions can also be removed from the set.
Whether such an approach fits your case can't be said without looking at the expressions - with 1000 variables and 20000 expressions, it might not be that they have many CNF terms in common.
Outside of Java, and for much larger numbers of expressions, DNF is possibly more useful, since its implementation on the GPU is obvious.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With