Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scanning for Virus Signatures Using Java

i need to search for virus signatures in file and i am using java to do this i have programmed all the other features such as gathering files and filtering them into ones that need to be search etc. i just need a little help with the virus signature side.

what format to use (hashed string, binary, bytes)?

what method i should use to scan for the string (search algorithm, etc)?

i was thinking of turning the file into bytes and then using a Boyer–Moore string search algorithm to search for the bytes.

i want to use the virus signatures from a signature file and scan a file for them.

 public void Search(File file) {

    if (file.exists()) {

        if (file.isDirectory()) {
            if (file.canRead()) {

                File[] listOfFiles = file.listFiles();
                if (listOfFiles != null) {
                    for (int i = 0; i < listOfFiles.length; i++) {
                        Search(listOfFiles[i]);
                    }
                }
            } else {
                cannotReadDirCount++;
            }
        } else if (file.isFile()) {

            if (file.canRead()) {

                totalFileCount++;

                for (int a = 0; a < executableCriteriaList.size(); a++) {

                    if (file.getName().endsWith(executableCriteriaList.get(a).toLowerCase()) || file.getName().endsWith(executableCriteriaList.get(a).toUpperCase())) {

                        // scanExecutableFile(file); HERE IS where i need to scan the file
                        searchFiles.add(file);
                    }

                }

            } else {
                cannotReadFileCount++;
            }

        }
    } else {
        cannotReadFileCount++;
    }
}

Thanks for the Help

like image 807
Davinco Avatar asked May 20 '26 04:05

Davinco


2 Answers

If you were scanning for just one virus signature, then a single string search algorithm like Boyer-Moore would be a good choice. (There are other fast single search algorithms too.)

But a virus scanner typically looks for many virus signatures, and the signatures are typically not just simple sequence-of-byte signatures.

If you are looking for the (technically) best algorithm, then I suggest you read the Wikipedia page on String Search Algorithms, and consider all of the alternatives that it links to. That's only a start, since there are (apparently) other search algorithms that are not listed there.

As to the best representation of the signatures, that will depend on what search algorithms you use. But since you are looking for byte patterns in code objects, a byte-based representation (byte strings or byte-based patterns / regexes) seems most appropriate.

(I don't see how hashes would actually help you with this problem ...)


But that assumes that you really need the best search technology that is available. It sounds like this is an assignment you are doing, and for that a your original choice of Boyer-Moore is fine. A simple approach is to read each file into memory, and then do a Boyer-Moore search for each virus signature. That won't be as fast as a commercial / professional virus scanner, but it should be good enough for your purposes.

like image 100
Stephen C Avatar answered May 21 '26 18:05

Stephen C


There are several algorithms that will help you. I suggest Aho-Corasick or Rabin-Karp, but a suffix tree may also come in handy. Rabin-Karp is the easiest to implement of those, but Aho-Corasick does not use hashes and so you don't need to take special care of collisions.

like image 26
Ivaylo Strandjev Avatar answered May 21 '26 16:05

Ivaylo Strandjev



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!