After reading about ForkJoinPool, I tried an experiment to test how fast actually ForkJoinPool is, when compared to plain recursion.
I calculated the number of files in a folder recursively, and to my surprize, plain recursion performed way better than ForkJoinPool
Here's my code.
Recursive Task
class DirectoryTask extends RecursiveTask<Long> {
private Directory directory;
@Override
protected Long compute() {
List<RecursiveTask<Long>> forks = new ArrayList<>();
List<Directory> directories = directory.getDirectories();
for (Directory directory : directories) {
DirectoryTask directoryTask = new DirectoryTask(directory);
forks.add(directoryTask);
directoryTask.fork();
}
Long count = directory.getDoumentCount();
for (RecursiveTask<Long> task : forks) {
count += task.join();
}
return count;
}
}
Plain Recursion
private static Long getFileCount(Directory directory) {
Long recursiveCount = 0L;
List<Directory> directories = directory.getDirectories();
if (null != directories) {
for (Directory d : directories) {
recursiveCount += getFileCount(d);
}
}
return recursiveCount + directory.getDoumentCount();
}
Directory Object
class Directory {
private List<Directory> directories;
private Long doumentCount = 0L;
static Directory fromFolder(File file) {
List<Directory> children = new ArrayList<>();
Long documentCount = 0L;
if (!file.isDirectory()) {
throw new IllegalArgumentException("Only directories are allowed");
}
String[] files = file.list();
if (null != files) {
for (String path : files) {
File f = new File(file.getPath() + File.separator + path);
if (f.isHidden()) {
continue;
}
if (f.isDirectory()) {
children.add(Directory.fromFolder(f));
} else {
documentCount++;
}
}
}
return new Directory(children, documentCount);
}
}
Results
Where's the mistake here?
I am just trying to understand whether there is a particular threshold, below which plain recursion is faster than a ForkJoinPool.
Nothing in life comes for free. If you had to move one beer crate from your car to your apartment - what is quicker: carrying it there manually, or first going to the shed, to get the wheelbarrow to use that to move that one crate?
Creating thread objects is a "native" operation that goes down into the underlying Operating System to acquire resources there. That can be a rather costly operation.
Meaning: just throwing "more threads" at a problem doesn't automatically speed things up. To the contrary. When your task is mainly CPU-intensive, there might be small gain from doing things in parallel. When you are doing lots of IO, then having multiple threads allows you to do "less" waiting overall; thus improving your throughput.
In other words: Fork/Join requires considerable activity before it does the real job. Using it for computations that only require a few ms is simply overkill. Thus: you would be looking to "fork/join" operations that work on larger data sets.
For further reading, you might look at parallel streams. Those are using the fork/join framework under the covers; and surprise, it is a misconception to expect arbitrary parallelStream to be "faster" than ordinary streams, too.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With