Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java - processing documents in parallel

I have 5 documents(say) and I have some processing on each of them. Processing here includes open the document/file, read the data, do some document manipulation(edit text etc). For document manipulation I will probably be using docx4j or apache-poi. But my use case is this - I want to somehow process these 4-5 documents in parallel utilizing multiple cores available to me on my CPU. The processing on each document is independent of each other.

What would be the best way to achieve this parallel processing in Java. I have used ExecutorService in java before and Thread class too. But I dont have much idea about the newer concepts like Streams or RxJava. Can this task be achieved by using Parallel Stream in Java as introduced in Java 8? What would be better to use Executors/Streams/Thread Class etc. If Streams can be used please provide a link where I can find some tutorial on how to do that. Thanks for your help!

like image 730
Aditya Bahuguna Avatar asked Nov 24 '25 21:11

Aditya Bahuguna


1 Answers

You can process in parallel using Java Streams using the following pattern.

List<File> files = ...
files.parallelStream().forEach(f -> process(f));

or

File[] files = dir.listFiles();
Stream.of(files).parallel().forEach(f -> process(f));

Note: process cannot throw a CheckedException in this example. I suggest you either log it or return a result object.

like image 85
Peter Lawrey Avatar answered Nov 26 '25 10:11

Peter Lawrey



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!