Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split a spliterator into N spliterators

Tags:

java

java-8

While analyzing a colleague's Java 7 code from a few years ago, I found that he implemented a utility for traversing data, potentially in parallel. He called it Range, and it extended the Iterator interface. Some of its new methods were awkwardly familiar:

  • int size() would give the exact size of the range;
  • Range split() would separate the range into 2 pieces, preferably, although not necessarily, similar in size (modifying the current range and creating a new one);
  • Range[] split(int n) would separate the range into N sub-ranges, potentially attempting to make them as even as possible.
  • Although void remove() was there from Iterator, its subtypes were merely throwing UnsupportedOperationException.

What I said to myself was that this is definitely what we can now do with (subsized) spliterators, thus taking advantage of the stream API. However, the spliterator in Java 8 does not provide an easy approach to splitting into n parts.

With a recursive function, we can split it up to n = 2^L parts with L recursion levels, but the approach is not as trivial when n is not a power of two, not to mention that it feels unnatural to keep a utility function just for the effect.

One might also say to simply avoid messing around with spliterators and let a stream do the splitting induced by forks during the actual processing, but the ForkJoin strategy might not be very adequate for the task, and does not guarantee that it will use the number of threads that we specifically wish to dedicate to the job. There could be cases of heavy tasks being performed on a small number of elements, in fact.

The question sums up to the following: having a spliterator with at least the characteristics SIZED and SUBSIZED, how can I split it into an exact number of spliterators?

like image 602
E_net4 stands with Ukraine Avatar asked Feb 03 '26 15:02

E_net4 stands with Ukraine


1 Answers

The way to do it, which is not even that hacky and interoperates with the existing F/J support framework, is coding a spliterator wrapper which uses its own splitting policy. What you lose is the ability to leverage a natively random-access structure; you can only split by iterating over the inner splitetator's data set.

You can refer to my earlier answer where I present the code which splits into batches of predetermined size; it is quite easy to adapt it to your case with just an appropriate constructor call.

like image 65
Marko Topolnik Avatar answered Feb 06 '26 04:02

Marko Topolnik



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!