I cannot find any statement that specifies whether it would be safe to get multiple InputStreams (from multiple ZipEntry's) and process each in its own thread.
Would this be safe to attempt?
Would it be advisable?
Added
Might I get better performance this way?
No, it is not thread-safe in that sense. If you're appending to the same zip file, you'd need a lock there, or the file contents could get scrambled. If you're appending to different zip files, using separate ZipFile() objects, then you're fine. Show activity on this post.
Ans. Yes for sure, you are creating 101 logical threads (1 main thread + 100 other by calling start() method of thread).
The implementation of those operations is thread-safe, if (and only if) all threads use the same SynchronizedInputStream object to access a given InputStream , and nothing apart from your wrapper access the InputStream directly.
A thread-safe routine is one that can be called concurrently from multiple threads without undesirable interactions between threads. A routine can be thread safe for either of the following reasons: It is inherently reentrant. It uses thread-specific data or lock on mutexes.
Reading should be OK. Each stream contains its own state, so you can open multiple streams that point to the same file and read from them concurrently.
But simultaneous writing is wrong. It will create mismatch in your file.
ZipFile InputStreams should be threadsafe, but the ZipFile API itself is instance-synchronized (internally all the reading/writing methods, including for reading metadata, are isolated using synchronized (this)), so ZipFile instances can be accessed by only one thread at a time.
If you want multiple threads to read from the same zipfile in a scalable way, you must open one ZipFile instance per thread, with each thread reading from separate InputStreams, each one derived from a different ZipEntry. That way, the per-thread lock in the ZipFile methods does not block all but one thread from reading from the zipfile at one time. It also means that when each thread closes the ZipFile after they're done reading, they close their own instance, not the shared instance, so you don't get an exception on the second and subsequent close.
Protip: if you really care about speed, and you need multiple threads reading from the same ZipFile, you can get more performance by reading all the ZipEntry objects from the first ZipFile instance, and sharing them with all threads, to avoid duplicating work in reading the zipfile central directory for each thread separately. A ZipEntry object is not tied to a specific ZipFile instance per se, ZipEntry just records metadata that will work with any ZipFile object representing the same zipfile that the ZipEntry came from. So this is the recipe for scaling up ZipFile usage in Java:
ZipFile instances on the file, one for each of the N worker threads.ZipFile instances, read all the ZipEntry objects, and store in a list.ZipEntry objects to each of the N worker threads, along with the thread's own unique ZipFile instance.InputStream on the thread's ZipFile instance, using whatever ZipEntry objects you want that thread to open (e.g. you could instruct each thread to open just one of the files). Or you could put all the ZipEntry objects into a concurrent queue or parallel stream, and all the worker threads could consume entries from the queue, if you want just one thread to open each ZipEntry, while load-leveling as well as possible.ZipFile instance.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With