Adding a file to an existing in memory zip file with Zip4j

Question

with Zip4j, I'm trying to add a file to an existing zip file loaded in memory. I have no access to the file system. My zip file is basically a byte[] and I would like to not unzip it to add my new file.

I tried several methods but none gave me an acceptable result. The closest I got was by putting my zip file in a ByteArrayOutputStream, wrap it in a ZipOutputStream and finally adding my new file in it :

    public byte[] addFileToArchive(byte[] originalContent, byte[] fileToAdd, String filePathInArchive) {

        ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
        try (ZipOutputStream zipOutputStream = new ZipOutputStream(outputStream)) {
            // Add original content
            outputStream.write(originalContent);

            // Add new file
            ZipParameters parameters = new ZipParameters();
            parameters.setFileNameInZip(filePathInArchive);

            zipOutputStream.putNextEntry(parameters);
            zipOutputStream.write(fileToAdd);
            zipOutputStream.closeEntry();
        }

        return outputStream.toByteArray();
    }

Unfortunately, it creates a strange zip file displaying original files when I preview it with a zip explorer (like Ark) but when I unzip it to verify its real content, only the new file is present.

I found a stale issue on the official github related to same kind of needs.

Does anyone know a way to achieve that? Thanks!

EDIT: Why I just can't use File or ZipFile ? It's because this code can receive data from filesystem or S3 API or several other data provider.

rzwitserloot · Accepted Answer

Zip files consist of:

A series of 'zipped contents'. Each 'content' is a single file. It exists on its own (this explains why .zip is a horribly inefficient format when zipping up many tiny files that share a lot of common data; zip does not offer any way to use similarity between files to improve compression). Each 'content' block contains not just the byte content of the entry, but also the metadata, including its name and its uncompressed size.
A central directory structure, which lists the contents of the zip file. You do not necessarily need this thing; it is replicating data already shown in the metadata of the content items.

ZIP as a format sticks the central directory structure at the end. In this streaming age that seems like a boneheaded decision, but, think about how file systems work: Adding data at the end of a file is possible, whereas adding at at the beginning is not possible (you can ask a filesystem to do it; if it supports this at all, usually it just makes a complete copy, whereas when you add to the end, all bytes inside it that fit in a full sector equivalent are generally not touched at all).

This means there are 2 completely different ways to unzip / read a file:

Start at the start, and print/unzip each file as you see it. You might as well do this when 'streaming'. You have to go through the bytes anyway!
Start at the end, read only the central directory structure and print all of that only. When talking about files, this is vastly more efficient: You only need to ask the filesystem to read a single sector (the one containing the last segment of the file).

What you've done is this:

Have one zip file containing all your old stuff.
Make a completely unrelated separate zip file containing just the one file you added.
Concatenate the bytes of these 2 things together.

This is an invalid zip file. If you read it using the first strategy, you see only the 'old' files or possibly all files, depending on how that reader is built (if it stops once it sees the central directory structure, you never see the newly added file. If it simply skips over central directory structures it will see both, but, this is somewhat bizarre; this makes sense only if the zip reader is explicitly designed to attempt to recover corrupted zip files.. because that's what you created here, a corrupted zip file).

If you read it using the second strategy, you hop to the end of the whole thing which is the central directory structure of your second zip file that contains only the one added file.

This explains your output.

In general your approach is broken - you do not want to 'stream' anything because the point of zips is that adding a file at the end is possible without rewriting. You want a zip API that explicitly does not involve InputStream - it must involve Path or File. Or, you don't mind the inefficiency of rewriting.

So, this:

byte[] originalContent = FileUtils.readFileToByteArray(originalFile);

is a mistake already. Don't do that

From a 'lets write a zip tool from the ground up' approach, now you know all you need to know. You even know that if you must 'stream' through it, you can still do so very efficiently; there is no need to uncompress each file, you can just copy the bytes verbatim from old to new. And if you don't have to stream through it (it's a file on disk already), you can just open it, move the 'file position' to the start of the central directory structure, write the new file (in compressed state), then write a new central directory structure. That's what e.g. the command line zip and pkzip tools would do in this situation, and it means that if you have a zip file of, say, 540GB and you add a file, that will take less than a second, whereas any tool that can only 'stream' through would take way, way longer and wouldn't work unless there's 540GB+ free disk space.

But, how do you do that without writing your own zip library?

The zip4j docs are very explicit about how to do this. Just.. Read the docs.

ZipFile existingZip = new ZipFile(pathToExistingZip);
existingZip.addFiles(new File[] {fileToAdd1});

Adding a file to an existing in memory zip file with Zip4j

Tags:

java

in-memory

zip4j

MatthieuBlm

1 Answers

rzwitserloot

Recent Activity

Donate For Us

Adding a file to an existing in memory zip file with Zip4j

Tags:

java

in-memory

zip4j

MatthieuBlm

1 Answers

rzwitserloot

Related questions

Recent Activity

Donate For Us