I have a directory with 100+ zip files and I need to read the files inside the zip files to do some data processing, without unzipping the archive.
Is there a Ruby library to read the contents of files in zip archives, without unzipping the file?
Using rubyzip gives an error:
require 'zip'
Zip::File.open('my_zip.zip') do |zip_file|
  # Handle entries one by one
  zip_file.each do |entry|
    # Extract to file/directory/symlink
    puts "Extracting #{entry.name}"
    entry.extract('here')
    # Read into memory
    content = entry.get_input_stream.read
  end
end 
Gives this error:
test.rb:12:in `block (2 levels) in <main>': undefined method `read' for Zip::NullInputStream:Module (NoMethodError)
    from .gem/ruby/gems/rubyzip-1.1.6/lib/zip/entry_set.rb:42:in `call'
    from .gem/ruby/gems/rubyzip-1.1.6/lib/zip/entry_set.rb:42:in `block in each'
    from .gem/ruby/gems/rubyzip-1.1.6/lib/zip/entry_set.rb:41:in `each'
    from .gem/ruby/gems/rubyzip-1.1.6/lib/zip/entry_set.rb:41:in `each'
    from .gem/ruby/gems/rubyzip-1.1.6/lib/zip/central_directory.rb:182:in `each'
    from test.rb:6:in `block in <main>'
    from .gem/ruby/gems/rubyzip-1.1.6/lib/zip/file.rb:99:in `open'
    from test.rb:4:in `<main>'
If you are using Windows 7, 8 or 10, follow the following steps to open any zip files without WinZip or WinRAR. Double click the zip file you wish to extract to open the file explorer. At the top part of the explorer menu, find “Compressed folder tools” and click it. Select the “extract” option that appears below it.
Also, you can use the zip command with the -sf option to view the contents of the . zip file. Additionally, you can view the list of files in the . zip archive using the unzip command with the -l option.
To list/view the contents of a compressed file on a Linux host without uncompressing it (and where GZIP is installed), use the "zcat" command.
The Zip::NullInputStream is returned if the entry is a directory and not a file, could that be the case?
Here's a more robust variation of the code:
#!/usr/bin/env ruby
require 'rubygems'
require 'zip'
Zip::File.open('my_zip.zip') do |zip_file|
  # Handle entries one by one
  zip_file.each do |entry|
    if entry.directory?
      puts "#{entry.name} is a folder!"
    elsif entry.symlink?
      puts "#{entry.name} is a symlink!"
    elsif entry.file?
      puts "#{entry.name} is a regular file!"
      # Read into memory
      entry.get_input_stream { |io| content = io.read }
      # Output
      puts content
    else
      puts "#{entry.name} is something unknown, oops!"
    end
  end
end
I came across the same issue and checking for if entry.file?, before entry.get_input_stream.read, resolved the issue.
require 'zip'
Zip::File.open('my_zip.zip') do |zip_file|
  # Handle entries one by one
  zip_file.each do |entry|
    # Extract to file/directory/symlink
    puts "Extracting #{entry.name}"
    entry.extract('here')
    # Read into memory
    if entry.file?
      content = entry.get_input_stream.read
    end
  end
end 
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With