I'm trying to use zlib to compress out some lengthy strings, some of which may contain unicode characters. At the moment, I'm doing this in ruby, but I think this would apply across any language really. Here's the super basic implementation:
require 'zlib'
example = "“hello world”" # note the unicode quotes
compressed = Zlib.deflate(example)
puts Zlib.inflate(compressed)
The issue here is that the text comes out as this:
\xE2\x80\x9Chello world\xE2\x80\x9
...no unicode quotes, just weird unrecognizable characters. Does anyone know of a way that Zlib can be used while retaining unicode characters? Bonus points for an answer in ruby : )
It seems Zlib produces ASCII-8BIT as the default encoding upon inflating. To fix it just force the original encoding:
require 'zlib'
input = "“hello world”" 
compressed = Zlib.deflate(input)
output = Zlib.inflate(compressed).force_encoding(input.encoding)
Or set the encoding manually:
output = Zlib.inflate(compressed).force_encoding('utf-8')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With