I'm using the extractTo method of the PHP PharData class to examine the contents of a phar file and running into some strage results.  I've reached the limits of my byte level detective work and was hoping someone here would be able to help me sort this out.
Details follow, but generally speaking: When I extract my archive files with PharData::extractTo, the files I get out appeart to be a bzip varient, but the bzip2 command doesn't like them.  Is this normal phar behavior, or is it a problem with the specific archive? (or possible the PHP/OS combination  I'm using).  Is there a way to get plain text files out of a phar archive — or should plain text be the default and I'm looking at weird system behavior?
Specifically, when I run the command
$phar = new Phar('n98-magerun.phar');
$phar->extractTo('/tmp/n98-magerun');
On my OS 10.6.8, Intel based Mac using the built in PHP 5.3.6, the archive is successfully extracted into the /tmp/n98-magerun folder.

The archive I'm extracting can be found here.
If I open any of the text files extracted in BBEdit, I see the correct contents.

However, if I use other tools such as quicklook, vi, or cat, I see binary data.  I noticed this when attempting to ack/grep through the contents of the files and I wasn't getting the results I expected.

If I use the file command on the file, it's reporting that it's a bzip file.
$ file MIT-LICENSE.txt 
MIT-LICENSE.txt: bzip2 compressed data, block size = 400k
and examining the file with a hex editor confirms the file starts with a BZ header

However, attempting to decompress the file with bzip2 results in the following error
$ bzip2 -d MIT-LICENSE.txt 
bzip2: Can't guess original name for MIT-LICENSE.txt -- using MIT-LICENSE.txt.out
bzip2: Compressed file ends unexpectedly;
    perhaps it is corrupted?  *Possible* reason follows.
bzip2: No such file or directory
    Input file = MIT-LICENSE.txt, output file = MIT-LICENSE.txt.out
It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.
You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.
bzip2: Deleting output file MIT-LICENSE.txt.out, if it exists.
and I can bzcat the file succesfully, although it barfs in the middle of the file with this
bzcat: Compressed file ends unexpectedly;
    perhaps it is corrupted?  *Possible* reason follows.
bzcat: Undefined error: 0
    Input file = MIT-LICENSE.txt, output file = (stdout)
It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.
You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.
It's a bzip2 file, but to decompress it, you need use the --stdout (or -c) option (see below).
The reason you need the --stdout option, is the file does not end with a .bz2 extension, which would allow bunzip2 to determine the resultant filename to decompress to.
$ bunzip2 --stdout MIT-LICENSE.txt 2>/dev/null
Copyright (c) 2012 netz98 new media GmbH
http://www.netz98.de
Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:
The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE
I have no idea why bunzip2 is outputing the following to standard error:
bzip2: Compressed file ends unexpectedly;
        perhaps it is corrupted?  *Possible* reason follows.
bzip2: Success
        Input file = MIT-LICENSE.txt, output file = (stdout)
It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.
You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.
As the file command reports the file is a valid bzip2 file with a block size of 400k:
$ file MIT-LICENSE.txt 
MIT-LICENSE.txt: bzip2 compressed data, block size = 400k
I tried adding the -4 option to bunzip2, but it still complains:
$ bunzip2 -d -4 -vvvvv -c  MIT-LICENSE.txt >/dev/null 
  MIT-LICENSE.txt: 
    [1: huff+mtf rt+rld {0x2010d4b9, 0x2010d4b9}]
    combined CRCs: stored = 0x2010d4b9, computed = 0x2010d4b9
    [1: huff+mtf 
bunzip2: Compressed file ends unexpectedly;
        perhaps it is corrupted?  *Possible* reason follows.
bunzip2: Success
        Input file = MIT-LICENSE.txt, output file = (stdout)
It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.
You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.
so my guess is the program that's creating these bzip2 files is the cause of this issue.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With