Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

standardizing CSV file types

Tags:

php

csv

excel

I'm using a csv parser class (http://code.google.com/p/php-csv-parser/) to parse and extract data from csv files. The problem I'm encountering is that it only works for certain csv file types. (It seems that there is a csv type for Mac, for Ms-Dos, and for Windows.)

The code works if I use a csv file which was saved on a mac (in excel) using the csv - windows option. However, if I save a file on a windows machine simply as csv, that doesn't work. (You would think that that would be the same format as saving csv-windows on a mac.) It does work from a windows machine if I save it as a csv-MSDOS file. This seems a little ridiculous.

Is there a way to standardize these three file types so that my code can read any type of csv that is uploaded?

i'm thinking it would be something like this:

$standardizedCSV = preg_replace_all('/\r[^\n]/', '\r\n', $csvContent);

I know it has something to do with how each file type handles end of lines, but I'm a little put out trying to figure out those differences. If anybody has any advice, please let me know.

Thanks.

UPDATE: This is the relevant code from the csv parser I'm using which extracts data row by row:

$c = 0;
$d = $this->settings['delimiter'];
$e = $this->settings['escape'];
$l = $this->settings['length'];

$res = fopen($this->_filename, 'r');

while ($keys = fgetcsv($res, $l, $d, $e)) {

if ($c == 0) {
   $this->headers = $keys;
} else {
   array_push($this->rows, $keys);
}

$c ++;
}

I guess I need to understand how fgetcsv handles eol's, so that I can make sure that csv files of any format are handled in the same manner.

like image 577
user1383418 Avatar asked Dec 13 '25 16:12

user1383418


2 Answers

This seems to do the trick:

    ini_set("auto_detect_line_endings", true);

The problem was with line endings, but I didn't need to create my own EOL parser. This runtime setting does it for me. See http://us.php.net/manual/en/filesystem.configuration.php#ini.auto-detect-line-endings.

like image 185
user1383418 Avatar answered Dec 16 '25 07:12

user1383418


I don't think the line endings is an issue. The thing about CSV is that it's only a "comma separated values" file and not standardized beyond that. So some systems separate the values using commas, some using semicolons (;). I'm sure there are variations that use even other value separators.

Additionally, the escape character (most often backslash \) can be different between CSV files, and some CSV files also use quotation marks around each value (").

A CSV file can use any variation between the above. For instance, I'm fairly certain that Microsoft Excel exports CSV files separating values using semicolons and without any quotation around the values.

I'm sure there are ways to auto-detect how to parse the CSV file, but the best way would be allowing the user to decide. That's what Excel does.

like image 29
Hubro Avatar answered Dec 16 '25 07:12

Hubro



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!