Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CSV::MalformedCSVError: New line must be <"\n\r">

Trying to parse this file with Ruby CSV.

https://www.sec.gov/files/data/broker-dealers/company-information-about-active-broker-dealers/bd070219.txt

However, I am getting an error.

CSV.open(file_name, "r", { :col_sep => "\t", :row_sep => "\n\r" }).each do |row|
    puts row
end

CSV::MalformedCSVError: New line must be <"\n\r"> not <"\r"> in line 1.

like image 640
user2012677 Avatar asked Nov 24 '25 02:11

user2012677


2 Answers

Windows row_sep is "\r\n", not "\n\r". However this CSV is malformed. Looking at it using a hex editor it appears to be using "\r\r\n".

It's tab-delimited.

In addition it is not using proper quoting, line 247 has 600 "B" STREET STE. 2204, so you need to turn off quote characters.

quote_char: nil, col_sep: "\t", row_sep: "\r\r\n"

There's an extra tab on the end, each line ends with \t\r\r\n. You can also look at it as using a row_sep of "\r\n" with an extra \r field.

quote_char: nil, col_sep: "\t", row_sep: "\r\n"

Or you can view it as having a row_sep of \t\r\r\n and no extra field.

quote_char: nil, col_sep: "\t", row_sep: "\t\r\r\n"

Either way, it's a mess.


I used a hex editor to look at the file as text and raw data side by side. This let me see what's truly at the end of the line.

87654321  0011 2233 4455 6677 8899 aabb ccdd eeff  0123456789abcdef                       
00000000: 3030 3030 3030 3139 3034 0941 4252 4148  0000001904.ABRAH
00000010: 414d 2053 4543 5552 4954 4945 5320 434f  AM SECURITIES CO
00000020: 5250 4f52 4154 494f 4e09 3030 3832 3934  RPORATION.008294
00000030: 3532 0933 3732 3420 3437 5448 2053 5452  52.3724 47TH STR
00000040: 4545 5420 4354 2e20 4e57 0920 0947 4947  EET CT. NW. .GIG
00000050: 2048 4152 424f 5209 5741 0939 3833 3335   HARBOR.WA.98335
00000060: 090d 0d0a 3030 3030 3030 3233 3033 0950  ....0000002303.P
          ^^^^^^^^^

Hex 09 0d 0d 0a is \t\r\r\n.

Alternatively, you can print the lines with p and any invisible characters will be revealed.

f = File.open(file_name)
p f.readline

"0000001904\tABRAHAM SECURITIES CORPORATION\t00829452\t3724 47TH STREET CT. NW\t \tGIG HARBOR\tWA\t98335\t\r\r\n"
like image 174
Schwern Avatar answered Nov 25 '25 20:11

Schwern


Use :row_sep => :auto instead of :row_sep => "\n\r":

CSV.open(file_name, "r", { :col_sep => "\t", :row_sep => :auto }).each do |row|
    puts row
end
like image 45
GProst Avatar answered Nov 25 '25 20:11

GProst



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!