The csv file was created correctly but the name and address fields contain every piece of punctuation there is available. So when you try to import into mysql you get parsing errors. For example the name field could look like this, "john ""," doe". I have no control over the data I receive so I'm unable to stop people from inputting garbage data. From the example above you can see that if you consider the outside quotes to be the enclosing quotes then it is right but of course mysql, excel, libreoffice, and etc see a whole new field. Is there a way to fix this problem? Some fields I found even have a backslash before the last enclosing quote. I'm at a loss as I have 17 million records to import.
I have windows os and linux so whatever solution you can think of please let me know.
This may not be a usable answer but someone needs to say it. You shouldn't have to do this. CSV is a file format with an expected data encoding. If someone is supplying you a CSV file then it should be delimited and escaped properly, otherwise its a corrupted file and you should reject it. Make the supplier re-export the file properly from whatever data store it was exported from.
If you asked someone to send you JPG and they send what was a proper JPG file with every 5th byte omitted or junk bytes inserted you wouldnt accept that and say "oh, ill reconstruct it for you".
You don't say if you have control over the creation of the CSV file. I am assuming you do, as if not, the CVS file is corrupt and cannot be recovered without human intervention, or some very clever algorithms to "guess" the correct delimiters vs the user entered ones.
Convert user entered tabs (assuming there are some) to spaces and then export the data using TABS separator.
If the above is not possible, you need to implement an ESC sequence to ensure that user entered data is not treated as a delimiter.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With