Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

mongoimport: set type for all fields when importing CSV

I have multiple problems with importing a CSV with mongoimport that has a headerline.

Following is the case:

I have a big CSV file with the names of the fields in the first line. I know you can set this line to use as field names with: --headerline.

I want all field types to be strings, but mongoimport sets the types automatically to what it looks like.

IDs such as 0001 will be turn into 1, which can have bad side effects.

Unfortunately, there is (as far as i know) no way of setting them as string with a single command, but by naming each field and setting it type with

--columnsHaveTypes --fields "name.string(), ... "

When I did that, the next problem appeared. The headerline (with all field names) got imported as values in a separate document.

So basically, my questions are:

  • Is there a way of setting all field types as string using the --headerline command ?

  • Alternative, is there a way to ignore the first line ?

like image 418
NickTheDev Avatar asked Jan 28 '26 10:01

NickTheDev


2 Answers

I had this problem when uploading 41 million record CSV file into mongodb.

./mongoimport -d testdb -c testcollection --type csv --columnsHaveTypes -f 
"RECEIVEDDATE.date(2006-01-02 15:04:05)" --file location/test.csv

As above we have a command to upload file with data types called '-f' or '--fields' but when we use this command to the file that contain header line, mondodb upload first row as well i.e header lines row then its leads error 'cannot convert to datatype' or upload column names also as data set. Unfortunately we cannot use '--headerline' command instead of '--fields'. Here the solutions that I found for this problem.

1)Remove header column and upload using '--fields' command as above command. if you re use linux environment you can use below command to remove first row of the huge file i.e header line.it took 2-3 mints for me.(depending on the machine performance)

sed -i -e "1d" location/test.csv

2)upload the file using '--headerline' command then mongodb uploads the file with its default identified data types.Then open mongodb shell command use testdb then run javascript command that get each record and change it into specific data types.But if you have huge file this will takes time. found this solution from stackoverflow

db.testcollection.find().forEach( function (x) {
x.RECEIVEDDATE = new Date(x.RECEIVEDDATE ); db.testcollection .save(x);});

If you wanna remove the unnecessary rows that not fit to data type use below command. mongodb document '--parseGrace skipRow'

like image 59
Yeshan Jayasooriya Avatar answered Feb 01 '26 19:02

Yeshan Jayasooriya


https://docs.mongodb.com/manual/reference/program/mongoimport/#example-csv-import-types reads:

MongoDB 3.4 added support for specifying field types. Specify field names and types in the form .() using --fields, --fieldFile, or --headerline.

so your first line within the csv file should have names with types. e.g.:

name.string(), ... 

and the mongoimport parameters

--columnsHaveTypes --headerline --file <filename.csv>

As to the question of how to remove the first line, you can employ pipes. mongoimport reads from STDIN if no --file option passed. E.g.:

tail -n+2 <filename.csv> | mongoimport --columnsHaveTypes --fields "name.string(), ... " 
like image 36
Alex Blex Avatar answered Feb 01 '26 20:02

Alex Blex



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!