Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Node.js: difference in file size when copying csv

I have the below code where I am reading from a CSV and writing to another CSV. I will be transforming some data before writing to another file, but as a test, I ran the code and see that there are slight differences between source and destination files without event changing anything about the files.

  for(const m of metadata) {
      tempm = m;
      fname = path;
      const pipelineAsync = promisify(pipeline);
      if(m.path) {
        await pipelineAsync(
          fs.createReadStream(m.path),
          csv.parse({delimiter: '\t', columns: true}),
          csv.transform((input) => {
            return Object.assign({}, input);
          }),
          csv.stringify({header: true, delimiter: '\t'}),
          fs.createWriteStream(fname, {encoding: 'utf16le'})
        )
        let nstats = fs.statSync(fname);
        tempm['transformedPath'] = fname;
        tempm['transformed'] = true;
        tempm['t_size_bytes'] = nstats.size;
      }
  }

I see that for example,

file a: the source file size is `895631` while after copying destination file size is `898545`
file b: the source file size is `51388` while after copying destination file size is `52161`
file c: the source file size is `13666` while after copying destination file size is `13587`

But when i do not use tranform, the sizes match, for example this code produces excatly same file sizes on both source and dest


  for(const m of metadata) {
      tempm = m;
      fname = path;
      const pipelineAsync = promisify(pipeline);
      if(m.path) {
        await pipelineAsync(
          fs.createReadStream(m.path),
          /*csv.parse({delimiter: '\t', columns: true}),
          csv.transform((input) => {
            return Object.assign({}, input);
          }),
          csv.stringify({header: true, delimiter: '\t'}),*/
          fs.createWriteStream(fname, {encoding: 'utf16le'})
        )
        let nstats = fs.statSync(fname);
        tempm['transformedPath'] = fname;
        tempm['transformed'] = true;
        tempm['t_size_bytes'] = nstats.size;
      }
  }

Can any one please help in identifying what options i need to pass to csv transformation, so that the copy happens correctly.

I am doing this test to ensure, i am not losing out any data in large files.

Thanks.

Update 1: I have also checked that the encoding on both the files is same.

Update 2: I notice that the the source file has CRLF and destination file has LF. Is there a way i can keep the same using node.js or is it something OS dependent.

Update 3: Looks like the issue is EOL, I see the source file has CRLF while the destination file / transformed file has LF. I need to now find a way to specify this my above code so that the EOL is consistent

like image 717
opensource-developer Avatar asked Jan 24 '26 11:01

opensource-developer


1 Answers

You need to setup you EOL config:

const { pipeline } = require('stream')
const { promisify } = require('util')
const fs = require('fs')
const csv = require('csv')
const os = require('os')


;(async function () {
  const pipelineAsync = promisify(pipeline)
  await pipelineAsync(
    fs.createReadStream('out'),
    csv.parse({ delimiter: ',', columns: true }),
    csv.transform((input) => {
      return Object.assign({}, input)
    }),
    // Here the trick:
    csv.stringify({ eol: true, record_delimiter: os.EOL, header: true, delimiter: '\t' }),
    fs.createWriteStream('out2', { encoding: 'utf16le' })
  )
})()

You can use \r\n as well or whatever you need $-new-line\n

This setup can be spotted reading the source code.

like image 84
Manuel Spigolon Avatar answered Jan 26 '26 23:01

Manuel Spigolon



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!