Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error while loading parquet format file into Amazon Redshift using copy command and manifest file

I'm trying to load parquet file using manifest file and getting below error.

query: 124138ailed due to an internal error. File 'https://s3.amazonaws.com/sbredshift-east/data/000002_0 has an invalid version number: )

Here is my copy command

copy testtable from 's3://sbredshift-east/manifest/supplier.manifest' IAM_ROLE 'arn:aws:iam::123456789:role/MyRedshiftRole123' FORMAT AS PARQUET manifest;

here is my manifest file

**{  
   "entries":[  
      {  
         "url":"s3://sbredshift-east/data/000002_0",
         "mandatory":true,
         "meta":{  
            "content_length":1000
         }
      }
   ]
}**

I'm able to load the same file using copy command by specifying the file name.

copy testtable from 's3://sbredshift-east/data/000002_0' IAM_ROLE 'arn:aws:iam::123456789:role/MyRedshiftRole123' FORMAT AS PARQUET;

INFO: Load into table 'supplier' completed, 800000 record(s) loaded successfully. COPY

What could be wrong in my copy statement?

like image 873
SKB Avatar asked Dec 15 '25 14:12

SKB


2 Answers

This error happens when the content_length value is wrong. You have to specify the correct content_length. You could check it executing an s3 ls command.

aws s3 ls s3://sbredshift-east/data/
2019-12-27 11:15:19        539 sbredshift-east/data/000002_0

The 539 (file size) should be the same than the content_lenght value in your manifest file.

I don't know why they are using this meta value when you don't need it in the direct copy command.

¯\_(ツ)_/¯

like image 77
Franzi Avatar answered Dec 17 '25 08:12

Franzi


The only way I've gotten parquet copy to work with manifest file is to add the meta key with the content_length.

From what I can gather in my error logs, the COPY command for parquet (w/ manifest) might first be reading the files using Redshift Spectrum as an external table. If that's the case, this hidden step does require the content_step which contradicts their initial statement about COPY commands.

https://docs.amazonaws.cn/en_us/redshift/latest/dg/loading-data-files-using-manifest.html

like image 20
Newton T. Avatar answered Dec 17 '25 10:12

Newton T.



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!