Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delimiter not found error - AWS Redshift Load from s3 using Kinesis Firehose

I am using Kinesis firehose to transfer data to Redshift via S3. I have a very simple csv file that looks like this. The firehose puts it to s3 but Redshift errors out with Delimiter not found error. I have looked at literally all posts related to this error but I made sure that delimiter is included.

File

GOOG,2017-03-16T16:00:01Z,2017-03-17 06:23:56.986397,848.78
GOOG,2017-03-16T16:00:01Z,2017-03-17 06:24:02.061263,848.78
GOOG,2017-03-16T16:00:01Z,2017-03-17 06:24:07.143044,848.78
GOOG,2017-03-16T16:00:01Z,2017-03-17 06:24:12.217930,848.78

OR

"GOOG","2017-03-17T16:00:02Z","2017-03-18 05:48:59.993260","852.12"
"GOOG","2017-03-17T16:00:02Z","2017-03-18 05:49:07.034945","852.12"
"GOOG","2017-03-17T16:00:02Z","2017-03-18 05:49:12.306484","852.12"
"GOOG","2017-03-17T16:00:02Z","2017-03-18 05:49:18.020833","852.12"
"GOOG","2017-03-17T16:00:02Z","2017-03-18 05:49:24.203464","852.12"

Redshift Table

CREATE TABLE stockvalue
( symbol                   VARCHAR(4),
  streamdate               VARCHAR(20),
  writedate                VARCHAR(26),
  stockprice               VARCHAR(6)
);
  • Error Error

  • Just in case, here's what my kinesis stream looks like Firehose

Can someone point out what may be wrong with the file. I added a comma between the fields. All columns in destination table are varchar so there should be no reason for datatype error. Also, the column lengths match exactly between the file and redshift table. I have tried embedding columns in double quotes and without.

like image 870
Master of none Avatar asked Oct 16 '25 17:10

Master of none


2 Answers

Can you post the full COPY command? It's cut off in the screenshot.

My guess is that you are missing DELIMITER ',' in your COPY command. Try adding that to the COPY command.

like image 87
Jon Ekiz Avatar answered Oct 19 '25 07:10

Jon Ekiz


I was stuck on this for hours, and thanks to Shahid's answer it helped me solve it.

Text Case for Column Names is Important

Redshift will always treat your table's columns as lower-case, so when mapping JSON keys to columns, make sure the JSON keys are lower-case, e.g.

Your JSON file will look like:

{'id': 'val1', 'name': 'val2'}{'id': 'val1', 'name': 'val2'}{'id': 'val1', 'name': 'val2'}{'id': 'val1', 'name': 'val2'}

And the COPY statement will look like

COPY latency(id,name) FROM 's3://<bucket-name>/<manifest>' CREDENTIALS 'aws_iam_role=arn:aws:iam::<aws-account-id>:role/<role-name>' MANIFEST json 'auto';

Settings within Firehose must have the column names specified (again, in lower-case). Also, add the following to Firehose COPY options:

json 'auto' TRUNCATECOLUMNS blanksasnull emptyasnull

How to call put_records from Python:

Below is a snippet showing how to use the put_records functions with kinesis in python:

'objects' passed into the 'put_to_stream' function is an array of dictionaries:

def put_to_stream(objects):
    records = []

    for metric in metrics:
        record = {
            'Data': json.dumps(metric),
            'PartitionKey': 'swat_report'
        };

        records.append(record)

    print(records)

    put_response = kinesis_client.put_records(StreamName=kinesis_stream_name, Records=records)

flush
``
like image 43
flusharcade Avatar answered Oct 19 '25 08:10

flusharcade



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!