I am creating a copy of a production redshift database at a development level. I know how to unload data from my production instance/cluster to s3, then copy that data into my development instance/cluster, but only if I unload all the data at once. What I would like to do instead, is to copy just 1000 or so rows from each of my tables, to cut down on space and transfer time between my redshift instances.
e.g.
UNLOAD ('SELECT * FROM myschema.mytable LIMIT 1000') TO 's3://my-bucket' CREDENTIALS etcetcetc
Is there a way to do this LIMIT with UNLOAD, or am I going to have to switch to a bulk-insert-style paradigm?
EDIT: I am programmatically unloading and copying a bunch of tables, so I don't want to hard code in any key-based limits in case we add new tables or change table structures, etc.
While "LIMIT" is not part of the actual "UNLOAD" command, the Redshift documentation on UNLOAD provides a few alternatives:
Limit Clause
The SELECT query cannot use a LIMIT clause in the outer SELECT. For example, the following UNLOAD statement will fail:
unload ('select * from venue limit 10') to 's3://mybucket/venue_pipe_' credentials 'aws_access_key_id=<access-key-id>;aws_secret_access_key=<secret-access-key>';Instead, use a nested LIMIT clause. For example:
unload ('select * from venue where venueid in (select venueid from venue order by venueid desc limit 10)') to 's3://mybucket/venue_pipe_' credentials 'aws_access_key_id=<access-key-id>;aws_secret_access_key=<secret-access-key>';Alternatively, you could populate a table using SELECT…INTO or CREATE TABLE AS using a LIMIT clause, then unload from that table.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With