Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between --append and --incremental append in sqoop

Tags:

sqoop

Is there any difference between using --append and --incremental append for inserting new rows from RDBMS to an existing dataset in HDFS? I am using --append along with --where and --incremental append along with --last-value.

like image 801
Midhun Mathew Sunny Avatar asked Oct 27 '25 09:10

Midhun Mathew Sunny


1 Answers

--append Append data to an existing dataset in HDFS

--append 
--where "dpt_id >10"

is same as: (ONLY appends the data to existing data-sets, can also append duplicates - NOTE: this will NOT overwrite the data but will append):

--incremental append
--check-column dpt_id
--last-value 10

but NOT following options (Appends the new data and Updates the existing data - NO duplicates - NOTE: this will not overwrite the data but will update OR append)

--incremental lastmodified
--check-column lastupdated
--last-value 20160802000000

Sqoop supports two types of incremental imports: append and lastmodified.

You can use the --incremental argument to specify the type of incremental import to perform.

append:

  • You should specify append mode when importing a table where new rows are continually being added with increasing row id values.
  • You specify the column containing the row’s id with --check-column.
  • Sqoop imports rows where the check column has a value greater than the one specified with --last-value.

lastmodified:

  • An alternate table update strategy supported by Sqoop is called lastmodified mode. You should use this when rows of the source table may be updated, and each such update will set the value of a last-modified column to the current timestamp.
  • Rows where the check column holds a timestamp more recent than the timestamp specified with --last-value are imported.
  • When running a subsequent import, you should specify --last-value in this way to ensure you import only the new or updated data.
  • This is handled automatically by creating an incremental import as a saved job, which is the preferred mechanism for performing a recurring incremental import.

read more about incremental_imports here. . .

like image 107
Ronak Patel Avatar answered Oct 30 '25 16:10

Ronak Patel



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!