Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

s3 - Comparing files between two buckets

I would like to compare the file contents of two S3-compatible buckets and identify files that are missing or that differ.

Should I use checksum to do it instead?

like image 420
meitale Avatar asked Oct 25 '25 04:10

meitale


1 Answers

It appears that your requirement is to compare the contents of two Amazon S3 buckets and identify files that are missing or differ between the buckets.

To do this, you could use:

  • Object name: This, of course, will help find missing files
  • Object size: A different size indicates different contents and the size is given with each bucket listing.
  • eTag: An eTag is an MD5 checksum on the contents of an object. If the same file has a different eTag, then the contents is different.
  • Creation date: This is not actually a reliable way to identify differences, but it can be used with other metadata to determine whether you want to update a file. For example, if two files differ the object in the destination bucket has a newer date than the object in the source bucket, you probably don't need to copy the file across. But if the source file was modified after the destination file, it's likely to be a candidate for re-copying.

Instead of doing all the above logic yourself, you can also use the AWS Command-Line Interface (CLI). It has a aws s3 sync command that will compare files from the source and destination, and will then copy files that are modified or missing.

like image 145
John Rotenstein Avatar answered Oct 26 '25 19:10

John Rotenstein



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!