Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get objects with given tag in given S3 bucket?

I am using @aws-sdk/client-s3 and can’t figure out how to get objects with given tag in given S3 bucket.

like image 452
sunknudsen Avatar asked Jan 29 '26 17:01

sunknudsen


2 Answers

Unfortunately, S3 does not support filtering objects by tag out of the box. There’s no built-in API to get all objects with a certain tag. But you can work around this with the following steps: See: AWS S3 ListObjects API

  1. List all objects in the bucket: ListObjectsV2Command

  2. For each object, fetch its tags: GetObjectTaggingCommand

  3. Filter based on the desired tag:

     if (tags.TagSet.some(tag => 
         tag.Key === targetTagKey && 
         tag.Value === targetTagValue)) {
       matchingObjects.push(obj);
     }
    

Note: This approach can be expensive if you have a large number of objects. If you often need to filter by tags, consider maintaining tag info in a database or a metadata index.

like image 67
Enbiya Avatar answered Feb 01 '26 07:02

Enbiya


In 2024, AWS has released S3 Metadata, built-on on S3 Tables, that allows you to query an object's system and custom-defined tags, along with properties over time. This feature is only available in three regions, at the moment, and comes with some other limitations as it's new. Also, this only currently applies to new objects or tags being added to objects from the time you enable S3 Metadata. However, it does work for your use case, if the regions align.

You can query objects using the Athena SDK, the Athena web editor or directly through a Spark. I'll demonstrate using the SDK with node.js' @aws-sdk/client-athena.

  1. First create an S3 Table bucket, and enable Integration with AWS analytics services - Preview. This will be required for Athena.
  2. Enable S3 Metedata on your S3 bucket, from the Metadata tab. Click Create metadata configuration, select the previously created S3 Table bucket and enter a table name. You will need these values later.
  3. Next, Athena will need permissions to query the table via your IAM entity. Within Lake formation, under Data Permissions select Grant, and select your IAM users or roles that will be querying the S3 Table. Under LF-Tags or catalog resources select Named Data Catalog resources and for Catalog choose the data catalog created for S3 Table. The format will be <account_id>:s3tablescatalog/<table_bucket_name>. Next, under Databases select aws_s3_metadata, and under Tables the table you previously created. Finally under Table permission click the Select permission, and click Grant.
  4. Now we can query via Athena, if you already have a workspace setup. I will use the following node.js to do this. Replace <region>, <table-bucket-name> and <table-name> with your values.
    const client =  new AthenaClient({
        region: "<region>",
    });

    const params = {
        WorkGroup: "primary",
        QueryExecutionContext: {
            Database: "aws_s3_metadata",
            Catalog: "s3tablescatalog/`<table-bucket-name>`",
        },
        QueryString: `
        SELECT key, object_tags
        FROM "aws_s3_metadata"."<table-name>"
        where object_tags['department'] = 'it'
        ORDER BY record_timestamp DESC
        LIMIT 5; 
        `
    }

    const command = new StartQueryExecutionCommand(params);
    var response = await client.send(command);
    
    await sleep(10000);
 
    const commandResult = new GetQueryResultsCommand(response);
    const responseResult = await client.send(commandResult);
    console.log(responseResult.ResultSet);

Some notes on the code:

  • This query defined in QueryString looks for objects with a tag key for department and value of it. You can adjust this to your actual tags.

  • I have also defined my own sleep function to wait for the execution to complete, you can fill this in with your own.

  • The GetQueryResultsCommand call may also be paginated if your results are longer, refer to the documentation for parsing the output and implementing pagination.

As for costs:

  • S3 Metadata is charged at $0.45 per million updates.
  • S3 Tables
    • First 50 TB is charged at $0.0265 per GB per month
    • PUT, POST, LIST requests at $0.005 per 1,000 requests per month
    • GET and all other requests at $0.0004 per 1,000 requests per month
    • Object monitoring at $0.45 per month
  • Glue Data Catalog (used for the analytics integration)
    • First million metadata object storage and requests are free.
  • Athena
    • $5.00 per TB of data scanned

I would recommend reviewing the individual cost pages for more information. Hope this helped.

like image 25
PeskyPotato Avatar answered Feb 01 '26 09:02

PeskyPotato



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!