I need make a scan with limit and a condition on DynamoDB.
The docs says:
In a response, DynamoDB returns all the matching results within the scope of the Limit value. For example, if you issue a Query or a Scan request with a Limit value of 6 and without a filter expression, DynamoDB returns the first six items in the table that match the specified key conditions in the request (or just the first six items in the case of a Scan with no filter). If you also supply a FilterExpression value, DynamoDB will return the items in the first six that also match the filter requirements (the number of results returned will be less than or equal to 6).
The code (NODEJS):
var params = {
    ExpressionAttributeNames: {"#user": "User"},
    ExpressionAttributeValues: {":user": parseInt(user.id)},
    FilterExpression: "#user = :user and attribute_not_exists(Removed)",
    Limit: 2,
    TableName: "XXXX"
};
DynamoDB.scan(params, function(err, data) {
    if (err) {
        dataToSend.message = "Unable to query. Error: " + err.message;
    } else if (data.Items.length == 0) {
        dataToSend.message = "No results were found.";
    } else {
        dataToSend.data = data.Items;
        console.log(dataToSend);
    }
});
Table XXXX definitions:
In code above, if I remove the Limit parameter, DynamoDB will return the items that match the filter requirements. So, the conditions are ok. But when I scan with Limit parameter, the result is empty.
The XXXX table, has 5 items. Only the 2 firsts have the Removed attribute. When I scan without Limit parameter, DynamoDB returns the 3 items without Removed attribute.
What i'm doing wrong?
A Scan operation in Amazon DynamoDB reads every item in a table or a secondary index. By default, a Scan operation returns all of the data attributes for every item in the table or index. You can use the ProjectionExpression parameter so that Scan only returns some of the attributes, rather than all of them.
The Scan operation returns one or more items and item attributes by accessing every item in a table or a secondary index. To have DynamoDB return fewer items, you can provide a FilterExpression operation.
DynamoDB supports up to five requests per second for each of these APIs. The result set from a Query is limited to 1 MB per call. You can use the LastEvaluatedKey from the query response to retrieve more results. The result set from a Scan is limited to 1 MB per call.
From the docs that you quoted:
If you also supply a FilterExpression value, DynamoDB will return the items in the first six that also match the filter requirements
By combining Limit and FilterExpression you have told DynamoDB to only look at the first two items in the table, and evaluate the FilterExpression against those items. Limit in DynamoDB can be confusing because it works differently from limit in a SQL expression in a RDBMS.
Also ran into this issue, i guess you will just have to scan the whole table to a max of 1 MB
Scan
The result set from a Scan is limited to 1 MB per call. You can use the LastEvaluatedKey from the scan response to retrieve more results.
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html
You might be able to get what you need by using a secondary index. Using the classic RDB example, customer - order example: you have one table for customers and one for orders. The Orders table has a Key consisting of Customer - HASH, Order - RANGE. So if you wanted to get the latest 10 orders, there would be no way to do it without a scan
But if you create a Global Secondary Index on orders of "Some Constant" -- HASH, Date RANGE, and queried against that index, they query would do what you want and only charge you for the RCUs involved with the records returned. No expensive scan needed. Note, writes will be more expensive, but in most cases, there are many more reads than writes.
Now you have your original problem if you want to get the 10 biggest orders for a day larger than $1000. The query would return the last 10 orders, and then filter out those less than $1000.
In this case, you could create a computed key of Date-OrderAmount, and queries against that index would return what you want.
It's not as simple as SQL, but you need to think about access patterns in SQL too. If if you have a lot of data, you need to create Indexes in SQL or the DB will happily to table scans on your behalf, which will impair performance and raise your costs.
Note that everything I proposed is normalized in the sense that there is only one source of truth. You are not duplicating data -- you are merely recasting views of it to get what you need from DynamoDB.
Bear in mind that the CONSTANT as a HASH s subject to the 10GB per partition limit, so you would need to design around it if you had a lot of active data. For example, depending on your expected access pattern, you could use Customer and not a constant as a HASH. Or use STreams to organize the data (or subsets) in other ways.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With