My table is:
create table transactions
transaction_id integer not null,
transaction_timestamp integer not null,
input_index smallint,
output_index smallint not null,
from_id integer,
to_id integer not null,
input_value real,
output_value real not null,
constraint unique_transactions
unique (transaction_id, from_id, to_id)
);
with the following indexes:
create index idx_transactions_from_id_block_timestamp
on transactions (from_id asc, transaction_timestamp desc);
create index idx_transactions_to_id_block_timestamp
on transactions (to_id asc, transaction_timestamp desc);
create index idx_transactions_transaction_id
on transactions (transaction_id);
create index idx_transactions_block_timestamp
on transactions (transaction_timestamp desc);
The query I ideally would want to have is-
select distinct on (transaction_id,output_index) *
from transactions
where to_id = 1000
and transaction_timestamp between 1691193600 AND 1711929600
order by transaction_timestamp desc
limit 10
Giving me the most recent 10 unique (transaction_id,output_index) pairs (don't care which from_id and input_index are selected to stay).
This straight forward way doesn't work, since postgres requires the order by to first contain the distinct on columns. ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions
doing so will reorder my rows, selecting the first 10 with the highest transaction_id, which i do not want.
Is there an efficient way to do this, using the low limit number to hopefully not have to go over the millions of rows in the table?
I have attempted the following queries, all ended up taking too long since they needed to work on the whole table, without using the small limit 10.
Query 1:
WITH RankedTransactions AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY transaction_id, output_index ORDER BY transaction_timestamp DESC) AS rn
FROM transactions
WHERE to_id = 1000
and transaction_timestamp between 1691193600 AND 1711929600
)
SELECT transaction_id,
input_index,
output_index,
transaction_timestamp,
from_id,
to_id,
input_value,
output_value
FROM RankedTransactions
WHERE rn = 1
ORDER BY transaction_timestamp DESC
LIMIT 10;
Query 2:
SELECT *
FROM (
SELECT DISTINCT ON (transaction_id, output_index) *
FROM transactions
WHERE to_id = 1000
and transaction_timestamp between 1691193600 AND 1711929600
ORDER BY transaction_id, output_index DESC
) AS latest_transactions
ORDER BY transaction_timestamp DESC
LIMIT 10;
This works:
SELECT *
FROM (
SELECT DISTINCT ON (transaction_id, output_index) *
FROM transactions
WHERE to_id = 1000
AND transaction_timestamp BETWEEN 1691193600 AND 1711929600
ORDER BY transaction_id, output_index, transaction_timestamp DESC -- !!!
) AS latest_transactions
ORDER BY transaction_timestamp DESC
LIMIT 10;
I removed the distracting DESC from output_index in the sort order.
The best query (and index) depend on how many rows (and duplicates within those) to expect per basic selection (after filtering by WHERE to_id = 1000 AND transaction_timestamp BETWEEN 1691193600 AND 1711929600).
If there are not many of either, excess rows can be filtered and this query is just fine when backed by your (existing) index on (to_id, transaction_timestamp DESC).
With a large number of qualifying rows and/or a large number of duplicates on (transaction_id, output_index), things get more complicated.
Related:
If there are many qualifying rows, but not very many duplicates on (transaction_id, output_index) within that set, and while your LIMIT is very small (like LIMIT 10), this recursive query should be very fast.
I collect the (few!) selected combinations of (transaction_id, output_index) in an array to use that as filter going forward. Create a matching composite type for the purpose once:
CREATE TYPE trans_out AS (transaction_id integer, output_index smallint);
Then:
WITH RECURSIVE cte AS (
( -- parentheses required
SELECT t AS my_row, ARRAY[(transaction_id, output_index)::trans_out] AS selected
FROM transactions t
WHERE to_id = 1000
AND transaction_timestamp >= 1691193600
AND transaction_timestamp <= 1711929600
ORDER BY transaction_timestamp DESC
LIMIT 1
)
UNION ALL
SELECT t.my_row, c.selected || t.sel1
FROM cte c
CROSS JOIN LATERAL (
SELECT t AS my_row, (t.transaction_id, t.output_index)::trans_out AS sel1
FROM transactions t
WHERE t.to_id = 1000
AND t.transaction_timestamp >= 1691193600
AND t.transaction_timestamp <= (c.my_row).transaction_timestamp -- parentheses required!
AND (t.transaction_id, t.output_index)::trans_out <> ALL (selected)
ORDER BY transaction_timestamp DESC
LIMIT 1
) t
)
SELECT (my_row).* -- parentheses required!
FROM cte;
This still only needs your existing index on (to_id, transaction_timestamp DESC) (where DESC does not matter).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With