I have this below query which takes about 15-20 secs to run.
with cte0 as (
SELECT
label,
date,
CASE
WHEN
Lead(label || date || "number") OVER (PARTITION BY label || date || "number" ORDER BY "label", "date", "number", "time") IS NULL
THEN
'1'::numeric
ELSE
'0'::numeric
END As "unique"
FROM table_data
LEFT JOIN table_mapper ON
table_mapper."type" = table_data."type"
WHERE Date BETWEEN date_trunc('month', current_date - 1) and current_date - 1
)
SELECT 'MTD' as "label", round(sum("unique") / count("unique") *100,1) as "value" FROM cte0 WHERE "date" BETWEEN date_trunc('month', current_date - 1) AND current_date -1
UNION ALL
SELECT 'Week' as "label", round(sum("unique") / count("unique") *100,1) as "value" FROM cte0 WHERE "date" BETWEEN date_trunc('week', current_date - 1) AND current_date -1
UNION ALL
SELECT 'FTD' as "label", round(sum("unique") / count("unique") *100,1) as "value" FROM cte0 WHERE "date" = current_date -1
In the table table_data I have a index on date column.
CREATE INDEX ix_cli_date ON table_data USING btree (date);
\d table_data)
Table "public.table_data"
Column | Type | Modifiers
------------------+------------------------+-----------
date | date | not null
number | bigint | not null
time | time without time zone | not null
end time | time without time zone | not null
duration | integer | not null
time1 | integer | not null
time2 | integer | not null
time3 | integer | not null
time4 | integer | not null
time5 | integer | not null
time6 | integer | not null
time7 | integer | not null
type | text | not null
name | text | not null
id1 | integer | not null
id2 | integer | not null
key | integer | not null
status | text | not null
Indexes:
"ix_cli_date" btree (date)
Table Definition (\d table_mapper)
Table "public.table_mapper" Column | Type | Modifiers ------------+------+----------- type | text | not null label | text | not null label2 | text | not null label3 | text | not null label4 | text | not null label5 | text | not null
Result (cost=184342.66..230332.86 rows=3 width=64) (actual time=23377.923..25695.478 rows=3 loops=1)"
CTE cte0"
-> WindowAgg (cost=121516.06..156751.65 rows=612793 width=23) (actual time=14578.000..18985.958 rows=696157 loops=1)"
-> Sort (cost=121516.06..123048.04 rows=612793 width=23) (actual time=14577.975..17084.405 rows=696157 loops=1)"
Sort Key: (((table_mapper.label || (table_data.date)::text) || (table_data."number")::text)), table_mapper.label, table_data.date, table_data."number", table_data."time""
Sort Method: external merge Disk: 39480kB"
-> Hash Left Join (cost=11.96..37474.21 rows=612793 width=23) (actual time=1.449..3308.718 rows=696157 loops=1)"
Hash Cond: (table_data."type" = table_mapper."type")"
-> Index Scan using ix_cli_date on table_data (cost=0.02..29036.36 rows=612793 width=38) (actual time=0.141..946.648 rows=696157 loops=1)"
Index Cond: ((date >= date_trunc('month'::text, ((('now'::text)::date - 1))::timestamp with time zone)) AND (date Hash (cost=7.53..7.53 rows=353 width=25) (actual time=1.275..1.275 rows=336 loops=1)"
Buckets: 1024 Batches: 1 Memory Usage: 15kB"
-> Seq Scan on table_mapper (cost=0.00..7.53 rows=353 width=25) (actual time=0.020..0.589 rows=336 loops=1)"
-> Append (cost=27591.00..73581.21 rows=3 width=64) (actual time=23377.920..25695.467 rows=3 loops=1)"
-> Aggregate (cost=27591.00..27591.02 rows=1 width=32) (actual time=23377.917..23377.918 rows=1 loops=1)"
-> CTE Scan on cte0 (cost=0.00..27575.68 rows=3064 width=32) (actual time=14578.052..22335.236 rows=696157 loops=1)"
Filter: ((date = date_trunc('month'::text, ((('now'::text)::date - 1))::timestamp with time zone)))"
-> Aggregate (cost=27591.00..27591.02 rows=1 width=32) (actual time=1741.509..1741.510 rows=1 loops=1)"
-> CTE Scan on cte0 (cost=0.00..27575.68 rows=3064 width=32) (actual time=20.009..1522.352 rows=168261 loops=1)"
Filter: ((date = date_trunc('week'::text, ((('now'::text)::date - 1))::timestamp with time zone)))"
-> Aggregate (cost=18399.11..18399.13 rows=1 width=32) (actual time=576.029..576.030 rows=1 loops=1)"
-> CTE Scan on cte0 (cost=0.00..18383.79 rows=3064 width=32) (actual time=9.308..546.735 rows=23486 loops=1)"
Filter: (date = (('now'::text)::date - 1))"
Total runtime: 25710.506 ms"
Description :
I'm taking the unique count and repeated count from the table_data and this where LEAD helped me out where I give the value 0 for the last repeated value of a column.
Suppose I have 3 x in a column. I give 1 value to the first 2 x and the 3rd x is given 0.
Actually through a cte I'm taking the entire rows from the table table_data and doing some calculation using the lead and concatinating the strings for a defined date range where each row 1 and 0 value is defined as per the criteria.
If the lead is null it'll be counted as 1 and if it is not null then 0.
And the I return 3 rows MTD, Current Week and FTD respectively with a calculation on taking the sum() I got from the lead and the count(*) entire rows.
For MTD I have the sum and count for the current month.
For Week - It's the current week and FTD is for yesterday.
WITH cte AS (
SELECT d.thedate
, lead(m.label) OVER (PARTITION BY m.label, d.thedate, d.number
ORDER BY d.thetime) AS leader
FROM table_data d
LEFT JOIN table_mapper m USING (type)
WHERE thedate BETWEEN date_trunc('month', current_date - 1)
AND current_date - 1
)
SELECT 'MTD' AS label, round(count(leader)::numeric / count(*) * 100, 1) AS val
FROM cte
UNION ALL
SELECT 'Week', round(count(leader)::numeric / count(*) * 100, 1)
FROM cte
WHERE thedate BETWEEN date_trunc('week', current_date - 1) AND current_date - 1
UNION ALL
SELECT 'FTD', round(count(leader)::numeric / count(*) * 100, 1)
FROM cte
WHERE thedate = current_date - 1;
The CTE makes sense for big tables, so you only scan it once. For smaller tables it may be faster without ...
Using thedate instead of reserved word date (in standard SQL).
thetime, uni instead of time, unique. Etc.
Simplified the lead() call. You get a value or NULL for the leading row. That seems the be the only relevant information.
It's a pointless waste to repeat columns from the PARTITION clause in the ORDER BY clause of a window function.
Building on that, count(leader) / count(*) instead of sum(uni) / count(uni) is a bit faster. count(column) only counts non-null values, while count(*) counts all rows.
The condition for the first term of the UNION query was redundant.
More advice and links about data definition in the comments to the question.
You should have primary keys. I suggest serial or IDENTITY column as surrogate PK for table_data:
ALTER TABLE table_data ADD COLUMN table_data_id serial PRIMARY KEY;
See:
Make type the primary key of table_mapper (also needed for the following FK constraint):
ALTER TABLE table_mapper ADD CONSTRAINT table_mapper_pkey (type);
Add a foreign key constraint for type to enforce referential integrity. Something like:
ALTER TABLE table_data ADD CONSTRAINT table_data_type_fkey
FOREIGN KEY (type) REFERENCES table_mapper (type)
ON UPDATE CASCADE ON DELETE NO ACTION;
For ultimate read performance (at some cost for writes), add a multi-column index to possibly allow index-only scans for above query:
CREATE INDEX table_data_foo_idx ON table_data (thedate, number, thetime);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With