Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are inline queries a bad idea?

Tags:

sql

postgresql

I have a table containing the runtimes for generators on different sites, and I want to select the most recent entry for each site. Each generator is run once or twice a week.

I have a query that will do this, but I wonder if it's the best option. I can't help thinking that using WHERE x IN (SELECT ...) is lazy and not the best way to formulate the query - any query.

The table is as follows:

CREATE TABLE generator_logs (
    id integer NOT NULL,
    site_id character varying(4) NOT NULL,
    start timestamp without time zone NOT NULL,
    "end" timestamp without time zone NOT NULL,
    duration integer NOT NULL
);

And the query:

SELECT id, site_id, start, "end", duration 
FROM generator_logs
WHERE start IN (SELECT MAX(start) AS start 
                FROM generator_logs 
                GROUP BY site_id) 
ORDER BY start DESC

There isn't a huge amount of data, so I'm not worried about optimizing the query. However, I do have to do similar things on tables with 10s of millions of rows, (big tables as far as I'm concerned!) and there optimisation is more important.

So is there a better query for this, and are inline queries generally a bad idea?

like image 715
Johan Avatar asked Dec 17 '25 17:12

Johan


2 Answers

Should your query not be correlated? i.e.:

SELECT id, site_id, start, "end", duration 
FROM generator_logs g1
WHERE start = (SELECT MAX(g2.start) AS start 
               FROM generator_logs  g2
               WHERE g2.site_id = g1.site_id) 
ORDER BY start DESC

Otherwise you will potentially pick up non-latest logs whose start value happens to match the latest start for a different site.

Or alternatively:

SELECT id, site_id, start, "end", duration 
FROM generator_logs g1
WHERE (site_id, start) IN (SELECT site_id, MAX(g2.start) AS start 
                           FROM generator_logs  g2
                           GROUP BY site_id)
ORDER BY start DESC
like image 85
Tony Andrews Avatar answered Dec 20 '25 05:12

Tony Andrews


I would use joins as they perform much better then "IN" clause:

select gl.id, gl.site_id, gl.start, gl."end", gl.duration 
from 
    generator_logs gl
    inner join (
        select max(start) as start, site_id
        from generator_logs 
        group by site_id
    ) gl2
        on gl.site_id = gl2.site_id
        and gl.start = gl2.start

Also as Tony pointed out you were missing correlation in your original query

like image 26
kristof Avatar answered Dec 20 '25 05:12

kristof



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!