Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I use the MAX function over three tables?

Tags:

sql

postgresql

So, I have a problem with a SQL Query.

It's about getting weather data for German cities. I have 4 tables: staedte (the cities with primary key loc_id), gehoert_zu (contains the city-key and the key of the weather station that is closest to this city (stations_id)), wettermessung (contains all the weather information and the station's key value) and wetterstation (contains the stations key and location). And I'm using PostgreSQL

Here is how the tables look like:

wetterstation
s_id[PK]   standort    lon    lat    hoehe 
----------------------------------------
10224      Bremen      53.05  8.8    4


wettermessung
stations_id[PK]    datum[PK]     max_temp_2m   ......
----------------------------------------------------
10224              2013-3-24     -0.4


staedte
loc_id[PK]    name    lat    lon
-------------------------------
15            Asch    48.4   9.8


gehoert_zu
loc_id[PK]    stations_id[PK]
-----------------------------
15            10224

What I'm trying to do is to get the name of the city with the (for example) highest temperature at a specified date (could be a whole month, or a day). Since the weather data is bound to a station, I actually need to get the station's ID and then just choose one of the corresponding to this station cities. A possible question would be: "In which city was it hottest in June ?" and, say, the highest measured temperature was in station number 10224. As a result I want to get the city Asch. What I got so far is this

SELECT name, MAX (max_temp_2m)
FROM wettermessung, staedte, gehoert_zu 
WHERE wettermessung.stations_id = gehoert_zu.stations_id
    AND gehoert_zu.loc_id = staedte.loc_id 
    AND wettermessung.datum BETWEEN '2012-8-1' AND '2012-12-1' 
GROUP BY name
ORDER BY MAX (max_temp_2m) DESC 
LIMIT 1

There are two problems with the results: 1) it's taking waaaay too long. The tables are not that big (cities has about 70k entries), but it needs between 1 and 7 minutes to get things done (depending on the time span) 2) it ALWAYS produces the same city and I'm pretty sure it's not the right one either.

I hope I managed to explain my problem clearly enough and I'd be happy for any kind of help. Thanks in advance ! :D

like image 434
smeshko Avatar asked Jan 23 '26 02:01

smeshko


1 Answers

If you want to get the max temperature per city use this statement:

SELECT * FROM (
   SELECT gz.loc_id, MAX(max_temp_2m) as temperature
      FROM wettermessung as wm
      INNER JOIN gehoert_zu as gz
         ON wm.stations_id = gz.stations_id
      WHERE wm.datum BETWEEN '2012-8-1' AND '2012-12-1' 
      GROUP BY gz.loc_id) as subselect
   INNER JOIN staedte as std
      ON std.loc_id = subselect.loc_id
      ORDER BY subselect.temperature DESC

Use this statement to get the city with the highest temperature (only 1 city):

SELECT * FROM(
   SELECT name, MAX(max_temp_2m) as temp
   FROM wettermessung as wm
   INNER JOIN gehoert_zu as gz
      ON wm.stations_id = gz.stations_id
   INNER JOIN staedte as std
      ON gz.loc_id = std.loc_id
   WHERE wm.datum BETWEEN '2012-8-1' AND '2012-12-1' 
   GROUP BY name
   ORDER BY MAX(max_temp_2m) DESC 
   LIMIT 1) as subselect
ORDER BY temp desc
LIMIT 1

For performance reasons always use explicit joins as LEFT, RIGHT, INNER JOIN and avoid to use joins with separated table name, so your sql serevr has not to guess your table references.

like image 151
5im Avatar answered Jan 25 '26 20:01

5im



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!