So, I have a problem with a SQL Query.
It's about getting weather data for German cities. I have 4 tables: staedte (the cities with primary key loc_id), gehoert_zu (contains the city-key and the key of the weather station that is closest to this city (stations_id)), wettermessung (contains all the weather information and the station's key value) and wetterstation (contains the stations key and location). And I'm using PostgreSQL
Here is how the tables look like:
wetterstation
s_id[PK] standort lon lat hoehe
----------------------------------------
10224 Bremen 53.05 8.8 4
wettermessung
stations_id[PK] datum[PK] max_temp_2m ......
----------------------------------------------------
10224 2013-3-24 -0.4
staedte
loc_id[PK] name lat lon
-------------------------------
15 Asch 48.4 9.8
gehoert_zu
loc_id[PK] stations_id[PK]
-----------------------------
15 10224
What I'm trying to do is to get the name of the city with the (for example) highest temperature at a specified date (could be a whole month, or a day). Since the weather data is bound to a station, I actually need to get the station's ID and then just choose one of the corresponding to this station cities. A possible question would be: "In which city was it hottest in June ?" and, say, the highest measured temperature was in station number 10224. As a result I want to get the city Asch. What I got so far is this
SELECT name, MAX (max_temp_2m)
FROM wettermessung, staedte, gehoert_zu
WHERE wettermessung.stations_id = gehoert_zu.stations_id
AND gehoert_zu.loc_id = staedte.loc_id
AND wettermessung.datum BETWEEN '2012-8-1' AND '2012-12-1'
GROUP BY name
ORDER BY MAX (max_temp_2m) DESC
LIMIT 1
There are two problems with the results: 1) it's taking waaaay too long. The tables are not that big (cities has about 70k entries), but it needs between 1 and 7 minutes to get things done (depending on the time span) 2) it ALWAYS produces the same city and I'm pretty sure it's not the right one either.
I hope I managed to explain my problem clearly enough and I'd be happy for any kind of help. Thanks in advance ! :D
If you want to get the max temperature per city use this statement:
SELECT * FROM (
SELECT gz.loc_id, MAX(max_temp_2m) as temperature
FROM wettermessung as wm
INNER JOIN gehoert_zu as gz
ON wm.stations_id = gz.stations_id
WHERE wm.datum BETWEEN '2012-8-1' AND '2012-12-1'
GROUP BY gz.loc_id) as subselect
INNER JOIN staedte as std
ON std.loc_id = subselect.loc_id
ORDER BY subselect.temperature DESC
Use this statement to get the city with the highest temperature (only 1 city):
SELECT * FROM(
SELECT name, MAX(max_temp_2m) as temp
FROM wettermessung as wm
INNER JOIN gehoert_zu as gz
ON wm.stations_id = gz.stations_id
INNER JOIN staedte as std
ON gz.loc_id = std.loc_id
WHERE wm.datum BETWEEN '2012-8-1' AND '2012-12-1'
GROUP BY name
ORDER BY MAX(max_temp_2m) DESC
LIMIT 1) as subselect
ORDER BY temp desc
LIMIT 1
For performance reasons always use explicit joins as LEFT, RIGHT, INNER JOIN and avoid to use joins with separated table name, so your sql serevr has not to guess your table references.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With