Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Retrieve last known value for each column of a row

Not sure about the correct words to ask this question, so I will break it down.

I have a table as follows:

date_time | a | b | c

Last 4 rows:

15/10/2013 11:45:00 | null   | 'timtim' | 'fred'
15/10/2013 13:00:00 | 'tune' | 'reco'   | null
16/10/2013 12:00:00 | 'abc'  | null     | null
16/10/2013 13:00:00 | null   | 'died'   | null

How would I get the last record but with the value ignoring the null and instead get the value from the previous record.

In my provided example the row returned would be

16/10/2013 13:00:00 | 'abc' | 'died' | 'fred'

As you can see if the value for a column is null then it goes to the last record which has a value for that column and uses that value.

This should be possible, I just cant figure it out. So far I have only come up with:

select 
    last_value(a) over w a
from test
WINDOW w AS (
    partition by a
    ORDER BY ts asc
    range between current row and unbounded following
    );

But this only caters for a single column ...

like image 792
cghrmauritius Avatar asked Sep 18 '25 19:09

cghrmauritius


2 Answers

Order of rows

The "last row" and the sort order need to be defined unambiguously. There is no natural order in a set (or a table). I assume ORDER BY ts, where ts is the timestamp column.
Like Jorge pointed out in his comment: If ts is not UNIQUE, we need to add tiebreaker(s) to ORDER BY to make the sort order deterministic. The primary key serves nicely.

General solution with window functions

To get a result for every row:

SELECT ts
     , max(a) OVER (PARTITION BY grp_a) AS a
     , max(b) OVER (PARTITION BY grp_b) AS b
     , max(c) OVER (PARTITION BY grp_c) AS c
FROM (
   SELECT *
        , count(a) OVER (ORDER BY ts) AS grp_a
        , count(b) OVER (ORDER BY ts) AS grp_b
        , count(c) OVER (ORDER BY ts) AS grp_c
   FROM tbl
   ) sub;

How?

The aggregate function count() ignores NULL values when counting. Used as aggregate-window function, it computes the running count of a column according to the default window definition, which is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. NULL values don't increase the count, so these rows fall into the same peer group as the last non-null value.
In a second window function, the only non-null value per group is easily extracted with max() or min().

Just the last row

WITH cte AS (
   SELECT *
        , count(a) OVER w AS grp_a
        , count(b) OVER w AS grp_b
        , count(c) OVER w AS grp_c
   FROM   tbl
   WINDOW w AS (ORDER BY ts)
   ) 
SELECT ts
     , max(a) OVER (PARTITION BY grp_a) AS a
     , max(b) OVER (PARTITION BY grp_b) AS b
     , max(c) OVER (PARTITION BY grp_c) AS c
FROM   cte
ORDER  BY ts DESC
LIMIT  1;

Simple alternatives for just the last row

SELECT ts
     , COALESCE(a, (SELECT a FROM tbl WHERE a IS NOT NULL ORDER BY ts DESC LIMIT 1)) AS a
     , COALESCE(b, (SELECT b FROM tbl WHERE b IS NOT NULL ORDER BY ts DESC LIMIT 1)) AS b
     , COALESCE(c, (SELECT c FROM tbl WHERE c IS NOT NULL ORDER BY ts DESC LIMIT 1)) AS c
FROM   tbl
ORDER  BY ts DESC
LIMIT  1;

Or:

SELECT (SELECT ts FROM tbl                     ORDER BY ts DESC LIMIT 1) AS ts
     , (SELECT a  FROM tbl WHERE a IS NOT NULL ORDER BY ts DESC LIMIT 1) AS a
     , (SELECT b  FROM tbl WHERE b IS NOT NULL ORDER BY ts DESC LIMIT 1) AS b
     , (SELECT c  FROM tbl WHERE c IS NOT NULL ORDER BY ts DESC LIMIT 1) AS c

fiddle
Old sqlfiddle

Performance

While this should be decently fast, if performance is your paramount requirement, consider a plpgsql function. Start with the last row and loop descending until you have a non-null value for every column required. Along these lines:

  • GROUP BY and aggregate sequential numeric values
like image 115
Erwin Brandstetter Avatar answered Sep 20 '25 10:09

Erwin Brandstetter


Here I create an aggregation function that collects columns into arrays. Then it is just a matter of removing the NULLs and selecting the last element from each array.

Sample Data

CREATE TABLE T (
    date_time timestamp,
    a text,
    b text,
    c text
);

INSERT INTO T VALUES ('2013-10-15 11:45:00', NULL, 'timtim', 'fred'),
('2013-10-15 13:00:00', 'tune', 'reco', NULL  ),
('2013-10-16 12:00:00', 'abc', NULL, NULL     ),
('2013-10-16 13:00:00', NULL, 'died', NULL    );

Solution

CREATE AGGREGATE array_accum (anyelement)
(
    sfunc = array_append,
    stype = anyarray,
    initcond = '{}'
);

WITH latest_nonull AS (
    SELECT MAX(date_time) As MaxDateTime, 
           array_remove(array_accum(a), NULL) AS A, 
           array_remove(array_accum(b), NULL) AS B, 
           array_remove(array_accum(c), NULL) AS C
    FROM T
    ORDER BY date_time
)
SELECT MaxDateTime, A[array_upper(A, 1)], B[array_upper(B,1)], C[array_upper(C,1)]
FROM latest_nonull;

Result

     maxdatetime     |  a  |  b   |  c
---------------------+-----+------+------
 2013-10-16 13:00:00 | abc | died | fred
(1 row)
like image 37
Matthew Plourde Avatar answered Sep 20 '25 09:09

Matthew Plourde