I am currently using this method to sort letters in a string alphabetically in PostgreSQL. Are there any other efficient methods?
select string_agg(c, '') as s
from (select unnest(regexp_split_to_array('ijsAafhareDbv', '')) as c
order by c) as t;
s
--------------
ADaabefhijrsv
I created 3 functions, one using my query, another using Laurenz's query and one more: I created a Python(plpythonu) function for sorting. Finally, I created a table with 100000 rows( I did it from my Mac Laptop for now )
with each containing a random 15 character string generated using the random_string function in this Link
create table t as select random_string(15) as s FROM generate_series(1,100000);
Here are the 3 functions.
CREATE or REPLACE FUNCTION sort1(x TEXT) RETURNS TEXT AS $$
select string_agg(c, '') as s
from (select unnest(regexp_split_to_array($1, '')) as c
order by c) as t;
$$ LANGUAGE SQL IMMUTABLE;
CREATE or REPLACE FUNCTION sort2(x TEXT) RETURNS TEXT AS $$
WITH t(s) AS (VALUES ($1))
SELECT string_agg(substr(t.s, g.g, 1), ''
ORDER BY substr(t.s, g.g, 1)
)
FROM t
CROSS JOIN LATERAL generate_series(1, length(t.s)) g;
$$ LANGUAGE SQL IMMUTABLE;
create language plpythonu;
CREATE or REPLACE FUNCTION pysort(x text)
RETURNS text
AS $$
return ''.join(sorted(x))
$$ LANGUAGE plpythonu IMMUTABLE;
These are the results from EXPLAIN ANALYSE for all three.
knayak=# EXPLAIN ANALYSE select sort1(s) FROM t;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Seq Scan on t (cost=0.00..26541.00 rows=100000 width=32) (actual time=0.266..7097.740 rows=100000 loops=1)
Planning time: 0.119 ms
Execution time: 7106.871 ms
(3 rows)
knayak=# EXPLAIN ANALYSE select sort2(s) FROM t;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Seq Scan on t (cost=0.00..26541.00 rows=100000 width=32) (actual time=0.418..7012.935 rows=100000 loops=1)
Planning time: 0.270 ms
Execution time: 7021.587 ms
(3 rows)
knayak=# EXPLAIN ANALYSE select pysort(s) FROM t;
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Seq Scan on t (cost=0.00..26541.00 rows=100000 width=32) (actual time=0.060..389.729 rows=100000 loops=1)
Planning time: 0.048 ms
Execution time: 395.760 ms
(3 rows)
From this analysis, it turns out - Python sort was the fastest and no considerable differences between the first 2. Need to check the performance in real-time for huge tables in our systems though.
If you want a solution without regular expressions, you could use this:
WITH t(s) AS (VALUES ('amfjwzeils'))
SELECT string_agg(substr(t.s, g.g, 1), ''
ORDER BY substr(t.s, g.g, 1)
)
FROM t
CROSS JOIN LATERAL generate_series(1, length(t.s)) g;
string_agg
------------
aefijlmswz
(1 row)
I would benchmark which solution is faster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With