Are Views that have fewer columns faster, do they perform better?
In this example CREATE VIEW, the SELECT has only the columns for which the project has an immediate need (two of six columns from table "d").
CREATE VIEW data_with_form_id AS (
SELECT
e.form_id, d.fld, d.val
I don't want to be shortsighted; I could easily add the meta columns:
CREATE VIEW data_with_form_id AS (
SELECT
e.form_id, d.*
But what does that do to performance of the view?
I searched around for "postgresql view performance" with a few terms for select and columns, but just searching for "postgresql" and "performance" leads to so many results that finding anything about view performance of number of columns in SELECT is needle-in-(field of many)-haystack proportions.
Table definitions + sample data:
\i tmp.sql
CREATE table eee
( form_id SERIAL NOT NULL PRIMARY KEY
, payload char (500)
);
INSERT INTO eee(payload)
SELECT 'payload_'|| gs::text
FROM generate_series(1,100) gs;
CREATE table ddd
( id SERIAL NOT NULL PRIMARY KEY
, form_id SERIAL NOT NULL REFERENCES eee(form_id)
, fld char (100)
, val char (200)
, trash char (400)
, filth char (800)
);
CREATE INDEX ON ddd(form_id);
INSERT INTO ddd(form_id, fld,val,trash,filth)
SELECT eee.form_id
, 'fld_'|| gs::text
, 'val_'|| gs::text
, 'trash_'|| gs::text
, 'filth_'|| gs::text
FROM eee
JOIN generate_series(1,10) gs ON random() < 0.3
;
VACUUM ANALYZE eee;
VACUUM ANALYZE ddd;
Two views:
CREATE VIEW v1 AS
SELECT e.form_id
, d.fld, d.val
FROM eee e
JOIN ddd d ON d.form_id = e.form_id
;
CREATE VIEW v0 AS
SELECT e.payload
, d.*
FROM eee e
JOIN ddd d ON d.form_id = e.form_id
;
Let's try them:
\echo v1 complete
EXPLAIN SELECT * FROM v1 ;
\echo v1 three fields
EXPLAIN SELECT form_id,fld, val FROM v1 ;
\echo v0 complete
EXPLAIN SELECT * FROM v0 ;
\echo v0 three fields
EXPLAIN SELECT form_id,fld, val FROM v0 ;
\echo v0 four fields
EXPLAIN SELECT form_id,fld, val,trash FROM v0 ;
Output:
DROP SCHEMA
CREATE SCHEMA
SET
CREATE TABLE
INSERT 0 100
CREATE TABLE
CREATE INDEX
INSERT 0 309
VACUUM
VACUUM
CREATE VIEW
CREATE VIEW
v1 complete
QUERY PLAN
-----------------------------------------------------------------------------------------
Hash Join (cost=5.09..71.03 rows=309 width=309)
Hash Cond: (d.form_id = e.form_id)
-> Seq Scan on ddd d (cost=0.00..65.09 rows=309 width=309)
-> Hash (cost=3.84..3.84 rows=100 width=4)
-> Index Only Scan using eee_pkey on eee e (cost=0.14..3.84 rows=100 width=4)
(5 rows)
v1 three fields
QUERY PLAN
-----------------------------------------------------------------------------------------
Hash Join (cost=5.09..71.03 rows=309 width=309)
Hash Cond: (d.form_id = e.form_id)
-> Seq Scan on ddd d (cost=0.00..65.09 rows=309 width=309)
-> Hash (cost=3.84..3.84 rows=100 width=4)
-> Index Only Scan using eee_pkey on eee e (cost=0.14..3.84 rows=100 width=4)
(5 rows)
v0 complete
QUERY PLAN
---------------------------------------------------------------------
Hash Join (cost=9.25..75.18 rows=309 width=2025)
Hash Cond: (d.form_id = e.form_id)
-> Seq Scan on ddd d (cost=0.00..65.09 rows=309 width=1521)
-> Hash (cost=8.00..8.00 rows=100 width=508)
-> Seq Scan on eee e (cost=0.00..8.00 rows=100 width=508)
(5 rows)
v0 three fields
QUERY PLAN
-----------------------------------------------------------------------------------------
Hash Join (cost=5.09..71.03 rows=309 width=309)
Hash Cond: (d.form_id = e.form_id)
-> Seq Scan on ddd d (cost=0.00..65.09 rows=309 width=309)
-> Hash (cost=3.84..3.84 rows=100 width=4)
-> Index Only Scan using eee_pkey on eee e (cost=0.14..3.84 rows=100 width=4)
(5 rows)
v0 four fields
QUERY PLAN
-----------------------------------------------------------------------------------------
Hash Join (cost=5.09..71.03 rows=309 width=713)
Hash Cond: (d.form_id = e.form_id)
-> Seq Scan on ddd d (cost=0.00..65.09 rows=309 width=713)
-> Hash (cost=3.84..3.84 rows=100 width=4)
-> Index Only Scan using eee_pkey on eee e (cost=0.14..3.84 rows=100 width=4)
(5 rows)
Now look at t the "width" columns in the plans: they differ, depending not only on the table and view definitions, but also on the final query.
This is because, in postgres, a view is a kind of macro: it is merged into the query plan **before* any optimization takes place. The optimiser than later removes the unreferenced columns from the plan, resulting on a reduced rowsize for the result.
The amount of data read from the base tables is of course the same: the physical tables have not changed their rowsizes.
Note for the non-insider: I purposedly used CHAR(xxx) columns to inflate the row sizes. varchar() columns would have been toasted. (:=put into secondary storage), and would NOT have inflated the rowsizes.
It certainly costs extra to fetch columns you don't need, although that cost is often negligible. But the columns fetched depend on the query that uses the view, not the view definition.
It is, however, not difficult to add new columns to a view.
To expand on the cost of fetching columns, fetching the 40th column is more expensive than fetching the second. Fetching an oversized column that is stored out of line in a TOAST table is particularly expensive.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With