I am trying to take a column of original prices and enter a discount % and return the closest match to a predetermined set of values. These allowable values are found in another table that is just one column of prices. I am curious to hear how ties would be handled. Please note that this is for a long list of items, so this would have to apply to an entire column. The specific syntax needed is Google Big Query.
I envision this functioning similarly to excel's VLOOKUP approximate = 1. In practice, I will apply the same solution to multiple price points in the results table (ex. origPrice, 25%off, 50%off, and 75%off etc. ), but I figured that I could copy-paste the solution multiple times.
The below example shows a 50% price reduction.
allowableDiscounts
| discountPrice |
|---|
| $51.00 |
| $48.50 |
| $40.00 |
productInfo
| Item | OrigPrice |
|---|---|
| Apple | $100.00 |
| Banana | $ 98.00 |
Desired Output
| Item | OrigPrice | exact50off | closestMatch |
|---|---|---|---|
| Apple | $100.00 | $50.00 | $51.00 |
| Banana | $ 98.00 | $44.00 | $40.00 |
I have researched solutions here and elsewhere. Most of what I found suggested sorting the allowableDiscounts table by the absolute value of the difference between exact50off and discountPrice. That worked great for one instance, but I could not figure out how to apply that to an entire list of prices.
I have workarounds both in SQL and excel that can accomplish the same task manually, but I am looking for something to match the above function so that way if the allowableDiscounts table changes, the calculations will reflect that without recoding.
SELECT
p.Item,
p.OrigPrice,
p.OrigPrice * 0.5 AS exact50off
--new code from allowableDiscounts.discountPrice
FROM
productInfo AS p
WHERE
--filters applied as needed
You may work it out with a CROSS JOIN, then compute the smallest difference and filter out the other generated records (with higher differences).
Smallest difference here is retrieved by assigning a rank to all differences in each partition <Item, OrigPrice> (with ROW_NUMBER), then all values ranked higher than 1 are discarded.
WITH cte AS (
SELECT *,
OrigPrice*0.5 AS exact50off,
ROW_NUMBER() OVER(PARTITION BY Item, OrigPrice ORDER BY ABS(discountPrice - OrigPrice*0.5)) AS rn
FROM productInfo
CROSS JOIN allowableDiscounts
)
SELECT Item,
OrigPrice,
exact50off,
discountPrice
FROM cte
WHERE rn = 1
In case the tables are large, as you stated, a cross join is not possible and a window function is the only solution.
First we generate a function nearest, which return the element (x or y) closest to a target value.
Then we define both tables, discountPrice and productInfo. Next, we union these tables as helper. The first column tmp holds the value 1, if the data is from the main table productInfo and we calculate the column exact50off. For the table discountPrice the tmp column in set to 0 and the exact50off column is filled with the entries discountPrice. We add the table discountPrice again, but for column exact75off.
We query the helper table and use:
last_value(if(tmp=0,exact50off,null) ignore nulls) over (order by exact50off),
tmp=0 : Keep only entries from the table discountPricelast_value get nearest lowest value from table discountPriceWe run the same again, but with desc to obtain the nearest highest value.
The function nearest yields the nearest values of both.
Analog this is done for exact75off
create temp function nearest(target any type,x any type, y any type) as (if(abs(target-x)>abs(target-y),y,x) );
with allowableDiscounts as (select * from unnest([51,48.5,40,23,20]) as discountPrice ),
productInfo as (select "Apple" as item, 100 as OrigPrice union all select "Banana",98 union all select "Banana cheap",88),
helper as (
select 1 as tmp, # this column holds the info from which table the rows come forme
item,OrigPrice, # all columns of the table productInfo (2)
OrigPrice/2 as exact50off, # calc 50%
OrigPrice*0.25 as exact75off, # calc 75%
from productInfo
union all # table for 50%
select 0 as tmp,
null,null, # (2) null entries, because the table productInfo has two columns (2)
discountPrice as exact50off, #possible values for 50% off
null # other calc (75%)
from allowableDiscounts
union all # table for 75%
select 0 as tmp,
null,null, # (2) null entries, because the table productInfo has two columns (2)
null, # other calc (50%)
discountPrice, #possible values for 75% off
from allowableDiscounts
)
select *,
nearest(exact50off,
last_value(if(tmp=0,exact50off,null) ignore nulls) over (order by exact50off),
last_value(if(tmp=0,exact50off,null) ignore nulls) over (order by exact50off desc)
) as closestMatch50off,
nearest(exact75off,
last_value(if(tmp=0,exact75off,null) ignore nulls) over (order by exact75off),
last_value(if(tmp=0,exact75off,null) ignore nulls) over (order by exact75off desc)
) as closestMatch75off,
from helper
qualify tmp=1
order by exact50off
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With