I want to calculate the sum by var1. Could you use two methods to do the calculation. SQL and data step with if first.var1.
data have;
input var1 var2$ var3;
datalines;
1 a 3
1 a 4
1 a 3
2 b 5
2 b 3
3 c 1
;
run;
data want;
input var1 var2 $ var3 sum_by_var1;
datalines;
1 a 3 10
1 a 4 10
1 a 3 10
2 b 5 9
2 b 3 9
3 c 1 9
;
run;
my two ways:
The code below works on this small data set, but I wonder if it will work on large data sets because it is hard to check the results.
proc sql;
create table new as
select
*
,sum(var3) as sum_by_var1
from have
group by var1
order by var1
;
run;
The code below doesn't work
data new2;
set have;
by var1;
if first.var1 then
by_var1 + var3;
run;
To fix your calculation using the data step you need to use:
Retain keyword to calculate the sum by var1,Output keyword to output only once sum by var1 is calculated; that's when reaching the last observation for var1,Fix:
data new2;
set have;
by var1;
retain sum_by_var1;
if first.var1 then do; sum_by_var1=0; end;
sum_by_var1 + var3;
if last.var1 then do; output; end;
run;
Output:
var1=1 var2=a var3=3 sum_by_var1=10
var1=2 var2=b var3=3 sum_by_var1=8
var1=3 var2=c var3=1 sum_by_var1=1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With