Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SAS PROC SQL: why there is a different when hard coding values within IN()?

Tags:

sas

proc-sql

I have a question about the following 2 codes in SAS PROC SQL.

Code 1: (Standard Book version)

CREATE TABLE WORK.OUTPUT AS 
SELECT 
    "CLAIM" AS SOURCE,
    a.CLAIMID, 
    a.DXCODE
FROM 
    DW.CLAIMS_BAV AS a
WHERE 
    a.SITEID = '0001'
    AND a.CLAIMID IN (SELECT CLAIMID FROM WORK.INPUT)

Code 2: (The much faster way in practice)

CREATE TABLE WORK.OUTPUT AS 
SELECT 
    "CLAIM" AS SOURCE,
    a.CLAIMID, 
    a.DXCODE
FROM 
    DW.CLAIMS_BAV AS a
WHERE 
    a.SITEID = '0001'
    AND a.CLAIMID IN ('10001', '10002', '10003', ... '15000')

When I try to do it more elegantly by using subquery in #1, the run time blows up to 50 minutes +. But the same input returns within 3 minutes using Code 2. Why is that? Note, it's just as slow using INNER JOIN too (after reading this). The input is 5000+ CLAIMID, which I manually paste into the IN('...') block everyday.

PS: The CLAIMID are made up, in real life they are random.

The CLAIMID are indexed in DW.CLAIMS. I am using SAS PROC SQL to access an Oracle database. What is going on, and is there a better way? Thanks!

like image 513
George Avatar asked Dec 03 '25 18:12

George


1 Answers

I don't know that I can tell you why SAS is so slow at the first select; something's not optimized in that scenario clearly.

If I had to guess, I'd guess that SAS is deciding in the first case that it can't use pass-through SQL and so it's downloading the whole big table and then running this SAS-side, while in the second case it's passing the query up to the SQL database and only transporting the resulting rows back.

But there are several ways to work around this, anyway. Here's one: use a macro variable to do precisely the pasting you're doing!

proc sql;
select quote(strip(claimid)) into :claimlist separated by ',' 
from work.input
;

CREATE TABLE WORK.OUTPUT AS 
SELECT 
    "CLAIM" AS SOURCE,
    a.CLAIMID, 
    a.DXCODE
FROM 
    DW.CLAIMS_BAV AS a
WHERE 
    a.SITEID = '0001'
    AND a.CLAIMID IN (&claimlist.)
;
quit;

Tada, you don't have to touch this anymore, and it's identical to the copy/paste that you did.

A few extra notes given some comments:

If CLAIMID is ever less than 15, you may have space padding, so I added strip to remove those. It doesn't matter for string comparisons - except insomuch as you might run out of macro language, and I worry that some DBMS may actually care about the padding. You can leave out strip if the 15 is a constant length.

Macro variables run up to 64K in space. If you have 15 character variable plus " " two plus comma one, you have 18 characters; you have room for a bit over 3500 values. That's under 5000, unfortunately.

In this case, you can either split up the field into two macro variables (easy enough hopefully, use obs and firstobs) or you can do some other solution.

  1. Transfer the work.input dataset into the DW libname, then do the join in SQL there.
  2. Put the contents of the claimID into a file instead of into a macro variable, and then %include that file.
  3. Use call execute to execute the whole proc SQL.

Here's one example of CALL EXECUTE.

data _null_;
  set work.input end=eof;
  if _n_=1 then do;
    call execute('CREATE TABLE WORK.OUTPUT AS 
SELECT 
    "CLAIM" AS SOURCE,
    a.CLAIMID, 
    a.DXCODE
FROM 
    DW.CLAIMS_BAV AS a
WHERE 
    a.SITEID = "0001"
    AND a.CLAIMID IN (');      *the part of the SQL query before the list of IDs;
  end;
  call execute(quote(claimID) || ' ');
  if EOF then do;
    call execute('); QUIT;');  *the part of the SQL query after the list of IDs;
  end;
run;

This would be nearly identical to the %INCLUDE solution really, except there you put that stuff to a text file instead of CALL EXECUTEing it, and then you %INCLUDE that text file.

like image 128
Joe Avatar answered Dec 07 '25 15:12

Joe



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!