Unordered sets as columns / Python's frozenset object

Question

I am trying to replicate the following pattern (shown in Python) in SAS:

>>> df = pd.DataFrame({'a': list('abc'), 'b': list('cba')})
>>> df
   a  b
0  a  c
1  b  b
2  c  a
>>> df['key'] = [frozenset([x, y]) for x, y in df[['a', 'b']].values]
>>> df
   a  b     key
0  a  c  (a, c)
1  b  b     (b)
2  c  a  (a, c)
>>> df.drop_duplicates('key')
   a  b     key
0  a  c  (a, c)
1  b  b     (b)

Effectively, I have two columns, a and b. This table was produced via a self-join, and a and b represent the same canonical column. I need to drop duplicates on the combined/concatenated a and b columns without preserving the ordering of the values within those columns. I can do this very easily in Python, using the frozenset unordered set collection - see above.

I thought concatenating the columns then sorting might work, but in reality my columns are much more complex than single-letter cells, and the comment from this article

Hmm. "Tim Mott" and "Tom Mitt" will not match in name form, but will match in sorted anagram form. That could be interesting.

Made it clear that's not what I want to be doing.

Any tips on how to replicate this in SAS?

Edit: Ultimately, this is the view I'm trying to create, basically counting records by that unordered key:

>>> df.groupby('key').size()
key
(a, c)    2
(b)       1
dtype: int64

Tom · Accepted Answer

If you are just looking at two variables it is pretty simple. Just combine them with the smaller value first.

data have ;
  input (var1-var2) ($);
cards;
a  c
b  b
c  a
;

proc sql ;
create table want as
select
  case when var1=var2 then var1
       when var1<var2 then catx(',',var1,var2)
       else catx(',',var2,var1)
  end as key
  , count(*) as size
from have
group by 1
;
quit;

Result:

Obs    key    size

 1     a,c      2
 2     b        1

Unordered sets as columns / Python's frozenset object

Tags:

python

sas

blacksite

1 Answers

Tom

Recent Activity

Donate For Us

Unordered sets as columns / Python's frozenset object

Tags:

python

sas

blacksite

1 Answers

Tom

Related questions

Recent Activity

Donate For Us