Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting value and creating new column out of it

Tags:

pandas

I would like to extract certain section of a URL, residing in a column of a Pandas Dataframe and make that a new column. This

ref = df['REFERRERURL']
ref.str.findall("\\d\\d\\/(.*?)(;|\\?)",flags=re.IGNORECASE)

returns me a Series with tuples in it. How can I take out only one part of that tuple before the Series is created, so I can simply turn that into a column? Sample data for referrerurl is

http://wap.blah.com/xxx/id/11/someproduct_step2;jsessionid=....

In this example I am interested in creating a column that only has 'someproduct_step2' in it.

Thanks,

like image 403
BBSysDyn Avatar asked Dec 05 '25 14:12

BBSysDyn


2 Answers

In [25]: df = DataFrame([['http://wap.blah.com/xxx/id/11/someproduct_step2;jsessionid=....']],columns=['A'])

In [26]: df['A'].str.findall("\\d\\d\\/(.*?)(;|\\?)",flags=re.IGNORECASE).apply(lambda x: Series(x[0][0],index=['first']))
Out[26]: 
               first
0  someproduct_step2

in 0.11.1 here is a neat way of doing this as well

In [34]: df.replace({ 'A' : "http:.+\d\d\/(.*?)(;|\\?).*$"}, { 'A' : r'\1'} ,regex=True)
Out[34]: 
                   A
0  someproduct_step2
like image 125
Jeff Avatar answered Dec 10 '25 22:12

Jeff


This also worked

def extract(x):
    res = re.findall("\\d\\d\\/(.*?)(;|\\?)",x)
    if res: return res[0][0]

session['RU_2'] = session['REFERRERURL'].apply(extract)
like image 34
BBSysDyn Avatar answered Dec 10 '25 22:12

BBSysDyn