I have a pandas dataframe with the following structure:
DF_Cell, DF_Site
C1,A
C2,A
C3,B
C4,B
C5,B
And I have a very long loop (100 million iterations) in which I treat one by one strings that correspond to the "DF_Cell" column in the DataFrame (first loop iteration creates C1, second iteration creates C2, etc...).
I would like to lookup in the dataframe the DF_Site corresponding to the cell (DF_Cell) treated in the loop.
One way I could think of was to put the treated cell in a one-cell DataFrame and then doing a left merge on it, but this is much too inefficient for such big data.
Is there a better way?
Perhaps you want to set DF_Cell as the index*:
In [11]: df = pd.read_csv('foo.csv', index_col='DF_Cell')
# or df.set_index('DF_Cell', inplace=True)
In [12]: df
Out[12]:
DF_Site
DF_Cell
C1 A
C2 A
C3 B
C4 B
C5 B
You can then refer to the row, or specific entry, using loc:
In [13]: df.loc['C1']
Out[13]:
DF_Site A
Name: C1, dtype: object
In [14]: df.loc['C1', 'DF_Site']
Out[14]: 'A'
*Assuming this has two columns, you could use squeeze=True.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With