I have a DataFrame as follows:
a b c d
0 0.140603 0.622511 0.936006 0.384274
1 0.246792 0.961605 0.866785 0.544677
2 0.710089 0.057486 0.531215 0.243285
I want to iterate the df with itertuples() and print the values and column names of each row. Currently I know the following method:
df=pd.DataFrame(np.random.rand(3,4),columns=['a','b','c','d'])
for item in df.itertuples():
print(item)
And the output is:
Pandas(Index=0, a=0.55464273035498401, b=0.50784779485386233, c=0.55866384351761911, d=0.35969591433338755)
Pandas(Index=1, a=0.60682158587529356, b=0.37571390304543184,
c=0.13566419305411737, d=0.55807909125502775)
Pandas(Index=2, a=0.73260693374584385, b=0.59246381839030349, c=0.92102184020347211, d=0.029942550647279687)
Question:
1) I thought the return data of each iteration is a tuple (as suggested by the function name) when the type(df) returns Pandas()?
2) What is the best way to extract the value of 'a', 'b', 'c', 'd' being the column names as I loop through the items of each row?
It's a named tuple.
To access the values of the named tuple, either by label:
for item in df.itertuples():
print(item.a, item.b)
or by position
for item in df.itertuples():
print(item[1], item[2])
When DataFrame has more than 254 columns, the return type is a tuple and the only available access is by position. To be anyway able to access by label, restrict df just to columns you need
for item in df.loc[:, [a, b]].itertuples():
print(item.a, item.b)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With