df:
name score A      1 A      2 A      3 A      4 A      5 B      2 B      4 B      6  B      8 Want to get the following new dataframe in the form of below:
   name count mean std min 25% 50% 75% max     A     5    3    .. ..  ..  ..  ..  ..     B     4    5    .. ..  ..  ..  ..  .. How to exctract the information from df.describe() and reformat it? Thanks
By use + operator simply you can combine/merge two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation.
The describe() method returns description of the data in the DataFrame. If the DataFrame contains numerical data, the description contains these information for each column: count - The number of not-empty values. mean - The average (mean) value.
The describe() function is used to generate descriptive statistics that summarize the central tendency, dispersion and shape of a dataset's distribution, excluding NaN values. Syntax: DataFrame.describe(self, percentiles=None, include=None, exclude=None) Parameters: Name.
there is even a shorter one :)
print df.groupby('name').describe().unstack(1) Nothing beats one-liner:
In [145]:
print df.groupby('name').describe().reset_index().pivot(index='name', values='score', columns='level_1')
In[1]: import pandas as pd import io  data = """ name score A      1 A      2 A      3 A      4 A      5 B      2 B      4 B      6 B      8     """  df = pd.read_csv(io.StringIO(data), delimiter='\s+') print(df) .
Out[1]:   name  score 0    A      1 1    A      2 2    A      3 3    A      4 4    A      5 5    B      2 6    B      4 7    B      6 8    B      8 A nice approach to this problem uses a generator expression (see footnote) to allow pd.DataFrame() to iterate over the results of groupby, and construct the summary stats dataframe on the fly:
In[2]: df2 = pd.DataFrame(group.describe().rename(columns={'score':name}).squeeze()                          for name, group in df.groupby('name'))  print(df2) .
Out[2]:    count  mean       std  min  25%  50%  75%  max A      5     3  1.581139    1  2.0    3  4.0    5 B      4     5  2.581989    2  3.5    5  6.5    8 Here the squeeze function is squeezing out a dimension, to convert the one-column group summary stats Dataframe into a Series.  
Footnote: A generator expression has the form my_function(a) for a in iterator,  or if iterator gives us back two-element tuples, as in the case of groupby:  my_function(a,b) for a,b in iterator
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With