I have some data in the pandas data frame as following where I converted the currency and the value earlier to the USD from CYN (Chinese Yuan)
currency port supplier_id value
0 USD CNAQG 35 118.8344
1 USD CNAQG 19 121.0082
2 USD CNAQG 49 86.9520
3 USD CNAQG 54 112.3130
4 USD CNAQG 113 113.7622
5 USD CNAQG 5 114.4868
6 USD CNAQG 55 111.5884
7 USD CNAQG 81 117.3852
8 USD CNAQG 2 111.5884
6651 USD USTPA 14 420.0000
6652 USD USTPA 56 420.0000
6653 USD USTPA 113 420.0000
6654 USD USTPA 5 500.0000
6655 USD USTPA 55 500.0000
6656 USD USTPA 193 390.0000
6657 USD USTPA 74 450.0000
6658 USD USTPA 35 420.0000
6659 USD USTPA 54 420.0000
6660 USD USTPA 231 450.0000
The df.info() prints the following,
<class 'pandas.core.frame.DataFrame'>
Int64Index: 6652 entries, 0 to 6660
Data columns (total 4 columns):
currency 6652 non-null object
port 6652 non-null object
supplier_id 6652 non-null int64
value 6652 non-null float64
dtypes: float64(1), int64(1), object(2)
memory usage: 259.8+ KB
None
The first 2 letter of the port indicates country and I have a map for that,
COUNTRIES = {
"CN": "CHINA",
"US": "USA"
}
I would like to group the data based on the country where the port is situated and the intention will be to visualize the list of values per country in a meaningful way. I appreciate any suggestion of what kind of graph would be appropriate for the job done.
I think you need pivot and then plot by plot.bar or plot:
df1 = pd.pivot(index=df['supplier_id'],
columns = df['port'].str[:2].map(COUNTRIES),
values=df['value']).fillna(0)
print (df1)
port CHINA USA
supplier_id
2 111.5884 0.0
5 114.4868 500.0
14 0.0000 420.0
19 121.0082 0.0
35 118.8344 420.0
49 86.9520 0.0
54 112.3130 420.0
55 111.5884 500.0
56 0.0000 420.0
74 0.0000 450.0
81 117.3852 0.0
113 113.7622 420.0
193 0.0000 390.0
231 0.0000 450.0
df1.plot.bar()
df1.plot()
But if error:
ValueError: Index contains duplicate entries, cannot reshape
then need pivot_table with some aggregate function mean, sum... (default function is 'mean'):
print (df)
currency port supplier_id value
0 USD CNAQG 35 118.8344
1 USD CNAQG 19 121.0082
2 USD CNAQG 49 86.9520
3 USD CNAQG 54 112.3130
4 USD CNAQG 113 113.7622
5 USD CNAQG 5 114.4868
6 USD CNAQG 55 111.5884
7 USD CNAQG 81 117.3852
8 USD CNAQG 2 111.5884
6651 USD USTPA 14 420.0000
6652 USD USTPA 56 420.0000
6653 USD USTPA 113 420.0000
6654 USD USTPA 5 500.0000
6655 USD USTPA 55 500.0000
6656 USD USTPA 193 390.0000
6657 USD USTPA 74 450.0000
6658 USD USTPA 35 420.0000
6659 USD USTPA 54 420.0000
6660 USD USTPA 231 450.0000 <-duplicates for USTPA, 231
6660 USD USTPA 231 800.0000 <-duplicates for USTPA, 231
COUNTRIES = {
"CN": "CHINA",
"US": "USA"
}
df1 = pd.pivot_table(df,
index='supplier_id',
columns = df['port'].str[:2].map(COUNTRIES),
values='value',
aggfunc=np.mean,
fill_value=0)
print (df1)
port CHINA USA
supplier_id
2 111.5884 0
5 114.4868 500
14 0.0000 420
19 121.0082 0
35 118.8344 420
49 86.9520 0
54 112.3130 420
55 111.5884 500
56 0.0000 420
74 0.0000 450
81 117.3852 0
113 113.7622 420
193 0.0000 390
231 0.0000 625 <-mean (450 + 800) /2
df1.plot.bar()
Alternative solution with groupby and mean:
df1 = df.groupby(['supplier_id', df['port'].str[:2].map(COUNTRIES)])['value']
.mean()
.unstack(fill_value=0)
print (df1)
port CHINA USA
supplier_id
2 111.5884 0.0
5 114.4868 500.0
14 0.0000 420.0
19 121.0082 0.0
35 118.8344 420.0
49 86.9520 0.0
54 112.3130 420.0
55 111.5884 500.0
56 0.0000 420.0
74 0.0000 450.0
81 117.3852 0.0
113 113.7622 420.0
193 0.0000 390.0
231 0.0000 625.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With