Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting dataframe contents based on last column values

Tags:

python

pandas

I am trying to sort and display the contents of a file. The code I have does not produce the expected output. I have set ascending=True or ascending=False but it still does not work. Thank in advance.

sampledata.txt # Source file

 ADS43   11.468   02:45
982AS2S   5.657   02:45
K72KSU3  -3.398   02:45
 JJS7AS   3.238   02:45
 LO92SA   2.221   02:45
 22SA8A  -1.931   02:45
 ADS43   11.468   03:00
982AS2S  -5.657   03:00
K72KSU3   3.398   03:00
 JJS7AS  -2.238   03:00
 LO92SA   7.221   03:00
 111AS2 -10.756   03:00
 P352AS  -1.912   03:30
982AS2S -12.595   03:30
K72KSU3  -9.153   03:30
 JJS7AS  12.238   03:30
 LO92SA  17.221   03:30
 111AS2 -13.756   03:30

Current Code: #output is not properly sorted based on the last column values

data = {}
for row in open('sampledata.txt'):
    cols = row.rstrip().split()
    if cols[2] not in data:
        data[cols[2]] = {}
    data[cols[2]][cols[0]] = cols[1]
    
df = pd.DataFrame(data)
df2 = df.sort_values(by=[df.columns[-1]], ascending=False)
print (df2)

Current Output: #- Based on sampledata.txt

          02:45    03:00    03:30
LO92SA    2.221    7.221   17.221
JJS7AS    3.238   -2.238   12.238
K72KSU3  -3.398    3.398   -9.153
111AS2      NaN  -10.756  -13.756
982AS2S   5.657   -5.657  -12.595
P352AS      NaN      NaN   -1.912
ADS43    11.468   11.468      NaN
22SA8A   -1.931      NaN      NaN

Intended Output: #- sorted based on the last column from largest to smallest

          02:45    03:00    03:30
LO92SA    2.221    7.221   17.221
JJS7AS    3.238   -2.238   12.238
P352AS      NaN      NaN   -1.912
K72KSU3  -3.398    3.398   -9.153
982AS2S   5.657   -5.657  -12.595
111AS2      NaN  -10.756  -13.756
ADS43    11.468   11.468      NaN
22SA8A   -1.931      NaN      NaN
like image 254
DarCoy Avatar asked Dec 12 '25 16:12

DarCoy


2 Answers

Try:

(dfe:=df.pivot(index=0, columns=2, values=1)).sort_values(by=dfe.columns[-1], ascending=False)

Output:

2         02:45   03:00   03:30
0                              
LO92SA    2.221   7.221  17.221
JJS7AS    3.238  -2.238  12.238
P352AS      NaN     NaN  -1.912
K72KSU3  -3.398   3.398  -9.153
982AS2S   5.657  -5.657 -12.595
111AS2      NaN -10.756 -13.756
22SA8A   -1.931     NaN     NaN
ADS43    11.468  11.468     NaN
like image 106
Scott Boston Avatar answered Dec 14 '25 06:12

Scott Boston


I've executed your code near-verbatim and don't receive the error you report.

data = {}
for row in ''' ADS43   11.468   02:45
982AS2S   5.657   02:45
K72KSU3  -3.398   02:45
 JJS7AS   3.238   02:45
 LO92SA   2.221   02:45
 22SA8A  -1.931   02:45
 ADS43   11.468   03:00
982AS2S  -5.657   03:00
K72KSU3   3.398   03:00
 JJS7AS  -2.238   03:00
 LO92SA   7.221   03:00
 111AS2 -10.756   03:00
 P352AS  -1.912   03:30
982AS2S -12.595   03:30
K72KSU3  -9.153   03:30
 JJS7AS  12.238   03:30
 LO92SA  17.221   03:30
 111AS2 -13.756   03:30'''.split('\n'):
    cols = row.rstrip().split()
    if cols[2] not in data:
        data[cols[2]] = {}
    data[cols[2]][cols[0]] = cols[1]

df = pd.DataFrame(data)
df.sort_values(by=df.columns[-1], ascending=True)

That yields this (mis-sorted) data frame but the problem with the sort here is not the same problem as that which you report in your OP (as of this version), which seems merely to be the unsorted version of df. The aetiology of the issue is that the data are being sorted as strings and not as numbers.

          02:45    03:00    03:30
P352AS      NaN      NaN   -1.912
982AS2S   5.657   -5.657  -12.595
111AS2      NaN  -10.756  -13.756
K72KSU3  -3.398    3.398   -9.153
JJS7AS    3.238   -2.238   12.238
LO92SA    2.221    7.221   17.221
ADS43    11.468   11.468      NaN
22SA8A   -1.931      NaN      NaN

I think a proper way to solve this problem would be just to run the following. It loads the long form data in numeric format and then pivots (I just prefer stack and unstack) the data into position.

df0 = pd.read_csv(YOUR_FILE, sep='\s+', header=None, index_col=0)
d = df0.set_index(2, append=True)[1].unstack()  # gets rid of multi-index and names
d.columns.name = None
d.index.name = None
d.sort_values(by=d.columns[-1], ascending=False)

That yields what you seem to want:

          02:45   03:00   03:30
LO92SA    2.221   7.221  17.221
JJS7AS    3.238  -2.238  12.238
P352AS      NaN     NaN  -1.912
K72KSU3  -3.398   3.398  -9.153
982AS2S   5.657  -5.657 -12.595
111AS2      NaN -10.756 -13.756
22SA8A   -1.931     NaN     NaN
ADS43    11.468  11.468     NaN
like image 42
ifly6 Avatar answered Dec 14 '25 05:12

ifly6