I am trying to sort and display the contents of a file. The code I have does not produce the expected output. I have set ascending=True or ascending=False but it still does not work. Thank in advance.
sampledata.txt # Source file
ADS43 11.468 02:45
982AS2S 5.657 02:45
K72KSU3 -3.398 02:45
JJS7AS 3.238 02:45
LO92SA 2.221 02:45
22SA8A -1.931 02:45
ADS43 11.468 03:00
982AS2S -5.657 03:00
K72KSU3 3.398 03:00
JJS7AS -2.238 03:00
LO92SA 7.221 03:00
111AS2 -10.756 03:00
P352AS -1.912 03:30
982AS2S -12.595 03:30
K72KSU3 -9.153 03:30
JJS7AS 12.238 03:30
LO92SA 17.221 03:30
111AS2 -13.756 03:30
Current Code: #output is not properly sorted based on the last column values
data = {}
for row in open('sampledata.txt'):
cols = row.rstrip().split()
if cols[2] not in data:
data[cols[2]] = {}
data[cols[2]][cols[0]] = cols[1]
df = pd.DataFrame(data)
df2 = df.sort_values(by=[df.columns[-1]], ascending=False)
print (df2)
Current Output: #- Based on sampledata.txt
02:45 03:00 03:30
LO92SA 2.221 7.221 17.221
JJS7AS 3.238 -2.238 12.238
K72KSU3 -3.398 3.398 -9.153
111AS2 NaN -10.756 -13.756
982AS2S 5.657 -5.657 -12.595
P352AS NaN NaN -1.912
ADS43 11.468 11.468 NaN
22SA8A -1.931 NaN NaN
Intended Output: #- sorted based on the last column from largest to smallest
02:45 03:00 03:30
LO92SA 2.221 7.221 17.221
JJS7AS 3.238 -2.238 12.238
P352AS NaN NaN -1.912
K72KSU3 -3.398 3.398 -9.153
982AS2S 5.657 -5.657 -12.595
111AS2 NaN -10.756 -13.756
ADS43 11.468 11.468 NaN
22SA8A -1.931 NaN NaN
Try:
(dfe:=df.pivot(index=0, columns=2, values=1)).sort_values(by=dfe.columns[-1], ascending=False)
Output:
2 02:45 03:00 03:30
0
LO92SA 2.221 7.221 17.221
JJS7AS 3.238 -2.238 12.238
P352AS NaN NaN -1.912
K72KSU3 -3.398 3.398 -9.153
982AS2S 5.657 -5.657 -12.595
111AS2 NaN -10.756 -13.756
22SA8A -1.931 NaN NaN
ADS43 11.468 11.468 NaN
I've executed your code near-verbatim and don't receive the error you report.
data = {}
for row in ''' ADS43 11.468 02:45
982AS2S 5.657 02:45
K72KSU3 -3.398 02:45
JJS7AS 3.238 02:45
LO92SA 2.221 02:45
22SA8A -1.931 02:45
ADS43 11.468 03:00
982AS2S -5.657 03:00
K72KSU3 3.398 03:00
JJS7AS -2.238 03:00
LO92SA 7.221 03:00
111AS2 -10.756 03:00
P352AS -1.912 03:30
982AS2S -12.595 03:30
K72KSU3 -9.153 03:30
JJS7AS 12.238 03:30
LO92SA 17.221 03:30
111AS2 -13.756 03:30'''.split('\n'):
cols = row.rstrip().split()
if cols[2] not in data:
data[cols[2]] = {}
data[cols[2]][cols[0]] = cols[1]
df = pd.DataFrame(data)
df.sort_values(by=df.columns[-1], ascending=True)
That yields this (mis-sorted) data frame but the problem with the sort here is not the same problem as that which you report in your OP (as of this version), which seems merely to be the unsorted version of df. The aetiology of the issue is that the data are being sorted as strings and not as numbers.
02:45 03:00 03:30
P352AS NaN NaN -1.912
982AS2S 5.657 -5.657 -12.595
111AS2 NaN -10.756 -13.756
K72KSU3 -3.398 3.398 -9.153
JJS7AS 3.238 -2.238 12.238
LO92SA 2.221 7.221 17.221
ADS43 11.468 11.468 NaN
22SA8A -1.931 NaN NaN
I think a proper way to solve this problem would be just to run the following. It loads the long form data in numeric format and then pivots (I just prefer stack and unstack) the data into position.
df0 = pd.read_csv(YOUR_FILE, sep='\s+', header=None, index_col=0)
d = df0.set_index(2, append=True)[1].unstack() # gets rid of multi-index and names
d.columns.name = None
d.index.name = None
d.sort_values(by=d.columns[-1], ascending=False)
That yields what you seem to want:
02:45 03:00 03:30
LO92SA 2.221 7.221 17.221
JJS7AS 3.238 -2.238 12.238
P352AS NaN NaN -1.912
K72KSU3 -3.398 3.398 -9.153
982AS2S 5.657 -5.657 -12.595
111AS2 NaN -10.756 -13.756
22SA8A -1.931 NaN NaN
ADS43 11.468 11.468 NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With