Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python script for segregating column by flag values and saving to two files

I have a data file containing four columns.

test.txt file :

id | addr | value | flag|

:--|:----- |-------|-----|

300| 275 | 5 | 0 |

300| 766 | 15 | 1

300| 276 | 3 | 1

300| 248 | 6 | 1

300| 267 | 11 | 1

508|205 | 12 | 0

508|201 | 12 | 1

301|32 | 3 | 0

301|44 | 4 | 1

301|32 | 2 | 0

I need to segregate the second column values based on the flag value of the fourth column and save those to two separate files.

required output: file:1

id | addr(f=0)

300 | 275
508 | 205
301 | 32
file:2

id | addr(f=1)

300 |766
300 |276
300 |248
300 |267
508 |201
301 |44

I am very new to python and so far I have done the following.

import sys

if len(sys.argv) < 2:
    sys.stderr.write("Usage: {0} filename\n".format(sys.argv[0]))
    sys.exit()

fn = sys.argv[1]
sys.stderr.write("reading " + fn + "...\n")

# Initialize dictionaries (or hash id)
list_id = {}

fin = open(fn,"r")
for line in fin:
    line = line.rstrip()
    f = line.split("|")
    id = f[0]
    addr = f[1]
    flag = f[3]

fin.close()

Need your suggestion to complete the program. Thanks in advance for your kind help.

The real glimpse of Data :

enter image description here

like image 558
Rubz Avatar asked Mar 21 '26 02:03

Rubz


1 Answers

this is a variant using the csv module:

from csv import reader, writer

with open('test.txt', 'r') as file:
    rows = reader(file, delimiter='|', skipinitialspace=True)
    with open('file1.txt', 'w') as file1, open('file2.txt', 'w') as file2:
        writer1 = writer(file1, delimiter='|')
        writer2 = writer(file2, delimiter='|')
        for row in rows:

            try:
                flag = int(row[3])
            except IndexError:
                # row does has less than 4 elements, next row!
                print('row too short!', row)
                continue
            except ValueError:
                # if this is not an integer, next row!
                print('row[3] not an int!', row[3])
                continue

            if flag == 0:
                writer1.writerow(row[:2])  # write the first 2 entries only
            elif flag == 1:
                writer2.writerow(row[:2])
            else:
                print('flag not in (0, 1)!', flag)

for your updated (and different from the original) input, changing the reader to

rows = reader(file, delimiter=' ', skipinitialspace=True)

should work.

like image 157
hiro protagonist Avatar answered Mar 23 '26 16:03

hiro protagonist