Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Looking up values of one csv file in another csv file and print the row

Tags:

python

pandas

csv

I have two csv files, one is a single column with a header and the other is multiple columns with headers. I want to lookup the values in the one column file and search the column with the same header in the other file. Once a match is found, I want that whole row to be printed. I know this is a vlookup function, but the csv file with multiple columns is extremely large and always crashes excel when I try to use formulas to accomplish this. So I've been trying to use python as a solution.

I'm taking this column:

age
23
43
18

Searching this table:

Name,   age,number,AA,BB,CC,DD,EE
John,   23,   1,   34,35,36,37,38
Mary,   32,   2,   33,34,35,36,37
Jacob , 43,   3,   32,33,34,35,36
Matthew,22,   4,   31,32,33,34,35
Jean,   18,   5,   30,31,32,33,34

trying to print this:

Name,   age,number,AA,BB,CC,DD,EE
John,   23,   1,   34,35,36,37,38
Jacob , 43,   3,   32,33,34,35,36
Jean,   18,   5,   30,31,32,33,34

I've been trying to use this code, but I got all mixed up and it just prints the first row as a column:

with open('/home/s/Untitled 1.csv') as f:
r=pandas.read_csv(f)
with open('/home/s/Test1.csv','r') as w:
    x=pd.read_csv(w)
    col=w['age']
    for line in w:
        for col in w:
            for row in r:
                if row in col:
                    print(line)

I basically want the script to use the first entry in the query column to search the column with the same heading in the data table and print that row, looping for the rest of the entries in the rows below.

Any advice would be appreciated!

like image 775
Pythling Avatar asked Jan 24 '26 18:01

Pythling


1 Answers

There are a lot of problems with your code which shows that you are very confused.

with open('/home/s/Untitled 1.csv') as f:
    r = pandas.read_csv(f)
with open('/home/s/Test1.csv','r') as w:
    x = pandas.read_csv(w)
    # w is not indexable
    col = w['age']
    for line in w:
        # w is not a table.
        for col in w:
            for row in r:
                if row in col:
                    print(line)

I think it would help if I break down the problem for you:

  1. Read the first file and store the ages
  2. Read the second file and store in a pandas data frame
  3. Loop through the pandas data frame to find matches of ages
  4. Print out matches

Read the first file and store the ages

This can be done by using a list instead of a data frame. You will be able to see why later on.

ages = []
with open("incsv1.csv", "r") as f:
    r = pandas.read_csv(f)
    ages = list(r["age"])

Read the second file and store it in a pandas data frame

You have already done it:

with open("incsv2.csv", "r") as f:
    x = pandas.read_csv(f)

Loop through the pandas data frame to find matches of ages and print matches

Since you know that you are only going to look through the ages column, just index to that and iterate through it:

for i, age in enumerate(x["ages"]):
    # You can't do this without a numpy int64
    if age in ages:
        print x.loc[i]

The whole program will output:

Name      John
age         23
number       1
AA          34
BB          35
CC          36
DD          37
EE          38
Name: 0, dtype: object
Name      Jacob
age           43
number         3
AA            32
BB            33
CC            34
DD            35
EE            36
Name: 2, dtype: object
Name      Jean
age         18
number       5
AA          30
BB          31
CC          32
DD          33
EE          34
Name: 4, dtype: object

A better way

Now, I know you want it to print out in a straight line, so I will show you what I think would be much better:

import pandas

ages = []
with open("incsv1.csv", "r") as f:
    r = pandas.read_csv(f)
    ages = list(r["age"])

with open("incsv2.csv", "r") as f:
    # Skip the first line
    f.readline()
    for line in f:
        if int(line.split(",")[1]) in ages:
            print line,

As you can see, you don't really need pandas in this problem. In fact, I could remove it:

ages = []
with open("incsv1.csv", "r") as f:
    # Skip the first line
    f.readline()
    for line in f:
        ages.append(int(line.strip("\n")))

with open("incsv2.csv", "r") as f:
    # Skip the first line
    f.readline()
    for line in f:
        if int(line.split(",")[1]) in ages:
            print line,
like image 114
Moon Cheesez Avatar answered Jan 26 '26 09:01

Moon Cheesez



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!