Plotting data from multiple pandas data frames in one plot

Question

I am interested in plotting a time series with data from several different pandas data frames. I know how to plot a data for a single time series and I know how to do subplots, but how would I manage to plot from several different data frames in a single plot? I have my code below. Basically what I am doing is I am scanning through a folder of json files and parsing that json file into a panda so that I can plot. When I run this code it is only plotting from one of the pandas instead of the ten pandas created. I know that 10 pandas are created because I have a print statement to ensure they are all correct.

import sys, re
import numpy as np
import smtplib
import matplotlib.pyplot as plt
from random import randint
import csv
import pylab as pl
import math
import pandas as pd
from pandas.tools.plotting import scatter_matrix
import argparse
import matplotlib.patches as mpatches
import os
import json



parser = argparse.ArgumentParser()
parser.add_argument('-file', '--f', help = 'folder where JSON files are stored')
if len(sys.argv) == 1:
    parser.print_help()
    sys.exit(1)
args = parser.parse_args()


dat = {}
i = 0

direc = args.f
directory = os.fsencode(direc)

fig1 = plt.figure()
ax1 = fig1.add_subplot(111)

for files in os.listdir(direc):
    filename = os.fsdecode(files)
    if filename.endswith(".json"):
        path = '/Users/Katie/Desktop/Work/' + args.f + "/" +filename
        with open(path, 'r') as data_file:
            data = json.load(data_file)
            for r in data["commits"]:
                dat[i] = (r["author_name"], r["num_deletions"], r["num_insertions"], r["num_lines_changed"],
                          r["num_files_changed"], r["author_date"])
                name = "df" + str(i).zfill(2)
                i = i + 1
                name = pd.DataFrame.from_dict(dat, orient='index').reset_index()
                name.columns = ["index", "author_name", "num_deletions",
                                          "num_insertions", "num_lines_changed",
                                          "num_files_changed",  "author_date"]
                del name['index']
                name['author_date'] = name['author_date'].astype(int)
                name['author_date'] =  pd.to_datetime(name['author_date'], unit='s')
                ax1.plot(name['author_date'], name['num_lines_changed'], '*',c=np.random.rand(3,))
                print(name)
                continue

    else:
        continue
plt.xticks(rotation='35')
plt.title('Number of Lines Changed vs. Author Date')
plt.show()

omdv · Accepted Answer

Quite straightforward actually. Don't let pandas confuse you. Underneath it every column is just a numpy array.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df1 = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))

fig1 = plt.figure()
ax1 = fig1.add_subplot(111)

ax1.plot(df1['A'])
ax1.plot(df2['B'])

enter image description here

Plotting data from multiple pandas data frames in one plot

Tags:

python

pandas

plot

K22

1 Answers

omdv

Recent Activity

Donate For Us

Plotting data from multiple pandas data frames in one plot

Tags:

python

pandas

plot

K22

1 Answers

omdv

Related questions

Recent Activity

Donate For Us