I'm currently working with Pandas and matplotlib to perform some data visualization and I want to add a line of best fit to my scatter plot.
Here is my code:
import matplotlib
import matplotlib.pyplot as plt
import pandas as panda
import numpy as np
def PCA_scatter(filename):
   matplotlib.style.use('ggplot')
   data = panda.read_csv(filename)
   data_reduced = data[['2005', '2015']]
   data_reduced.plot(kind='scatter', x='2005', y='2015')
   plt.show()
PCA_scatter('file.csv')
How do I go about this?
A line of best fit is a straight line that minimizes the distance between it and some data. The line of best fit is used to express a relationship in a scatter plot of different data points. It is an output of regression analysis and can be used as a prediction tool for indicators and price movements.
First, to add the line of fit described by the regression analysis, right click on the chart and select “Add fit Line at Total”, which is towards the bottom of the listed options. This will add the line to the scatter plot. You can close the “Properties” box that is displayed after you have added this line.
You can do the whole fit and plot in one fell swoop with Seaborn.
import pandas as pd
import seaborn as sns
data_reduced= pd.read_csv('fake.txt',sep='\s+')
sns.regplot(data_reduced['2005'],data_reduced['2015'])

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With