I a have two sets of data of which I want to find a correlation. Although there is quite some scattering of data there's obvious a relation. I currently use numpy polyfit (8th order) but there is some "wiggling" of the line (especially at the beginning and the end) which is not appropriate. Secondly I don't think the fit is very well at the beginning of the line (the curve should be slightly steeper.
How can I get a best fit "spline" through these data points?

My current code:
# fit regression line
regressionLineOrder = 8
regressionLine = np.polyfit(data['x'], data['y'], regressionLineOrder)
p = np.poly1d(regressionLine)
Take a look at @MatthewDrury's answer for Why use regularisation in polynomial regression instead of lowering the degree?. It's simply fantastic and spot on. The most interesting bit comes in at the end when he starts talking about using a natural cubic spline to fit a regression in place of a regularized polynomial of degree 10. You could use the implementation of scipy.interpolate.CubicSpline to accomplish something very similar. There are a ton of classes for other spline methods contained in scipy.interpolate for similar methods.
Here is a simple example:
from scipy.interpolate import CubicSpline
cs = CubicSpline(data['x'], data['y'])
x_range = np.arange(x_min, x_max, some_step)
plt.plot(x_range, cs(x_range), label='Cubic Spline')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With