I'm creating a program to compare audio files which uses a similar algorithm to the one described here http://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf. I am plotting the times of matches between two songs being compared and finding the line of least squares for the plot. href=http://imgur.com/fGu7jhX&yOeMSK0 is an example plot of matching files. The plot is too messy and the least squares regression line does not produce a high correlation coefficient even though there is an obvious line in the graph. What other algorithm can I use to recognize this line?
This is an interesting question, but it's been pretty quiet. Maybe this answer will trigger some more activity.
For identifying lines with arbitrary slopes and intercepts within a collection of points, the Hough transform would be a good place to start. For your audio application, however, it looks like the slope should always be 1, so you don't need the full generality of the Hough transform.
Instead, you can think of the problem as one of clustering the differences x - y
, where x
and y
are the vectors holding the x and y coordinates of the points.
One approach would be to compute a histogram of x - y
. Points that are close to lying in the same line with slope 1 will have differences in the same bin in the histogram. The bin with the largest count corresponds to the largest collection of points that are approximately aligned. An issue to deal with in this approach is choosing the boundaries of the histogram bins. A bad choice could result in points that should be grouped together being split into neighboring bins.
A simple brute-force approach is to imagine a diagonal window with a given width, sliding left to right across the (x,y) plane. The best candidate for a line corresponds to the position of the window that contains the most points. This is similar to a histogram of x - y
, but instead of having a collection of disjoint bins, there are overlapping bins, one for each point. All the bins have the same width, and each point determines the left edge of a bin.
The function count_diag_groups
in the code below does that computation. For each point, it counts how many points are in the diagonal window when the left edge of the window is on that point. The best candidate for a line is the window with the most points. Here's the plot generated by the script. The top is the scatter plot of the data. The bottow is the same scatter plot, with the best candidate points highlighted.
A nice feature of this method is that there is only one parameter, the window width. A not-so-nice feature is that it has time complexity O(n**2), where n is the number of points. There are surely algorithms with better time complexity that could do something similar; the article that you link to discusses this. To judge the quality of an alternative, however, will require more concrete specifications of how "good" or robust the line identification must be.
import numpy as np
import matplotlib.pyplot as plt
def count_diag_groups(x, y, width):
"""
Returns a list of arrays. The length of the list is the same
as the length of x. The k-th array holds the indices into x
(and y) of a set of points that are in a "diagonal" window with
the given width whose left edge includes the point (x[k], y[k]).
"""
d = x - y
result = []
for i in range(d.size):
delta = d - d[i]
neighbors = np.where((delta >= 0) & (delta <= width))[0]
result.append(neighbors)
return result
def generate_demo_data():
# Generate some data.
np.random.seed(123)
xmin = 0
xmax = 100
ymin = 0
ymax = 25
nrnd = 175
xrnd = xmin + (xmax - xmin)*np.random.rand(nrnd)
yrnd = ymin + (ymax - ymin)*np.random.rand(nrnd)
n = 25
xx = xmin + 0.1*(xmax - xmin) + ymax*np.random.rand(n)
yy = (xx - xx.min()) + 0.2*np.random.randn(n)
x = np.concatenate((xrnd, xx))
y = np.concatenate((yrnd, yy))
return x, y
def plot_result(x, y, width, selection):
xmin = x.min()
xmax = x.max()
ymin = y.min()
ymax = y.max()
xsel = x[selection]
ysel = y[selection]
# Plot...
plt.figure(1)
plt.clf()
ax = plt.subplot(2,1,1)
plt.plot(x, y, 'o', mfc='b', mec='b', alpha=0.5)
plt.xlim(xmin - 1, xmax + 1)
plt.ylim(ymin - 1, ymax + 1)
plt.subplot(2,1,2, sharex=ax, sharey=ax)
plt.plot(x, y, 'o', mfc='b', mec='b', alpha=0.5)
plt.plot(xsel, ysel, 'o', mfc='w', mec='w')
plt.plot(xsel, ysel, 'o', mfc='r', mec='r', alpha=0.65)
xi = np.array([xmin, xmax])
d = x - y
yi1 = xi - d[imax]
yi2 = yi1 - width
plt.plot(xi, yi1, 'r-', alpha=0.25)
plt.plot(xi, yi2, 'r-', alpha=0.25)
plt.xlim(xmin - 1, xmax + 1)
plt.ylim(ymin - 1, ymax + 1)
plt.show()
if __name__ == "__main__":
x, y = generate_demo_data()
# Find a selection of points that are close to being aligned
# with a slope of 1.
width = 0.75
r = count_diag_groups(x, y, width)
# Find the largest group.
sz = np.array(list(len(f) for f in r))
imax = sz.argmax()
# k holds the indices of the selected points.
selection = r[imax]
plot_result(x, y, width, selection)
This looks like an excellent example of a task for Random Sampling Consensus (RANSAC). The Wikipedia article even uses your problem as an example!
The rough outline is something like this.
Check the Wikipedia article for more information.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With