BACKGROUND
My goal is to create a JavaScript-based web app to analyse and display frequency information in audio sources, both in-page sources (<audio> tag) and signals streamed from the client's microphone.  I am well on my way :)
As a keen saxophonist, one of my goals is to compare the information inherent in the tone of different saxophonists and instruments by examining the distribution of upper partials in relation to a fundamental pitch. In short, I want to derive a representation of why different instrumentalists and instrument brands sound different even when playing the same pitch. Additionally I want to compare the tuning and frequency distribution of various 'alternative fingerings' against traditional or standard fingerings by the same player/instrument.
Accessing and displaying frequency information is a fairly trivial matter using the JS AudioContext.analyserNode, which I am using in conjunction with the HTML5 Canvas element to create a frequency map or 'winamp-style bargraph' similar to the one found 'Visualizations with Web Audio API' @ MDN.
PROBLEM
In order to achieve my goal I need to identify some particular information in the audio source, significantly the frequency in Hertz of the fundamental tone, for direct comparison between instrumentalists/instruments, and the frequency range of the source, to identify the frequency spectrum of the sounds I'm interested in.  That information is to be found in the variable fData below...
// example...
var APP = function() {
    // ...select source and initialise etc..
    var aCTX = new AudioContext(),
        ANAL = aCTX.createAnalyser(),
        rANF = requestAnimationFrame,
        ucID = null;
    ANAL.fftSize = 2048;
    function audioSourceStream(stream) {
        var source = aCTX.createMediaStreamSource(stream);
        source.connect(ANAL);
        var fData = new Uint8Array(ANAL.frequencyBinCount);
        (function updateCanvas() {
            ANAL.getByteFrequencyData(fData);
            // using 'fData' to paint HTML5 Canvas
            ucID = rANF(updateCanvas);
        }());
    }
};
ISSUES
While I can easily represent fData as a bar- or line-graph etc via the <canvas> API, such that the fundamental and upper partials of a sound source are clearly visible, so far I have not been able to determine...
fData (min-max Hz)fData (Hz)Without this I cannot begin to identify the dominant frequency of the source (in order to compare variations in tuning against traditional musical pitch names) and/or highlight or excluded regions of the represented spectrum (zooming in or out etc) for more detailed examination.
My intention is to prominently display the dominant frequency by pitch (note name) and frequency (Hz) and to display the frequency of any individual bar in the graph on-mouseover. N.B. I already have a data object in which all the frequencies (Hz) of the chromatic pitches between C0-B8 are stored.
Despite reading the AudioContext.analyserNode specification several times, and virtually every page on this site and MDN about this subject, I still have no firm idea about how to accomplish this portion of my task.
Bascially, how does one go about turning the values in the Uint8Array() fData into a representation of the amplitude of each frequency in Hertz which the fData array elements reflect.
Any advice, suggestions, or encouragement would be greatly appreciated.
BP
It is an AudioNode that passes the audio stream unchanged from the input to the output, but allows you to take the generated data, process it, and create audio visualizations. An AnalyserNode has exactly one input and one output.
This means that in JavaScript, we create nodes in a directed graph to say how the audio data flows from sources to sinks. (There is still no equivalent API for video. To process video on the web, we have to use hacky invisible <canvas> elements.) Our new AudioContext () is the graph.
So if your AudioContext.sampleRate is 48000 (Hertz), your frequency bins will range across [0,24000] (also in Hz). If you are using the default value of 2048 for fftSize in your AnalyserNode, then frequencyBinCount will be 1024 (it's always half the FFT size).
The AnalyserNode interface represents a node able to provide real-time frequency and time-domain analysis information. It is an AudioNode that passes the audio stream unchanged from the input to the output, but allows you to take the generated data, process it, and create audio visualizations.
So first, understand that the output of an FFT will give you an array of relative strength in frequency RANGES, not precise frequencies.
These ranges are spread out in the spectrum [0,Nyquist frequency]. The Nyquist frequency is one-half of the sample rate. So if your AudioContext.sampleRate is 48000 (Hertz), your frequency bins will range across [0,24000] (also in Hz).
If you are using the default value of 2048 for fftSize in your AnalyserNode, then frequencyBinCount will be 1024 (it's always half the FFT size). This means each frequency bin will represent (24000/1024 = 23.4) approximately 23.4Hz of range - so the bins will look something like this (off-the-cuff, rounding errors may occur here):
fData[0] is the strength of frequencies from 0 to 23.4Hz.
fData[1] is the strength of frequencies from 23.4Hz to 46.8Hz.
fData[2] is the strength of frequencies from 46.8Hz to 70.2Hz.
fData[3] is the strength of frequencies from 70.2Hz to 93.6Hz.
...
fData[511] is the strength of frequencies from 11976.6Hz to 12000Hz.
fData[512] is the strength of frequencies from 12000Hz to 12023.4Hz.
...
fData[1023] is the strength of frequencies from 23976.6Hz to 24000Hz.
Make sense so far?
The next comment that usually comes up is "Wait a second - this is less precise, musically speaking, in the bass registers (where 23.4 Hz can cover a whole OCTAVE) than the treble registers (where there are hundreds of Hz between notes)." To that I say: Yes, yes it is. That's just how FFTs work. In the upper registers, it's easier to see tuning differences.
The NEXT next comment is usually "wow, I need a MASSIVE fftSize to be precise in the bass registers." Usually, the answer is "no, you probably shouldn't do it that way" - at some point, auto-correlation is more efficient than FFTs, and it's a lot more precise.
Hope this helps point you in the right direction, add a comment if there's a followup.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With