For my employ, I have been charged with the task of developing a new data acquisition system for the electrophyisiology set-up in the lab. We are an auditory neuroscience lab, thus, we have a need to present sound stimuli and measure the brain response. An important feature for the new system was to be able to calibrate it to present the sounds more accurately than the old system. The following in an overview of my journey into digital signal processing. I honestly didn’t even know what DSP was when I started, and thus had a gross under-estimate of how long it would take me to complete this quest. However old and tired I may be now, I have arrived at my destination, and am, at least, a little wiser.
An inherent characteristic of audio speakers is that all frequencies that are played through the speakers do not come out at equal intensities for the same power delivered to the speakers. What the shape of this frequency vs. intensity curve looks like, will depend on the type of speakers being employed. Our lab deals with high frequency stimuli, and thus we use tweeters, which are meant for use in higher frequency output. This is opposed to mid-range or woofers – which are what you have encountered when that guy rolls up at a stop light and is emitting bass that detaches your organs from their fascia. Often accompanied with an impeccable taste in music.
Ultrasonic frequencies(>20kHz) are greatly attenuated, as compared to human hearing range (20Hz-20kHz). Since we are studing the audiory system of organisms that can hear in this range (e.g. mice, bats), it is important to know how this speaker frequency roll-off may be affecting the sound stimuli we use for experiments, and, if possible, compensate for it. If we generate a series of pure tones (single frequency sounds) at the same amplitude, and play them out of a speaker, the resultant intensity as measured through a decibel meter may look like this:
This means that if we have a stimulus with many frequency components (e.g. animal vocalizations), parts of the stimulus will be attenuated more than others. The implication is that we could lose some frequency components of the sound, and not others, to below a hearing threshold. We want to do our best to recreate a stimulus as close to a native (non-recording) sound as possible.
Currenly in the lab, such a series of tones is generated, and manually written down, with quill and parchment, for each tone using the dB meter reading. This is later taken into account when doing data analysis.
From this starting point, I went to automate the process. I wrote a program that would loop through a series of tones, and record their amplitude through a microphone. I used a dB meter to manually measure the dB SPL of a single reference frequency and set this as 0 attenuation.
I used the peak voltage (RMS) of the time domain signal recieved from the microphone to determine the intensity. The dB attenuation was calculated relative to the amplitude of the reference frequency. In series of our output signal, is a signal attenuator, which will reduce the signal intensity by a specified dB amount. So, I can then run the same series of tones, only using the results from the roll-off curve to attenuate each tone according to the inverse of the result for that frequency. This does work for pure tones.
What about the frequencies we did not explicitly measure with a tone? What about stimuli with multiple frequency components?
A partial solution to this, is to use as small of a frequency step in the tone series as we have patience for, and interpolate for all the others. This works, but still if we are presenting only pure tones.
What I needed to do was to break down the stimuli into their frequency components and compensate in the frequency domain.
That mathmatical operation used to convert a signal from a series of time samples to a list of frequency coefficients of its component sinusoids, is the Discrete Fourier Transform (DFT). The DFT, however has O(N2) complexity ☹. Luckily, some clever people came up with the Fast Fourier Transform (FFT), which is an alogrithm that can effectively compute the DFT in O(NLogN) time.
I can discover a frequency attenuation curve by generating a stimulus which contains energy at all frequencies with constant power spectral density, e.g. white noise or a frequency sweep (aka chirp). By comparing the FFT of the output signal and its recorded response, I can get the attenuation curve of the speaker with much better frequency resolution, and in much less time. Win.
The following function calculates a dB frequency attenuation vector for a given desired output signal, and it’s recorded response:
This gets me to where I was before, albeit with a better resolution of attenuation curve. However, now we have figured out that we can use Fouier analysis to get at the composite frequencies in signals. So, we can use the attenuation vector to adjust the relative frequency intensities in the output signal by the appropriate amount, for it to come out of the speaker with the intended frequency spectrum. Explicitly, that is, convert the output signal into the frequency domain, multiply it by the attenuation vector, and convert back into the time domain.
Here is a simple function that will do this:
This little function has a few short falls, however.
As a matter of practicality, I want to restrict the calibration to a range for which I care about, 5 - 100kHz. This way we do not try to boost frequencies which are out of range of our speakers altogether, or end up with a stimulus that has a prohibitively large amplitude as it tries to adjust high frequencies for which the intensity roll-off is very high. So, we should only apply the frequency adjustment for the frequency range of interest.
Smoothing my attenuation vector before I multiply it against my output signal
reduces noise we captured during the calibration recording, and also has the
benefit of smoothing out the edges of our frequency application range. You can
find example code for the
smooth function in the scipy
cookbook. I experimented with
values and window types to find what got me the best results.
I also want to create a more general solution, so I want to be able to output signals with different samplerates using the same attenuation vector. To do this, I take as input the calibration frequencies and the output signal samplerate, and interpolate to get the calibration attenuation vector into the correct frequency steps.
Thus, the previous function becomes:
Now to bring it all together.
To test this, I can do some before and after calibration signals and compare their responses. I think a chirp is most interesting to look at. I also just like to say “chirp”. Chirp. To keep it simple, I have used the same chirp signal for calibration recording and the adjustment test, though this is not necessary.
Definite improvement. The extra energy visible in the last plot is due to harmonics and aliasing. My sampling rate is more than adequate for the stimuli which I need to present, so it’s not something to worry about.
The biggest downside to this method, is that it is computationally expensive. My application needs to prepare several hundred unqiue stimuli before presenting them. In the middle of an experiment, time lost is data lost. Of course, when I am rapidly changing and testing a piece of code I use a small example set of data. When I went to show this to a lab member (“Hey look what I can do!”) they arranged a set of data more like what they would use in an actual experiment. The look on the scientist’s face as the program takes 5 minutes to calibrate a few hundred short stimuli told me “unimpressed”. So I had to do something different, which lead me further into the depths of DSP…
To be continued.
Wow, if you made this far, you deserve a cookie! The delicious kind, not the I’m -watching-you kind.
Note: A resource that I found useful for understanding the concepts here (and the next section especially) was Steven W. Smith’s free online book The Scientist and Engineer’s Guide to Digital Signal Processing. Also, Thanks to Kinga for helping me figure out the math.