INTRODUCING SPECTROGRAMS

A spectrum is always made over a finite time interval, which may be as long as the full length of the signal or may be only a short part of it. Therefore, an individual spectrum provides no information about temporal changes in frequency composition during the interval over which it is made. If this spectrum is made over a very short time interval (e.g.: few milliseconds), it shows an "instantaneous" frequency pattern, but again we can't get an idea of time evolution of our waveform. To see how frequency composition changes over time we need to examine a sound spectrogram, which is the representation of more spectra, computed on consecutive or overlapping segments of the signal. A spectrogram shows the evolution in time of sound frequency structure. Generally a spectrogram produces a plot where the frequency is on the vertical axis and time on the horizontal axis; the amplitude of a given frequency at a given time is represented by a grayscale value between white (amplitude = 0) and black (max amplitude).

Fig. 5a is the spectrum of one hoot uttered by a Long Eared Owl (Asio Otus). This spectrum is computed over 250 milliseconds of waveform, that is the entire duration of the bird call. Analyzing the spectrum plotted in Fig. 5a, we realize that the dominant frequency (the frequency associated with the maximum amplitude) is equal to 375 Hz and there are not other peak frequencies in this call.
The spectrogram of the same call depicted in Fig. 5b adds some more useful informations: at the beginning the frequency - we could also speak in term of tone or pitch of the bird voice - rises from 250 Hz to the peak value (the darker zone around 375 Hz) and at the end falls again to 250 Hz. We could also have an estimate of the time needed to this rise and fall of tone. It is clear that a spectrogram gives us a more complete information about the structure of the sound studied.
To explain better the spectrogram operation, we can imagine splitting the signal into successive short time intervals or frames - which may overlap each other in time - generating a series of spectra that approximate the instantaneous spectrum of the signal at successive moments in time. (Bibliography 3) Fig. 6 illustrates this concept: the spectrogram is of the same owl call as in Fig. 5b. The difference between these two spectrograms is the different degree of overlap: in Fig. 5b there is a consistent overlap (usually expressed as percentage of the frame length), while in Fig. 6 the overlap is zero.

The precision of the spectrogram of a waveform depends on vertical axis resolution (frequency resolution) and horizontal axis resolution (time resolution). Ideally one should have a high resolution on both these parameters, but this is impossible: the frequency resolution and time resolution cannot be varied independently. The FFT length determines these two parameters: a short FFT (here short means with less points sampled, e.g. 128) yields a spectrogram with a fine time resolution, but with a poor frequency resolution. Conversely a longer FFT (e.g. 2048 points sampled) results in a better frequency resolution and a poor time resolution. This phenomenon is caused by the uncertainty principle, which states that a spectrum (and hence a spectrogram) can't have both fine frequency and time resolution. The inverse relationship between FFT length and frequency resolution is determined by this formula: frequency Resolution = (sampling frequency)/FFT size.
What is the best analysis resolution to choose? It depends on what kind of information is important to show in that particular spectrogram. Of course the characteristics of the signal to be analyzed are also crucial: e.g. rapid changes in frequency needs a shorter frame length. On the contrary if you need a more precise frequency representation a longer frame length is a better choice.
However, if the features you are interested in are distinguishable in the waveform (e.g. the beginning or end of a call, or some other rapid change in amplitude), you'll achieve the best precision by making TIME measurements on the waveform rather than on the spectrogram (Bibliography 3). Thus it is very important that a spectrogram is plotted TOGHETER WITH ITS CORRESPONDENT WAVEFORM. A further reason to combine these two representation of an acoustic signal is that watching to the waveform, one can easily realize the recorded signal quality (in this case quality means signal to noise ratio).
Fig. 7 below illustrates this point: the two calls recorded are from a Scops Owl (Otus scops). The waveform A indicates a poor recording quality (the sound source was far from the microphone); the correspondent spectrogram A doesn't show any frequency above the dominant of 1600 Hz. Notice also the consistent presence of background noise: the gray band between 0 and 750 Hz. The waveform B indicates an optimal recording quality: the signal to noise ratio is very high, so you can see only a thin gray line above 0 Hz in the spectrogram B, which shows also some high frequency components reaching 3500 Hz. It is also clear that one can measure the call duration in waveform B with the best precision. If these spectrograms were plotted without showing the correspondent waveform, they could suggest some misleading conclusions.

Next page - Previous page - return to top