Copyright c Sidney Wood. All rights reserved. Beginners guide to Praat. Click here to go directly to a quick guide for spectral analysis. Praat makes spectrograms by analysing the spectrum of the speech waveform at regular time intervals, or time stepsalong the speech signal. The method in Sound editor is the Fourier transform. Spectrograms based on Linear prediction are only available from the Objects window. The spectrograms are small here to make them fit this web page. Back to the top.
A male and a female speaker are used to illustrate the creation of spectrograms in the Sound editor. The second is a Swedish adult female speaker saying ett forskningsprojekt "a research project" :. This section mainly describes how to make spectrograms in the Sound editorbut there is also a brief account of FFT spectrograms in the Objects window.
First, load your signal into the Objects window as a Sound objectselect it, and click Edit.
Speech Signal Analysis Using Praat
The Sound editor opens. There is a long and a short way to do this. The short way is to open the View menu and select Show analyses.
Note also the setting Longest analysisdefault 5 seconds. There is one more step, to select the analysis settings you want for your spectrogram. Open the Spectrum menu and select Spectrogram settings :. Note that there are two setting items, Spectrogram settings and Advanced spectrogram settings. The default settings will give you a spectrogram that will look something like this for the male and female examples:.
There are one or two things you can do to optimise the appearance of the spectrogram, so that you can extract the maximum of information from it when you come to study it in detail: The resolution or definition detail sharpness of the spectrogram image The exclusion of background noise so that you only see speech detail.With Spectrogram settings These settings will be remembered across Praat sessions. All these settings have standard values "factory settings"which appear when you click Standards.
To see how the window length influences the bandwidth, first create a Hz sine wave with Create Sound from formula The spectrogram will show a horizontal black line. You can now vary the window length in the spectrogram settings and see how the thickness of the lines varies. The line gets thinner if you raise the window length. Apparently, if the analysis window comprises more periods of the wave, the spectrogram can tell us the frequency of the wave with greater precision.
To see this more precisely, create a sum of two sine waves, with frequencies of and Hz. In the editor, you will see a single thick band if the analysis window is short 5 msand two separate bands if the analysis window is long 30 ms.
Apparently, the frequency resolution gets better with longer analysis windows. So why don't we always use long analysis windows? The answer is that their time resolution is poor. To see this, create a sound that consists of two sine waves and two short clicks. The formula is 0. If you view this sound, you can see that the two clicks will overlap in time if the analysis window is long, and that the sine waves overlap in frequency if the analysis window is short. Apparently, there is a trade-off between time resolution and frequency resolution.
One cannot know both the time and the frequency with great precision. The Spectrum menu also contains Advanced spectrogram settingsWelcome to the Monthly Mystery Spectrogram webzone. These pages are Rob Hagiwara's professional web-space.
For personal musings, please see Rob's blog. May Commentary about stuff that I plan to change, or would like some in put on, is interspersed throughout this version of the page, in this goofy text. Depending on your browser, I think this is rendered in colo u r.
General stuff changing throughout:. Or you can just read this summary, but bear in mind there's going to be a lot left out, especially in the 'why' realm.
Then as usual learn by doing!
The goal of this page is to provide just enough basic information for the novice to begin, perhaps with some guidance, the process of decoding the monthly mystery spectrogram. This page is not intended to be the last word in spectrographic analysis in general, nor even the last word on spectrogram reading. However, reasoning your way through a mystery spectrogram is very instructive, especially in relating acoustic events with presumed articulatory ones.
That is, in relating physical sounds with speech production. If you're reading this, I assume you are familiar with basic articulatory phonetics, phonetic transcription, the International Phonetic Alphabet, and the surface phonology of 'general' North American English i. I try to keep in mind that I have an international audience, but there are some details I just take to be 'given' for English.
Someday if we do spectrograms of other languages, we'll have to adjust. I really recommend that beginners find someone to discuss spectrographic issues with.
If you're doing spectrograms as part of a class, form a study group. If you're a 'civilian', form a club. Or something. I'm toying with the idea of starting a Yahoo group or something for us to do some discussions as 'community'. Strong opinions anyone? Unfortunately, I don't have time to answer in detail every e-mail I receive about specific spectrograms or sounds or features, but if you have a general question or suggestions, please feel free to contact me.
These fonts are in my opinion the best available freeware fonts for IPA-ing in Unicode for the web. Please see my list of currently supported fonts for justification and links to download these fonts. A sound spectrogram or sonogram is a visual representation of an acoustic signal. To oversimplify things a fair amount, a Fast Fourier transform is applied to an electronically recorded sound.
This analysis essentially separates the frequencies and amplitudes of its component simplex waves. A long window resolves frequency at the expense of time—the result is a narrow band spectrogramwhich reveals individual harmonics component frequenciesbut smears together adjacent 'moments'.
If a short analysis window is used, adjacent harmonics are smeared together, but with better time resolution. The result is a wide band spectrogram in which individual pitch periods appear as vertical lines or striationswith formant structure.A spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time.
When applied to an audio signalspectrograms are sometimes called sonographsvoiceprintsor voicegrams. When the data is represented in a 3D plot they may be called waterfalls.
Spectrograms are used extensively in the fields of musicsonarradarand speech processing seismologyand others. Spectrograms of audio can be used to identify spoken words phoneticallyand to analyse the various calls of animals.
A spectrogram can be generated by an optical spectrometera bank of band-pass filtersby Fourier transform or by a wavelet transform in which case it is also known as a scaleogram or scalogram. A spectrogram is usually depicted as a heat mapi. A common format is a graph with two geometric dimensions: one axis represents timeand the other axis represents frequency ; a third dimension indicating the amplitude of a particular frequency at a particular time is represented by the intensity or color of each point in the image.
There are many variations of format: sometimes the vertical and horizontal axes are switched, so time runs up and down; sometimes as a waterfall plot where the amplitude is represented by height of a 3D surface instead of color or intensity.
The frequency and amplitude axes can be either linear or logarithmicdepending on what the graph is being used for. Audio would usually be represented with a logarithmic amplitude axis probably in decibelsor dBand frequency would be linear to emphasize harmonic relationships, or logarithmic to emphasize musical, tonal relationships. Spectrogram of this recording of a violin playing.
Note the harmonics occurring at whole-number multiples of the fundamental frequency. Spectrogram of dolphin vocalizations; chirps, clicks and harmonizing are visible as inverted Vs, vertical lines and horizontal striations respectively.
Spectrogram of an FM signal. In this case the signal frequency is modulated with a sinusoidal frequency vs. Spectrogram of great tit song. Spectrogram of the soundscape ecology of Mount Rainier National Parkwith the sounds of different creatures and aircraft highlighted.
Spectrograms of light may be created directly using an optical spectrometer over time. Spectrograms may be created from a time-domain signal in one of two ways: approximated as a filterbank that results from a series of band-pass filters this was the only way before the advent of modern digital signal processingor calculated from the time signal using the Fourier transform.
These two methods actually form two different time—frequency representationsbut are equivalent under some conditions. The bandpass filters method usually uses analog processing to divide the input signal into frequency bands; the magnitude of each filter's output controls a transducer that records the spectrogram as an image on paper.
Creating a spectrogram using the FFT is a digital process. Digitally sampled data, in the time domainis broken up into chunks, which usually overlap, and Fourier transformed to calculate the magnitude of the frequency spectrum for each chunk.
Each chunk then corresponds to a vertical line in the image; a measurement of magnitude versus frequency for a specific moment in time the midpoint of the chunk. These spectrums or time plots are then "laid side by side" to form the image or a three-dimensional surface,  or slightly overlapped in various ways, i. From the formula above, it appears that a spectrogram contains no information about the exact, or even approximate, phase of the signal that it represents.
For this reason, it is not possible to reverse the process and generate a copy of the original signal from a spectrogram, though in situations where the exact initial phase is unimportant it may be possible to generate a useful approximation of the original signal. The Pattern Playback was an early speech synthesizer, designed at Haskins Laboratories in the late s, that converted pictures of the acoustic patterns of speech spectrograms back into sound.
In fact, there is some phase information in the spectrogram, but it appears in another form, as time delay or group delay which is the dual of the Instantaneous Frequency [ citation needed ].
The size and shape of the analysis window can be varied. A smaller shorter window will produce more accurate results in timing, at the expense of precision of frequency representation. A larger longer window will provide a more precise frequency representation, at the expense of precision in timing representation. From Wikipedia, the free encyclopedia.
Visual representation of the spectrum of frequencies of a signal as it varies with time. For the hardware, see Waterfall display hardware. For the musical recording, see Sonograph EP. For the scientific instrument, see Spectrograph.
Spectrogram of a gravitational wave GWIn dipthongs, you can see the formants change frequency as the tongue body moves through the mouth:. You can't always tell reliably which formant you're looking at -- F1, F2, F3, etc. But the existence of formants is usually obvious enough that you can at least be sure you're looking at a vowel. There are some especially common difficulties in identifying formants. In [i]F2 and F3 also often appear merged together in a single wide band.
Fricatives are easy. The turbulent airstream of fricatives creates a chaotic mix of random frequencies, each lasting for a very brief time. The result sounds much like static noise, and on a spectrogram it looks like the kind of static noise you might see on a TV screen.
While each momentary burst of energy occurs at a random frequency, there are tendencies in which frequencies the random bursts cluster around. Voiced fricatives show aspects of both regular vocal fold vibrations and a randomly turbulent airstream. On a spectrogram, it looks a little like a cross between a fricative and a vowel. It will have a lot of random noise that looks like static, but through the static you can usually see the faint bands of the voiceless vowel's formants.
The medial phase of a voiceless plosive is complete silence. On a spectrogram, this will appear as a white blank. The quiet vocal fold vibrations in a voiced plosive will sometimes appear as a faint band along the bottom of the spectrogram at the frequency of f0.
But very often you won't see anything there, either because the voicing got lost in the background noise or because the recording or computer equipment cut off frequencies that low. To tell the difference between plosives, listeners rely on the release burst and on formant transitions. On a spectrogram, the release burst looks like a very, very thin fricative.
The formant transitions if you can see them look like the formants have been distorted away from the frequencies they have during most of the vowel. Aspiration will look like a period of [h] between the blank gap and the vowel -- specifically, a voiceless version of the following vowel. Recall that the tongue body is in position for the following vowel and that aspiration is just a delay in the onset of voicing. NB: Aspiration is not the same as the release burst.
The period of aspiration which only some voiceless plosives have is much longer than the very short release burst which all released plosives have. Nasals and [l] usually look like quite faint vowels, without a lot of amplitude in the higher frequencies.PHO_210 - Reading Spectrograms: Consonants
You can still see some things that look like formants. But the acoustic properties of tubes with branches and side-chambers are much more complicated, with anti-formants as well as formants, so the formant bands will appear in different positions and usually be fainter.
Which nasal or lateral it is usually isn't something you can figure out looking at just a spectrogram. Right at the end of the vowel, you can see F2 and F3 start to approach one another in a formant transition pattern often called the "velar pinch" that usually marks the onset phase of a velar consonant.The assignment is to use the Praat computer program to measure F1 and F2 for three tokens of several English vowels.
You should hand in:. Download Praat from Paul Boersma's website at www. Follow the instructions there. Record three words for each of the following vowels. Start the Praat program. You'll see three windows: a title window that disappears right away, the main "objects" window where most of the work gets doneand a "picture" window for drawing fancy printable diagrams. You don't need the picture window, so close it. Make sure you have a microphone hooked up to your computer.
From the main window, choose "New" from the window, then "Record mono sound". This will open up a new "SoundRecorder" window. Click the "Record" button when you're ready to start speaking. While you're speaking, you should see some green bouncing up and down in the vertical white "Meter" stripe otherwise your computer isn't getting any sound from your microphone.
Speak the list of words. Use a natural speech rate and style -- not too fast, not too slow, not artificially careful. Don't worry if you make a mistake -- just repeat the word and you can skip over the error when you're doing the measurements.
When you've finished speaking, click the "Stop" button. Click "Play" and listen to the results to make sure they're OK.
Then click the "Save to list" button near the bottom. This sends the newly recorded recording off to the list of objects in Praat's main window where you should now see "Sound sound" in the list.
Name the file after yourself e. Click the "Edit" button in the main Praat window. This will open a new sound-editor window -- you won't be editting the sound, but the window shows you a lot stuff conveniently in one place. The scrollbar at the bottom lets you move the part of the spectrogram that is visible forward and backward in time. Clicking on one of the three grey horizontal bars near the bottom will play the sound -- the bottom bar plays the entire sound, the middle bar just the part of the sound whose spectrogram is in the window right now, and the top bar various subparts of the visible sound.To select Spectrogram view, click on the track name or the black triangle in the Track Control Panel which opens the Track Dropdown Menu.
The waveform view can be switched to a Spectrogram view by clicking on the track name or the black triangle in the Track Control Panel which opens up the Track Dropdown Menu where the spectrogram view can be selected. To demonstrate how the various settings affect the appearance of an audio track in spectrogram view, we will start with this artificially constructed test track.
It consists of 10 segments of a sine wave tone at Hz, each 2 seconds long. The level of each segment in dB is indicated by the labels below the audio track. As you can clearly see, the minimum and maximum frequency settings determine the minimum and maximum frequencies displayed, as indicated in the track vertical scale. Gain can be said to increase the "brightness" of the display. It does this by amplifying the signal by the indicated amount.
With the default setting of 20 dB, any frequency band that originally had before amplification a level of dB or greater and now, after amplification has a level greater than 0 dB will be displayed as white. Similarly the "lower" level bands will also "get brighter". There are six color bands in spectrogram view: white, red, magenta, dark blue, light blue and gray.
The Range setting determines the spacing between colors. There is an inherent trade-off between frequency resolution and time resolution. The image below shows the spectrogram view of a pure Hz tone with two clicks very close together. With a window size of we can see the two clicks. Changing the Window Size to results in better frequency resolution the white band is narrower. However the time resolution is worse. The two clicks have been smeared together into one.
The image below shows the spectrogram view of a musical note with many overtones. With a window size of the overtones are not clear.
When choosing which window size to use, the general rules are: if you need good time resolution for example to find clicks use a smaller window size if you need good frequency resolution for example to find an annoying tone use a larger window size. You can zoom in on the vertical frequency axis by left-clicking in the Vertical Scale and using the magnifiers when these are enabled in Tracks Behaviors Preferences.
Alternatively you can right-click in the Vertical scale to bring up a dropdown context menu which has commands for vertical zooming :. Changing to the Blackman-Harris Window Type gets rid of much of the spectral leakage at the expense of lower frequency resolution note that the red band near the 2. Changing to a rectangular window causes the track to be redrawn a little faster at the expense of very bad spectral leakage. However, the frequency resolution is better the red band near the 2.
The is no "right" window type.