(One intermediate revision by one other user not shown) | |||
Line 1: | Line 1: | ||
− | + | [[Category:speech]] | |
+ | [[Category:ECE438]] | ||
+ | [[Category:Digital Signal Processing]] | ||
=SupplementalSpeech_prelecture= | =SupplementalSpeech_prelecture= | ||
+ | Due to a kiwi server fail, my pre-lecture notes are not as substantial as I would have liked See my post-lecture notes for a more detailed description. | ||
+ | |||
+ | * The server failed??? When?? Zach do you know anything about this? --Mboutin 19:45, 3 November 2009 (UTC) | ||
+ | * It was from around 2pm till about 5:30pm Tuesday. When I tried to preview my page that I had started writing, it said something like "Server not available." --Pclay | ||
+ | * We will look into this. Thanks for the detailed info Peter! --Mboutin 13:33, 4 November 2009 (UTC) | ||
+ | |||
+ | |||
+ | Notes for speech lecture: | ||
+ | |||
+ | Structure: -> Basic speech stuff (pipes, fricatives) -> Voiced vs. Unvoiced | ||
+ | |||
+ | 1) avg power | ||
+ | 2) zero crossing | ||
+ | |||
+ | -> x(t) -> v(t) => s(t) = conv( x(t), v(t) ) | ||
+ | |||
+ | periodic filter phoneme | ||
+ | pulse | ||
+ | train | ||
+ | |||
+ | -> Model vocal tract as a series of tubes | ||
+ | |||
+ | - Going through tube delays the signal (show function) | ||
+ | - between tubes (show function) | ||
+ | |||
+ | + This model leads to a transfer function -> Transfer function V(d) | ||
+ | |||
+ | Since the vocal tract is a cavity that resonates, it amplifies certain frequencies | ||
+ | X(f) is sum(a_k * delta(f-kf_a)) | ||
+ | |||
+ | This frequencies, which are the local maxes of |S(f)| are called formants | ||
+ | |||
+ | - Generally, the vocal tract transfer function is an all-pole filter | ||
+ | where a real pole or a complex pole pair correspond to a resonance. | ||
+ | - Also, if you are given a z-model, F = theta / (2*pi*T) where T is | ||
+ | the sampling period. (same thing as wT = theta | ||
+ | |||
+ | - zeros, anti-resonances, of the transfer function will occur when there is no | ||
+ | measurable output (i.e. Nasals and Fricatives) | ||
+ | - Nasal => output from the mouth is zero | ||
+ | Fricatives/stop consonants => blockage behind source is infinite (forcing air | ||
+ | through constriction) | ||
+ | |||
+ | |||
+ | -> Spectrograms | ||
+ | - Models frequency vs. time | ||
+ | - Use a short-time DTFT to obtain useful info about an utterance | ||
+ | X_m(e^jw) = sum( x(n)w(n-m)e^(-jwn)) | ||
+ | - wideband uses window length = one period | ||
+ | - high time resolution, low freq | ||
+ | - striations due to energy variation | ||
+ | - narrowband captures several periods | ||
+ | - high freq, low time | ||
+ | - striations correspond to peaks in frequency spectrum. | ||
− | + | The formants correspond to the dark bands. | |
+ | -> How to read a spectrogram by Rob Hagiwara | ||
+ | http://home.cc.umanitoba.ca/~robh/howto.html | ||
[[ Student summary speech|Back to Student summary speech]] | [[ Student summary speech|Back to Student summary speech]] |
Latest revision as of 06:47, 14 November 2011
SupplementalSpeech_prelecture
Due to a kiwi server fail, my pre-lecture notes are not as substantial as I would have liked See my post-lecture notes for a more detailed description.
* The server failed??? When?? Zach do you know anything about this? --Mboutin 19:45, 3 November 2009 (UTC) * It was from around 2pm till about 5:30pm Tuesday. When I tried to preview my page that I had started writing, it said something like "Server not available." --Pclay * We will look into this. Thanks for the detailed info Peter! --Mboutin 13:33, 4 November 2009 (UTC)
Notes for speech lecture:
Structure: -> Basic speech stuff (pipes, fricatives) -> Voiced vs. Unvoiced
1) avg power 2) zero crossing
-> x(t) -> v(t) => s(t) = conv( x(t), v(t) )
periodic filter phoneme pulse train
-> Model vocal tract as a series of tubes
- Going through tube delays the signal (show function) - between tubes (show function)
+ This model leads to a transfer function -> Transfer function V(d)
Since the vocal tract is a cavity that resonates, it amplifies certain frequencies X(f) is sum(a_k * delta(f-kf_a))
This frequencies, which are the local maxes of |S(f)| are called formants
- Generally, the vocal tract transfer function is an all-pole filter where a real pole or a complex pole pair correspond to a resonance. - Also, if you are given a z-model, F = theta / (2*pi*T) where T is the sampling period. (same thing as wT = theta
- zeros, anti-resonances, of the transfer function will occur when there is no measurable output (i.e. Nasals and Fricatives) - Nasal => output from the mouth is zero Fricatives/stop consonants => blockage behind source is infinite (forcing air through constriction)
-> Spectrograms
- Models frequency vs. time - Use a short-time DTFT to obtain useful info about an utterance X_m(e^jw) = sum( x(n)w(n-m)e^(-jwn)) - wideband uses window length = one period - high time resolution, low freq - striations due to energy variation - narrowband captures several periods - high freq, low time - striations correspond to peaks in frequency spectrum.
The formants correspond to the dark bands.
-> How to read a spectrogram by Rob Hagiwara
http://home.cc.umanitoba.ca/~robh/howto.html