Latest revision as of 06:47, 14 November 2011

SupplementalSpeech_prelecture

Due to a kiwi server fail, my pre-lecture notes are not as substantial as I would have liked See my post-lecture notes for a more detailed description.

   * The server failed??? When?? Zach do you know anything about this? --Mboutin 19:45, 3 November 2009 (UTC)
   * It was from around 2pm till about 5:30pm Tuesday. When I tried to preview my page that I had started writing, it said something like "Server not available." --Pclay
   * We will look into this. Thanks for the detailed info Peter! --Mboutin 13:33, 4 November 2009 (UTC)

Notes for speech lecture:

Structure: -> Basic speech stuff (pipes, fricatives) -> Voiced vs. Unvoiced

1) avg power 2) zero crossing

-> x(t) -> v(t) => s(t) = conv( x(t), v(t) )

periodic filter  phoneme
pulse 
train

-> Model vocal tract as a series of tubes

- Going through tube delays the signal (show function) - between tubes (show function)

+ This model leads to a transfer function -> Transfer function V(d)

Since the vocal tract is a cavity that resonates, it amplifies certain frequencies X(f) is sum(a_k * delta(f-kf_a))

This frequencies, which are the local maxes of |S(f)| are called formants

- Generally, the vocal tract transfer function is an all-pole filter
  where a real pole or a complex pole pair correspond to a resonance.
- Also, if you are given a z-model, F = theta / (2*pi*T) where T is 
  the sampling period. (same thing as wT = theta

- zeros, anti-resonances, of the transfer function will occur when there is no 
  measurable output (i.e. Nasals and Fricatives)  
- Nasal => output from the mouth is zero
  Fricatives/stop consonants => blockage behind source is infinite (forcing air 
  through constriction)

-> Spectrograms

 - Models frequency vs. time
 - Use a short-time DTFT to obtain useful info about an utterance
   X_m(e^jw) = sum( x(n)w(n-m)e^(-jwn))
 - wideband uses window length = one period
   - high time resolution, low freq
   - striations due to energy variation
 - narrowband captures several periods
   - high freq, low time
   - striations correspond to peaks in frequency spectrum.

The formants correspond to the dark bands.

-> How to read a spectrogram by Rob Hagiwara

 http://home.cc.umanitoba.ca/~robh/howto.html

Back to Student summary speech

Difference between revisions of "SupplementalSpeech prelecture" - Rhea

Latest revision as of 06:47, 14 November 2011

SupplementalSpeech_prelecture

Alumni Liaison

@@ Line 1: / Line 1: @@
+[[Category:speech]]
+[[Category:ECE438]]
+[[Category:Digital Signal Processing]]
 =SupplementalSpeech_prelecture=
+Due to a kiwi server fail, my pre-lecture notes are not as substantial as I would have liked See my post-lecture notes for a more detailed description.
+    * The server failed??? When?? Zach do you know anything about this? --Mboutin 19:45, 3 November 2009 (UTC)
+    * It was from around 2pm till about 5:30pm Tuesday. When I tried to preview my page that I had started writing, it said something like "Server not available." --Pclay
+    * We will look into this. Thanks for the detailed info Peter! --Mboutin 13:33, 4 November 2009 (UTC)
+Notes for speech lecture:
+Structure: -> Basic speech stuff (pipes, fricatives) -> Voiced vs. Unvoiced
+) avg power
+) zero crossing
+-> x(t) -> v(t) => s(t) = conv( x(t), v(t) )
+ periodic filter  phoneme
+ pulse
+ train
+-> Model vocal tract as a series of tubes
+- Going through tube delays the signal (show function)
+- between tubes (show function)
++ This model leads to a transfer function -> Transfer function V(d)
+Since the vocal tract is a cavity that resonates, it amplifies certain frequencies
+X(f) is sum(a_k * delta(f-kf_a))
+This frequencies, which are the local maxes of |S(f)| are called formants
+ - Generally, the vocal tract transfer function is an all-pole filter
+   where a real pole or a complex pole pair correspond to a resonance.
+ - Also, if you are given a z-model, F = theta / (2*pi*T) where T is
+   the sampling period. (same thing as wT = theta
+ - zeros, anti-resonances, of the transfer function will occur when there is no
+   measurable output (i.e. Nasals and Fricatives)
+ - Nasal => output from the mouth is zero
+   Fricatives/stop consonants => blockage behind source is infinite (forcing air
+   through constriction)
+-> Spectrograms
+  - Models frequency vs. time
+  - Use a short-time DTFT to obtain useful info about an utterance
+    X_m(e^jw) = sum( x(n)w(n-m)e^(-jwn))
+  - wideband uses window length = one period
+    - high time resolution, low freq
+    - striations due to energy variation
+  - narrowband captures several periods
+    - high freq, low time
+    - striations correspond to peaks in frequency spectrum.
-Put your content here . . .
+ The formants correspond to the dark bands.
+-> How to read a spectrogram by Rob Hagiwara
+  http://home.cc.umanitoba.ca/~robh/howto.html
 [[ Student summary speech|Back to Student summary speech]]