Line 1: Line 1:
 +
[[Category:speech]]
 +
[[Category:ECE438]]
 +
[[Category:Digital Signal Processing]]
 +
 
=SupplementalSpeech_prelecture=
 
=SupplementalSpeech_prelecture=
  

Latest revision as of 06:47, 14 November 2011


SupplementalSpeech_prelecture

Due to a kiwi server fail, my pre-lecture notes are not as substantial as I would have liked See my post-lecture notes for a more detailed description.

   * The server failed??? When?? Zach do you know anything about this? --Mboutin 19:45, 3 November 2009 (UTC)
   * It was from around 2pm till about 5:30pm Tuesday. When I tried to preview my page that I had started writing, it said something like "Server not available." --Pclay
   * We will look into this. Thanks for the detailed info Peter! --Mboutin 13:33, 4 November 2009 (UTC) 


Notes for speech lecture:

Structure: -> Basic speech stuff (pipes, fricatives) -> Voiced vs. Unvoiced

1) avg power 2) zero crossing

-> x(t) -> v(t) => s(t) = conv( x(t), v(t) )

periodic filter  phoneme
pulse 
train

-> Model vocal tract as a series of tubes

- Going through tube delays the signal (show function) - between tubes (show function)

+ This model leads to a transfer function -> Transfer function V(d)

Since the vocal tract is a cavity that resonates, it amplifies certain frequencies X(f) is sum(a_k * delta(f-kf_a))

This frequencies, which are the local maxes of |S(f)| are called formants

- Generally, the vocal tract transfer function is an all-pole filter
  where a real pole or a complex pole pair correspond to a resonance.
- Also, if you are given a z-model, F = theta / (2*pi*T) where T is 
  the sampling period. (same thing as wT = theta
- zeros, anti-resonances, of the transfer function will occur when there is no 
  measurable output (i.e. Nasals and Fricatives)  
- Nasal => output from the mouth is zero
  Fricatives/stop consonants => blockage behind source is infinite (forcing air 
  through constriction)
  

-> Spectrograms

 - Models frequency vs. time
 - Use a short-time DTFT to obtain useful info about an utterance
   X_m(e^jw) = sum( x(n)w(n-m)e^(-jwn))
 - wideband uses window length = one period
   - high time resolution, low freq
   - striations due to energy variation
 - narrowband captures several periods
   - high freq, low time
   - striations correspond to peaks in frequency spectrum. 
The formants correspond to the dark bands.

-> How to read a spectrogram by Rob Hagiwara

 http://home.cc.umanitoba.ca/~robh/howto.html


Back to Student summary speech

Alumni Liaison

Ph.D. on Applied Mathematics in Aug 2007. Involved on applications of image super-resolution to electron microscopy

Francisco Blanco-Silva