Contents
Homework 10, ECE438, Fall 2014, Prof. Boutin
Question 1
Consider a model of the vocal tract consisting of three tubes of equal length l connected to a first tube that is infinitely thin. Assume that l=Tc where c is the speed of sound and T is the period at which you sample the airflow throughout the model.
a) Obtain the transfer function of this model of the vocal tract. (You may use the matrix equations for the tube junction/time delay obtained in class without justification.)
$ r_0=\frac{A_1-A_0}{A_1+A_0}=1 $
$ r_1=\frac{A_2-A_1}{A_2+A_1} $
$ r_2=\frac{A_3-A_2}{A_3+A_2} $
$ \left( \begin{array}{c} Ad\left(z \right) \\ Cd\left(z \right) \end{array} \right)= $$ \frac{1}{2}\left( \begin{array} {cc} 1&-1 \\ -1&1 \end{array} \right) z\left( \begin{array} {cc} 1&0 \\ 0&z^{-2} \end{array} \right) \frac{1}{1+r_1} \left( \begin{array}{cc} 1&-r_1 \\ -r_1&1 \end{array} \right) z\left( \begin{array} {cc} 1&0 \\ 0&z^{-2} \end{array} \right) \frac{1}{1+r_2} \left( \begin{array}{cc} 1&-r_2 \\ -r_2&1 \end{array} \right) z\left( \begin{array} {cc} 1&0 \\ 0&z^{-2} \end{array} \right) \frac{1}{2}\left( \begin{array} {cc} 1&-1 \\ -1&1 \end{array} \right) \left( \begin{array}{c} Bd\left(z \right) \\ 0 \end{array} \right) $
$ \left( \begin{array}{c} Ad\left(z \right) \\ Cd\left(z \right) \end{array} \right)= \frac{z^{3}}{4\left(1+r_1\right)\left(1+r_2\right)} \left(\begin{array} {cc} X&Y \\ Y&X \end{array} \right) \left( \begin{array}{c} Bd\left(z \right) \\ 0 \end{array} \right) $
where $ X=\frac{r_2 \left( \frac{r_1}{z^{2}}+1 \right) + \frac{r_1+\frac{1}{z^{2}}}{z^{2}}}{z^{2}} + \frac{r_1}{z^{2}} + \frac{r_2\left(r_1+\frac{1}{z^{2}} \right)}{z^{2}} + 1 $
$ Y=\frac{r_2 \left( \frac{r_1}{z^{2}}+1 \right) + \frac{r_1+\frac{1}{z^{2}}}{z^{2}}}{z^{2}} - \frac{r_1}{z^{2}} - \frac{r_2\left(r_1+\frac{1}{z^{2}} \right)}{z^{2}} - 1 $
$ H\left( z \right) = \frac{Bd\left( z \right)}{Ad \left( z \right)}=\frac{1}{X} $
Simplify X, we get
$ H\left( z \right) = \frac{4z^{3}\left( 1+r_1 \right) \left( 1+r_2 \right)}{z^{6}+r_1r_2z^{4}+\left(r_1+r_2 \right)z^{4} + r_1r_2z^{2} + \left(r_1+r2 \right)z^{2}+1} $
b) How many formants could one create with such a model? Explain.
The denominator is a polynomial with power of 6. In this case, one will get 6 roots from the equation that the polynomial equals 0. All 3 pairs of roots can be written in the format of a+(-)b*i, and b can be 0 to indicate a pair of real roots. Thus, we can find 3 formants.
c) Explain how one would control the location of the formants with such a model.
The location of the three formants are controlled by the value of the 6 roots. The values of them vary along with the coefficients of the polynomial mentioned in part (b), while the coefficients are simply functions of $ r_1 $ $ r_2 $, etc..This means that we can control the location of formants by changing the area ratio of adjacent areas.
Question 2
Why do the poles of the transfer function of the vocal tract always come in complex conjugate pairs? Explain.
Since real systems have transfer functions with real coefficients, the poles of the vocal tract should come in complex conjugate pairs. If we write the transfer function H(z) as H(z)=P(z)/Q(z), where P(z) and Q(z) are polynomials, then the poles of the transfer function are the zeros of the polynomial Q(z). But Q(z) has real coefficients (Since the system can be written as a difference equation with real coefficients). And the zeros of a polynomial with real coefficients always come in complex conjugate pairs.
Question 3
We have seen that the transfer function of the vocal tract for voiced phonemes has poles (which create the formants).
a) What does this imply regarding the difference equation representing the system (in discrete-time)?
This implies that the difference equation must has the form
$ y[n]=\sum_{i=0}^{N-1} b_i x[n-i] -\sum_{k=1}^{M} a_k x[n-k] $
where M is the number of poles and M>0
b) Could the vocal tract be modeled using an FIR filter? Explain.
No, it must be an IIR filter as it must have poles. As explained in (a), the difference equation describing the system involves values of the output y[n] at previous times.
Questions 4
Warning: do not confuse the period of the sampling with the period of the pulse train produced by the vocal tract (1/pitch). Use different variables!
A person is pronouncing a phoneme. The pitch of the person's voice is 250Hz. The phoneme has two formants: a large one at 500 Hz, a weak one at 1.25 kHz.
You are given a digital recording of that phoneme. The sampling rate for the recording is 5kHz.
a) From the information given, can you tell the gender of the person?
pitch period = 1/pitch = 1/250Hz = 4ms.
Males usually have 8ms of pitch period and females usually have 4ms.
So, it is likely to be female voice.
b) How does the gender of the person influence the location of the local maxima of the magnitude of the frequency response of the vocal tract?
The gender will not influence the location of the local maxima. It only affects the pitch frequency.
c) Sketch the graph of the magnitude of the CT Fourier transform of the phoneme. (Put three dots "..." in the inaudible region of the spectrum.) How does it compare to the graph of the magnitude of the DT Fourier transform of the digital recording of the phoneme?
d) Sketch the approximate location of the poles of the transfer function H(z) corresponding to the vocal tract of that person when he/she is pronouncing the phoneme.
Discussion
You may discuss the homework below.
- write comment/question here
- answer will go here