Moon Landing Speech Analysis
by ECE student Yiran Gu
This project is conducted in order to analyze the speech given by Neil Armstrong while he made his steps on the moon. While he was saying 'It's a small step for man, one giant leap for mankind', people were curious if he had said the 'a' in between 'for' and 'man'.
We will need the original audio tract to play with. This is the link where I downloaded the mp3 file.
["https://archive.org/details/TheMoonLanding"]
The audio is 63 sec long, so I cut it with an audio edit software to get the 'for [a] man' part, and named it with for[a]man.mp3.
Afterwards, I used audioread command to read the mp3 file into the matlab. This command returns two variables, which are x and fs. x is the read file, while fs is a returned sampling frequency. If you are trying to play x directly, you won't be able to hear the correct audio tract. Instead, sound(x,fs) would be replace sound(x).
Then I plotted the signal, and the graph is following:
From the graph, one can clearly see the voiced segments of at least two phonemes. Now let's take the DTFT of the signal to see what it is like in frequency domain. Here's the result.
Now I realized that I can not tell too much about the signal if I took the DTFT of the whole thing, because I want to tell if a formant exists in a certain time period. As a result of that, a spectrogram would best fit my needs. We want to clearly see the frequency variation so I choose to do a narrowband spectrogram. I used the numbers in lab manual of lab9 to develop a spectrogram. As expected, two formants can be viewed clearly from the spectrogram.
To be more precise, I created a wideband spectorgram to see what it looks like with a better resolution in time domain. When I'm applying the band values in the lab manual, the developped spectrogram shows a dark red area between two of others. I tried different values to see if there could be a better resolution, but unfortunately this trade-off cant achieve a good look in both frequency and time domain. However, we can see from each spectrogram that there is a formant in between. The location may vary for different person because everyone has his or her own pronunciation.
In conclusion, there is a high possibility that Neil Armstrong did say the 'a' between 'for' and 'men.
The following are the codes that I've been using in matlab to do analysis and generate graphs.
close all,clear,clc [x,fs]=audioread('for[a]man.mp3'); tend=l/fs; %the ending time of the whole signal dt=tend/l; %delta t t=[0:dt:tend]; figure(1); plot(t(1:l),x) xlabel('time') ylabel('x') title('original signal') L=320; overlap=60; N=512; X=Specgm(s,L,overlap,N,fs); L=500; overlap=80; N=512; X=Specgm(s,L,overlap,N,fs); L=60; overlap=20; N=512; X=Specgm(s,L,overlap,N,fs);
Specgm
function [A] =Specgm(x,L,overlap,N,fs) %Gu&Wood %UNTITLED2 Summary of this function goes here % Detailed explanation goes here %X=DFTwin(x,L,m,N); l=L-overlap; %length without overlap k=round(length(x)/l); if (length(x)<(l*k+L)) %modification of k k=k-1; end for i=1:(k-1) %take the DFTwin of X X=DFTwin(x,L,l*i,N); A(:,i)=X(1:N/2); end ls=length(x); tend=ls/fs; dt=tend/l; t=[0:dt:tend]; n=length(x); %plot the graph figure; Amax=max(max(20*log10(abs(A)))); imagesc(4000/Amax*20*log10(abs(A))) axis xy axis([0 tend 0 4000]) colormap(jet) end
DFTwin
function [X] =DFTwin(x,L,m,N) %Gu & Wood %UNTITLED4 Summary of this function goes here % Detailed explanation goes here h=hamming(L); %create a hamming window n=length(x); %get lengthe of input vector x for i=1:L %shift the window by m w(i+m)=h(i); end w(n)=0; %create a vector of 0 length of n x=x.*w; %form a new signal with the product of x and w X=fft(x((m+1):(m+L)),N); %take the fourier transform end