Difference between revisions of "Digital Speech Generation" - Rhea

Revision as of 16:25, 25 October 2009

Digital Speech Generation

Page Under Construction

Background

Digital Speech Generation, as the name suggests, is the process of making a computer "speak" a sequence of letters/words in a meaningful way.
For English, this is especially hard, because it is a very un-mathematical language.
What I mean by unmathematical, is that the "a" in "apple" is not pronounced the same way as the "a" in "hate". In other words, letters do not sound the same way in different words.
Sure, this might be true for consonants, but vowels, not by a long shot.
Hindi for example, is a little more mathematical. The "आ" in "आप", meaning "you" with respect, will be the same as the "आ" in "आरंभ" which means "start", and this is true for all uses of आ.
If English were the same way, all we would have to do would be to assign a sound to every letter and then just play it.
Unfortunately, its not that simple.
What is done, is to use another type of alphabet, the phonetic alphabet, which basically assigns a sound to every alphabet within it, and in doing so, this set of alphabets can generate any sound in the human language (Or can they?). http://www.langsci.ucl.ac.uk/ipa/pulmonic.html

Motivation

The above technique sounds great right? The question is, how easy is it to implement?
The following experiment will explore if this can indeed be done.
Unfortunately, IPA (the International Phonetic Association) does not give you these phonetic elements in a neat little zip file, nicely documented and ready to use.
What is gives you is a folder called American English, which has a bunch of words that you hear along with a $30 Handbook to know what each phoneme means.
However, if we ignore our desire to conform to standards for a minute, we can easily sieve out the syllables from these words that contain the phonemes that we need.
I named my own phonemes, and broke down my test word "ECE 438" into a composition of these phonemes. So, please don't take the names on these phonemes as standard IPA language. For example, if I say ee, I mean the ee in eel. The phonetic code for that is $$ i: $$
Also, I am required to add this copyright information. For the record, I am allowed to use it for educational purposes. Copyright:phonetic_sounds

Experiment: Can MATLAB say ECE 438?

The answer to the above question is Yes, it can.
As I stated above, IPA gives you a bunch of words like "sky","tie" etc in their zip file.
What I did was to load these into MATLAB, cut out the part of the word containing the syllable I needed and then concatenated these syllables together to form the speech I needed.

The following code will clarify this method:


% %                           COPYRIGHT DHRUV LAMBA                     % %

% % % % % %     BIG WORDS TO CLEAVE BITS FROM             % % % % % % % % %
% % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % %


bead  = wavread('Phonetic_sounds\American-English\Vowels\01-bead.wav');
sigh  = wavread('Phonetic_sounds\American-English\Consonants\12-sigh.wav');
fie   = wavread('Phonetic_sounds\American-English\Consonants\04-fie.wav');
bode  = wavread('Phonetic_sounds\American-English\Vowels\07-bode.wav');
thigh = wavread('Phonetic_sounds\American-English\Consonants\10-thigh.wav');
kite  = wavread('Phonetic_sounds\American-English\Consonants\16-kite.wav');
bade  = wavread('Phonetic_sounds\American-English\Vowels\03-bayed.wav');
% % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % %
% % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % %



% % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % %
% % % % % % % % % %           WORD BITS               % % % % % % % % % % %

pause = (zeros(1,1000))';
ee    = bead(3000:5000); clear bead;
ss    = sigh(1:4000); clear sigh;
ff    = fie(1:3500); clear fie;
oh    = bode(4000:7000);clear bode;
thin  = thigh(1:4000);clear thigh;
tee   = kite(6000:10000);clear kite;
ay    = bade(3000:7000);clear bade;
% % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % %
% % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % %

total = [ee;ss;ee;pause;ee;pause;pause;ff;oh;pause;thin;tee;ee;pause;ay;tee];
sound(total,15000);
wavwrite(total,15000,'ece438.wav')

So as you can see above. I take a word, say "bead" and then cut out the part that contains the "ee".
I did this mostly by educated guessing. For example, in bead, you know that starting b is short, and the ending d is short too, so if we cut out the middle part of the signal, we should get our ee.
Of course, this had to be tweaked till it sounded good.
And if you decide to try this yourself, you might notice the IPA voice is a bit different from what it sounds like here. That's because I set the sampling audio to 15kHz to get a nice smooth voice.
In conclusion, I broke my test word, ECE 438 into ee;ss;ee;pause;ee;pause;pause;ff;oh;pause;thin;tee;ee;pause;ay;tee and concatenated them to form the complete word.
In phonetic notation, this is $i: s i: f o r \theta r ==Resources== *While the page gets worked on, here are some really cool web-pages about speech and phonetics that I came across in my research: **The Upenn course slides are particularly good. http://www.ling.upenn.edu/courses/Fall_2008/ling520/week1/week1.pdf .You can change the week number at the end of the hyperlink to access the other slides. **This page is good to hear what different parts of the phonetic alphabet sound like. http://web.uvic.ca/ling/resources/ipa/charts/IPAlab/IPAlab.htm **Also, on the UPenn webpage http://www.ling.upenn.edu/courses/Fall_2008/ling520/ try the links to the labs they have some interesting stuff --[[User:Dlamba|Dlamba]] 20:00, 23 October 2009 (UTC)$

@@ Line 34: / Line 34: @@
 <pre>
+% %                           COPYRIGHT DHRUV LAMBA                     % %
 % % % % % %     BIG WORDS TO CLEAVE BITS FROM             % % % % % % % % %
 % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % %
 bead  = wavread('Phonetic_sounds\American-English\Vowels\01-bead.wav');
@@ Line 67: / Line 71: @@
 wavwrite(total,15000,'ece438.wav')
 </pre>
+*So as you can see above. I take a word, say "bead" and then cut out the part that contains the "ee".
+*I did this mostly by educated guessing. For example, in bead, you know that starting b is short, and the ending d is short too, so if we cut out the middle part of the signal, we should get our ee.
+*Of course, this had to be tweaked till it sounded good.
+*And if you decide to try this yourself, you might notice the IPA voice is a bit different from what it sounds like here. That's because I set the sampling audio to 15kHz to get a nice smooth voice.
+*In conclusion, I broke my test word, ECE 438 into ee;ss;ee;pause;ee;pause;pause;ff;oh;pause;thin;tee;ee;pause;ay;tee and concatenated them to form the complete word.
+*In phonetic notation, this is <math>i: s i:  f o r \theta r
 ==Resources==

Difference between revisions of "Digital Speech Generation" - Rhea

Revision as of 16:25, 25 October 2009

Contents

Digital Speech Generation

Background

Motivation

Experiment: Can MATLAB say ECE 438?

Alumni Liaison