Latest revision as of 06:19, 21 March 2013

Audio Signal Generating and Processing Project

Student project for ECE438

Introduction:

Listen to this piece of music.

Media:Audio_Signal_Generating_and_Processing_Project_final_verison.wav‎

Just soso, right? but this is generated by computer software by MATLAB.

- Abstract -

This project is intent to analysis different musical instrument's sound, and try to create artificial musical instrument sounds to play a piece.

- Procedure -

A record of limited number of keys on a piano keyboard was used. The original sample is here.

Media:Orginal_sound_sample.wav‎

After the first frustrating method, I decided to up/down sample the keys by right order, then place them in right key.

According to modern music theory of interval, each intervals are equally spaced, each octave is equally spaced in to 12 intervals. A octave higher means twice the frequency. So, each interval is spaced by frequency ration of

2^{ \frac{1}{12}} = 1.05946309

But here comes a problem for up/down sample, it can only up/down sample by a integer factor. One can't upsampling by 1.05946309.

However, inspect the rational number

{ \frac{18}{17}} = 1.05882353

, that is relatively close to 1.05946309.

Next closer fraction is

{ \frac{107}{101}} = 1.059405941

, But this fraction doesn't change too much accuracy, but as we can see below, it increase the computation steps rapidly. So I choose

{ \frac{18}{17}} = 1.05882353

as the approximate factor.

Next, use this fraction, apply the following:

if a note half-step above the original is desired, then upsample by 17, then down sample by 18, call this as "move up"

In this case, the signal is preserved, but at a lower sampling frequency. If play at the original frequency, then the note half-step above is played.

if a note half-step below the original is desired, then upsample by 18, then down sample by 17, call this as "move down"

In this case, the signal is preserved, but at a higher sampling frequency. If play at the original frequency, then the note half-step below is played.

For each interval(from lower C to higher C),

take the lower C, "move up" by step recursively, then get a map of full chromatic scale, define map1, with the exact timber of the lower C;

take the higher C, "move down" by step recursively, then get a map of full chromatic scale, with the exact timber of the higher C;

if we pick higher part of the scale as map1, lower part map2, then at the junction, the timber suddenly changed, makes the sound very unnatural.

You can hear it in here Media:Audio_Signal_Generating_and_Processing_Project_Timber_before.wav

Instead, apply the following method:

a given note is contribute by both map1 and map2, and proportional to the end point.

For example, the note C# is constructed by

{C^\#} = { \frac{11}{12}}*map_1(C^\#) + { \frac{1}{12}}*map_2(C^\#)

F = { \frac{7}{12}}*map_1(F) + { \frac{5}{12}}*map_2(F)

A = { \frac{3}{12}}*map_1(A) + { \frac{9}{12}}*map_2(A)

This take cares of the timbre difference. Minor detail is still not perfect, but maybe just change the original signal can improve it. It is a very poor recorded signal.

Error analysis:

The ratio I pick is 1.05882353 versus the accurate factor = 1.05946309;

Error factor is

\frac{1.05946309}{1.05882353} = 1.00060403

Since this error accumulates, and I am generating 12 notes with one real notes, take

1.00060403^{12} = 1.00727253

as the maximum error factor.

This difference is

log_{1.05946309}(1.00727253) = 0.12544891

, about 1/8 of a step;

In modern music, pitch was divided in to the term "cents" to measure smaller difference in pitch. Each step contains 100 cents.

In this case, the error is within 13 cents. For pure frequency, the smallest pitch difference human ears can distinguish is about 6~7 cents.

In string musical instrument, human can distinguish about 12~20 cents.

These data need to be verify, but on my opinion, that data is the best record of all human being. I have a experiment with my music teacher, I myself can only distinguish about 1/3 of a step in string instrument(about 35 cents), and even my music teacher can only distinguish about 1/4 of a step(about 25 cents)

On the other hand, a not well toned piano can easily go off 20 cents.

So I will claim that, this approach is acceptable in pitch level.

Hence, we have a full piano keyboard by now.

The data was saved in a matrix into a .mat file.

A script was wrote, that use a special pattern of pitch and rhythm matrix to call the corresponding column of the keyboard matrix.

Then combines the different duration and pitch notes in to a song, as you heard at the beginning.

There's another method I tried, which is to generate signal directly by inspecting a musical instrument's FFT, but this method doesn't turn up good result. Documentation can be found here:

Audio_Signal_Generating_and_Processing_Project,_Previous_method

Back to 2011 Fall ECE 438 Boutin

@@ Line 2: / Line 2: @@
 = Audio Signal Generating and Processing Project  =
+Student project for [[ECE438]]
+----
+Introduction:
+:Listen to this piece of music.
+::[[Media:Audio_Signal_Generating_and_Processing_Project_final_verison.wav‎]]
+:Just soso, right? but this is generated by computer software by MATLAB.
+----
 - '''Abstract''' -
-This project is intent to analysis different musical instrument's sound, and try to create artificial musical instrument sounds to play a piece.
+:This project is intent to analysis different musical instrument's sound, and try to create artificial musical instrument sounds to play a piece.
-First, by looking at the Fourier domain, one can and measure the amplitude of each harmonics. The intention is trying to produce similar amplitude harmonic cosine functions, and mix all the waveform together to construct a simulated instrument voice.
+----
 - '''Procedure''' -
-:Approach 1:
+: A record of limited number of keys on a piano keyboard was used. The original sample is here.
+::[[Media:Orginal_sound_sample.wav‎]]
-::First a couple of sound files are inspected. Particularly this
-::[http://www.daimi.au.dk/~jones/dsp/sounds/singlenote/Piano.ff.F3.wav single note piano sound] was used. Take the ::FFT in MATLAB, the frequency domain of the graph looks like this.
-<br>
-::[[Image:Fft piano.png|500px]]
-::Calculating fundamental frequency:
-::<math> F_{fundamental} = \frac{K|_{FFT(Data)'s 1_{st}Peak}*SampleRate}{ Number of DataPoints}</math>
-::<br> Then one can record all the amplitude of different harmonics. Also notice that the sound amplitude is decreasing as time goes by. A decreasing exponential envelope is require for the signal to sound more like a real instrumental voice. the amplitude of the waveform of the soundtrack is 4 at very beginning, decreased to 0.5 after 1s and goes to 0.5 after 3 seconds. So an envelope function <span class="texhtml">''e''<sup>( − 2.07944154 * ''t'')</sup> = ''e''<sup>( − 2.07944154 / ''s''''a''''m''''p''''l''''e''''r''''a''''t''''e'''''<b> * ''n'')</b></sup></span>
-::Next thing need to be done, is design a pattern that plays multiple notes at the same time. The first guess is just sum up all the harmonics and get the result. But actually this doesn't work. The sound of a minor three interval sounds like this.
-::Note_A:This is a cosine function with a frequency = 440Hz.
-::Namely Note_A = cos(440*2*pi*t) = cos(440*2*pi*n/Sample_Rate);
-::[[Media:Audio_Signal_Generating_and_Processing_Project_FILES_Note_A.wav]]
-::Note_C:This is a cosine function with a frequency = 523.251131Hz.
-::Namely Note_A = cos(523.251131*2*pi*t) = cos(523.251131*2*pi*n/Sample_Rate);
-::[[Media:Audio_Signal_Generating_and_Processing_Project_FILES_Note_C.wav]]
-::Distorted m3 interval: this is the sum up of the Note_A and Note_C directly.
-::Which sounded distorted.
-::[[Media:Audio_Signal_Generating_and_Processing_Project_FILES_failed_m3_interval.wav]]
-<br>
-::After some research reading online materials about mixing audios, several algorithms are tried, but a clear mix sound is still not founded. An article online mentioned that some how MATLAB doesn't allow a sound vector's amplitude to go above. As long as an coefficient less then one is multiplied to each terms, the sum of the waveform does construct a nice sound of mixed audio.
-::Here is a mixed C chord, consisting C,G,c,e1 four notes.
-::[[Media:Audio_Signal_Generating_and_Processing_Project_FILES_C_chord.wav]]
-::As a testing, I wrote an script that plays the first two lines of Parable's piece &lt;Canon&gt;.
-::[[Media:Audio_Signal_Generating_and_Processing_Project_FILES_canon_pure_frequency_with_chord.wav]]
-<br>
-<br>
-::An analysis is performed that a simulated piano voice is produced, according to the chart above. But it doesn't sound as expected. It is distorted and doesn't sound like a real piano. A better harmonic level is needed to adjust the sound.
-::Here is the file that contains all different harmonics.
-::Note that, the file contains only first harmonic, is exactly the same as the file produced by pure cosine wave.
-::As more harmonics added in, the sound become richer and more emotion, but not quite as the direction that is desired. Again, a better harmonic level is needed. [[Media:20111031_Simulated_Piano_sound.wav]]
+: After the first frustrating method, I decided to up/down sample the keys by right order, then place them in right key.
-::As inspecting the contribution to non harmonic frequencies, i.e. the part on FFT plot at low frequency but not at peak looks like noise, I found that they do have significant contribution to the sound. The following method is used: Take the fft of the data, and mask off all terms that magnitude less then some level. In this case, I choose 40. So all the frequency component that less than 40 is deleted, left pure harmonic frequencies. And this sound do sound artificial and not as detailed as the original.
+: According to modern music theory of interval, each intervals are equally spaced, each octave is equally spaced in to 12 intervals. A octave higher means twice the frequency. So, each interval is spaced by frequency ration of  <math> 2^{ \frac{1}{12}} = 1.05946309</math>
+:But here comes a problem for up/down sample, it can only up/down sample by a integer factor. One can't upsampling by 1.05946309.
+::However, inspect the rational number <math> { \frac{18}{17}} = 1.05882353</math> , that is relatively close to 1.05946309.
+::Next closer fraction is <math> { \frac{107}{101}} =  1.059405941</math>, But this fraction doesn't change too much accuracy, but as we can see below, it increase the computation steps rapidly. So I choose  <math> { \frac{18}{17}} = 1.05882353</math>  as the approximate factor.
-::So, this indicated that '''by only using integer multiples of harmonics, one can hardly simulate a very nature piano sound.'''  Hence some other approach is need.
+:Next, use this fraction, apply the following:
+:: if a note half-step above the original is desired, then upsample by 17, then down sample by 18, call this as "move up"
+:::In this case, the signal is preserved, but at a lower sampling frequency. If play at the original frequency, then the note half-step above is played.
+:: if a note half-step below the original is desired, then upsample by 18, then down sample by 17, call this as "move down"
+:::In this case, the signal is preserved, but at a higher sampling frequency. If play at the original frequency, then the note half-step below is played.
-<br>
+:For each interval(from lower C to higher C),
+:take the lower C, "move up" by step recursively, then get a map of full chromatic scale, define map1, with the exact timber of the lower C;
+:take the higher C, "move down" by step recursively, then get a map of full chromatic scale, with the exact timber of the higher C;
+:if we pick higher part of the scale as map1, lower part map2, then at the junction, the timber suddenly changed, makes the sound very unnatural.
+:: You can hear it in here [[Media:Audio_Signal_Generating_and_Processing_Project_Timber_before.wav]]
-:Approach 2:
+:Instead, apply the following method:
+::a given note is contribute by both map1 and map2, and proportional to the end point.
+::For example, the note C# is constructed by
+:: <math> {C^\#} =  { \frac{11}{12}}*map_1(C^\#)  + { \frac{1}{12}}*map_2(C^\#)</math>
+:: <math> F =  { \frac{7}{12}}*map_1(F) + { \frac{5}{12}}*map_2(F) </math>
+:: <math> A =  { \frac{3}{12}}*map_1(A) + { \frac{9}{12}}*map_2(A) </math>
+::This take cares of the timbre difference. Minor detail is still not perfect, but maybe just change the original signal can improve it. It is a very poor recorded signal.
-::The next reasonable idea is to modify the existing sound track to produce different pitches and length notes, then use these notes to construct a piece.
+----
-:: First Observation:
+:Error analysis:
-:: By just playing the wav file in different sample rate can give a different pitch and duration note.
+::The ratio I pick is 1.05882353 versus the accurate factor = 1.05946309;
-:: As the sound sample rate goes up, pitch goes up and duration decreases.
+::Error factor is <math>\frac{1.05946309}{1.05882353} = 1.00060403</math>
-:: For the region close to the original note, this is a valid approach. The note sounds nature and not distorted.
+::Since this error accumulates, and I am generating 12 notes with one real notes, take <math>1.00060403^{12} = 1.00727253</math> as the maximum error factor.
-:: But this region only covers about up and down an octave. If go beyond this region, The sound became funny.
+::This difference is <math>log_{1.05946309}(1.00727253) = 0.12544891</math>, about 1/8 of a step;
+::In modern music, pitch was divided in to the term "cents" to measure smaller difference in pitch. Each step contains 100 cents.
+::In this case, the error is within 13 cents. For pure frequency, the smallest pitch difference human ears can distinguish is about 6~7 cents.
+::In string musical instrument, human can distinguish about 12~20 cents.
+::These data need to be verify, but on my opinion, that data is the best record of all human being. I have a experiment with my music teacher, I myself can only distinguish about 1/3 of a step in string instrument(about 35 cents), and even my music teacher can only distinguish about 1/4 of a step(about 25 cents)
+::On the other hand, a not well toned piano can easily go off 20 cents.
+::So I will claim that, this approach is acceptable in pitch level.
+----
-:: The next thing I will try is to upsample or down sample the existing sound track, to produce a different pitch but same duration note.
+:Hence, we have a full piano keyboard by now.
-:: Two approach to Upsample:
+:The data was saved in a matrix into a .mat file.
-::: 1. zero padding then LPF
+:A script was wrote, that use a special pattern of pitch and rhythm matrix to call the corresponding column of the keyboard matrix.
-::: 2. Fit in a significantly large number of samples, try to simulate an analog signal, then sample it at different frequency to produce different pitch.
+:Then combines the different duration and pitch notes in to a song, as you heard at the beginning.
-:: For such a lot of notes on piano, multiple note sound files are need to simulate different notes.
-<br>
-All Matlab files can be download here. [[Media:Audio_Signal_Generating_and_Processing_Project_FILES_MATLAB_FILES.zip]]
+::There's another method I tried, which is to generate signal directly by inspecting a musical instrument's FFT, but this method doesn't turn up good result. Documentation can be found here:
-<br>
+[[Audio_Signal_Generating_and_Processing_Project%2C_Previous_method]]
 [[2011 Fall ECE 438 Boutin|Back to 2011 Fall ECE 438 Boutin]]
 [[Category:2011_Fall_ECE_438_Boutin]]
+[[Category:bonus point project]]
+[[Category:ECE438]]

Difference between revisions of "Audio Signal Generating and Processing Project" - Rhea

Latest revision as of 06:19, 21 March 2013

Audio Signal Generating and Processing Project

Alumni Liaison