Line 1: | Line 1: | ||
<br> | <br> | ||
− | <center><font size="4"></font> | + | <center><font size="4"></font> |
<font size="4">Generation of N-dimensional normally distributed random numbers from two categories with different priors </font> | <font size="4">Generation of N-dimensional normally distributed random numbers from two categories with different priors </font> | ||
− | A [https://www.projectrhea.org/learning/slectures.php slecture] by Jonghoon Jin | + | A [https://www.projectrhea.org/learning/slectures.php slecture] by Jonghoon Jin |
Partly based on the [[2014 Spring ECE 662 Boutin|ECE662 Spring 2014 lecture]] material of [[User:Mboutin|Prof. Mireille Boutin]]. | Partly based on the [[2014 Spring ECE 662 Boutin|ECE662 Spring 2014 lecture]] material of [[User:Mboutin|Prof. Mireille Boutin]]. | ||
Line 11: | Line 11: | ||
---- | ---- | ||
− | <font size="5">Introduction</font> | + | <font size="5">Introduction</font> |
In the most of pattern recognition, decision and classification problems, the concept of training becomes popular with the advance of Machine Learning and computing power that can afford to expensive computatioal costs. Such a trainable system is a record-holder for many open problems and its powerful capability is successfully built based on the massive volume of information in a dataset. Due to the importance of data, people not only try to obtain a big quantity of data, but also concentrate a good quality of data. Often, a real-world data contains either high variance or high bias hence it is difficult to estimate true density and using such data does not necessarily result in good performance. In the circumstance, a naive assumption about the class distribution helps us synthesize data so that we can train models with a consistent dataset. Among many distributions, Normal distribution is frequently used in many literatures. This tutorial will explain how to generate binary data classes from normal distributions with prior probabilities in a perspective on numerical experiments. | In the most of pattern recognition, decision and classification problems, the concept of training becomes popular with the advance of Machine Learning and computing power that can afford to expensive computatioal costs. Such a trainable system is a record-holder for many open problems and its powerful capability is successfully built based on the massive volume of information in a dataset. Due to the importance of data, people not only try to obtain a big quantity of data, but also concentrate a good quality of data. Often, a real-world data contains either high variance or high bias hence it is difficult to estimate true density and using such data does not necessarily result in good performance. In the circumstance, a naive assumption about the class distribution helps us synthesize data so that we can train models with a consistent dataset. Among many distributions, Normal distribution is frequently used in many literatures. This tutorial will explain how to generate binary data classes from normal distributions with prior probabilities in a perspective on numerical experiments. | ||
Line 17: | Line 17: | ||
It consists of the following sections: | It consists of the following sections: | ||
− | # Prior selection : How to set priors? | + | #Prior selection : How to set priors? |
− | # Normal distribution : How it work? Which is more efficient? | + | #Normal distribution : How it work? Which is more efficient? |
− | #* Central limit theorem | + | #*Central limit theorem |
− | #* Inverse transform sampling method | + | #*Inverse transform sampling method |
− | #* Box-Muller transform | + | #*Box-Muller transform |
− | #* Ziggurat Algorithm | + | #*Ziggurat Algorithm |
− | # Conclusion | + | #Conclusion |
− | <br> | + | <br> <font size="5">Prior Selection</font> |
− | <font size="5">Prior Selection</font> | + | |
Prior probability gives an idea on which class is more likely to be observed among all possible classes. For binary data classes with normal distributions, it is important to draw samples based on priors before investigating how to generate distributions. This is because in practice the number of samples contains implicit information about prior probability and also empirical prior is different from the true prior used to determine class selection where the empirical prior is calculated by only looking at the ratio of class 1 and class 2 for binary classification example. | Prior probability gives an idea on which class is more likely to be observed among all possible classes. For binary data classes with normal distributions, it is important to draw samples based on priors before investigating how to generate distributions. This is because in practice the number of samples contains implicit information about prior probability and also empirical prior is different from the true prior used to determine class selection where the empirical prior is calculated by only looking at the ratio of class 1 and class 2 for binary classification example. | ||
− | To be specific, prior probability affects the generation of samples for binary classification in a way described in the following. First, sampling data from uniform distribution < | + | To be specific, prior probability affects the generation of samples for binary classification in a way described in the following. First, sampling data from uniform distribution <span class="texhtml">(0,1)</span> and let the class priors be <span class="texhtml">''P''''r''''o''''b''(ω<sub>1</sub>)</span> and <span class="texhtml">''P''''r''''o''''b''(ω<sub>2</sub>)</span> respectively where the condition <span class="texhtml">''P''''r''''o''''b''(ω<sub>1</sub>) + ''P''''r''''o''''b''(ω<sub>2</sub>) = 1</span> holds. If a random sample is drawn in the range <span class="texhtml">[0,''P''''r''''o''''b''(ω<sub>1</sub>)]</span>, label the sample as class 1, then, continue to generating a normal random number based on the class 1 statistics <span class="texhtml">(μ,σ)</span>. Otherwise, the sample belongs to <span class="texhtml">[''P''''r''''o''''b''(ω<sub>2</sub>),1]</span> and should be labeled as class 2, then, move onto the normal random number generation step with the class 2 statistics like the same way as we did for class 1. In summary, |
− | <math>X_i = X_i^{1} \sim N(\mu_1, \sigma_1) \quad X_i \in [0, Prob(\omega_1)]</math><br> | + | <center><math>X_i = X_i^{1} \sim N(\mu_1, \sigma_1) \quad X_i \in [0, Prob(\omega_1)]</math><br> <math>X_i = X_i^{2} \sim N(\mu_2, \sigma_2) \quad \text{otherwise}</math></center> |
− | <math>X_i = X_i^{2} \sim N(\mu_2, \sigma_2) \quad \text{otherwise}</math> | + | |
− | where the superscript of < | + | where the superscript of <span class="texhtml">''X''<sub>''i''</sub></span> indicates the label of the class. |
− | <br> | + | <br> <font size="5">Normal Distribution Generation</font> |
− | <font size="5">Normal Distribution Generation</font> | + | |
Once you decide the class of the sample, now it is time to generate actual random number that follows the class statistics determined in the prior selection step. The normal distribution is frequently used in many statistical models and being able to draw samples from the normal distribution lies at the heart of many probabilistic learning algorithms. The probability distribution function for the normal distribution is defined as | Once you decide the class of the sample, now it is time to generate actual random number that follows the class statistics determined in the prior selection step. The normal distribution is frequently used in many statistical models and being able to draw samples from the normal distribution lies at the heart of many probabilistic learning algorithms. The probability distribution function for the normal distribution is defined as | ||
− | <math>y = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left\{-\frac{\left(x-\mu\right)^2}{2\sigma^2}\right\}</math> | + | <center><math>y = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left\{-\frac{\left(x-\mu\right)^2}{2\sigma^2}\right\}</math></center> |
− | where < | + | where <span class="texhtml">μ</span> is the distribution mean and <span class="texhtml">σ</span> is the standard deviation. |
It is easy to generate an uniform distribution, but usually generation of a normal distribution requires additional steps rather than having a single step solution. Central limit theorem, inverse transform sampling method, Box-Muller transformation method and Ziggurat algorithm are examples to generate such distribution, given a source of uniform distribution. Since each of methods has its pros and cons, the optimal selection among those algorithm may differs under different conditions. | It is easy to generate an uniform distribution, but usually generation of a normal distribution requires additional steps rather than having a single step solution. Central limit theorem, inverse transform sampling method, Box-Muller transformation method and Ziggurat algorithm are examples to generate such distribution, given a source of uniform distribution. Since each of methods has its pros and cons, the optimal selection among those algorithm may differs under different conditions. | ||
− | <br> | + | <br> <font size="4">Central Limit Theorem</font> |
− | <font size="4">Central Limit Theorem</font> | + | |
Central limit theorem states that when a sufficiently large number of samples drawn from independent random variables, the arithmetic mean of their distributions will be have a normal distribution as commonly known as a bell-shaped distribution. This implies that sampling from identical and independent uniform distributions will result in a distribution which will be normally distributed as the number of samples involved are large enough. This is shown in the following | Central limit theorem states that when a sufficiently large number of samples drawn from independent random variables, the arithmetic mean of their distributions will be have a normal distribution as commonly known as a bell-shaped distribution. This implies that sampling from identical and independent uniform distributions will result in a distribution which will be normally distributed as the number of samples involved are large enough. This is shown in the following | ||
− | <math>\lim_{n \to \infty}\frac{X_1 + \dots + X_n}{n} \approx \mu</math><br> | + | <center><math>\lim_{n \to \infty}\frac{X_1 + \dots + X_n}{n} \approx \mu</math><br> <math>\lim_{n \to \infty}\frac{X_1^2 + \dots + X_n^2}{n} \approx \sigma^2 + \mu^2</math></center> |
− | <math>\lim_{n \to \infty}\frac{X_1^2 + \dots + X_n^2}{n} \approx \sigma^2 + \mu^2</math> | + | |
− | where <math>\left\{X_1, \dots, X_n\right\}</math> be a random sample of size < | + | where <math>\left\{X_1, \dots, X_n\right\}</math> be a random sample of size <span class="texhtml">''n''</span> and <span class="texhtml">μ</span>, <span class="texhtml">σ</span> represents the mean and standard deviation for a normal distribution. However, in practice approaching a normal probabilistic distribution to a high accuracy using only central limit theorem requires an impractically large number of samples. Therefore, using this method to generate normal distribution is expensive and not plausible. |
− | <br> | + | <br> <font size="4">Inverse Transform Sampling Method</font> |
− | <font size="4">Inverse Transform Sampling Method</font> | + | |
− | Inverse transform sampling method is a fundamental method in sampling for random numbers and can generate any types of probabilistic distributions given a source of uniform distributions. This method involves computing the quantile function. Which of a probabilistic distribution is the inverse of its cumulative distribution. Basically, it generates random numbers < | + | Inverse transform sampling method is a fundamental method in sampling for random numbers and can generate any types of probabilistic distributions given a source of uniform distributions. This method involves computing the quantile function. Which of a probabilistic distribution is the inverse of its cumulative distribution. Basically, it generates random numbers <span class="texhtml">''u''</span> from the uniform distribution in the interval <span class="texhtml">[0,1]</span>. Then, using equations for the desired probabilistic distribution, it computes the value that satisfies <span class="texhtml">''F''(''x'') = ''u''</span>. Finally, the number <span class="texhtml">''x''</span> drawn from <span class="texhtml">''F''</span> density distribution is considered as a random sample. |
Random numbers that follow a normal distribution can be obtained by the following procedure. First, it starts from Poisson distribution described below. | Random numbers that follow a normal distribution can be obtained by the following procedure. First, it starts from Poisson distribution described below. | ||
− | <math>p(k) = P(X = k) = e^{-\lambda} \frac{\lambda^k}{k!}, k \ge 0 </math> | + | <center><math>p(k) = P(X = k) = e^{-\lambda} \frac{\lambda^k}{k!}, k \ge 0 </math></center> |
− | < | + | <span class="texhtml">λ</span> and <span class="texhtml">''k''</span> are all deterministic since <span class="texhtml">λ</span> is a given mean and <span class="texhtml">''k''</span> is a <span class="texhtml">''n''</span>th point of Poisson process. If we can simulate <span class="texhtml">''X'' = ''N''(1)</span> where <math>\left\{N(t) : t \ge 0 \right\}</math>, Poisson random number <span class="texhtml">''X''</span> can be obtained by using the formula |
− | <math> X = N(1) - 1 = min \left\{n \ge 1 | t_n > 1 \right\} - 1</math> | + | <center><math> X = N(1) - 1 = min \left\{n \ge 1 | t_n > 1 \right\} - 1</math></center> |
− | where <math>t_n = X_1 + \dots + X_n</math>. The relation above comes from the exponential term in Poission distribution < | + | where <math>t_n = X_1 + \dots + X_n</math>. The relation above comes from the exponential term in Poission distribution <span class="texhtml">''e''<sup> − λ</sup>''n''</span>, that is <math>X_i = - \frac{1}{\lambda} \ln{u_i}</math>. Using such a relation with additivity of logarithm helps to generate a Poisson random number easily by consequtively adding <span class="texhtml">''X''<sub>''i''</sub></span> (equivalently, multiplying uniform random numbers <span class="texhtml">''u''<sub>''i''</sub></span>). Details are described as |
− | <math>X = \min\left\{ n \ge | + | <center><math>X = \min\left\{ n \ge 1 : \sum_{i=1}^{n} \ln(u_i) < -\alpha \right\} - 1 </math><br> <math>\qquad = \min\left\{ n \ge 1 : \ln\left(\prod_{i=1}^{n} u_i\right) < -\alpha \right\} - 1 </math><br> <math>\qquad = \min\left\{ n \ge 1 : \prod_{i=1}^{n} u_i < e^{-\alpha} \right\} - 1</math></center> |
− | <math>\qquad = \min\left\{ n \ge | + | |
− | <math>\qquad = \min\left\{ n \ge | + | |
− | where the product with new < | + | where the product with new <span class="texhtml">''u''<sub>''i''</sub></span> drawn from <span class="texhtml">''U''˜uniform(0,1)</span> is repeated until we find <math>X = \prod_{i=1}^{n} u_i < e^{-\alpha}</math>. Note that for large <span class="texhtml">λ</span> in a Poisson, its distribution is approximately normal with mean of <span class="texhtml">λ</span> and variance of <span class="texhtml">λ</span> by the central limit theorem. As a result, a normal distribution can be generated once we obtain a Poisson distribution based on the samples from uniform distribution using inverse transform sampling method. |
This method involves computing the quantile function of the distribution which utilizes the cumulative distribution function and then inverting that function. This is why the terminology is called ``inverse''method.'' For a discrete distribution, summation over all individual samples drawn from uniform distribution yields desired distributions. However, for a continuous cases, integration over probability density function is needed like the same way as we did in discrete domain, but it is impossible to obtain an analytical solution for most distributions. This shortcoming makes this method computationally inefficient in continuous domain and the alternative such as Box-Muller transform can be used. | This method involves computing the quantile function of the distribution which utilizes the cumulative distribution function and then inverting that function. This is why the terminology is called ``inverse''method.'' For a discrete distribution, summation over all individual samples drawn from uniform distribution yields desired distributions. However, for a continuous cases, integration over probability density function is needed like the same way as we did in discrete domain, but it is impossible to obtain an analytical solution for most distributions. This shortcoming makes this method computationally inefficient in continuous domain and the alternative such as Box-Muller transform can be used. | ||
− | <br> | + | <br> <font size="4">Box-Muller Transform</font> |
− | <font size="4">Box-Muller Transform</font> | + | |
− | The method generates a normal distribution given a source of uniform distribution. Main key of this method is to utilize the relation between Cartesian and polar coordinates. | + | The method generates a normal distribution given a source of uniform distribution. Main key of this method is to utilize the relation between Cartesian and polar coordinates. The polar form picks two samples from the interval, [-1, +1], and maps them to two independent samples that are normally distributed without the use of sine or cosine functions. The relation between two different domains can be characterized as few equations |
− | The polar form picks two samples from the interval, [-1, +1], and maps them to two independent samples that are normally distributed without the use of sine or cosine functions. | + | |
− | The relation between two different domains can be characterized as few equations | + | |
− | <math>r^2 = x_1^2 + x_2^2, \qquad \tan\theta = \frac{x_2}{x_1}</math><br> | + | <center><math>r^2 = x_1^2 + x_2^2, \qquad \tan\theta = \frac{x_2}{x_1}</math><br> <math>x_1 = r \cos\theta, \qquad x_2 = r \sin\theta</math></center> |
− | <math>x_1 = r \cos\theta, \qquad x_2 = r \sin\theta</math> | + | |
− | where <math>\theta \in [0, 2\pi]</math> and we assume <math>|r| \le 1</math> since we are going to use < | + | where <math>\theta \in [0, 2\pi]</math> and we assume <math>|r| \le 1</math> since we are going to use <span class="texhtml">''r''˜uniform(0,1)</span> as a basic building block. Such region covers the area contained in the unit circle in polar coordinate. |
− | Such region covers the area contained in the unit circle in polar coordinate. | + | |
− | Box-Muller sampling is based on the joint distribution of two independent random variables < | + | Box-Muller sampling is based on the joint distribution of two independent random variables <span class="texhtml">''x''<sub>1</sub>˜''N''(0,1),''x''<sub>2</sub>˜''N''(0,1)</span> in Cartesian coordinate. If we convert such distributions in polar coordinate, the joint distribution <span class="texhtml">''p''(''x''<sub>1</sub>,''x''<sub>2</sub>)</span> becomes |
− | < | + | <span class="texhtml">''p''(''x''<sub>1</sub>,''x''<sub>2</sub>) = ''p''(''x''<sub>1</sub>)''p''(''x''<sub>2</sub>)</span><br> <math>= \frac{1}{2\pi}\exp\left\{-\frac{x_1^2}{2}\right\}\exp\left\{-\frac{x_2^2}{2}\right\}</math><br> <math>= \frac{1}{2\pi}\exp\left\{-\frac{x_1^2 + x_2^2}{2}\right\}</math><br> <math>= \frac{1}{2\pi}\exp\left\{-\frac{r_2}{2}\right\}</math> |
− | <math>= \frac{1}{2\pi}\exp\left\{-\frac{x_1^2}{2}\right\}\exp\left\{-\frac{x_2^2}{2}\right\}</math><br> | + | |
− | <math>= \frac{1}{2\pi}\exp\left\{-\frac{x_1^2 + x_2^2}{2}\right\}</math><br> | + | |
− | <math>= \frac{1}{2\pi}\exp\left\{-\frac{r_2}{2}\right\}</math> | + | |
− | From the last line of the equation, we can see that the distribution follows an exponential distribution with respect to < | + | From the last line of the equation, we can see that the distribution follows an exponential distribution with respect to <span class="texhtml">''r''<sup>2</sup></span>. Precisely, distributions in polar coordinate are: |
− | <math>r^2 \sim \text{exponential}\left(\frac{1}{2}\right), \qquad \theta \sim \text{uniform}\left(0, 2\pi\right)</math> | + | <center><math>r^2 \sim \text{exponential}\left(\frac{1}{2}\right), \qquad \theta \sim \text{uniform}\left(0, 2\pi\right)</math></center> |
− | Since the angle < | + | Since the angle <span class="texhtml">θ</span> already follows an uniform distribution, it can be simply drawn from an uniform distribution. However, the additional work is needed to generate the exponential distribution. The connection between uniform and exponential can be formulated as follows: |
− | <math>\text{exponential}\left(\lambda\right) = \frac{-\log\left(\text{uniform}(0, 1)\right)}{\lambda}</math> | + | <center><math>\text{exponential}\left(\lambda\right) = \frac{-\log\left(\text{uniform}(0, 1)\right)}{\lambda}</math></center> |
− | Then, combining with the equation in terms of < | + | Then, combining with the equation in terms of <span class="texhtml">''r''<sup>2</sup></span> and solving for <span class="texhtml">''r''</span> gives, |
− | <math>r \sim \sqrt{-2 \log\left(\text{uniform}(0, 1)\right)}</math> | + | <center><math>r \sim \sqrt{-2 \log\left(\text{uniform}(0, 1)\right)}</math></center> |
− | Now all variables < | + | Now all variables <span class="texhtml">''r''</span> and <span class="texhtml">θ</span> can be expressed in terms of uniform distributions. By back-tracking above procedure, we will be able to successively generate a normal distribution from out of uniform distributions. This way to generate normal random numbers is known as Box-Muller transformations with polar method, which makes Box-Muller transformation more efficient by avoiding direct calculations of cosine and sine functions. |
− | <br> | + | <br> <font size="4">Ziggurat Algorithm</font> |
− | <font size="4">Ziggurat Algorithm</font> | + | |
Ziggurat method is the fastest algorithm for generating normal random numbers. It is more efficient than Box-Muller method but is more complicated algorithm. Due to the efficiency and speed, it has been implemented in various libraries such as GNU scientific library and MatLab. | Ziggurat method is the fastest algorithm for generating normal random numbers. It is more efficient than Box-Muller method but is more complicated algorithm. Due to the efficiency and speed, it has been implemented in various libraries such as GNU scientific library and MatLab. | ||
Line 122: | Line 105: | ||
This method partitions a target distribution into a small blocks so that the whole distribution can be described as an union of blocks which is | This method partitions a target distribution into a small blocks so that the whole distribution can be described as an union of blocks which is | ||
− | <math>Z = \bigcup_{i=0}^C B_i</math> | + | <center><math>Z = \bigcup_{i=0}^C B_i</math></center> |
− | where the area of each < | + | where the area of each <span class="texhtml">''B''<sub>''i''</sub></span> is a rectangle whose width extends from <span class="texhtml">''x'' = 0</span> to <span class="texhtml">''x'' = ''x''<sub>''i''</sub></span> horizontally and whose height vertically extends from <span class="texhtml">''f''(''x''<sub>''i''</sub>)</span> to <span class="texhtml">''f''(''x''<sub>''i'' + 1</sub>)</span> (details are shown in the following equation). |
− | <math>B_i = 1 \quad \left\{(x, y) | 0 < x < x_i, f(x_i) < y < f(x_{i+1})\right\}</math><br> | + | <center><math>B_i = 1 \quad \left\{(x, y) | 0 < x < x_i, f(x_i) < y < f(x_{i+1})\right\}</math><br> <math>B_i = 0 \quad \text{otherwise}</math> </center> |
− | <math>B_i = 0 \quad \text{otherwise}</math> | + | |
− | Here < | + | Here <span class="texhtml">''x''<sub>''i''</sub></span>s are <span class="texhtml">''x''</span>-values that monotonically increases from the center to the tail for a given normal distribution. |
− | The algorithm first choose a random block < | + | The algorithm first choose a random block <span class="texhtml">''i''</span> among all possible <span class="texhtml">''B''<sub>''i''</sub></span>s. Then, it draws a random number <span class="texhtml">''u''<sub>0</sub></span> from uniform <span class="texhtml">(0,1)</span> and calculate <span class="texhtml">''z''</span> by multiplying the x-axis index <span class="texhtml">''i''</span> with <span class="texhtml">''u''<sub>0</sub></span>. If the randomly generated <span class="texhtml">''z''</span> belongs to any of partitioned blocks, <span class="texhtml">''z''</span> returns as a normal random number. Otherwise, it again calculates a similar step for <span class="texhtml">''y''</span>-axis to check whether the sample belongs to wedge or not. Since understanding of the principle of this algorithm requires background knowledge, details are out of scope for this slecture and please refer to the paper titled ``The Ziggurat Method for Generating Random Variables''.'' |
− | The critical values such as the edges of the rectangle (< | + | The critical values such as the edges of the rectangle (<span class="texhtml">''x''<sub>''i''</sub></span> or <span class="texhtml">''f''(''x''<sub>''i''</sub>)</span>) used in Ziggurat method are stored in memory, which improves efficiency in terms of computational cost. Moreover, it does not need to evaluate an exponential function all the time while Box-Muller method computes the exponential (or logarithm) every time. The ziggurat algorithm will calculate an exponential function only if samples are in the wedge. For these reason, the ziggurat algorithm is the fastest way to generate but relatively difficult to implement it is best used when large quantities of random numbers are needed. |
− | <br> | + | <br> <font size="5">Conversion from normalized distribution</font> |
− | <font size="5">Conversion from normalized distribution</font> | + | |
− | Any normal distribution with < | + | Any normal distribution with <span class="texhtml">μ<sub>0</sub></span> and <span class="texhtml">σ<sub>0</sub></span> can be easily converted to another normal distribution with a custom mean <span class="texhtml">μ<sub>1</sub></span> and standard deviation <span class="texhtml">σ<sub>1</sub></span>. For convenience and resource saving purpose, random numbers are generated from normalized normal distribution that follows <span class="texhtml">''N''(0,1)</span> from a single unified source, then such numbers are converted to follow a custom normal distribution simply using the following equation. |
− | <math>Z = \frac{X - \mu_0}{\sigma_0} \quad \text{where } Z \sim N(0, 1) \text{ and } X \sim N(\mu_0, \sigma_0)</math><br> | + | <center><math>Z = \frac{X - \mu_0}{\sigma_0} \quad \text{where } Z \sim N(0, 1) \text{ and } X \sim N(\mu_0, \sigma_0)</math><br> <math>X = \sigma_i Z + \mu_i \quad \text{where } Z \sim N(0, 1) \text{ and } X \sim N(\mu_i, \sigma_i)</math></center> |
− | <math>X = \sigma_i Z + \mu_i \quad \text{where } Z \sim N(0, 1) \text{ and } X \sim N(\mu_i, \sigma_i)</math> | + | |
− | <br> | + | <br> <font size="5">Extension to N-dimensional data</font> |
− | <font size="5">Extension to N-dimensional data</font> | + | |
Until now, we have studied how to generate normally distributed random number in 1-dimensional space. The same method can be extended to N-dimensional space to create multi-dimensional random vectors whose underlying distribution is normal. Compared to the previous case where we were using a scalar value for a mean and a standard deviation, N-dimensional random vectors can be obtained by repeating the single random number generation process N times, then concatenating random numbers as a column-wise vector. This is possible because all basis are orthogonal to each other and generating a single random value in a vector does not affect the generation process of other components. Note that prior selection is used to choose either class vector 1 or class vector 2. Once the class is determined by its prior drawn from uniform distribution, random number generation process is repeated N times to fill out the random vector. Each of which will follow class vector statistics, which are | Until now, we have studied how to generate normally distributed random number in 1-dimensional space. The same method can be extended to N-dimensional space to create multi-dimensional random vectors whose underlying distribution is normal. Compared to the previous case where we were using a scalar value for a mean and a standard deviation, N-dimensional random vectors can be obtained by repeating the single random number generation process N times, then concatenating random numbers as a column-wise vector. This is possible because all basis are orthogonal to each other and generating a single random value in a vector does not affect the generation process of other components. Note that prior selection is used to choose either class vector 1 or class vector 2. Once the class is determined by its prior drawn from uniform distribution, random number generation process is repeated N times to fill out the random vector. Each of which will follow class vector statistics, which are | ||
− | <math> | + | <center><math> |
M^1 = \begin{bmatrix} | M^1 = \begin{bmatrix} | ||
\mu_1^1 \\ | \mu_1^1 \\ | ||
Line 173: | Line 152: | ||
\sigma_N^2 \\ | \sigma_N^2 \\ | ||
\end{bmatrix} | \end{bmatrix} | ||
− | </math> | + | </math></center> |
− | where the superscript denotes the class label and the subscript indicates an element index. < | + | where the superscript denotes the class label and the subscript indicates an element index. <span class="texhtml">''M''</span> and <span class="texhtml">''S''</span> denotes a mean matrix and a standard deviation matrix, respectively. |
Surely, this method is no more available when more than two basis in a given multi-dimension space are correlated. The scope of this work covers a general case where all basis in the search space are orthogonal. | Surely, this method is no more available when more than two basis in a given multi-dimension space are correlated. The scope of this work covers a general case where all basis in the search space are orthogonal. | ||
− | <br> | + | <br> <font size="5">Conclusion</font> |
− | <font size="5">Conclusion</font> | + | |
Now we understand how to generate N-dimensional normally distributed random numbers from two categories with different priors using samples taken from uniform distributions. The generation of random numbers with a normal distribution combined with the prior selection step discussed in the earlier section provides statistically consistent datasets, which will improve the accuracy of trainable systems. While class selection based on the prior probabilities is relatively simple, the generation of random numbers with a normal distribution has many issues to be considered because of the trade-off between its computational cost and the accuracy. The four methods covered in this work have their own pros and cons, but practically efficiency and complexity are the most important concerns to be considered when deciding an optimal model for given task. | Now we understand how to generate N-dimensional normally distributed random numbers from two categories with different priors using samples taken from uniform distributions. The generation of random numbers with a normal distribution combined with the prior selection step discussed in the earlier section provides statistically consistent datasets, which will improve the accuracy of trainable systems. While class selection based on the prior probabilities is relatively simple, the generation of random numbers with a normal distribution has many issues to be considered because of the trade-off between its computational cost and the accuracy. The four methods covered in this work have their own pros and cons, but practically efficiency and complexity are the most important concerns to be considered when deciding an optimal model for given task. | ||
− | <br> | + | <br> <font size="5">References</font> |
− | <font size="5">References</font> | + | |
− | # Dong-U Lee, Wayne Luk, John Villasenor and Peter Cheung, ``A Hardware Gaussian Noise Generator for Channel Code Evaluation | + | #Dong-U Lee, Wayne Luk, John Villasenor and Peter Cheung, ``A Hardware Gaussian Noise Generator for Channel Code Evaluation |
− | # Raul Toral and Amitabha Chakrabarti, ``Generation of Gaussian distributed random numbers by using a numerical inversion method | + | #Raul Toral and Amitabha Chakrabarti, ``Generation of Gaussian distributed random numbers by using a numerical inversion method |
− | # Hassan Edrees, Brian Cheung, McCullen Sandora, David Nummey and Deian Stefan ``Hardware-Optimized Ziggurat Algorithm for High-Speed Gaussian Random Number Generators | + | #Hassan Edrees, Brian Cheung, McCullen Sandora, David Nummey and Deian Stefan ``Hardware-Optimized Ziggurat Algorithm for High-Speed Gaussian Random Number Generators |
− | # Central Limit Theorem, wikipedia, http://en.wikipedia.org/wiki/Central_limit_theorem | + | #Central Limit Theorem, wikipedia, http://en.wikipedia.org/wiki/Central_limit_theorem |
− | # Inverse Transform Sampling, wikipedia, http://en.wikipedia.org/wiki/Inverse_transform_sampling | + | #Inverse Transform Sampling, wikipedia, http://en.wikipedia.org/wiki/Inverse_transform_sampling |
− | # Box-Muller Transform, wikipedia, http://en.wikipedia.org/wiki/Box_Muller_transform | + | #Box-Muller Transform, wikipedia, http://en.wikipedia.org/wiki/Box_Muller_transform |
− | # Box-Muller Transformation, mathworld, http://mathworld.wolfram.com/Box-MullerTransformation.html | + | #Box-Muller Transformation, mathworld, http://mathworld.wolfram.com/Box-MullerTransformation.html |
− | # Emiliano A. Valdez, Lecture note, http://www.math.uconn.edu/~valdez/math3634s09/Math3634-Weeks4to5.pdf | + | #Emiliano A. Valdez, Lecture note, http://www.math.uconn.edu/~valdez/math3634s09/Math3634-Weeks4to5.pdf |
− | # Karl Sigman, Lecture note, http://www.columbia.edu/~ks20/4404-Sigman/4404-Notes-ITM.pdf | + | #Karl Sigman, Lecture note, http://www.columbia.edu/~ks20/4404-Sigman/4404-Notes-ITM.pdf |
− | # Henrik Schoioler, Lecture note, http://www.control.auc.dk/~henrik/undervisning/DES/lec03.pdf | + | #Henrik Schoioler, Lecture note, http://www.control.auc.dk/~henrik/undervisning/DES/lec03.pdf |
− | # http://theclevermachine.wordpress.com/2012/09/11/sampling-from-the-normal-distribution-using-the-box-muller-transform/ | + | #http://theclevermachine.wordpress.com/2012/09/11/sampling-from-the-normal-distribution-using-the-box-muller-transform/ |
− | + | <br> <br> | |
− | <br> | + | |
---- | ---- | ||
− | == [[Generation of N-dimensional normally distributed random numbers from two categories with different priors review|Questions and comments]] == | + | == [[Generation of N-dimensional normally distributed random numbers from two categories with different priors review|Questions and comments]] == |
If you have any questions, comments, etc. please post them on [[Generation of N-dimensional normally distributed random numbers from two categories with different priors review|this page]]. | If you have any questions, comments, etc. please post them on [[Generation of N-dimensional normally distributed random numbers from two categories with different priors review|this page]]. |
Revision as of 16:47, 30 April 2014
Generation of N-dimensional normally distributed random numbers from two categories with different priors
A slecture by Jonghoon Jin
Partly based on the ECE662 Spring 2014 lecture material of Prof. Mireille Boutin.
Introduction
In the most of pattern recognition, decision and classification problems, the concept of training becomes popular with the advance of Machine Learning and computing power that can afford to expensive computatioal costs. Such a trainable system is a record-holder for many open problems and its powerful capability is successfully built based on the massive volume of information in a dataset. Due to the importance of data, people not only try to obtain a big quantity of data, but also concentrate a good quality of data. Often, a real-world data contains either high variance or high bias hence it is difficult to estimate true density and using such data does not necessarily result in good performance. In the circumstance, a naive assumption about the class distribution helps us synthesize data so that we can train models with a consistent dataset. Among many distributions, Normal distribution is frequently used in many literatures. This tutorial will explain how to generate binary data classes from normal distributions with prior probabilities in a perspective on numerical experiments.
It consists of the following sections:
- Prior selection : How to set priors?
- Normal distribution : How it work? Which is more efficient?
- Central limit theorem
- Inverse transform sampling method
- Box-Muller transform
- Ziggurat Algorithm
- Conclusion
Prior Selection
Prior probability gives an idea on which class is more likely to be observed among all possible classes. For binary data classes with normal distributions, it is important to draw samples based on priors before investigating how to generate distributions. This is because in practice the number of samples contains implicit information about prior probability and also empirical prior is different from the true prior used to determine class selection where the empirical prior is calculated by only looking at the ratio of class 1 and class 2 for binary classification example.
To be specific, prior probability affects the generation of samples for binary classification in a way described in the following. First, sampling data from uniform distribution (0,1) and let the class priors be P'r'o'b(ω1) and P'r'o'b(ω2) respectively where the condition P'r'o'b(ω1) + P'r'o'b(ω2) = 1 holds. If a random sample is drawn in the range [0,P'r'o'b(ω1)], label the sample as class 1, then, continue to generating a normal random number based on the class 1 statistics (μ,σ). Otherwise, the sample belongs to [P'r'o'b(ω2),1] and should be labeled as class 2, then, move onto the normal random number generation step with the class 2 statistics like the same way as we did for class 1. In summary,
$ X_i = X_i^{2} \sim N(\mu_2, \sigma_2) \quad \text{otherwise} $
where the superscript of Xi indicates the label of the class.
Normal Distribution Generation
Once you decide the class of the sample, now it is time to generate actual random number that follows the class statistics determined in the prior selection step. The normal distribution is frequently used in many statistical models and being able to draw samples from the normal distribution lies at the heart of many probabilistic learning algorithms. The probability distribution function for the normal distribution is defined as
where μ is the distribution mean and σ is the standard deviation.
It is easy to generate an uniform distribution, but usually generation of a normal distribution requires additional steps rather than having a single step solution. Central limit theorem, inverse transform sampling method, Box-Muller transformation method and Ziggurat algorithm are examples to generate such distribution, given a source of uniform distribution. Since each of methods has its pros and cons, the optimal selection among those algorithm may differs under different conditions.
Central Limit Theorem
Central limit theorem states that when a sufficiently large number of samples drawn from independent random variables, the arithmetic mean of their distributions will be have a normal distribution as commonly known as a bell-shaped distribution. This implies that sampling from identical and independent uniform distributions will result in a distribution which will be normally distributed as the number of samples involved are large enough. This is shown in the following
$ \lim_{n \to \infty}\frac{X_1^2 + \dots + X_n^2}{n} \approx \sigma^2 + \mu^2 $
where $ \left\{X_1, \dots, X_n\right\} $ be a random sample of size n and μ, σ represents the mean and standard deviation for a normal distribution. However, in practice approaching a normal probabilistic distribution to a high accuracy using only central limit theorem requires an impractically large number of samples. Therefore, using this method to generate normal distribution is expensive and not plausible.
Inverse Transform Sampling Method
Inverse transform sampling method is a fundamental method in sampling for random numbers and can generate any types of probabilistic distributions given a source of uniform distributions. This method involves computing the quantile function. Which of a probabilistic distribution is the inverse of its cumulative distribution. Basically, it generates random numbers u from the uniform distribution in the interval [0,1]. Then, using equations for the desired probabilistic distribution, it computes the value that satisfies F(x) = u. Finally, the number x drawn from F density distribution is considered as a random sample.
Random numbers that follow a normal distribution can be obtained by the following procedure. First, it starts from Poisson distribution described below.
λ and k are all deterministic since λ is a given mean and k is a nth point of Poisson process. If we can simulate X = N(1) where $ \left\{N(t) : t \ge 0 \right\} $, Poisson random number X can be obtained by using the formula
where $ t_n = X_1 + \dots + X_n $. The relation above comes from the exponential term in Poission distribution e − λn, that is $ X_i = - \frac{1}{\lambda} \ln{u_i} $. Using such a relation with additivity of logarithm helps to generate a Poisson random number easily by consequtively adding Xi (equivalently, multiplying uniform random numbers ui). Details are described as
$ \qquad = \min\left\{ n \ge 1 : \ln\left(\prod_{i=1}^{n} u_i\right) < -\alpha \right\} - 1 $
$ \qquad = \min\left\{ n \ge 1 : \prod_{i=1}^{n} u_i < e^{-\alpha} \right\} - 1 $
where the product with new ui drawn from U˜uniform(0,1) is repeated until we find $ X = \prod_{i=1}^{n} u_i < e^{-\alpha} $. Note that for large λ in a Poisson, its distribution is approximately normal with mean of λ and variance of λ by the central limit theorem. As a result, a normal distribution can be generated once we obtain a Poisson distribution based on the samples from uniform distribution using inverse transform sampling method.
This method involves computing the quantile function of the distribution which utilizes the cumulative distribution function and then inverting that function. This is why the terminology is called ``inversemethod. For a discrete distribution, summation over all individual samples drawn from uniform distribution yields desired distributions. However, for a continuous cases, integration over probability density function is needed like the same way as we did in discrete domain, but it is impossible to obtain an analytical solution for most distributions. This shortcoming makes this method computationally inefficient in continuous domain and the alternative such as Box-Muller transform can be used.
Box-Muller Transform
The method generates a normal distribution given a source of uniform distribution. Main key of this method is to utilize the relation between Cartesian and polar coordinates. The polar form picks two samples from the interval, [-1, +1], and maps them to two independent samples that are normally distributed without the use of sine or cosine functions. The relation between two different domains can be characterized as few equations
$ x_1 = r \cos\theta, \qquad x_2 = r \sin\theta $
where $ \theta \in [0, 2\pi] $ and we assume $ |r| \le 1 $ since we are going to use r˜uniform(0,1) as a basic building block. Such region covers the area contained in the unit circle in polar coordinate.
Box-Muller sampling is based on the joint distribution of two independent random variables x1˜N(0,1),x2˜N(0,1) in Cartesian coordinate. If we convert such distributions in polar coordinate, the joint distribution p(x1,x2) becomes
p(x1,x2) = p(x1)p(x2)
$ = \frac{1}{2\pi}\exp\left\{-\frac{x_1^2}{2}\right\}\exp\left\{-\frac{x_2^2}{2}\right\} $
$ = \frac{1}{2\pi}\exp\left\{-\frac{x_1^2 + x_2^2}{2}\right\} $
$ = \frac{1}{2\pi}\exp\left\{-\frac{r_2}{2}\right\} $
From the last line of the equation, we can see that the distribution follows an exponential distribution with respect to r2. Precisely, distributions in polar coordinate are:
Since the angle θ already follows an uniform distribution, it can be simply drawn from an uniform distribution. However, the additional work is needed to generate the exponential distribution. The connection between uniform and exponential can be formulated as follows:
Then, combining with the equation in terms of r2 and solving for r gives,
Now all variables r and θ can be expressed in terms of uniform distributions. By back-tracking above procedure, we will be able to successively generate a normal distribution from out of uniform distributions. This way to generate normal random numbers is known as Box-Muller transformations with polar method, which makes Box-Muller transformation more efficient by avoiding direct calculations of cosine and sine functions.
Ziggurat Algorithm
Ziggurat method is the fastest algorithm for generating normal random numbers. It is more efficient than Box-Muller method but is more complicated algorithm. Due to the efficiency and speed, it has been implemented in various libraries such as GNU scientific library and MatLab.
This method partitions a target distribution into a small blocks so that the whole distribution can be described as an union of blocks which is
where the area of each Bi is a rectangle whose width extends from x = 0 to x = xi horizontally and whose height vertically extends from f(xi) to f(xi + 1) (details are shown in the following equation).
$ B_i = 0 \quad \text{otherwise} $
Here xis are x-values that monotonically increases from the center to the tail for a given normal distribution.
The algorithm first choose a random block i among all possible Bis. Then, it draws a random number u0 from uniform (0,1) and calculate z by multiplying the x-axis index i with u0. If the randomly generated z belongs to any of partitioned blocks, z returns as a normal random number. Otherwise, it again calculates a similar step for y-axis to check whether the sample belongs to wedge or not. Since understanding of the principle of this algorithm requires background knowledge, details are out of scope for this slecture and please refer to the paper titled ``The Ziggurat Method for Generating Random Variables.
The critical values such as the edges of the rectangle (xi or f(xi)) used in Ziggurat method are stored in memory, which improves efficiency in terms of computational cost. Moreover, it does not need to evaluate an exponential function all the time while Box-Muller method computes the exponential (or logarithm) every time. The ziggurat algorithm will calculate an exponential function only if samples are in the wedge. For these reason, the ziggurat algorithm is the fastest way to generate but relatively difficult to implement it is best used when large quantities of random numbers are needed.
Conversion from normalized distribution
Any normal distribution with μ0 and σ0 can be easily converted to another normal distribution with a custom mean μ1 and standard deviation σ1. For convenience and resource saving purpose, random numbers are generated from normalized normal distribution that follows N(0,1) from a single unified source, then such numbers are converted to follow a custom normal distribution simply using the following equation.
$ X = \sigma_i Z + \mu_i \quad \text{where } Z \sim N(0, 1) \text{ and } X \sim N(\mu_i, \sigma_i) $
Extension to N-dimensional data
Until now, we have studied how to generate normally distributed random number in 1-dimensional space. The same method can be extended to N-dimensional space to create multi-dimensional random vectors whose underlying distribution is normal. Compared to the previous case where we were using a scalar value for a mean and a standard deviation, N-dimensional random vectors can be obtained by repeating the single random number generation process N times, then concatenating random numbers as a column-wise vector. This is possible because all basis are orthogonal to each other and generating a single random value in a vector does not affect the generation process of other components. Note that prior selection is used to choose either class vector 1 or class vector 2. Once the class is determined by its prior drawn from uniform distribution, random number generation process is repeated N times to fill out the random vector. Each of which will follow class vector statistics, which are
where the superscript denotes the class label and the subscript indicates an element index. M and S denotes a mean matrix and a standard deviation matrix, respectively.
Surely, this method is no more available when more than two basis in a given multi-dimension space are correlated. The scope of this work covers a general case where all basis in the search space are orthogonal.
Conclusion
Now we understand how to generate N-dimensional normally distributed random numbers from two categories with different priors using samples taken from uniform distributions. The generation of random numbers with a normal distribution combined with the prior selection step discussed in the earlier section provides statistically consistent datasets, which will improve the accuracy of trainable systems. While class selection based on the prior probabilities is relatively simple, the generation of random numbers with a normal distribution has many issues to be considered because of the trade-off between its computational cost and the accuracy. The four methods covered in this work have their own pros and cons, but practically efficiency and complexity are the most important concerns to be considered when deciding an optimal model for given task.
References
- Dong-U Lee, Wayne Luk, John Villasenor and Peter Cheung, ``A Hardware Gaussian Noise Generator for Channel Code Evaluation
- Raul Toral and Amitabha Chakrabarti, ``Generation of Gaussian distributed random numbers by using a numerical inversion method
- Hassan Edrees, Brian Cheung, McCullen Sandora, David Nummey and Deian Stefan ``Hardware-Optimized Ziggurat Algorithm for High-Speed Gaussian Random Number Generators
- Central Limit Theorem, wikipedia, http://en.wikipedia.org/wiki/Central_limit_theorem
- Inverse Transform Sampling, wikipedia, http://en.wikipedia.org/wiki/Inverse_transform_sampling
- Box-Muller Transform, wikipedia, http://en.wikipedia.org/wiki/Box_Muller_transform
- Box-Muller Transformation, mathworld, http://mathworld.wolfram.com/Box-MullerTransformation.html
- Emiliano A. Valdez, Lecture note, http://www.math.uconn.edu/~valdez/math3634s09/Math3634-Weeks4to5.pdf
- Karl Sigman, Lecture note, http://www.columbia.edu/~ks20/4404-Sigman/4404-Notes-ITM.pdf
- Henrik Schoioler, Lecture note, http://www.control.auc.dk/~henrik/undervisning/DES/lec03.pdf
- http://theclevermachine.wordpress.com/2012/09/11/sampling-from-the-normal-distribution-using-the-box-muller-transform/
Questions and comments
If you have any questions, comments, etc. please post them on this page.