Questions and Comments for: Curse of Dimensionality
A slecture by Haonan Yu
Please leave me comment below if you have any questions, if you notice any errors or if you would like to discuss a topic further.
Questions and Comments
Review by Soonam Lee:
- This slecture mainly talks about curse of dimensionality using easy metaphor. Assume that our goal is finding the needle in N-D haystack. If the haystack lies in 1-D, simple search followed by one axis or basis is enough. However, as number of dimension increases, the number of basis increases and it produces exponentially expensive search problem. The author explains this concept with various pictures. Moreover, this slecture describes sparsity of samples that is from increasing dimension. With the Gaussian reconstruction using different number of samples, the author conveys the idea efficiently. Lastly, three ways are introduced to break this curse of dimensionality such as feature extraction method, dimensionality reduction techniques, and kernel method. Firstly, feature extraction method is the one way that selects and extracts meaningful features from given redundant information. On the other hands, dimensionality reduction technique gave us the way to project from the high dimensional space to low dimensional space. In this case, how to choose the project axes are very important tasks. PCA uses the largest variance axes as a projection axes whereas LDA decides these axes based on the best separation across class. Apart from these two methods, kernel methods maps data to much higher dimensions but the data should be explained with high dimension.
- Overall, the contents are very simple and intuitive. More specifically, the metaphor which describes the curse of dimensionality is easy to understand. The author uses several pictures and it helps even beginner to understand this concept. Also, the sparsity caused by increasing dimension is also clear because the slecture explains the concept with pictures. Lastly, dimensional reduction technique is also easily understood.
- I have some suggestions to make this slecture more abundant. First, the author uses the word overfitting which is the lack of ability to reliably estimate and generalize. However, overfitting is not coming from lack of training data. Perhaps, underfitting is the appropriate word that have bad estimation performance due to sparse samples. The definition of overfitting is such that it fits very well for training data, but it doesn't fit well for testing data. It means since it captures unnecessary details such as noise from training data, it did not fit well in testing data. Second, since PCA and LDA are not difficult idea, he can put another pictures to explain them easier. The last figures can help understanding dimensionality reduction technique roughly, but did not provide information about PCA or LDA itself. Since we have several nice slecture to explain this, I will link them as below.
Answer to the comments: First, thanks for the reviews.
It is overfitting instead of underfitting. Underfitting is because that the model is too simple to capture the trend of the data. For example, you try to fit points sampled from a curve with a straight line. Underfitting cannot be solved by adding into training samples because in the end you will still get a straight line. In this case, overfitting is the correct term because the quantity of the data usually does not match the complexity of the model (in high dimensionality). However, this overfitting will be solved by adding into more training points. About PCA, I do have a link in the original slecture. I also have a link on LDA (to wikipedia).
Back to Curse of Dimensionality