Line 33: Line 33:
 
*According to Tou and Gonzalez, “The principal function of a pattern recognition system is to yield decisions concerning the class membership of the patterns with which it is confronted.”  For this project, the goal is to compare the image of each trimmed character with that of training data. Due to the fact that each image of characters is made up with numerical pixels, we can find the correlation between them in two dimensions. The formula is used to find the value of correlation. The higher value means the better match.
 
*According to Tou and Gonzalez, “The principal function of a pattern recognition system is to yield decisions concerning the class membership of the patterns with which it is confronted.”  For this project, the goal is to compare the image of each trimmed character with that of training data. Due to the fact that each image of characters is made up with numerical pixels, we can find the correlation between them in two dimensions. The formula is used to find the value of correlation. The higher value means the better match.
 
<math>
 
<math>
r =  \frac{ \sum_{m=1}^{M}  \sum_{n=1}^{N} \left( A_{mn}-\bar{A}\right) \left(B_{mn}-\bar{B}  \right) }
+
r =  \frac{ \sum_{m=1}^{M}  \sum_{n=1}^{N} \left( A_{mn}-\bar{A} \right) \left( B_{mn}-\bar{B}  \right) }
{\sqrt{ \left( \sum_{m=1}^{M} \sum_\sum_{n=1}^{N} \left(A_{mn}-\bar{A} \right)^2} \right) \left( \sum_{m=1}^{M} \sum_{n=1}^{N} \left(B_{mn}-\bar{B} \right)^2} \right)}
+
{\sqrt{ \left( \sum_{m=1}^{M} \sum_\sum_{n=1}^{N} \left( A_{mn}-\bar{A} \right)^2 \right) \left( \sum_{m=1}^{M} \sum_{n=1}^{N} \left( B_{mn}-\bar{B} \right)^2 \right)}}
 
</math>
 
</math>
Then we find the maximum value of correlation. And the corresponding letter or number is the recognized symbol for the each character.  
+
*Then we find the maximum value of correlation. And the corresponding letter or number is the recognized symbol for the each character.  
  
 
----
 
----

Revision as of 23:41, 13 November 2016


Introduction to Optical Character Recognition(OCR)

1. Introduction

  • Optical character Recognition (OCR) serves as a tool to detect information from natural images and transfer them into machine-coded texts, such as words, symbols and numbers. It is still a hot ongoing search area and some novel algorithms are publishing from time to time. It is pretty interesting and essential to recognize the characters in the image because it could help greatly in some certain area: auto plate number recognition, books and documents scanning, assistive technology for blind and visually impaired users , zip-code recognition needed for post offices and much more.
  • In this page, I would like to introduce a basic and simple method to transfer typed alphabets or numbers into machine coded texts.

2. algorithm overview

  • OCR is a simple machine learning example, because it requires training and testing processing. Both training and testing needs the procedure of pre-processing and feature extraction. However, we know the true classification of the training data and we can build the statistical model to estimate the marked class. And by comparing the same extracted feature data, we can try to classify the testing data into corresponding class.

3. algorithm assumption

  • The proposed algorithm is simple and easy to learn. The purpose of this project is to welcome talents like you to get involved with the recognition world.
  • We assume the input image has a clean background. The "clean" here means the contrast between the background and characters is high enough to detect. It works best when the background is white. It will enhance the success rate of the algorithm. The colored letters are intended to include in the project. However, if the returned results are weird, try to convert the image into gray scale. Because binarization helps us to calculate a relatively robust threshold value to classify the results.
  • We assume all the characters are lined horizontally. But they could be placed in several lines. In the project, the technique we used to segment each character requires this particular arrangement.
  • We assume all the characters are machine printed or similar to that. It is because the training data set contains only a particular letter style. So maybe some machine printed testing images would also return undesired results.
  • We assume all the test data contains characters whose font size is not less than 42 * 24 pixels. However, if that is small, You want to use the function 'imdilate' to increase its thickness before executing the program.

4. main steps of the algorithm

1. Segmentation

  • It is a pre-processing part. The preliminary step is to convert the image into binary number by 'im2bw'.The white pixel returns 0 and black pixel returns 1 by the function.
  • After that, crop the image to fit the text. We find the minimum and maximum index of the picture that contains a text pixel vertically and horizontally. Then we can get the cropped image by the calculated index.
  • Then extract the whole characters line by line. We sum up each row to find the first one to be zero. And we then know the index of the row to trim. By doing this, we can separate the first line of texts with the remaining if there are several lines. If we repeat this procedure, we can successfully separate all the lines that contains texts.
  • Finally, we can trim each character in each line. To achieve this, we will use the function 'bwlabel'. The function 'L = bwlabel(BW)' returns the label matrix L that contains labels for the 8-connected objects found in BW. The label matrix, L, is the same size as BW. The code is from: http://www.mathworks.com/matlabcentral/fileexchange/8031-image-segmentation---extraction.

2. Classification

  • According to Tou and Gonzalez, “The principal function of a pattern recognition system is to yield decisions concerning the class membership of the patterns with which it is confronted.” For this project, the goal is to compare the image of each trimmed character with that of training data. Due to the fact that each image of characters is made up with numerical pixels, we can find the correlation between them in two dimensions. The formula is used to find the value of correlation. The higher value means the better match.

$ r = \frac{ \sum_{m=1}^{M} \sum_{n=1}^{N} \left( A_{mn}-\bar{A} \right) \left( B_{mn}-\bar{B} \right) } {\sqrt{ \left( \sum_{m=1}^{M} \sum_\sum_{n=1}^{N} \left( A_{mn}-\bar{A} \right)^2 \right) \left( \sum_{m=1}^{M} \sum_{n=1}^{N} \left( B_{mn}-\bar{B} \right)^2 \right)}} $

  • Then we find the maximum value of correlation. And the corresponding letter or number is the recognized symbol for the each character.

5. References



Alumni Liaison

Ph.D. on Applied Mathematics in Aug 2007. Involved on applications of image super-resolution to electron microscopy

Francisco Blanco-Silva