Project Report On LICENSE PLATE RECOGNITION SYSTEM PROJECT GROUP MEMBERS A. NISHANTH J. VISHWESH NACHIKET VASANT VAIDYA NAVEEN SUKUMAR TAPPETA R. ANAND UNDER THE GUIDANCE OF PROF. S. R. SATHE DEPARTMENT OF COMPUTER SCIENCE VISVESVARAYA NATIONAL INSTITUTE OF TECHNOLOGY NAGPUR 2008-2009 DEPARTMENT OF COMPUTER SCIENCE VISVESVARAYA NATIONAL INSTITUTE OF TECHNOLOGY NAGPUR 2008-09 CERTIFICATE This Is To Certify That A. Nishanth J. Vishwesh Nachiket Vasant Vaidya Naveen Sukumar Tappeta R. Anand Have Successfully Completed The Project Titled
LICENSE PLATE RECOGNITION SYSTEM during the academic year 2008-2009. This dissertation is towards the partial fulfillment in the requirement for the degree of BACHELOR OF TECHNOLOGY (COMPUTER SCIENCE AND ENGINEERING), V. N. I. T, Nagpur. DR. S. R. SATHE DR. O. G. KAKDE (PROJECT GUIDE) (HEAD OF THE DEPT) ACKNOWLEGEMENT This project would not have taken its shape, nor would it have been successfully completed without the support and encouragement of our project guide Prof. S. R. Sathe, who presented us with an invaluable opportunity to work on this project.
We take this opportunity to express our sincere gratitude to him. We are extremely indebted to him for the same. We wish to express our heartfelt gratitude to the Department of Computer Science, and the Head of Department, Dr. O. G. Kakde for granting us full freedom towards the utilization of all the facilities in the department. This project has been a very good experience for all of us, which helped us to work together as a good team. We are sure that the knowledge and experience gathered by this will stand us in good stead in our future.
In this thesis a Number Plate Recognition System for Indian License Plates has been explicated. The System comprises of 4 modules for each of the following: The extraction of a region of interest (ROI) containing a car, the extraction of the license plate candidates from these ROIs, the segmentation of the characters from the best candidate and finally using Optical Character Recognition (OCR) on the segmented characters.
The results are fed to a grammar checking module after which the license plate number is obtained. The algorithm used to generate the ROIs is a weighted histogram method, the license plate extraction uses vertical edge detection and image morphology, the character segmentation is done using a simple connected component analysis along with heuristics and finally the OCR is implemented using the novel Hierarchical Temporal Memory (HTM) Framework.
This thesis proposes a solution to: the problem of License Plate Localization in images with a complicated background, the problem of increasing the effectiveness of morphology and edge based approaches as they are very sensitive to noisy edges and the problem of recognition of characters that have varied size, rotation and have a lot of noise. The system has two components viz. a component that runs on Matlab 7. 6 that performs the image processing and a component on python2. 5. 2 that runs the Numenta Platform for Intelligent Computing(Nupic) for the OCR.
The combination of algorithms that we have used proves very effective and has been applied successfully over a test database. ABSTRACT | INTRODUCTION| | | | | INTRODUCTION License plate recognition is a mass surveillance method that uses optical character recognition on images to read the license plates on vehicles. There are three key components to any NPR system: License Plate extraction, Character Segmentation, Optical Character Segmentation. Analyses on the performance of a number of techniques which are used in Number Plate Recognition are discussed below (Refer ). LICENSE PLATE LOCALIZATION TECHNIQUES
Binary Image Processing Techniques Using Edge and Morphology Based Approaches These techniques are sensitive to noisy edges, however hybrid techniques in this area coupled with prior system information such as distance constraints from the car boost system accuracies to as high as 99. 6% in . In this thesis, a method is proposed to boost the accuracies of these Image processing techniques when applied to images with complicated backgrounds by selecting region of interests. Image Transformations A gabor filter based method is used for texture analysis and license plate detection.
These methods are computationally expensive and slow for images with large analysis. In the method that uses Hough transform (HT), edges in the input image are detected first. Then, HT is applied to detect the LP regions. The execution time of the HT requires too much computation when applied to a binary image with great number of pixels. Methods Based On Color and Templates The solutions based on color currently available do not provide a high degree of accuracy in natural scenery, since color is not stable when the lighting conditions change.
In addition, as these methods are color based, they are country specific. Methods based on Templates have little effect on Indian License Plates due to rampant non-standardization. The hierarchy introduced in the License Plate Recognition system that has been developed in this thesis viz. Image Candidates containing Car Candidates containing License Plates Enhances the Edge and Morphology based techniques by drastically reducing the number of false Number Plate candidates and increases the accuracy of Plate Localization.
OPTICAL CHARACTER RECOGNITION TECHNIQUES Pattern Matching It is suitable for single font, not-rotated, fixed size characters only. It is reported that 90% of central processing unit time was consumed for the computation of the cross-correlation measures between the various templates and the relative sub-image. Hidden Markov Models (HMMs) The disadvantage was the complex procedure of preprocessing and parameterization. It gives a result of 95. 7%. It has a restriction on effective distance of plate recognition system.
Hybrid Approach This uses statistical and structural recognition method. It achieves robustness and a high recognition performance. Its success rate is 95. 41%. Neural Networks Multilayered Feed Forward Neural Networks Approach The network has to be trained for many training cycles to attain the good performance. The number of hidden layers as well as number of respective neurons has to be defined by trial and error approach only. This approach cannot handle the noisy part. Self-Organized Neural Networks Based On Kohonen’s
The feature maps are implemented to handle noisy, deformed, broken, or incomplete characters acquired from License Plates that were bent and/or tilted with respect to the camera. The method focuses on accuracy at the cost of increased complexity and execution speed. Thus it may be concluded that on analyzing some earlier used techniques for Optical Character Recognition (OCR), many methods involving HMM’s , self-organized neural networks based on Kohonen’s, though very robust, have high computational costs involved.
We propose a new technique for OCR on Numenta’s HTM framework. It achieves a high level of rotation and scale invariance in recognition and the hierarchical structure has an added advantage of memory efficiency while making invariant representations of characters. ORGANIZATION The thesis is organized as follows: The First chapter gives an introduction to the architecture of a general License Plate Recognition System and lists the components of the software system that has been developed for this thesis. It also lists the assumptions and the system parameters.
The second chapter provides an explanation to the Image Processing and Segmentation techniques that have been used. The third chapter illustrates the process involved in extracting the Regions of Interest (ROI) containing the car, Localization of the number plate from the ROI and describes the segmentation techniques to get the individual characters. The third chapter illustrates the concepts of Hierarchical Temporal Memory (HTM). The fourth chapter describes the HTM Learning Algorithms. The fifth chapter deals with the HTM based Optical Character Recognition (OCR) and Grammar Check Modules implemented for the system.
The last chapter lists the results that were obtained by applying the system on various images, conclusion and the future work. | CHAPTER-I| SYSTEM ARCHITECTURE| | | | | CHAPTER-I 1. SYSTEM ARCHITECTURE A Number Plate Recognition system comprises of software (Image Processing and Character Recognition) and hardware components (Custom License Plate Capture Cameras) as shown in the figure. Images from the acquisition device are processed by the Software components and the results are logged or the results could be used to trigger actions such as: opening a gate to a restricted area.
Figure 1. 1 The system proposed in the following thesis has been designed specifically for deployment at sites such as: Site Access Control, Car Parks, Freight & Logistics Companies, Toll Booths, Airports, Hotels, Industrial Estates, Contract Car Parking, Banks and Stadiums. In this thesis we have focused on the implementation of the software component of a License plate Recognition System and the algorithms proposed allow the system to be used with a variety of low and high quality acquisition devices subject to the set constraints. . 1 SOFTWARE ARCHITECTURE The system has two components namely, the Matlab component and the Python Component. The Matlab Component retrieves images from the Image Database. First, the ROI containing the car are extracted by the Car Candidate Generation Module. Second, the License Plate Localization and Segmentation Module localize the License Plate from the ROI, segment the characters and pass them to Python Component through the Inter Process Communication Module.
The Python Component applies the HTM framework based OCR on the characters segmented earlier and passes the results onto a Grammar Check Module. The Grammar Check Module returns a single result to the Matlab Component via Inter Process Communication Module. The results are finally displayed and can be further processed. The following pipeline illustrates the Software Architecture of the system: License Plate Recognition System Image Database Python Component HTM based OCR Grammar Checking Module Inter Process Communication Matlab Component Car Candidate Generation
License Plate Localization and Character Segmentation Inter process Communication Display Results Figure 1. 2 1. 1. 1 Assumptions made by the System * The System is designed to provide best results when applied to number plates corresponding to the rules stated below: “On June 1, 2005, the Government of India introduced High Security Registration (HSR) number plates which are tamper proof. All new motorized road vehicles that came into the market after that need to adhere to the new plates, while existing vehicles have been given two years to comply.
Features incorporated include the number plate having a patented chromium hologram; a laser numbering containing the alpha-numeric identification of both the testing agency and manufacturers and a retro-reflective film bearing a verification inscription “India” at a 45-degree inclination. The characters are embossed on the plate for better visibility. The letters “IND” are printed in a light shade of blue on the observers left side under the hologram. ” Figure 1. 3 * Even though the system works for multiple cars, we focus on retrieving the number plate of the prominent car. The grammar used is based on only Indian license plates and can be extended to other countries also. 1. 1. 2 Parameters taken by the System * Image resolution used: 1200 x 1600 * The images are acquired from a distance between 3 to 6 meters and a height of 2 meters * The input to the system is an RGB image * The output is a string containing the registration number | CHAPTER-II| IMAGE PROCESSING AND SEGMENTATION TECHNIQUES| | | | CHAPTER-II 2. IMAGE PROCESSING AND SEGMENTATION TECHNIQUES 2. 1 IMAGE ENHANCEMENT Image Enhancement involves the processing of an image that is better in quality in comparison to the raw image.
The following sections explain the various spatial filters that are used in the system. Spatial Filtering involves convolution of an image with a mask. The filter masks are called convolution masks or kernels. The response R of an m x n mask at any point (x,y) in an image is given by R= i=1mnwizi where the w’s are the mask coefficients, the z’s are the values of the image gray levels corresponding to those coefficients and mn is the total number of coefficients in the mask. 2. 1. 1 Average Filter R= 1121i=1121zi Figure 2. 2 Figure 2. 1
The equation above shows an 11 x 11 smoothing filter which is the average of the gray levels of the pixels in the 11 x 11 neighborhood defined by the mask. 2. 1. 2 Median Filter It is necessary to perform a high degree of noise reduction in an image before performing higher-level processing steps, such as edge detection. The median filter is a non-linear digital filtering technique, used to remove noise from images or other signals. It examines a sample of the input and decides if it is representative of the signal. This is performed using a window consisting of an odd number of samples.
The values in the window are sorted into numerical order; the median value, the sample in the center of the window, is selected as the output. The oldest sample is discarded, a new sample acquired, and the calculation repeats. Figure 2. 3 For example, suppose that a 3 x 3 neighborhood has values (10, 20, 20, 20, 15, 20, 20, 25, 100). These values are sorted as (10, 15, 20, 20, 20, 20, 20, 25, 100), which results in a median of 20. Thus, the principal function of median filters is to force points with distinct gray levels to be more like their neighbors.
In fact, isolated clusters of pixels that are light or dark with respect to their neighbors, and whose area is less than n2/2 (one-half the filter area), are eliminated by an n x n median filter. In this case “eliminated” means forced to the median intensity of the neighbors. Larger clusters are affected considerably less. Figure 2. 4 2. 2 IMAGE SEGMENTATION In computer vision, segmentation refers to the process of partitioning a digital image into multiple segments (sets of pixels) (Also known as super pixels).
The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries (lines, curves, etc. ) in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain visual characteristics. 2. 2. 1 IMAGE MORPHOLOGY AND DILATION Mathematical morphology (MM) is a theory and technique for the analysis and processing of geometrical structures, based on set theory, lattice theory, topology, and random functions.
MM is most commonly applied to digital images, but it can be employed as well on graphs, surface meshes, solids, and many other spatial structures. Dilation operation is fundamental to morphological processing. With A and B as sets in Z, the dilation of A by B, denoted A? B is defined as A? B= zBz? A ? ? This equation is based on obtaining the reflection of B about its origin and shifting this reflection by’ z. The dilation of A by B then is the set of all displacements, Z, such that B and A overlap by at least one element. Set B is commonly referred to as the structuring element in dilation. Figure 2. 5 2. 2. EDGE DETECTION The sobel operator calculates the gradient of the image intensity at each point, giving the direction of the largest possible increase from light to dark and the rate of change in that direction. The result shows how “abruptly” or “smoothly” the image changes at that point, and therefore how likely it is that that part of the image represents an edge, as well as how that edge is likely to be oriented The gradient of a two-variable function (the image intensity function) is at each image point a 2D vector with the components given by the derivatives in the horizontal and vertical directions.
At each image point, the gradient vector points in the direction of largest possible intensity increase, and the length of the gradient vector corresponds to the rate of change in that direction. The result of the Sobel operator at an image point which is in a region of constant image intensity is a zero vector and at a point on an edge is a vector which points across the edge, from darker to brighter values Mathematical Representation: Mathematically, the operator uses two 3? 3 kernels which are convolved with the original image to calculate approximations of the derivatives – one for horizontal changes, and one for vertical.
If we define A as the source image, and Gx and Gy are two images which at each point contain the horizontal and vertical derivative approximations, the computations are as follows: Gy=+1+2+1000-1-2-1*A and Gx=+10-1+20-2+10-1*A where * here denotes the 2-dimensional convolution operation. The x-coordinate is here defined as increasing in the “right”-direction, and the y-coordinate is defined as increasing in the “down”-direction. At each point in the image, the resulting gradient approximations can be combined to give the gradient magnitude, using: G=Gx2+Gy2
Using this information, we can also calculate the gradient’s direction: ? =arctanGxGy where, for example, ? is 0 for a vertical edge which is darker on the left side. Figure 2. 6 Figure 2. 7 2. 2. 3 EXTRACTION OF BLOB PROPERTIES USING MOMENTS Image moments are certain particular weighted averages (moments) of the image pixels’ intensities, or functions of those moments, usually chosen to have some attractive property or interpretation. For a 2-D continuous function f(x,y) the moment (sometimes called “raw moment”) of order (p + q) is defined as Mpq=-?? -?? xpyqfx,ydx dy or p, q = 0,1,2,… Adapting this to scalar (greytone) image with pixel intensities I(x,y), raw image moments Mij are calculated by Mij=xyxiyjIx,y Area: Area (for binary images) or sum of grey level (for greytone images): M00 Centroid: Centroid: x,y=M10M00,M01M00 Orientation and Length of Major and Minor Axis: Information about image orientation can be derived by first using the second order central moments to construct a covariance matrix. ?20=? 20? 00=M20M00-x2 ?02=? 02? 00=M02M00-y2 ?11=? 11? 00=M11M00-xy2 The covariance matrix of the image I(x,y) is now covIx,y=? 20? 11? 11? 02
The eigenvectors of this matrix correspond to the major and minor axes of the image intensity, so the orientation can thus be extracted from the angle of the eigenvector associated with the largest eigenvalue. It can be shown that this angle ? is given by the following formula: ? =12arctan2? 11? 20-? 02 The eigenvalues of the covariance matrix can easily be shown to be ? i=? 20+? 022±4? 112+? 20-? 0222 and are proportional to the squared length of the eigenvector axes. The relative difference in magnitude of the eigenvalues are thus an indication of the eccentricity of the image, or how elongated it is. The eccentricity is 1-? 2? Bounding Box: The extreme (min and max) x and y values in the blob are recorded and a set of bounding box co-ordinates namely (xmin , ymin) and (xmax , ymax) are generated. The bounding box is the smallest box that completely encloses the image. | CHAPTER-III| LOCALIZATION AND SEGMENTATION| | | | CHAPTER-III 3. LOCALIZATION AND SEGMENTATION 3. 1 INTRODUCTION To localize the license plate three classes of techniques are made use of, viz. Morphology-based techniques, Edge-based techniques and Histogram-based techniques. Later, the Character- Segmentation is achieved using Connected Component Analysis and by applying Heuristics.
The functioning of this part of the system can be further divided into 3 sub-sections. The first deals with Car Candidate Generation, the second deals with License Plate Localization and the third with Character Segmentation. 3. 2 EXTRACTING THE CAR-CANDIDATE REGION The Car-Candidate-Generation is done to focus on the car area in the given image to remove false candidates like name-boards, and noisy edges etc. The following steps were used to extract the car candidates: 1. Figure 3. 1 Figure 3. 2 The RGB Image is converted to a Grayscale image using: I = 0. 2989 * R + 0. 870 * G + 0. 1140 * B. 2. Figure 3. 3 An averaging filter is applied on the Grayscale image by convolving it with a 11×11 mask. 3. Figure 3. 4 The Grayscale Image is subtracted from the Averaged image to yield an edge image. The effect induced by Averaged Image – Grayscale Image is as follows: Values in the grayscale image higher than the average value (in the 11 x 11 neighborhood) are eliminated while calculating the difference as negative values are converted to 0. Values close to the average value are eliminated while thresholding to convert this image to a black and white image.
Thus only pixels with values significantly lower than the average in the 11 x 11 neighborhood are retained. All low intensity parts of the edge in a grayscale image have a high value in the averaged image and hence in the differenced image they retain a good magnitude. 4. Figure 3. 5 The Otsu’s Threshold is used to convert the edge image into a black and white image. 5. Figure 3. 6 The Black and White Image is labeled resulting in each connected component to have a number/label associated with it. The area of these blobs is computed. 6. Figure 3. 7 A Weighted Vertical Histogram is computed for the labeled image.
For every column in the Labeled Image, we extract the unique labels. The sum of the areas corresponding to these unique labels is the value of the Histogram for that column. 7. Figure 3. 8 Figure 3. 9 The peaks having magnitude above a threshold (V) and having a distance of VThresh between them are clustered to form vertical strips containing potential car candidates. Depending on the threshold value, eliminate the false car candidates. 8. Figure 3. 10 A weighted horizontal histogram is computed on the vertical strips thus obtained after eliminating the false candidates. 9. Figure 3. 12
Figure 3. 11 The peaks having magnitude above a threshold (H) and having a distance of HThresh between them are clustered to form rectangular strips assuming a threshold of one-third of the average in a strip, and obtain the car candidates. 3. 3 LICENSE PLATE LOCALIZATION The following pipeline explains the steps involved in the localization Once the car candidate is obtained; we localize the number plate in the following manner: 1. Grayscale image corresponding to the potential car candidate region is extracted. 2. To remove Salt and Pepper noise, median filter with a 3×3 mask is applied.
It is useful to reduce speckle noise and salt and pepper noise. Its edge-preserving nature makes it useful in cases where edge blurring is undesirable. 3. Vertical edge detection is done on the image using sobel operator. This yields a grayscale image which is then subjected to a threshold based on RMS estimate of noise. 4. The image is dilated using a rectangular structuring element of size [2 15]. Figure 3. 13 5. Figure 3. 14 The area and bounding box of the connected components are then computed by first labeling the image and then extracting the blob properties. 6. Figure 3. 15
Area and Aspect Ratio heuristics are applied on connected components in the Dilated Image, to get Candidate License Plate (CLP) regions, and the corresponding regions are extracted from the edge image. 3. 4 CHARACTER SEGMENTATION Now the number plate candidates thus obtained are subjected to Character Segmentation by the following steps: The Connected Component Analysis is performed to obtain bounding box of each character. The connected component analysis algorithm is applied to the processed images. So we get the bounding rectangle of the object and the number of the object pixels in these rectangles.
The following heuristics are applied to eliminate fake License Plate Candidates: * First the height heuristic is applied such that if the height of the bounding box of each object in the candidate license plate region is at least 0. 4 times the height of the Minor Axis Length of the CLP it was a part of, only then can it be a character. * Second the width heuristic is applied such that if the width of the bounding box of each object in the candidate license plate region is less than 0. 125 times the length of the Major Axis length of the CLP it was a part of, only then can it be a character. Any CLP retrieved from the dilated image should have at least four such objects stated in the above two points to qualify to be final candidate. * Lastly, an equation of a line passing through center of the plate is calculated using the Centroid and the Orientation of the plate candidate. All centroids of the Connected Components in the plate should be less than minimum perpendicular distance (MinDist) from the line. This heuristic further eliminates fake candidates. The other segmented regions eliminated, as they don’t qualify to be a character and may just be noise.
Figure 3. 16 Thus the segmented characters are obtained. | CHAPTER-IV| HIERARCHICAL TEMPORAL MEMORY (HTM)| | | | CHAPTER-IV 4. HIERARCHICAL TEMPORAL MEMORY (HTM) 4. 1 WHAT IS HTM? Hierarchical Temporal Memory (HTM) is a technology that replicates the structural and algorithmic properties of human brain such as visual pattern recognition, understanding spoken language, recognizing and manipulating objects by touch. HTMs are not programmed and do not execute different algorithms for different problems. Instead, HTMs “learn” how to solve problems.
HTMs are trained by exposing them to sensory data and the capability of the HTM is determined largely by what it has learnt. HTMs are organized as a tree-shaped hierarchy of nodes, where each node implements a common learning and memory function. HTM memory is hierarchical in both space and time to capture and model the structure of the world. HTMs perform the following four basic functions regardless of the particular problem they are applied to: * Discover causes in the world * Infer causes of novel input * Make predictions * Direct behavior . 1. 1 Discover causes in the world Figure 4. 1 Left box in the figure 4. 1 represents a world the HTM is to learn about. The world consists of objects and their relationships. The objects in the world are physical such as cars, people, and buildings. The right box in Figure 4. 1 represents an HTM. It interfaces to its world through one or more senses shown in the middle of the figure. The senses sample some attribute of the world such as light or touch, though the senses used by an HTM do not need to be the same senses humans have.
Typically the senses don’t directly detect the objects in the world. Senses typically present an array of data to the HTM, where each element in the array is a measurement of some small attribute of the world. From an HTM’s perspective, there are two essential characteristics of sensory data. First, the sensory data must measure something that is directly or indirectly impacted by the causes in the world. Second, the sensory data must change and flow continuously through time, while the causes underlying the sensory data remain relatively stable.
The temporal aspect of sensory data can come from movements or changes of the objects in the world or it can come from movement of the sensory system itself through the world. The HTM’s output is manifest as a set of probabilities for each of the learned causes. This moment-to-moment distribution of possible causes is called a “belief”. If an HTM knows about ten causes in the world, it will have ten variables representing those causes. The value of these variables – its belief – is what the HTM believes is happening in its world at that instant.
Typical HTMs will know about many causes, and as you will see, HTMs actually learn a hierarchy of causes. 4. 1. 2 Infer causes of novel input After an HTM network was trained with the set of training data, the network would be ready for the inference and it gives the output as the probability vector with the highest probability as the category to which the object belongs to. 4. 1. 3 Make predictions HTMs consist of a hierarchy of memory nodes where each node learns causes and forms beliefs. Part of the learning algorithm performed by each node is to store likely sequences of patterns.
By combining memory of likely sequences with current input, each node has the ability to make predictions of what is likely to happen next. An entire HTM, being a collection of nodes, also makes predictions. Just as an HTM can infer the causes of novel input, it also can make predictions about novel events. Predicting the future of novel events is the essence of creativity and planning. Leaving the details of how this works for later, we can state now what prediction can be used for. There are several uses for prediction in an HTM, including priming, imagination and planning, and generating behavior. Priming
When an HTM predicts what is likely to happen next, the prediction can act as what is called a “prior probability”, meaning it biases the system to infer the predicted causes. For example, if an HTM were processing text or spoken language, it would automatically predict what sounds, words, and ideas are likely to occur next. This prediction helps the system understand noisy or missing data. Imagination and Planning HTMs automatically predict and anticipate what is likely to happen next. Instead of using these predictions for priming, an HTM’s predictions can be fed back into the HTM as a substitute for sensory data.
This process is what humans do when they think. Thinking, imagining, planning the future, and silently rehearsing in our heads are all the same, and achieved by following a series of predictions. HTMs can do this as well. Imagining the future can be valuable in many applications. For example, a car may be equipped with an HTM to monitor nearby traffic, to drive accordingly. 4. 1. 4 Direct behavior An HTM that has learned the causes in its world, and how those causes behave over time, has in essence created a model of its world. Now suppose an HTM is attached to a system which physically interacts with the world.
What is important is that the system can move its sensors through its world and/or manipulate objects in its world. In such a system, the HTM can learn to generate complex goal-oriented behavior. As the HTM discovers the causes in its world, it learns to represent its built-in behaviors just as it learns to represent the behaviors of objects in the outside world. From the HTM’s perspective, the system it is connected to is just another object in the world. Through an associative memory mechanism, the HTM-based representations of the built-in behaviors are paired with the mechanisms creating the built-in behaviors themselves. . 2 SIGNIFICANCE OF THE CONCEPT OF HIERARCHY The following reasons explain the concepts to introduce a hierarchical structure: Shared representations lead to generalization and storage efficiency Many methods that have been proposed to do pattern recognition are unable to scale to large problems. Often these methods fail because the amount of memory required, and the amount of time required to train, grows exponentially as the problem space gets large, thereby making it impractical to build large systems. HTMs can require lots of training and large amounts of memory, but they do not suffer exponential problems of scale.
The hierarchy in HTMs is the key to why they can scale. Causes in lower levels of the hierarchy are shared among higher-level causes, significantly reducing the amount of time and memory required to learn new causes, and providing the HTM a means to generalize previously learned behaviors to new and novel causes. The hierarchy of HTM matches the spatial and temporal hierarchy of the real world One of the reasons that HTMs are efficient in discovering causes and performing inference is that the structure of the world is hierarchical.
HTMs exploit this structure by first looking for nearby correlations in sensory data. As you ascend the hierarchy, the HTM continues this process, but now it is looking for correlations of nearby causes from the first level, then correlations of nearby causes from the second level, etc. HTMs do not just exploit the hierarchical spatial structure of the world. They take advantage of the hierarchical temporal structure of the world as well. Nodes at the bottom of an HTM find temporal correlations among patterns that occur relatively close together in both space and time: “pattern B immediately follows pattern A”.
Because each node converts a sequence of spatial patterns into a constant value, the next level in the hierarchy looks for sequences of sequences. The world is hierarchical in a temporal sense, not just spatially. The design of an HTM’s hierarchy should reflect the likely correlations in its world. Belief propagation ensures all nodes quickly reach the best mutually compatible beliefs A connected graph where each node in the graph represents a belief or set of beliefs is commonly referred to as a Bayesian network. Thus, HTMs are similar to Bayesian networks.
In a Bayesian network, beliefs at one node can modify the beliefs at another node if the two nodes are connected via a conditional probability table (CPT). A CPT is a matrix of numbers where the columns of the matrix correspond to the individual beliefs from one node and the rows correspond to the individual beliefs from the other node. Multiplying a vector representing the belief in a source node times the CPT results in a vector in the dimension and “language” of beliefs in the destination node. Belief Propagation (BP) is a mathematical technique that is used with Bayesian networks.
If the network of nodes follows certain rules, such as not containing any loops, BP can be used to force the entire network to quickly settle on a set of beliefs that are mutually consistent. The sensory data imposes a set of beliefs at the lowest level in an HTM hierarchy, and by the time the beliefs propagate to the highest level, each node in the system represents a belief that is mutually consistent with all the other nodes. The highest level nodes show what highest level causes are most consistent with the inputs at the lowest levels. There are several advantages to doing inference this way.
One is that ambiguity gets resolved as beliefs ascend the hierarchy. Another advantage of hierarchical BP is that it is possible to make large systems that settle rapidly. Whenever the state of the network changes, whether due to sensory changes or internal prediction, the network quickly settles on the set of beliefs that are most mutually consistent. In human terms, what occupies our thoughts is sometimes driven by our senses and sometimes by our internal predictions. Hierarchical representation affords mechanism for attention The hierarchy in an HTM provides a mechanism for covert attention. Covert” attention is when you mentally attend to a limited portion of your sensory input. Humans can attend to a part of a visual scene. Figure 4. 2 We can limit our perceptual experience to a variable size area in the center of our visual field, and we can even attend to objects that are off the center of our visual field. We can similarly attend to tactile input from one hand, the other hand, or our tongue. Figure 4. 2 illustrates the basic mechanism by which HTMs implement covert attention. Each node in the hierarchy sends beliefs to other nodes higher in the hierarchy. These connections are illustrated by small arrows.
By providing a means to switch these pathways on and off, we can achieve the effect of limiting what the HTM perceives. The figure does not show the switching mechanism but highlights the active connections and nodes. The belief at the top of the hierarchy will represent the causes in a limited part of the input space. | CHAPTER-V| HTM ALGORITHMS| | | | CHAPTER-V 5. HTM ALGORITHMS 5. 1 NETWORK STRUCTURE The HTM network is a hierarchical network in which each level contains certain number of number of nodes. The bottom level contains sensor nodes and top level contains category node.
The sensor nodes take the input patterns from the outside world, the middle level nodes take the input from the nodes which are in the lower layer and produces output. The top level node is used to recognize the objects. Figure 5. 1 The Figure shows HTM network which is organized in a 3 level hierarchy interfaces with the outside world (Input image). The input image has the size 32 pixel by 32 pixels. The nodes at the lowest level (Level 1) of the network receive bit pattern from the input image. Each node receives its input from a 4×4 pixel part of the input image.
The nodes at the lowest level are arranged in an 8×8 grid so that the whole 32×32 image is covered by these nodes without overlap. A node at a higher level receives its input from several nodes at the lower level. Level 2 nodes receive its input from the outputs of 4 level 1 nodes. The top level node is used to recognize the object to which category it belongs. 5. 2 STRUCTURE OF A NODE The input to a node is the temporal sequence of patterns The goal of this node is to form pools or groups of these patterns such that patterns belonging to the same group are likely to be variations of the same thing in the visual world.
One variation in the visual world is the relative motion between objects and the viewer. Another source of variation is random noise. The node forms of invariant representations of the patterns by grouping together visual patterns that correspond to variations of the same original pattern and then starts producing outputs by comparing the input to which group it belongs. The node contains two elements to form the invariant representations of the input patterns. * Spatial Pooler * Temporal Pooler Figure 5. 2 The input to the node is connected to the input of the spatial pooler.
The output of the spatial pooler is the input to temporal pooler. The output of the temporal pooler is the output of the node. 5. 3 SPATIAL POOLER Figure 5. 3 The Spatial pooler pools patterns according to their pixel-wise similarity, which is useful for removing noise from a pattern. It learns a mapping from a potentially infinite number of input patterns to a finite number of quantization centers. The output of the spatial pooler is in terms of its quantization centers In the figure 5. 3 the set of patterns c1, c2, c3, c4, c5 are added to the spatial pooler The spatial pooler has two stages of operation. . During the learning stage it quantizes the input patterns and memorizes the quantization centers. 2. Once these quantization centers are learned, it produces outputs in terms of these quantization centers. This is the inference stage. 5. 3. 1 Spatial Pooler learning This spatial pooler algorithm takes in a maximum distance parameter D which corresponds to the minimum separation at which a pattern is considered different from the existing quantization patterns. For every input pattern, check whether there is a quantization center that is within Euclidean distance D from this pattern. If yes, do nothing.
If no, add the new pattern to the list of quantization centers. Using this algorithm the spatial pooler stores a small subset of its input patterns as quantization centers At the beginning of learning, the node rapidly adds new quantization centers. As more quantization centers are added the amount of space that is not mapped to an existing quantization center decreases. Therefore, over time, the number of new quantization centers that a spatial pooler forms within a unit of time decreases. The quantization process can be stopped when the number of new quantization centers per time period falls below some threshold value. . 3. 2 Spatial Pooler Inference After the competition of learning of spatial pooler, it will be able to infer the test patterns. For every input pattern it calculates the Euclidian Distance D(i) with the quantization center i in the spatial pooler and further it calculates the probability P(i) as Pi=exp-Di2? 2 where, the parameter ? value describes noise level in the system. Then input pattern is recognized as the quantization center with the highest probability value. 5. 4 TEMPORAL POOLER The temporal pooler pools together patterns using their temporal proximity.
If pattern A is frequently followed by pattern B, this pooler categorizes them to be in the same group. It can pool together patterns that are very dissimilar from a pixel-similarity perspective. Figure 5. 4 It takes the input from the output of spatial pooler (quantization centers) and produces groups as its output. In the figure 5. 4, the temporal takes the input as set of quantization centers c1,c2,c3,c4,c5 and gives the output as groups g1(c1,c3,c4) ,g2(c2,c5) since the patters c1,c3,c4 are the moving patterns of the same object. 5. 4. 1 Temporal Pooler Learning
Temporal pooler takes the quantization centers as an input and constructs an adjacency matrix (T) out of it. The matrix row represents the set of input patterns that are available at time (t-1) sec and the column represents the set of input patterns that are available at time t sec. The value T(i,j) represents the number of times that the pattern “j” is followed by pattern “i”. Steps of constructing adjacency matrix: Initially all the elements of the matrix T are filled with zero values. Let x(t ? 1) and x(t) be the inputs to the node (and spatial pooler) at time t ? 1 and t. The corresponding outputs of the spatial pooler are y(t? ) and y(t). The temporal pooler calculates c(t) and c(t ? 1) as c(t) = arg max y(t) and c(t ? 1) = arg max y(t ? 1). At time t, the node updates its time-adjacency matrix by incrementing T(c(t ? 1), c(t)). After the time-adjacency matrix has been formed, it must be partitioned into groups. This helps in the process of learning the matrix only once. The goal is to partition the set of quantization centers into temporally coherent subgroups such that all the quantization centers within a group are highly likely to follow each other and all the quantization centers within different groups are unlikely to follow each other. . 4. 2 Grouping Algorithm The greedy based grouping algorithm repeats the following process until all quantization points belong to a group: 1. Find the most-connected quantization point that is not yet part of a group. 2. The most-connected quantization point is simply the quantization point whose corresponding row in the time-adjacency matrix has the greatest sum. 3. Pick the Ntop most-connected quantization points to this quantization point. 4. Ntop is a parameter specified in the algorithms by top Neighbors. The pooler adds these quantization points to the current group.
Only quantization points that are not already part of a group are considered. 5. Repeat step 2 for each of the newly-added quantization points, 6. With well-behaved groups, this process will terminate automatically, because all of point X’s closest Ntop neighbors are also likely to have point X as one of their Ntop Closest neighbors. 7. If the group size does not automatically close off by the time the group reaches a certain size (maxGroupSize), the process is terminated and no new points are added. 8. The resulting set of quantization points is added as a new group.
Return to step 1 until all quantization points have been grouped. This following example illustrates with the time-adjacency matrix visualized as a graph. Thicker links represent higher values in the matrix. The role of the pooler is to segment this graph in order to group together quantization points that belong to the same cause. Assuming that quantization points with stronger links are more likely to represent the same cause than those with weaker links or no links. And top Neighbors (Ntop) is set to two. Figure 5. 5
First iteration of temporal grouping algorithm: Group formed using the most frequent quantization point, 6, as a seed. The pooler finds the 2 quantization points which are most strongly connected to point 6, point 13 and 7, and adds them to the group. The pooler then examines the top neighbors of point 13, which happen to be point 6 and 7, but they already have been pulled into the group. It does the same for point 7, and finds quantization centers 6 and 13. Since no unexamined neighbors are left, the pooler stops and group 0 is complete, with three points.
These points are removed from the graph Similarly Carrying out the second and third iteration of the grouping algorithm gives the following groups as Group – 1: 6 13 7 Group – 2: 10 9 12 4 0 11 5 Group – 3: 2 3 8 1 14 5. 4. 3 Temporal pooler Inference Once the time-adjacency matrix is used to partition the set of quantization centers into different temporal groups, the temporal pooler can start producing outputs. Let the number of temporal groups be Ng. Then, the output of the temporal pooler is a vector of size Ng.
The output can be considered as a probability distribution over the space of the temporal groups if the input pattern does not exactly match with any of the quantization center in the spatial pooler. The spatial pooler finds the index of the currently active quantization center by taking the argmax of its current input. The temporal pooler then finds the index of the temporal group to which this quantization index belongs. The output of the temporal pooler is all zeros except for one at the position corresponding to the index of the temporal group to which the quantization center belongs. 5. STRUCTURE OF A FULLY LEARNED NODE Once the spatial pooler and the temporal pooler within a node have finished their learning processes the node can be considered to be fully-learned. A fully learned node stores the following information; * A set of quantization centers numbered 1 to Nc in its spatial pooler. * A time-adjacency matrix of size Nc ? Nc storing the first-order temporal relations between the quantization centers. This matrix is stored in the temporal pooler. * A set of temporal groups numbered 1 to Ng. Each temporal group can be thought of as a subset of the set of spatial patterns.
These temporal groups are stored in the temporal pooler. The following example illustrates how the object recognition is done using the fully learned node. Figure 5. 6 This figure 4. 6 shows a fully-learned node in inference mode for 3 time steps. The fully learned node has 12 quantization centers within its spatial pooler and 4 temporal groups within its temporal pooler. The quantization centers are shown as 4×4 pixel patches. The temporal groups are shown in terms of their quantization centers. The input to the node is a 4×4 pixel patch.
The output of the spatial pooler is shown as a vector of length 12 with a filled square indicating a 1 and an empty square indicating a zero. The output of the temporal pooler is shown as a vector of length 4. 5. 6 STEPS OF LEARNING AND INFERENCE We train the all the nodes in the HTM network in a level-by-level strategy. First, only the nodes at the bottom level of the hierarchy are trained. During this time, the nodes at the higher levels are disabled – they do nothing. When the nodes at the bottom level are in their learning phase, they do not send any outputs to the higher levels.
Once all the nodes at level 1 are finished with their learning process the nodes at level 2 start learning. The nodes at the bottom level are switched from learning mode to inference mode, and the nodes at the second level are enabled and put into learning mode. This process is then repeated in the hierarchy until all nodes in the network are fully trained. And finally after the learning is done, the network can start producing the outputs for the test images. | CHAPTER-VI| OPTICAL CHARACTER RECOGNITION USING HTM| | | | CHAPTER-VI 6. OPTICAL CHARACTER RECOGNITION USING HTM
Steps of Development: 6. 1 HTM NETWORK CREATION A 3 level HTM network is created. The input of size 32 x 32 pixels to the network is taken via Image Sensor. Level 1 contains 8 x 8 Nodes and the input to each node is a 4 x 4 pixel image from the Image Sensor. Level 2 contains 4 x 4 Nodes and each node get input from 2 x 2 nodes from level 1. Level 3 of the network contain a single Node and is a category node which is used to recognize the object and to categorize the object into a category. Creating the Set of Training Cases Figure 6. 1
The training cases involve the set of images with different deformations such as rotation, location variant and different font size variant images. Train the HTM Network with Training images. Recognize the Objects with the test cases. 6. 2 GRAMMAR CHECK MODULE In order to nullify the effect of ambiguity in recognizing the similar characters like the pairs (o, 0), (I, 1), (z, 2) using HTM we have introduced the concept of grammar check which is described as follows. Input is a vector of probabilities for each character for each image Grammar Format: AA 11 BB 2222
Where AA is the two letter state code; 11 is the two digit district code; 2222 is the unique license plate number and BB are the optional alphabets if the 9999 numbers are used up For each position, we consider the highest probability character; if that character is not allowed to occur in that place, we go for the character with next highest probability. 6. 3 RESULTS We have carried out various test cases to recognize the characters. The following results represent some of the cases. Figure 6. 2 | CHAPTER-VII| CONCLUSION| | | | CHAPTER-VII . CONCLUSION Thus by the previously discussed methods, in this project we have successfully achieved a system which outplays the conventional methods for License Plate Recognition Systems. Our System can overcome Location-Invariance, Size-Invariance, Rotation-Invariance, and Noise to a high extent, while simultaneously achieving a high accuracy. The grammar check module plays a major role by removing ambiguity between similar characters. The system that we have developed is very useful in the filed of Intelligent Traffic Systems (ITS).
Also the Character recognition module using HTM will be useful in developing major applications such as Handwriting Recognition, Automatic Bill reading, and Card Reading etc. 7. 1 SNAPSHOT OF THE GUI Figure 7. 1 7. 2 STATISTICS This section gives some statistics of results produced by executing different modules of the system. Time Taken for Extraction of Candidate Regions Process| Time Taken (s)| RGB to Grayscale| 0. 5310| Image Averaging| 0. 5000| Edge Computation| 0. 1880| Otsu’s Threshold Computation| 0. 0620| Image to Black and White| 0. 630| Labeling the Image| 0. 3120| Computing Edge Areas| 1. 1250| Computing the Vertical Histogram| 0. 7820| Computing Vertical Threshold| 0| Computing Candidate Vertical Strips| 0| Computing Horizontal Histogram on all Candidates| 0. 9220| Computing the Final Candidates + Horizontal Histogram Threshold| 0. 0150| TOTAL:| 4. 5| Table 7. 1 Average Localization ; Segmentation Time on Candidates Process| Time Taken (s)| Median Filter| 0. 0417| Vertical Edge detection| 0. 2763| Image Dilation| 0. 0150| Labeling the Image| 0. 0160|
Computing the properties of the blobs in the region| 0. 1927| Connected Component Analysis| 0. 0103| TOTAL:| 0. 5520| Table 7. 2 HTM Training Results Description| Value| Iterations| Time of Training for Level-1| 2 Min 19 Sec| 90,000| Time of Training for Level-2| 14 Min 20 Sec| 3,60,000| Time of Training for Level-3| 25 Hrs 38 Min 17 Sec| 7,92,879| Table 7. 3 OCR Testing Results Description| Value| No. of Images (Same Font)| 7200| Recognition Time| 20 Min 36 Sec| Accuracy| 76. 6%| Accuracy After Grammar Check| 94. 3%| Table 7. 4 7. 3 SAMPLE RESULTS Input Image:
Figure 7. 2 Localized Number Plate Candidates: Figure 7. 3 Segmented Characters: Figure 7. 4 After Character Recognition, Grammar-Check removes the first set of candidates, and retains MH 31 AH 8382. 7. 4 FUTURE WORK What is intended to do next is to apply this technique over different possible instances and to improvise on it. These are some of the future prospects and improvements: * Employ same technique to Videos, by considering the video to be a set of frames. * Multiple Car Plate Detection. * Increase Accuracy for Distorted as well as Distant Number plates. Increase the Character Recognition Speed by parallel-processing techniques. Bibliography  Christos – Nikolaos E. Anagnostopoulos, IoannisE. Anagnostopoulos, IoannisD. Psoroulas,Vassili Loumos, and Eleftherios Kayafas, “License Plate Recognition From Still Images and Video Sequences: A Survey”, IEEE Transactions On Intelligent Transportation Systems, Vol. 9, No. 3, September 2008.  B. Hongliang and L. Changping, “A hybrid license plate extraction method based on edge statistics and morphology”, 17th International Conference on Pattern Recognition (ICPR’04). 3] Hamid Mahini, Shohreh Kasaei, Faezeh Dorri, Fatemeh Dorri,“An Efficient Features–Based License Plate Localization Method”, IEEE Transactions-2006.  William K. Pratt, “Digital Image Processing”-2nd edition.  Rafael C. Gonzalez and Richard E. Woods, “Digital Image Processing”, 4th edition.  H. Bai and C. Liu, “A hybrid license plate extraction method based on edge statistics and morphology Pattern Recognition”, 2004. ICPR2004. Proceedings of the 17th International Conference on, vol. 2, pp. 831-834, 2004. 7] Dileep George Thesis, “How The Brain Might Work: A Hierarchical And Temporal Model For Learning And Recognition”@numenta. com.  Jeff Hawkins and Dileep George, Numenta Inc. “Hierarchical Temporal Memory Concepts, Theory, and Terminology“@numenta. com.  Ching-Tang Hsieh, Yu-Shan Juan, Kuo-Ming Hung, ”Multiple License Plate Detection for Complex Background”, Proceedings of the 19th International Conference on Advanced Information Networking and Applications (AINA’05)1550-445X/05 @ 2005 IEEE.  Dileep George and Bobby Jaros , ”The HTM Learning Algorithms”, Numenta Inc. 11] Shyang-Lih Chang, Li-Shien Chen, Yun-Chung Chung, and Sei-Wan Chen, “IEEE Transactions on Intelligent Transportation Systems”, Vol. 5, No. 1, March 2004.  Halina Kwasnicka,Bartosz Wawrzyniak, ”License plate localization and recognition in camera pictures”, AI-METH 2002 – Artificial Intelligence Methods.  B. Enyedi , L. Konyha , K. Fazekas ,J. Turan, “License Plate Localization and Storage Method”: Systems, Signals and Image Processing, 2007 and 6th EURASIP Conference focused on Speech and Image Processing, Multimedia Communications and Services. 14th International Workshop. BIBLIOGRAPHY