home › Programs › An example of digital triangle recognition. A bitmap image as a two-dimensional dataset. Displaying the model on the computer

An example of digital triangle recognition. A bitmap image as a two-dimensional dataset. Displaying the model on the computer

UDC 004932: 621.396

T.M. VLASOV, V.G. KALMYKOV

ALGORITHM AND PROGRAM FOR RECOGNIZING IMAGE OUTLINES AS A SEQUENCE OF DIGITAL LINE INTERVALS

Abstract: In the given work the algorithm of the recognition of the digital direct line segment in contours of the binary images and the software implementation of the algorithm is considered. Utilization of this algorithm to process the images will result to more natural and economical description in comparison with known ways of the description of the images. The considered algorithm and the software implementation can be used also for the description of contours when processing the half-tone and color images.

Key words: image, contour, digital straight segments, algorithm, program.

Abstract: At the given of the robot, an algorithm for detecting the images in digital straight lines at the contours of binary images is induced, as well as the software for the implementation of the algorithm. A victorious algorithm for processing the image of the product before the description of the image will be more natural and economical, depending on how the image is encoded. The proprietary algorithm and software implementation can be used for coding contours when refined and colored images. Key words: image, contour, cuts of digital straight lines, algorithm, program.

Abstract: This paper considers an algorithm for recognizing segments of digital straight lines in the contours of binary images and software implementation of the algorithm. Using this algorithm in image processing will result in a more natural and economical comparing to known methods description of images. The considered algorithm and software implementation can also be used to describe the contours when processing grayscale and color images. Keywords: image, contour, segments of digital lines, algorithm, program.

1. Introduction

Structural analysis of image contours as sequences of line segments and arcs of curves is one of the main tasks in image processing for the purpose of interpreting them in artificial intelligence systems.

In most cases, an image can be considered as a part of a plane, divided into regions with constant or varying parameters, for example, optical density, color, texture, which determine the background and objects of the image. An integral property of each of these areas is its boundary, that is, the contour is a simply connected sequence consisting of line segments and arcs of curves. When processing a bitmap, the outlines of objects are usually highlighted. However, the contours of objects presented as a collection of individual, boundary pixels are not very suitable for further processing, since they do not sufficiently express its geometric essence.

Recognition of image contours in the form of a sequence of straight line segments can be considered one of the main tasks in the process of processing a raster image. Solving the problem of representing a contour in the form of a sequence of straight line segments allows obtaining a description of the image in a compact form, natural for human perception, invariant with respect to affine transformations, convenient, in particular, for processing by neural networks. Line segments are the main contour elements. The arc of a curve is also often replaced by a broken line inscribed in it, both in the basic provisions of mathematical analysis, and in a large number practical applications.

Known methods and algorithms, in particular, those proposed in the work, allow one to obtain an approximate solution, which is not acceptable for all applications.

In this paper, we consider the recognition of the contour of a binary image as a sequence of segments of digital lines without loss of information.

2. The contour as a sequence of segments of digital lines

V this section the structural analysis of image contours is considered as a sequence of segments of digital straight lines, which are the initial data for segmentation of the contour into arcs of digital curves and segments of digital straight lines.

We will restrict ourselves to considering binary images, the objects of which are completely determined by their bounding contours. Arcs of digital curves, as well as segments of digital straight lines, are formed by discretizing images containing contours formed by segments of straight lines and arcs of curves.

The characteristic features of line segments and arcs of curves are lost during the transformation. Considering the sampled image at sufficient magnification, it is often difficult to recognize individual line segments and arcs of curves in a sequence.

vertical and horizontal line segments. Additional difficulties arise during processing due to the fact that the contour lines - mathematical lines without thickness - are displayed on the monitor screen by connected sequences of pixels, that is, visual lines having a thickness.

To eliminate the problems arising from this, we will consider the image obtained from the original as a result of sampling as a two-dimensional cell complex. In this case

pixels are two-dimensional elements of this cellular complex. In addition to pixels, there are cracks and dots. Cracky is

the sides of the pixels that are one-dimensional elements. The points are the endpoints of the cracks and the corner points of the pixels. Points are zero-dimensional elements. So

Thus, in the case under consideration, the contour of the object is a connected closed sequence of contour cracks, bordering between the pixels of the object and the background. A contour can be described as a sequence of integers point coordinates,

bounding contour cracks. As shown in, representation of the image plane as

the cell complex offers many benefits. In particular, the border of the region becomes a thin curve with zero area.

In fig. 1 shows an example of the initial contour of an object formed by an arc of a curve and a line segment, as well as its digital equivalent as a sequence of cracks. Points are numbered that belong to cracks in different directions. As in the works, by an L -element we mean a connected sequence of cracks of the same direction, emerging from a certain point and ending with a crack of the same or perpendicular direction. In fig. 1 shows one of the possible partitions of the contour into L -elements, which are formed by cracks between the points: (0-2), (2-4), (4-6), (6-8), (8-9), (9 -11), (11-13), (13-15), (15-17), (17-19), (20-21), (21-23), (23-25), (25-27 ), (27-0). Each b -element is characterized by the following parameters: direction relative to its initial point g (it is assumed that g = 0 - for the upward direction, 1 - to the right, 2 - down, 3 - to the left); l - the number of cracks in the direction g (! = 1,2, ...); the direction of the last crack q relative to the direction g of the previous cracks (q = -1 - the last crack is directed to the left relative to the direction g, +1 - to the right, 0 - coincides with the direction g). The number of cracks l will conditionally be called the "length" of the b -element. For the b -element (0-2) g = 0,! = 3, q = + 1. For the b -element (27-0) g = 3, = 1, q = 0.

The method of selecting segments of digital lines in a contour uses the following property of a sequence of L -elements that form a segment. This sequence includes L -elements with the same values of g, q; their lengths take values! +1. And

the alternation of b -elements of lengths !, +1 is determined by the continued fraction obtained by dividing integers n = Ax = | x1 - x2 | and m = Ay = | y1 - y2 \, where (x13y1), (x2, y2) are the coordinates of the initial

and the end points of the line segment: or

Let us assume for definiteness that n> m. As follows from formula (1), l - the integer part of dividing n by m - corresponds in a digital line segment to the number of l consecutive cracks of the same direction. Together with the adjoining perpendicular crack, they form the L-element of length !. k1 consecutive b -elements of length l and one b -element of length! +1 (or k1 consecutive b -elements of length +1 and one b -element of length!) form a K1 -element of "length" k1 (by analogy with the "length "B -element). An L -element that differs in length by 1 from consecutive L -elements will be called a modified L -element of this K1 -element. Similarly, k2 consecutive K1 -elements of “length” k1 and one K1-element of “length” k1 +1 (or k2 consecutive K1 -elements of “length” k1 +1 and one K1 -element of “length” k1) form K2- element of "length" k1. So

k + k 2 + k z + ... + kg

further until the members of the continued fraction are exhausted. K1 -element (generally K-1 -element), which differs in length by 1 from the successive K1 -elements (Kg-1 -element), we will call the modified K1 -element (Kg-1 -element) of the given K2 -element (Kg -element). Thus, everyone

a digital segment of a straight line corresponds to a continued fraction, the elements of which determine the structure of this segment.

In the contour in Fig. 1 the following segments of digital straight lines can be selected: 0-3, 3-9, 910, 10-17, 17-0.

3. Selection of segments of a digital line in a contour

When processing the contours of images, in particular, binary images, in the sequence of cracks forming the contour, it is necessary to select the parts of the sequence that form the line segments. This problem can be considered as the problem of determining the elements of the continued fraction by the sequence of L-elements of the contour. This task is inverse to the problem of determining the structure of a straight line segment from the sequence of terms of the continued fraction, obtained as the ratio of the differences between the coordinates of the beginning and end of the segment.

The method of selecting segments of a digital straight line consists in the sequential execution of the following actions.

1. Isolation of a sequence of L -elements in a sequence of cracks. This action is consistent with the definition of a whole part! continued fraction (1).

2. Isolation of a sequence of Kr -elements with r = 1 in a sequence of L -elements, and one of the L -elements of each K1 -element must contain 1 crack more or less than the others. This action corresponds to the definition of the k1 -th element of the continued fraction (1). After its execution, the value of r should be increased by 1.

3. Isolation of the sequence of Kg -elements in the sequence of Kg-1 -elements,

moreover, one of the Kg-1 -elements of each Kg -element must contain one K-2 -element more or less than the others. This action corresponds to the definition of k (the th element of the continued fraction (1). After its execution, the value of r must be increased by 1.

4. Point 3 is repeated until it is still possible from the successive Kg -elements

form Km -element.

5. Determine the boundary points between two adjacent L -elements that are not included in the same Kr -element. These points are the endpoints of the digital line segments that form the contour.

Consider an algorithm for separating line segments in a sequence of L -elements

Let [b5 (fs, gs, qs)); 5 = 0.1, ..., t is a sequence of L-elements forming a contour; x5, y5 - coordinates of the beginning of the e-th b-element; [xy, yy); y =; r = 0.1, ...,!; !< £ - множество

breakpoints of the contour. Breakpoints define the endpoints of the line segments that form the contour. Finding breakpoints means defining the line segments that form a contour.

Each segment under consideration is characterized by a Kr -element, as well as a chain

fraction. At the initial moment of recognition of a straight line segment, the elements of the corresponding continued fraction are equal to 0. The segment can be considered recognized if the parameters of the Kr -element are recognized, including its order r and the values of the elements of the corresponding continued fraction.

1. Initial conditions.

The sequences [b5 (/ 5, gs, qs)) and (x5, y5) are given.

It is necessary to find the coordinates of the break points | x;., Y,).

k0p: = 0, k1p: = 0, k2p: = 0, ..., kr: = 0 are the working values of the elements of the continued fraction.

Let's take the point 5 = 0 as the starting point of the first segment; i = 0; i = 0.

2. Take the first L -element in the sequence as the beginning of the first line segment. Its initial point is x5, y. The length / = / 0 is also the value of the first element of the continued fraction.

5: = 5 + 1; k1p: = k1p + 1.

3. Check for the next L -element, whether they together with the previous K0-element.

3.1. If ((q3 == q3-1) && (q3 == 73-1) && (4 == / 3-1)), then the continuation of the Kg -element k0p: = k0p +1; 5: = 5 + 1; and continuation of a line segment. Go to step 3.

3.2. If ((q3 q3-1) || (q3 q73-1) 11 (| / e - / є-1!> 1)), then the end of the line segment. Go to step 5.

3.3. If ((& == 9s + 1) && (% == 7s + 1) && ((/ 3 + 1 == / 3 + 1) 1! (/ 3 - 1 == / 3 + 1))), then the completion of the K0 -element; Ї = Ї + 1.

4. Checking the continuation / completion of K (-element.

4.1. If (k == 0), then k ^ = k ^; cr: = 0; k ^ 1p: = k1 + 1p + 1; 5: = 5 +1; the beginning of the Km -element.

Go to step 3.

4.2. If ((k іΦ 0) && (k == k ^)), then k ^ 1p: = k ^ 1p + 1; 5: = 5 + 1; continuation of Ki + 1 -element. Go to step 3.

4.3. If ((k (Ф 0) && ((k + 1 == k1p) 11 (k1-1 == k ^))), then Ї: = +1; the end of the Km -element.

Go to step 4.

4.4. If ((^ φ0) && (| k - k1p |> 1)), then the end of the segment is a straight line to step 5.

5. End of the segment.

X] = Xs; y = Uz; k1p: = 0, k2p: = 0,., kir: = 0; k: = 0, k2: = 0,., k: = 0.

If (s< S), то s:= s +1; переход к п. 2.

Otherwise, the end of the sequence of L -elements. The end of the algorithm.

In fact, the proposed algorithm finds the elements of the continued fraction and for each obtained Kt -element and checks whether the continued fraction of the newly constructed Kt -element is suitable for the already constructed one.

4. The program for the selection of segments of a digital line

As you can see from the description of the algorithm, it contains a significant number of conditional jumps, the use of which contradicts the recommendations of structured programming due to the difficulties that arise when debugging programs. In addition, the number of parameters Kt in advance

it is impossible to determine, since the variable t is not limited in advance. Limit the value of t

This means limiting the size of the image in advance. The software implementation, especially the debugging of the proposed algorithm by trivial means, is significantly difficult for the indicated reasons. It is possible to reduce the difficulties of developing and debugging software implementation if you use modern object-oriented programming tools.

The proposed algorithm is implemented in the form of the LINESEGM program, which is part of the laboratory software package for image processing in the Visual C ++ environment.

As the initial information, the LINESEGM program uses the sequences of L -elements constructed for each of the contours of the processed image.

The result of the program is a connected sequence of digital straight line segments built for each contour, represented by the coordinates of the end points of the segments.

As can be seen from the algorithm, the operations of constructing Kt -elements from Kt-l -elements

are the same for all values of t. Note that the initial value t = 0 and in the course of the algorithm's operation is increased each time by 1. The special class CKForLn includes methods corresponding to the operations of the algorithm. In the process of running the program that implements the algorithm for each increase in t by 1, a new object is created containing functions that perform the operations of the algorithm for each value of t.

Taking into account that at the zero level K0 -elements are formed not from K -elements, but from L -

elements, to implement the algorithm at the zero level, a special modification of the CKForLn class was created - the Cmini class.

The principle of operation of the program is that for each value of t in the program, an object of the CKForLn class of the t-th level is implemented, containing functions that determine the parameters of the Kt -element. The initial parameters of the Kt -element are the parameters already

the completed Kt-l -element, the parameters of which were determined by the object of the class CKForLn t-1

Wow level.

Objects of the CKForLn class are implemented as conditions arise, namely: the need to build a K -element of the next level, and are accumulated in a special dynamic array. A zero-level object is created immediately at the start of the program.

Implementation of objects in a dynamic array as t increases allows you not to impose restrictions on the image size. Image size limits are determined only by the resources of the computer being used.

When describing the work of the program, the concept of completed Kt will be used -

element. Each completed Kt element contains kt Kt-l elements and one modified Kt-l element, which contains kt-l ± 1 Kt-2 elements, as opposed to an incomplete Kt element, which does not contain an incomplete Kt element.

The CKForLn class includes the following methods.

1. Method DK (), (define K-element) - define K -element.

To determine the Kt -element means to determine the number of Kt-1 -elements that form a given Kt -element.

2. Method VK§, (verify K-element) - check the identity of the considered K -element with the K-element of the same level, previously determined by the function of the DK () method.

3. Method DEO, (define the end of K-element) - determine the end of the K -element, that is, when determining the Kt -element, find its changed Kt-1 -element. The function of the DE () method of level t-1 is called by the function of the DK () method of level t.

4. Method VE (), (verify the end of K -element) - check the identity of the end of the K -element under consideration by the changed K -element, previously determined by the function of the DE () method.

The Cmini class includes the same methods, which differ from the methods of the CKForLn class in that the methods of the Cmini class work with L -elements and determine or check K0 -elements.

Cmini class methods

Methods of the Cmini class use as initial data the sequences of L -elements built for each of the contours of the processed image, starting from the current number of the L -element at the moment of calling the method function.

The function of the DK () method of the Cmini class compares the parameters of each next L -element with the parameters of the initial L -element until they match. If the parameters do not match, the DK () function checks whether the K0 element is completed and ends

work. Completed is the K0 -element, ending with a modified L -element, the length of which differs from other L -elements of the K0 -element by 1 (operation 3.1 for the beginning of the segment - t = 0).

The function of the VK () method checks whether the parameters of the next k0 L -elements coincide with the parameters of the L -elements of the K0 -element previously determined by the function of the DK () method

the same level. If the parameters of the current K0 -element coincide with the preliminary

a certain function VK () forms a sign of the continuation of the segment and completes the work (operation 3.1 for the continuation of the segment - t> 0).

Otherwise, the VK () function generates a sign of the end of the segment and ends

The DE () method function compares the parameters of the current Kci element with the parameters of the K0 element previously defined by the DK () function to determine if the current K0 element is modified. If other parameters are equal, the number of L -elements k0

modified K0 -element in comparison with K0 -element, previously determined

function DK (), must differ by 1 (operations 3.2, 3.3 to determine the completion

the initial К0 -element of the segment - t = 0). Result - the parameters of the modified K0 -element

are used in the VE () method of the Cmini class.

The VE () method function compares the parameters of the current K0 -element with the parameters of the modified K0 -element previously defined by the DE () function to determine

do they coincide (operation 3.2, 3.3 for the continuation of the segment - t> 0). The result - a sign of coincidence or non-coincidence - is used in the VК () method of the CKForLn class.

CKForLn class methods

The methods use the parameters of the K-elements constructed for the lowest level as input data. That is, to determine the parameters of the Kt -element, the parameters

already constructed Kt-l -elements.

The function of the DK () method of level t of the CKForLn class, when defining the parameters of a ^ -element, calls the function VK () of level t-1 of the CKForLn class, which checks whether an already defined Kt-l -element follows a Kt-l -element with the same parameters. If so, the call to the VK () function is repeated. In this case, the number of repetitions is counted, that is, the parameter kt is determined.

Otherwise, the function DK () of level t calls the function DE () of level t-1 to determine the changed Kt-l -element and exits. At the end of the work, the DK () function of level t of the CKForLn class determines the parameters and forms the signs of a completed or incomplete Kt -element (operations 4.1, 4.2 at the current maximum value of t).

The function of the VK () method of level t of the CKForLn class checks whether the parameters of the next kt Kt -elements coincide with the parameters of the Kt -element previously defined

function of the DK () method of the same level. If the parameters of the current Kt -element coincide with

by a predetermined function DK () Kt -element of the same level, the function VK ()

generates a sign of the continuation of the segment and exits.

Otherwise, the VK () function generates a sign of the end of the segment and terminates the work (operation 4.1, 4.2 with the current value of t less than the maximum).

Kt -element The function of the DE0 method of level t of the CKForLn class, when determining the parameters of a Kt -element, compares the parameters of the current Kt -element with the parameters of the Kt -element previously defined by the DK () function to determine whether the current Kt -element is modified. If other parameters are equal, their kt-1 values must differ by 1. If this condition is met, the DE () function forms the sign of the changed Kt -element and

exits (operation 4.3, 4.4 at the current maximum value of t).

The function of the VE () method of level t of the CKForLn class compares the parameters of the current Kt -element with the parameters of the modified Kt -element previously allocated by the DE () function to determine whether the values of their parameters are the same.

If the values of the parameters of the current Kt -element coincide with the preliminary

defined by the function DK () of the same level, the function VK () forms a sign of the coincidence of the parameter values and terminates the work (operation 4.3,4.4 at the current value of t less than the maximum).

The timing diagram (Fig. 2) illustrates the work of the LINESEGM program on the example of line segment recognition. The lower part of the figure shows a part of a digital line consisting of L-elements of the same main and auxiliary directions and different lengths.

At step 0, an object of class Stypi is created, which defines the parameters of the K0 -element.

At step 10, the determination of the parameters of the K0-element is completed and an object 1 of the class CKPrbn is created, which uses the functions of the previously created object to determine the parameters of the K1-element. At step 19, the definition of the parameters of the K1 -element is completed and an object 2 of the class CKPorbn is created, which uses the functions of the previously created objects to determine the parameters of the K2 -element. At step 49, the determination of the parameters of the K2 -element is completed and an object 3 of the class CKPrbn is created, which uses the functions of the previously created objects to determine the parameters of the K3-element. Step 79 performs

segment termination condition. The detailed work of the program is described in the appendix.

In the section 0-6, two L -elements form an incomplete K0 -element. It is obvious that b -

element 3-6 of length 3 completes the line segment, since b -element 6-7 of length 1 cannot be its continuation. Thus, b -element 6-7 is the beginning of the digital line segment.

In fig. 3 shows an example of how the program works. The contour of the binary image of the curly arrow is divided by squares into line segments. The result of the program - a sequence of straight line segments - was used to select arcs of digital curves. The large squares show the boundaries of the arcs of the digital curves.

The work of the program has been tested on a significant number (more than 2000) examples and is used in the study of the structural analysis of halftone images.

5. Operation of the program for recognizing line segments

Let us consider the operation of the UEEBESM program using the example of Fig. 4. In the lower part of the figure, a part of the digital line is shown, consisting of L-elements of the same main and auxiliary directions and different lengths. In the section 0-6, two L-elements form an incomplete K0-

Rice. 3. An example of the work of the program for the structural analysis of the contour - segmentation of the contour by segments of digital straight lines

element. Obviously, b -element 3-6 of length 3 completes the line segment, since b -element 6-7 of length 1 cannot be its continuation. Thus, b -element 6-7 is the beginning of the digital line segment.

The work of the program to determine the next segment of the straight line begins with the OK () function of the zero level, which determines the completed K0 -element 6-10, consisting of L -elements

lengths 1,1,2; k0 = 2. This K0 element is the initial one for the K1 element. The program creates an object of the first level and transfers control to the OK () function of this object. The OK () level 1 function calls the VKQ level 0 function. The VKQ function compares the parameters of the L-elements of the K0-element 6-10 with the subsequent L-elements and confirms the presence of the K0 -element 10-14,

identical to K0 -element 6-10. Continuing to work, the VKQ function detects that the following L -elements do not form the same K0 -element, terminates its work and transfers control to the OK () function of level 1. The OK () function of level 1 calls the OE () function of level 0. This last, starting with b -element 6-7, determines the presence of a modified K0 -element 14-19, consisting of b -elements of lengths 1,1,1,2; k0 = 3, exits and transfers control to the OK () level 1 function. This function determines the presence of a completed K1-element 6-19, consisting of two K0 -

elements 1,1,2, (k1 = 2) and one modified 1,1,1,2 (k0 = 3). The program creates an object of the second level and transfers control to the OK () function of this object. The OK () level 2 function calls the VKQ level 1 function, which, in turn, calls the VKQ level 0 function. The VKQ function compares the parameters of b -elements of K0 -element 6-10 with subsequent b -

elements and confirms the presence of K0 -elements 19-23, 23-27, identical to K0-element 6-10, that is, the same number of such K0 -elements contained in K1-element 6-19. Further, the VKQ function of level 0 returns control with the sign of the continuation of the segment of the VKQ function of level 1. The VKQ function calls the function VE0 of level 0, which determines the presence of a changed K0 -

element 27-32, identical to K0 element 14-19. Thus, K1-element 19-32 is defined,

identical to K1-element 6-19. Further, the VKQ level 1 function does not define the next K1-element identical to the K1-element 6-19, due to the fact that the VE0 level 0 function does not determine the changed K1-element identical to the K1-element 6-19, starting with the b-element 40-41, and returns control of the OK () level 2 function. The OK () level 2 function calls the OE () level 1 function, which determines the presence of a modified K1-element 32-49, consisting of K0 -elements 32-36, 36- 40,

40-44, 44-49. Next, the K2-element 6-49 is determined, an object of level 3 is formed, the modified K2-element 49-79 is determined. These two K2 elements form K3 element 6-79. This completes the construction of the segment, since the next L -elements 79-81 and 81-83 do not form a K0 -element,

identical to K0 -element 6-10, and the VKQ level 0 function does not form a continuation sign

segment. In the sequence of L -elements, the segment of the digital straight line 6-79 is highlighted. The program starts defining the next segment, starting from the b-element 80-82.

b. conclusions

1. Proposed new algorithm selection of line segments in image contours and non-trivial software implementation of the algorithm, which make it possible to obtain an exact solution to the problem of recognizing image contours as sequences of line segments.

2. The software implementation of the algorithm for the selection of straight line segments in the contours of images is made using modern means object-oriented programming, which made it possible not to impose explicit restrictions on the size of the processed image while maximizing the use of the resources of the computer used.

3. Based on the proposed algorithm and its software implementation, a theoretical solution was obtained and experiments were carried out to recognize arcs of digital curves and segmentation of the contour of binary images on a segment of digital straight lines and an arc of digital curves.

BIBLIOGRAPHY

1. Kovalevsky V.A. Applications of Digital Straight Segments to Economical Image Encoding, In Proceedings of the 7th International Workshop, DGCI "97, Montpellier. - France, 1997. - December 3-5. - P. 51-62.

2. Kalmykov V.G. Structural method for describing and recognizing segments of digital straight lines in the contours of binary images // Piece іntelekt. - 2002. - No. 4. - C. 450-457.

3. Kalmykov V.G., Vishnevsky V.V. Analysis of the contours of objects in binary images // Mathematical machines and systems. - 1997. - No. 2. - P. 68 - 71.

4. Kalmikov V.G. The arc of the digital curve is the value and the stagnation // The processing of signals and the image and the generation of images. Pratsi siomoi All-Ukrainian international conference. - Kiev. - 2004. - 11 - 15 zhovten.

Formulation of the problem is determined by the purpose and possibilities of its implementation.

Target. Develop a program for classifying rectangular parts into high-quality and defective ones.

Opportunities for the implementation of the task are determined by the capabilities of the computer. The computer is capable of processing numerical information in an algorithmic sequence. To realize the capabilities of a computer, it is necessary to simulate the problem being solved.

Modeling using a computer implies a transition from a real object (world) to a coded description of its properties using data and operations on them. Such a transition, as a rule, is carried out in several stages:

Abstraction- selection of the most essential features of the object from the point of view of the task.

It is necessary to conduct research that allows you to move from the object of modeling to the subject of modeling, discarding all unnecessary in accordance with the goal in the task

How is a rectangle different from other quadrilaterals?

Equality of opposite sides.
Parallelism of opposite sides.
Equality of the diagonals.
All corners are straight.

What is the minimum of signs needed to unambiguously solve the problem?

Equality of 2 opposite sides + equality of diagonals.
Parallelism of 2 opposite sides + equality of diagonals.
Three corners are straight.

So, thanks to abstraction, we got a verbal information model. But, it is still incomprehensible to the computer. He understands a mathematical model presented as an algorithm and implemented in software.

Methodology for implementing the task.

A drawing of a high-quality part (rectangle) or a defective part (quadrilateral) is performed from segments (LINE command) in graphics system AutoCAD and exported to. The kntrs.lsp () program reads line segment data (vertex coordinates) from a DXF file and writes it to text file vrtks.txt in a circular vertex traversal order.

The text file vrtks.txt can be created manually by taking the coordinates of the vertices directly from the drawing.

The developed program rct.lsp should read data from the vrtks.txt file, analyze it and output a record of the part's compliance with the requirements (rectangle or not) to the result.txt file.

Formalization of features

Equality of the lengths of the segments (sides or diagonals): The length of each segment is determined as the hypotenuse of a rectangular rectangle (according to the Pythagorean theorem) through the difference in coordinates of the segments:

(setq DX12 (abs (- X1 X2))) (setq DY12 (abs (- Y1 Y2))) (setq DA1 (sqrt (+ (* DX12 DX12) (* DY12 DY12))))

Parallelism of lines: K2 = K1, where TO- slope of a straight line K = (Y2-Y1) / (X2-X1)

How to formalize the concept of "Right angle"? As part of the task, the presence of a "Right Angle" can be determined by the sign of the perpendicularity of the segments: K2 = -1 / K1, where TO Is the slope of the straight line. K = (Y-Y0) / (X-X0).

Displaying the model on the computer

Any information is ultimately displayed on the computer using binary numbers(codes) into the in-machine model. Previously, coding was done by a programmer. Nowadays, most of the programs are written in algorithmic languages.

I have long wanted to write a general article containing the very basics of Image Recognition, a guide to basic methods, telling when to apply them, what tasks they solve, what can be done in the evening on my knees, and what is better not to think about without a team of people in 20.

I have been writing some articles on Optical Recognition for a long time, so a couple of times a month various people write to me with questions on this topic. Sometimes you get the feeling that you live with them in different worlds. On the one hand, you understand that a person is most likely a professional in a related topic, but knows very little about optical recognition methods. And the most offensive thing is that he is trying to apply a method from a nearby area of knowledge, which is logical, but does not work completely in Image Recognition, but does not understand this and is very offended if he starts telling something from the very basics. And considering that telling from the basics is a lot of time, which is often not, it becomes even sadder.

This article is conceived so that a person who has never dealt with image recognition methods could, within 10-15 minutes, create in his head a certain basic picture of the world corresponding to the topic, and understand in which direction he should dig. Many of the techniques described here are applicable to radar and audio processing.
I'll start with a couple of principles that we always start telling a potential customer, or a person who wants to start doing Optical Recognition:

When solving a problem, always go from the simplest. It is much easier to hang an orange label on a person than to follow a person, highlighting him with cascades. It is much easier to take a camera with a higher resolution than to develop a super-resolution algorithm.
A strict formulation of the problem in optical recognition methods is orders of magnitude more important than in system programming problems: one extra word in the technical specification can add 50% of the work.
There are no universal solutions to recognition problems. You cannot create an algorithm that will simply “recognize any inscription”. A sign on the street and a sheet of text are fundamentally different objects. Probably, you can make a general algorithm (here is a good example from Google), but it will require a lot of work of a large team and consist of dozens of different subroutines.
OpenCV is a bible that has many methods, and with which you can solve 50% of the volume of almost any problem, but OpenCV is only a small part of what you can actually do. In one study, the conclusions were written: "The problem is not solved by OpenCV methods, therefore, it is unsolvable." Try to avoid this, not be lazy and soberly evaluate the current task every time from scratch, without using OpenCV templates.

It is very difficult to give some kind of universal advice, or tell you how to create some kind of structure around which you can build a solution to arbitrary problems computer vision... The purpose of this article is to structure what you can use. I will try to split the existing methods into three groups. The first group is preliminary filtering and image preparation. The second group is logical processing of filtering results. The third group is decision-making algorithms based on logical processing. The boundaries between the groups are very conditional. To solve a problem, it is far from always necessary to apply methods from all groups; sometimes two, and sometimes even one, are enough.

The list of methods provided here is incomplete. I suggest adding critical methods in the comments that I have not written and attributing 2-3 accompanying words to each.

Part 1. Filtration

In this group, I have placed methods that allow you to highlight areas of interest in images without analyzing them. Most of these methods apply some kind of uniform transformation to all pixels in the image. At the filtration level, the image is not analyzed, but the points that pass the filtration can be considered as areas with special characteristics.

Threshold binarization, selection of histogram area

The simplest transformation is the threshold binarization of the image. For an RGB image and a grayscale image, the threshold is the color value. There are ideal tasks in which such a transformation is sufficient. Let's say you want to automatically select objects on a white sheet of paper:

The choice of the threshold at which binarization occurs largely determines the process of binarization itself. In this case, the image was binarized by the average color. Usually binarization is done using an algorithm that adaptively selects the threshold. This algorithm can be the choice of expectation or mode. And you can choose the largest peak of the histogram.

Binarization can give very interesting results when working with histograms, including when we consider an image not in RGB, but in HSV. For example, segment the colors of interest. This principle can be used to build both a tag detector and a human skin detector.

Classical filtering: Fourier, LPF, HPF

The classical methods of filtering from radar and signal processing can be successfully applied in a variety of Pattern Recognition tasks. The traditional method in radar, which is almost never used in pure images, is the Fourier transform (more specifically, the FFT). One of the few exceptions to which one-dimensional Fourier transform is used is image compression. For image analysis, a one-dimensional transformation is usually not enough; you need to use a much more resource-intensive two-dimensional transformation.

Few people actually calculate it, usually it is much faster and easier to use the convolution of the region of interest with already ready-made filter, sharpened to high (HPF) or low (LPF) frequencies. This method, of course, does not allow for spectrum analysis, but a specific video processing task usually requires not an analysis, but a result.

The most simple examples filters that emphasize low frequencies (Gaussian filter) and high frequencies (Gabor filter).
For each point in the image, a window is selected and multiplied with a filter of the same size. The result of this convolution is the new point value. When implementing a low-pass filter and a high-pass filter, images of the following type are obtained:

Wavelets

But what if we use some arbitrary characteristic function for convolution with a signal? Then it will be called "Wavelet transform". This definition of wavelets is not correct, but traditionally it has developed that in many commands wavelet analysis is the search for an arbitrary pattern in an image using convolution with a model of this pattern. There is a set of classical functions used in wavelet analysis. These include Haar wavelet, Morlet wavelet, Mexican hat wavelet, etc. Haar primitives, about which there were several of my previous articles (,), refer to such functions for two-dimensional space.

Above are 4 examples of classic wavelets. 3D Haar wavelet, 2D Meyer wavelet, Mexican Hat wavelet, Daubechies wavelet. A good example Using the extended interpretation of wavelets is the problem of finding a flare in the eye, for which the flare itself is the wavelet:

Classical wavelets are usually used to compress images, or to classify them (described below).

Correlation

After such a free interpretation of wavelets on my part, it is worth mentioning the actual correlation underlying them. It is an indispensable tool for filtering images. The classic application is video stream correlation to find shifts or optical streams. The simplest shift detector is also, in a sense, a difference correlator. Where the images do not correlate, there was movement.

Filtering functions

An interesting class of filters is function filtering. These are purely mathematical filters that allow you to detect a simple mathematical function in the image (line, parabola, circle). An accumulating image is built in which for each point of the original image a set of functions generating it are drawn. The most classic transformation is the Hough transformation for lines. In this transformation, for each point (x; y), the set of points (a; b) of the straight line y = ax + b is drawn, for which equality is true. We get beautiful pictures:

(the first plus is to the one who is the first to find the catch in the picture and such a definition and explain it, the second plus to the one who is the first to say what is shown here)
The Hough transform allows you to find any parameterizable functions. For example, a circle. There is a modified transformation that allows you to search for any shapes. This transformation is terribly fond of mathematicians. But when processing images, it, unfortunately, does not always work. Very slow speed, very high sensitivity to binarization quality. Even in ideal situations, I preferred to get by with other methods.
The analogue of the Hough transform for straight lines is the Radon transform. It is calculated through the FFT, which gives a performance benefit in a situation where there are a lot of points. In addition, it can be applied to a non-binarized image.

Filtering outlines

A separate class of filters is border and outline filtering. Outlines are very useful when we want to go from working with an image to working with objects in that image. When an object is complex enough, but well defined, then often the only way to work with it is to select its contours. There are a number of algorithms, solving the problem filtering contours:

Most often, it is Kenny that is used, which works well and whose implementation is in OpenCV (Sobel is also there, but he looks worse for contours).

Other filters

Above are the filters, modifications of which help to solve 80-90% of problems. But besides them, there are more rare filters used in local tasks. There are dozens of such filters, I will not list all of them. Interesting are iterative filters (for example, an active appearance model), as well as ridgelet and curvlet transformations, which are an alloy of classical wavelet filtering and analysis in the radon transform field. The beamlet transform works beautifully on the border of wavelet transform and logical analysis, allowing you to highlight the contours:

But these transformations are very specific and are tailored for rare tasks.

Part 2. Logical processing of filtering results

Filtering provides a set of data suitable for processing. But often you can't just take and use this data without processing it. This section will contain several classical methods allowing you to go from the image to the properties of objects, or to the objects themselves.

Morphology

The transition from filtering to logic, in my opinion, is the methods of mathematical morphology (,,). In fact, these are the simplest operations of building up and eroding binary images. These methods allow you to remove noise from a binary image by increasing or decreasing the existing elements. On the basis of mathematical morphology, there are contouring algorithms, but usually they use some kind of hybrid algorithms or algorithms in conjunction.

Contour analysis

Algorithms for getting boundaries have already been mentioned in the section on filtering. The resulting boundaries are quite simply converted to contours. For Canny's algorithm, this happens automatically; for other algorithms, additional binarization is required. You can get a contour for a binary algorithm, for example, by the beetle algorithm.
The outline is a unique characteristic of an object. This often makes it possible to identify the object along the contour. There is a powerful mathematical apparatus that allows you to do this. The apparatus is called contour analysis (,).

To be honest, I have never managed to apply contour analysis in real problems. Too ideal conditions are required. Either there is no border, or there is too much noise. But, if you need to recognize something in ideal conditions, then contour analysis is a great option. Works very quickly, beautiful mathematics and clear logic.

Special points

Singular points are unique characteristics of an object that allow you to associate an object with itself or with similar object classes. There are several dozen ways to highlight such points. Some methods highlight special points in adjacent frames, some after a long period of time and when changing lighting, some allow you to find special points that remain so even when the object is rotated. Let's start with methods that allow us to find singular points that are not so stable, but are quickly calculated, and then we go in increasing complexity:
First grade. Singular points that are stable for seconds. Such points are used to guide the object between adjacent video frames, or to merge images from neighboring cameras. These points include local image maxima, image angles (the best of the detectors, perhaps the Haris detector), points at which dispersion maxima are reached, certain gradients, etc.
Second class. Special points that are stable with changes in lighting and small movements of the subject. Such points are primarily used for training and subsequent classification of object types. For example, a pedestrian classifier or a face classifier is a product of a system built around such points. Some of the previously mentioned wavelets can be the base for such points. For example, Haar primitives, glare search, search for other specific functions. These points include points found by the directional gradient histogram (HOG) method.
Third class. Stable points. I only know about two methods that give complete stability and about their modifications. These are SURF and SIFT. They allow you to find special points even when you rotate the image. The calculation of such points takes longer than other methods, but enough limited time... Unfortunately, these methods are patented. Although, in Russia, the algorithms are not patented, so use it for the domestic market.

Part 3. Training

The third part of the story will be devoted to methods that do not work directly with the image, but which allow making decisions. These are mainly various methods of machine learning and decision making. Recently Yandyks posted a course on this topic on Habr, there is very good selection... Here it is in the text version. For a serious study of the topic, I strongly recommend that you look at them. Here I will try to outline several basic methods used in pattern recognition.
In 80% of situations, the essence of learning in the recognition problem is as follows:
There is a test set containing several classes of objects. Let it be the presence / absence of a person in the photo. For each image there is a set of features that have been identified by some feature, be it Haar, HOG, SURF or some kind of wavelet. The learning algorithm must build such a model, according to which it will be able to analyze a new image and decide which of the objects is in the image.
How it's done? Each of the test images is a point in the feature space. Its coordinates are the weight of each of the features in the image. Let our signs be: "The presence of eyes", "The presence of a nose", "The presence of two hands", "The presence of ears", etc. ... human. For a person in such a space, the point will be correct. For the monkey, the point is for the horse. The classifier is trained on a sample of examples. But not all photographs showed hands, others do not have eyes, and in the third, the monkey has a human nose due to an error in the classifier. The trained human classifier automatically splits the feature space in such a way as to say: if the first feature lies in the range 0.5 In essence, the purpose of the classifier is to draw in the feature space the areas that are characteristic for the objects of classification. This is how the successive approximation to the answer for one of the classifiers (AdaBoost) in two-dimensional space will look like:

There are many classifiers. Each of them works better in some task of their own. The task of selecting a classifier for a specific task is in many ways an art. Here are some beautiful pictures on the topic.

Simple case, one-dimensional separation

Let us analyze by example the simplest case of classification, when the feature space is one-dimensional, and we need to divide 2 classes. The situation occurs more often than one might imagine: for example, when you need to distinguish between two signals, or compare a pattern with a sample. Let's say we have a training sample. In this case, an image is obtained, where on the X-axis there will be a measure of similarity, and on the Y-axis - the number of events with such a measure. When the searched object looks like itself, the left Gaussian is obtained. When not like - right. A value of X = 0.4 divides the samples so that an erroneous decision minimizes the likelihood of any wrong decision being made. It is the search for such a separator that is the classification problem.

A little remark. The criterion that minimizes the error will not always be optimal. The next graph is a graph of a real iris recognition system. For such a system, the criterion is chosen such as to minimize the probability of a false admission of an unauthorized person to the object. This probability is called "error of the first kind", "false alarm probability", "false positive". In the English-language literature "False Access Rate".
) AdaBusta is one of the most common classifiers. For example, the Haar cascade is built on it. Usually they are used when a binary classification is needed, but nothing prevents you from teaching for a larger number of classes.
SVM (,,,) One of the most powerful classifiers with many implementations. Basically, on the learning tasks that I encountered, it worked in a similar way to adabusta. It is considered fast enough, but its training is more difficult than that of Adabusta and the selection of the correct core is required.

There are also neural networks and regression. But in order to briefly classify them and show how they differ, an article is needed much more than this one.
________________________________________________
I hope I managed to do a quick overview of the methods used without diving into the math and description. Maybe it will help someone. Although, of course, the article is incomplete and there is not a word about working with stereo images, or about OLS with Kalman filter, or about the adaptive Bayesian approach.
If you like the article, then I will try to do the second part with a selection of examples of how the existing ImageRecognition tasks are solved.

And finally

What to read?
1) Once I really liked the book "Digital Image Processing" by B. Yane, which is written simply and clearly, but at the same time, almost all the mathematics is presented. Good for familiarizing yourself with existing methods.
2) The classics of the genre are R. Gonzalez, R. Woods "Digital Image Processing". For some reason it was more difficult for me than the first one. Much less math, but more methods and pictures.
3) "Image processing and analysis in machine vision tasks" - written on the basis of a course taught at one of the departments of PhysTech. There are a lot of methods and their detailed descriptions. But in my opinion, the book has two big drawbacks: the book is strongly focused on the software package that comes with it, in the book too often a description of a simple method turns into mathematical jungle, from which it is difficult to make a structural diagram of the method. But the authors made a convenient site where almost all the content is presented - wiki.technicalvision.ru
4) For some reason, it seems to me that a good book that structures and links the picture of the world that arises when studying Image Recognition and Machine Learning is "About Intelligence" by Jeff Hawkins. There are no direct methods there, but there is a lot of thought to think about where the direct methods of image processing come from. When you read it carefully, you understand that you have already seen the methods of the human brain, but in the tasks of video processing.

Popular in the heading: