In the first small part of our project we didn’t start as usually with reading papers or anything like that: We tried to come up with our own ideas of how face detection can work. This was done without looking into literature to prevent to get influenced by ideas of other people or standard approaches.
One thing we looked at was the composition of the color of a given picture. With calculating average colors from different areas of the picture we find that it is possible to decide if the given picture is showing a face (portrait) or not. For this method it is necessary that a colored picture is available which can be investigated. When using this method it is very important that the picture/picture area, which is investigated, shows not only the face, but also the surrounding of it, i.e. some background, to have a difference between the face (red area) and the non-face area (green area).
With choosing color values from random pixels in the face area (red area in the image), calculating mean values and standard deviations and comparing these results one can decide if the investigated area has an face in the middle or not.
Lots of attempts to find the right choice for the query and the parameters lead to the result that it is necessary to have a lot of red color in the face area, compared to the mean value of all colors (red, blue and green). Red must dominate over more than 55 percent against the mean value for good results.
At this point it is very important to mention that this is only true if the investigated faces are from white-skinned people from Middle-Europe. For other skin-colors and other places in the world other values for the parameters must be chosen. A neural network should help a lot with finding the right values, but it wasn't implemented in our program.
An improvement of this method was reached by looking also at the color values of random pixels from the whole picture. With the knowledge of which difference should occur comparing the values of the whole image and just the area in the middle if the picture is showing a face another kind of decision could be found.
When using this method it is very important that the picture/picture area, which is investigated, shows not only the face, but also the surrounding of it, i.e. some background, to have a difference between the face (red area) and the non-face area (green area).
The decision if the picture is showing a face using this improved method is based again on the fact that the color red must dominate over the other colors and also over the color red in the outer region or rather when looking at the whole picture. The exact request can be found in our program, which is available here for download:
Another possibility which come to our minds was the detection of contrast in a picture to detect the circumference of a face.
All in all we had some success in this first part, but we weren't satisfied, therefore we began to study papers and ideas in face detection and face recognition, which had come up in the scientific community so far.
Face Detection Methods
The most commonly used method to detect a face in a picture seems to be using a pattern detection to locate for example the eyes. Since the face could be anywhere in the picture with an unknown size or even with a certain tilt, the amount of possibilities that have to be searched through are overwhelming. This is even a bigger Problem than Glasses, Beards or Face-partly-covering Fashion. So the reduction of information is the first step to do. As in many fields of technology the techniques used in our own body can be quite a rolemodel. The cells in the human eye build a thight network, preprocessing the Information for the brain. An interesting behavior can be investigatet watching the rod cells, which are responsible for light/dark recognition and vision in poorly lit environments. The neural cells belonging to one rod cell only fire, when the neighbouring cells get different Signals. In equally lit areas of the cell array, the rod cells don't seem to give any signal at all. Connecting those informations, one can get a very detailed picture of the edges of objects, ignoring the spacing in between. This reduces the information drastically, since the objects can be characterized through their edges. To copy that technique, there exist many possible operators. First of all the Sobel algorithm, which performs a gradient of the Image, giving stron Signals on regions, where the Pixels change. Since one might want to know where the edges are, not how high the gradient is, one can define a special threshold. Usding the threshold, every edge with a signal higher than it is converted to 1 everything lower as 0. So far, we got from a Full Image to a binary edge detected image, so how can we detect a face in it? There are several methods of comparing binary point groups in two dimensions with each other - in most papers dealing with face detection the Hausdorff Distance of point group is used. Since the Hausdorff distance is not implemented in MATLAB, one has to find another, effective way. If one would use the Hausdorff Distance, one would have to calculate it for every possible position of the mask on the image, not to mention the possibility of zooming the mask in and out or tilting it to find the Face. A very efficient method of finding a certain pattern in a set of Data is the cross-correlation. Here the position with the best correlation is found by convolution of the whole Signal with the desired Signal. Since one can use two dimensional fast fourier transform to find the cross correlation of two two dimensional data sets this method delivers a solution very fast without moving the mask from pixel to pixel. The only remaining Problem is the mask. There are several discussions of which to use, most detection Methods either use convolution and a grayscale image of averaged human faces (a kind of standardface), or a pointset-distance (like the hausdorff distance) and a binary mask. In our Program we use the binary mask developed by a genetic algorithm on binary edge-images but compare them by convolution. The results are satisfying, about 70-80% of Faces in Pictures can be located.
One Problem arises: When there are many objects in the background, the crosscorrelation often has a peak there due to a higher convolution because of higher point density. Pictures should not contain book shelfes or anything of that kind