|
|
Dette arbeidet ble presentert på Spie konferansen i Orlando, i April 1996. En del av dette er utført av HiØ studenetene Christer Jahren, Stig Jørgensen og Kåre Østerud, som prosjektoppgave, på studieretningen Soft Computing 1994/95
Eye Identification for
Face Recognition with
Neural Networks
Åge Eide¹ , Christer Jahren¹ , Stig Jørgensen¹ ,
Thomas Lindblad² , Clark S.Lindsey² and Kåre Østerud¹
1) Ostfold College, Halden
2) Royal Institute of Technology, Stockholm
Abstract: The problem of facial recognition from
grey-scale video images is approached using a two-stage neural network
implemented in software. The first net finds the eyes of a person and
the second neural network uses an image of the area around the eyes to
identify the person. In a second approach the first network is
implemented in hardware using the IBM ZISC036 RBF-chip to increase
processing speed. Other implementations in hardware is also discussed,
and includes preprocessing using wavelet (packet) transforms.
1. Introduction
The identification of individuals using face recognition represents a
challenging task with many applications [ 1-3] in everyday life as
well as in high security applications. Since the human face will vary
in appearance in the short time as well as in the long time range, the
inherent "slack in operation" of neural networks together with the
redundancy and possibility to generalize are suggestive for
implementing such a recognition system. If the system is implemented
in dedicated hardware, rather than in software on a von Neumann
computer, one will also benefit from the massively parallel neural
network and obtain a real time system.
Recently, Rowley et al [4] presented a neural network-based system in
which a small window of an image is examined to find and identify a
face. The present work stars from a slightly different point, but is
otherwise similar in the approach. Below we will present a system
using a two-stage neural network. The first neural network starts with
an image as input. The image is assumed to contain a face and the task
of this net is to localize the eyes. A second neural network will use
the information from the windows on the eyes then to recognize the
person.
The total problem of identification is thus based on the assumptions:
(i) that a person faces a TV camera, (ii) the eyes are localized by
the first neural network and (iii) identified by the second network to
belong to a known individual. We feel that this approach is typical
for admission control, etc.
2. A "straightforward neural" network approach
As mentioned above, we need two neural networks: one to localize the
eyes in the image obtained from the TV-camera and a second net to make
the identification. In a first approach a 256 greyscale image of
360x280 pixel was obtained using a Panasonic NVR30 video camera
coupled to a frame grabber card (Screen MachineII) in a 100 MHz 486
personal computer. However, images were also produced using a HP
flatbed scanner and photos.
2.1 Finding the eyes
The first stage neural network adopted a conventional feedforward
network with one hidden layer. The network had 117 inputs and 15
hidden nodes and one output node. It was trained using the
backpropagation algorithm using a sigmoid or tanh as transfer
function. One reason for this choice is the potential implementation
using the ETANN chip from Intel [5]. Two ETANNS can implement a
128-64-64 network, i.e. a system with 128 inputs and 64 outputs (and
64 hidden neurons). The chip is an analog, and will thus require DAC
preprocessing. However, such implementations have been made previously
[6] and found to work satisfactorily. Other commercially available
chips are the ZISC radial basis function chip with 36 nodes [7], and
the ALS430 from American Neuralogix [8]. All these chips are available
with pertinent soft

Figure 1. Example of facial image with scanning rectangles of
13 x 9 pixels over the eyes.
|
ware to run on IBM personal computers and we have previous experience
of implementing these systems [6, 9-13]
The network used 373 images for training and 141 independent images
for testing. The windows with 13x9 pixels (hence 117 inputs) are shown
in fig. l. Examples of input vectors are shown in fig. 2. Typically
vectors from windows outside the face, are significantly smoother than
vectors from the eye region. Provided that the window is not too big,
these vectors turn out to have gross properties that are very
similar. A relatively small window (13 x 9 pixel, cf. fig. l ) was
used to scan for the eyes. The reason is that the eyes of different
persons are most similar in the centre of the eyes, and this first
stage needs only to locate the eyes.The network turned out to work
very well. In a test of the redundancy of the network, we used an
image taken at a factor of two shorter distance to the
camera. Although the network was trained to find the eyes in an image
recorded at a given distance, there was no difficulty in finding them
in this extreme close-up. We also sucessfully used images recorded
using scanners to test the neural network.
2.2 Identifying the person
While the first stage neural network is to find the common properties
of eyes, the second one should find the differences. This neural
network is of the same architecture but has 441 input nodes, 30 hidden
nodes and has as many output nodes as individuals to be
recognized. The larger number of inputs is due to the larger window
used here. A square window of 21 x 21 pixels was used rather than the
rectangular 13 x 9 pixel. Once again the available hardware had some
influence on the coise of the size. The Intel ETANN multi chip board
(MCB) can hold a maximum of eight

Figure 2. Input vectors from a window on a persons eye
(top) and from another part of her face (bottom) The
window has 13 x 9 pixels and the data is plotted linearly.
|

Figure 3. Two input vectors from the same person to the
neural network trained for identification (top), and three
input vectors of the eyes of three different persons
(bottom). The vertical scale is in the 256 shades of
grey (displaced 200 units for clarity).
|
chips and thus we can have 512 inputs. Similarly, the IBM ZISC is
available on an ISA-card with 16 chips with 36 nodes each, i.e. again
> 500.
The input to this network is shown in fig. 3. Here we show two input
vectors from the same person (top) as well as the inputs from three
different persons (bottom). It is quite clear that the eyes are most
similar in the centre. This is the reason why we use a smaller window
( 13 x 9) for the network trained to find the eyes and a larger one
(21 x 21 ) for identification.
The neural network was tested using 141 images of 25 persons. The
result is shown in fig. 4, where we have plotted the output value of
the node representing the desired output minus the second largest node
output.The desired value is + 1 for the node corresponding to the
right person and - I for all other nodes. This applies to both
eyes. Hence the largest difference between any two outputs will be
4. In this way the vertical axis of fig. 4 will be a measure of how
"close" the second largest output is (values between

Figure 4. Results of testing the neural network using 141 images. The
outputs are plotted in decending order.
Negative values will thus respresent misidentified cases.
|
0 and 4) or, in misclassified cases, it will show a negative
value. As can be seen from the plot in fig. 4, approximately 5% of the
images were wrongly identified.
The drawback with the suggested system is that when run on a Pentium
personal computer it is too slow, and when implemented in hardware it
is too large and/or expensive. Clearly, one would like to reduce the
number of inputs by using some kind of feature extracting
preprocessor.
A PCMCIA card, developed by Neuroptics Consulting [14] for GIAT [15]
holds three ZISC036, suggesting that the total number of nodes should
be kept below one hundred. In such a case it would be possible to run
a real-time system on a lap-top computer. The CCD camera is easily
interfaced to this bus. Below we will discuss some methods to reduce
the number of inputs, while, at the same time, increase the flexibi I
ity and the redundancy of the total system.
3. More "advanced" approaches
Although the above system can be implemented using the Intel ETANN
and/or IBM ZISC036 chips and thus perform in real time, there are
still some inherent limitations, besides size and cost. One of them
has been touched upon in connection with the task of finding the eyes,
and has to do with the positioning of the face (distance, tilting,
etc). For this purpose one may choose another input to the
aforementioned neural network. With the hardware in mind, one may
choose to use the two projections of the window together with a
histogram of the greyscale. In fact, the choise will depend on "how
intelligent" the CCD camera is, i.e. how much preprocessing is done
before sending the data to the neural network. In the suggested case
we may have 9 + 13 + 10 = 32 inputs, i.e. the 256 greyscales are
binned in ten bins. Finding the eyes is very time consuming on von
Neumann computers, and this is where you would benefit from a hardware
implementation. Such implementations have

Figure 5.The implementation of three IBM ZISC036
RBF chips on a PCMCIA card (refs. [14,15])
|

Figure 6. A neural network circuit with a wavelet
preprocessor will reduce the number of inputs to
the latter, while enhancing features.
|
been made by Neuroptics Consulting [14] on a PCMCIA card with three
ZISC developed for GIAT Industries [ I 5].We have used this system to
test the tracking of a person's eye in a short video sequence. The
scanning window was 16x16 pixels and histogramming was performed using
64 bins. This test shows that simplicity as well as speed is gained in
this manner.
Another approach to reducing the number of inputs is to use a wavelet
(or wavelet packet) transform (cf. e.g. ref [17] preprocessor. The WTP
chip [16] from Aware Inc. is the only commercial hardware to our
knowledge at this time (Sept. 1995). This processor computes both the
forward and inverse wavelet transform (any 2, 4 or 6 coefficients
system) on one-dimensional data streams.The circuit will then be used
to calculate the transform, and the largest coefficients are retained
and fed to the neural network doing the identification.
However, some exploratory calculations [18] using both wavelet and
wavelet packet transforms show that one do not get as sparse input
data as one would expect. In fact, approximately 50% of the largest
wavelet coefficients need to be retained in order to yield a fair
reconstruction of the input vector (cf. fig. 7). Still this is quite
useful in view ofthe available neural network chips.
4. Summary and Conclusions
The present paper has demonstrated that a two stage neural network can
be used to find the eyes of a person and to indentify hirn/her. Such a
system could be run on a von Neumann computer, although the procedure
would be irritatinly long. Significant reduction of the processing
time can be obtained by implementing the neural network(s) in
hardware. This may require systems with many inputs and in order to
reduce this number an hardware prepocessing using wavelet (packet)
transforms may be considered.

Figure 7. Wavelet packet transform using Daubechies 6.
The input vector of 117 elements is shown (top) together
with the distribution of the WTP coefficients and the
reconstructed vector using 50% of the coeff'cient.
|
The method of finding the region of interest (the eye) and identifying
it, will be applicable to many areas (diagnostics, physics experiment,
etc; ref [19]) and very useful when implemented in hardware. The
wavelet preprocessing tum out to be even more efficient when
identifying impulses in telecommunication noise [20].
5. Acknowledgements
The present work was performed with support from the Swedish
Engineering Research Council (TFR). We would also like to acknowledge
discussions with IBM, Essones, France, GIAT Industried, Tolouse and
Neuroptics Consulting, Montpellier. In particular we acknowledge
discussions on programming the ZISC with G. Paillet. We also
appreciate the collaboration of the students of Ostfold College for
participating in this experiment.
References
- R.Linggard, D.J. Myers, and C Nightingale, and Neural Net works for vision, speech and natural language, Chapman & Hall (1992)
- I. Solheim, T. Payne and R. Castain, The potential in using backpropagation neural networks for facial verification systems, WINN-AIND, Auburn, AL, USA (1992)
- I. Solheim, Neural Networks for Facial Recognition, Thesis, University of Tromsö (1991)
- H.A. Rowley, S. Baluja and T. Kanade, Human Face Detection in Visual Scenes, CMU preprint July 1995: CMU-CS-95-158, Pittsburgh, PA
- Intel 80170 NX ETANN. Data Sheet, Intel Corp. Santa Clara, CA
- B. Denby, Th. Lindblad, C. S. Lindsey, Geza Szekely, J. Molnar, Åge Eide, S.R. Amendolia and A. Spaziani, Investigation of a VLSI neural network chip as part of a second ary vertex trigger Nucl. Instr. Meth. A335 (1993) 296 - 304
- ZISC03 Databook, IBM, Essonnes, France
- ALS420. Data Sheet, American NeuraLogix, Lake Mary, FA
- J. Molnar, G. Szekely, Th. Lindblad, C.S.
Lindsey, B. Denby, S.R. Arnendolia, Åge Eide, A Flexible VME-module with a 68070 Computer for Embedded Applications of the 80170NX Analog Neural Network Chip, ICFA Instrumentation Bulletin. No.10,
Dec.1993.
- C.S. Lindsey, Th. Lindblad, J.R. Vollaro, G. Székely, J. Molnar, Perfonnance of a Cascadeable Neural Network VME-module with Intel 80170NX Chips, Nucl. Instr. & Meth., A351 (1994) 466471.
- Th. Lindblad, C.S. Lindsey, M. Minerskjöld, G. Sekniaidze, G. Székely, Åge Eide,
Implementating the New Zero Instruction Set Computer (ZISC036) from IBM for a Higgs Search, Nucl. Inst. & Meth. A357 (1995) 217.
- M. Minerskjold, Th. Lindblad, C.S. Lindsey.
A versatile VME Digital Neural Network for Control and Pattern Recognition Applications. Proc. of EANN'95: International Conference on Engineering Applications of Neural Net works, Aug.21-23, 1995 Otaniemi, Finland.
- Th. Lindblad, C.S. Lindsey, Å. Eide, The IBM Zero Set Instruction Computer ZISC036, A Hardware Implemented Radial Basis Func tion Neural Network, to be in CRC Industrial Electronics Handbook, in press.
- Guy Paillet, Neuroptics Consulting, Montpellier, France
- G. Laurens, GIAT Industries, Tolouse, France
- Wavelet Transform Processor Chip, User's
Guide. PIN 2250, Aware, Inc. Cambridge, IL
- I. Daubechies, Ten lectures on Wavelets, So ciety for Industrial and Applied Mathematics Press, vol.61 CBMS-NSF, Philadelphia, 1992
- F. Majid, R. R. Coilman, M. V. Wickerhauser,
The Xwpl system Reference Manual, Yale Univ.
- Th. Lindblad et al, to be published
- Th. Lindblad et al, to be published
Copyright: 1996, Høgskolen i
Østfold. Last Update: 28.06.97,
Thomas Malt.
|