Vision Science & Facial Recognition

Facial Recognition, Ethical 2D &
Vision Science

“Super Recognisers” are untrained observers who naturally can identify individuals with an uncanny accuracy that exceeds the capabilities of even the best A.I. Facial Recognition Systems. These exceptionally rare people can remember unique facial identifiers from any photographic method. The can even identify specific individuals who are disguised, when there is a long time interval between each image or the photography is of very poor quality.

However, the most common artefact of photographic ID is also the worst case scenario: Trained observers with normal facial perception can look carefully at a recent ID photograph, then directly at the real person and yet fail to make a correct match.
This can lead to false-negative rejections that will deny entry to legitimate passport or photopass holders. Worse still, it can allow a failure to apprehend a suspect or have false-positive identifications and the risk of false imprisonment.

Even in well-controlled conditions, false negatives by trained observers in a large real-world study denied real debit card purchases intended to be validated only by a recent photo ID. This the main reason why signatures and other forms of I.D. verification still have priority over portrait photography in cash or legal transactions.*

Yet photographically identical twins can be easily identified as unique individuals in reality, even when they attempt to disguise themselves as their twin and only respond to their twin’s name.  The Richard Kemp trial * (linked above) was the first large study to demonstrate that ID photography distorts unique information that is instantly available to us when we identify any individual with direct vision. This is true even if they are now much older, have a different body size, hair, tattooing, spectacles, reconstructive surgery or have been disfigured by accidents.

Our historic misunderstandings of how the Human Visual System interprets portraiture and photography is the cause of these failures
It is fortunate that the key facial features computers use to identify individual faces lie on an almost flat plane. So the compression of life-like three dimensions into a 2D image often does not alter the core geometric relationships facial recognition systems rely on. These features also change slowly over time. However many other facial features are highly variable both photographically and in reality: Neck width, nose length, nose to ear distances, the angle ears stick-out, the apparent size or shape of the head can all appear to be very different in reality than in their 2D reproductions, or even from one lens type to another. Healthy and natural skin tones, eye colour, eye whiteness,  eye bags, teeth colour, fine lines and even apparent age can be recorded with shifts that are large enough to make identification more difficult. And bodyweight, skin tone/texture, scarring, eye whites, eye colour, hair style, hair colour and apparent age can all change in a short time, making even very recent photo IDs almost useless. These technical flaws in ID photography can have a range of surprising consequences, such as:

Police mugshots disguising the identity of suspects
Male passports being incorrectly accepted when presented by females.

Afro-caribbean female passports mistakenly accepted as caucasian male travellers when presented in error to experienced passport control officers.

Rather like the surprisingly poor scientific evidence that fingerprints are unique, it may be that authorities worldwide do not want it widely known that ID photography can be so problematic? What was abundantly clear from this research was the flattening and fattening effect of 2D telephoto lenses used in portraiture can produce strong distortions and ageing effects: During the mandatory debriefing process with the experiments reported here, some of the photographic models struggled to recognise themselves in their own  portraits. They often commented that they looked very old in these images, saying the portraits taken at the longest camera to subject distances made them look exactly like their parents.  The poor colour fidelity found in most imaging systems coupled with flawed brightness and contrast reproduction can make young people look much older with conventional ID images. And the flattening effect can change facial topography and relative proportions so much it is little wonder false negative rejections are far more common than false positives.

A particular problem relates to the way lenses, film and digital sensors handle contrast. The Human Visual System compresses contrast ranges, particularly when we perceive scenes with stereoscopic vision and the target has details in both sunlit areas and deep shadows. The result is we see can fine details in brightly lit highlights and yet can still recover exceptional detail from adjacent shadows. This is beyond the capability of almost any 2D camera system to resolve. Even high dynamic range photography cannot see what the eye can see. This is because 2D photography always expands the contrast ranges, bleaching out highlights and making shadows appear to be far darker than they are in reality to bystanders.

Ethics & 2D     

In professional photography, great care is taken to avoid overexposed (blown-out) highlights, and under-illuminated shadow areas below the level where they convey little or no detail. This level of control is not possible in conventional photography and CCT imagery, and yet it is a critical issue for recovering detail from darker skins.  Post-compressing the lighting range to make it more like human vision can still leave small zones of high contrast that can alter facial features by accentuating the flattening and fattening effects of telephoto lenses.  In addition to accurate size, shape and distance information, the Human Visual System uses stereoscopic disparity for binocular colour summation. This is where two eyes assign tonality and recover the finest colour details through the use of two lenses. This is currently impossible using any form of 2D imaging.   

AI based Facial Recognition Systems are deeply flawed in this respect too. While they can have good performance on lighter skins and male images, they are often shockingly poor at recovering detail from darker skin tones or assigning them a natural skin colour or even their gender. This is an innate problem to all 2D imaging systems, as they were not designed for FRS and always work from a very unnatural data set.  The result can be a racial bias so worrying that FRS has recently been banned by the San Francisco police:
https://www.ft.com/content/7e244488-76a7-11e9-be7d-6d846537acab
And the whole issue of incomplete and “noisy” FRS data is under detailed ethical scrutiny by the influential Partnership On AI, a major IT industry research group based in the USA.

In portraiture, any failure to reproduce natural scaling, body size and life-like tonality can make wrinkles, eye-bags, fine lines and sub-dermal structures appear to be more prominent than we see with direct vision. In addition to their effects on immediate recognition, unnatural accentuation of facial features and dimensions can make younger people seem to be older than they really are and adversely affect their attractiveness. Photographers can intervene using sophisticated lighting and powerful post production editing software to correct many of these distortions. But CCTV and AI Facial Recognition Systems cannot.  And it remains the case that almost all photographs of people are uncorrected for body image distortion and so cannot be a wholly reliable or veridical method of record. How long can such an unethical situation be allowed to continue once professionals become aware of the true extent of BID? 

Vision Science

The many flaws in 2D imaging revealed by the experiments reported here were present in the very first images of people. Yet these same cameras and lenses also recorded perspectives, landscapes, objects and tonal ranges that had a strong correlation to our perceived reality. You did not have to be a vision scientist to be impressed by their obvious naturalness. Everybody was.  Since 1840 when people saw their first photograph they were overwhelmed by these uncanny evocations of human vision. Photographs looked “right” in ways painting by eye and rendering by hand could not. People forgave their harsh and grainy monochromatic images (incapable of capturing red/green spectra), poor resolution and long exposure times. Resolution and exposure times soon improved, making it possible to take photographs of actual living and breathing people. Everyone wanted to have their portraits taken as photography became the  wonder of the age.  Artists realised the goal of vivid naturalness had been stolen from them by photomechanical imaging.  So they were finally free to paint more of what they felt and less of what they were saw. And if the subjects of the new-fangled photography felt hard done by when the resulting portraits seemed unflattering, fattening and ageing, they were told “But Sir/Madame: The camera never lies!”

Vision scientists were amongst the the first people to embrace the astonishing potential of photography. Those who felt 2D photography was somehow lacking in verisimilitude soon combined the Wheatstone stereoscope (1838) with the earliest cameras to produce true stereoscopic photography in the late 1840s. It’s clearly enhanced ecological validity over 2D seemed to suggest the beginnings of a new age of naturalness.  For a while, it seemed that stereography might be even become the default form of both vernacular and scientific photography.  The rate of progress in photography seemed to be so fast it would soon deliver full colour stereoscopic images that would be astonishingly life-like compared to 2D b/w photography. But the pace of developments slowed dramatically and consumer level 2D colour photography was still uncommon 100 years later.

Mass-produced 2D monochrome imaging became the dominant form, taking the world by storm towards the end of the 19th century. Even today, naturalistic colour photography is still not possible outside of a few vision laboratories.  Most people are still unaware that tri-colour photography has always been spectrally distorting (leading to errors, such as blue dresses occasionally appearing to be gold etc), and can often accompany other hidden and misunderstood distortions. Aberrations like these can individually and collectively undermine photography’s core naturalness, truthfulness and utility. 

It is surprising but true that unedited 2D photography has few peer reviewed studies to support the assumption that it can be a truly natural and entirely veridical method of record. Vision scientists and experimental psychologists often conduct experiments using photographs as a substitute for real targets such as faces, landscapes and inanimate objects. However, they do this without evaluating whether the human visual system processes these 2D representations in precisely the same way as they view a real-life target with motion, full colour and stereoscopic depth. So when studying face, form or object perception it should always be questioned if these reduced tonal range 2D still images have significant flaws (or deliver differing percepts) compared to direct viewing of the real living target.

Really Virtual Reality

A concern throughout the feasibility studies was that the stereoscopic methods being used could distort perception; What if what was being measured were simply optical artifacts of the feasibility study photography rather than a real-world perceptual condition?
The choice of convergence orthostereoscopic image capture and projection was demonstrated to Dr Nick Lodge,  director of reasearch and development at the UK Government’s Independent Television Commission and Professor Brian Rogers, the world-renowned stereoscopic vision scientist and 3D imaging expert. They agreed it allowed us to exclude the possiblity the size reductions seen in the feasibility studies were due to unforseen side-effects, such as binocular micropsia or macropsia. In side by side comparisons with the original models, no such effects were found. The virtual twin demonstration reported earlier showed that convergence orthostereography produced uncannily accurate reporductions of real people.

The only poor data sets came when more conventional parallel lens 3D was evaluated in the first feasibility studies.  Small increases and decreases in size were seen when the projectors were slightly converged or diverged away from the ideal alignment. But these conditions were uncomfortable and induced ghosting, which were completely eradicated as the convergence was realigned at the point of focus (for the original cameras) and at the gaze point of the test subjects. When realigned to the most natural condition, all virtual targets had exactly the same size and shape as their real “twins” and could be viewed for unlimited periods without discomfort. The size estimation data from the parallel 3D demonstrations were highly variable, and yet became highly consistent (with a very strong dose-response curve) when the more natural convergent lens 3D method was used.

Many perception experiments also use 2D line drawings or silhouettes instead of the naturalistic stimuli used in the BID experiments. But they often do this without  control studies against real targets to show that they are equally effective substitutes. It is entirely possible that the Human Visual System processes graphics or simplified 2D images more like a hieroglyph than a real target. If so, generalising outcomes to real world perception from experiments featuring artificial 2D stimuli could be problematic. However, all of the stereoscopic experiments reported here used orthostereoscopic 3D full-colour stimuli that had been calibrated for naturalness against a real-life target. They looked “real” in a way that no 2D photograph had looked before. It seems clear that many more experiments should be conducted with life-like 3D imagery to discover if known experiment paradigms have different outcomes when the quality, naturalness and ecological validity of the imaging is substantially better.

 For more on Facial Recognition and Vision Science methodologies, contact the author at

bernieharper@gmail.com

Hits: 49