Facial recognition, how do you know it's the same person?

sergioarduino · February 19, 2023, 12:52pm

Hello. My curiosity led me to test an API that claims to do face detection and facial recognition.

Exploring a little more with my esp32-cam I took three pictures of my face keeping the same position, same lighting, etc.

The API responded in JSON:

{"results":[{"status":{"code":"ok","message":"Success"},"name":"esp32-cam.jpg","md5":"9d69b87ca4f638f685f85cd70f4feccf","entities":[{"kind":"objects","name":"face-detector","objects":[{"box":[0.26976995218709976,0.12491494323036217,0.5152465488728709,0.6869953984971613],"entities":[{"kind":"classes","name":"face","classes":{"face":0.9388834238052368}},{"kind":"namedpoints","name":"face-landmarks","namedpoints":{"left-eye":[0.37281060695648194,0.3989172077178955],"right-eye":[0.5729453086853027,0.3699918079376221],"nose-tip":[0.451428747177124,0.5087626254558564],"mouth-left-corner":[0.435301628112793,0.6622309684753418],"mouth-right-corner":[0.5814676094055176,0.6382978916168214]}}]}]}]}]}
16168214]}

{"results":[{"status":{"code":"ok","message":"Success"},"name":"esp32-cam.jpg","md5":"b22d28c4ec09ea54b05e689564b2f130","entities":[{"kind":"objects","name":"face-detector","objects":[{"box":[0.21273358143976484,0.19003698955126802,0.5770508424915641,0.7694011233220854],"entities":[{"kind":"classes","name":"face","classes":{"face":0.9846720099449158}},{"kind":"namedpoints","name":"face-landmarks","namedpoints":{"left-eye":[0.3557454586029053,0.4791991996765136],"right-eye":[0.5900873565673829,0.47020770072937007],"nose-tip":[0.4579240131378174,0.6371343088150024],"mouth-left-corner":[0.3957681941986084,0.7933987998962402],"mouth-right-corner":[0.5702394294738771,0.7867423629760741]}}]}]}]}]}
29760741]}

{"results":[{"status":{"code":"ok","message":"Success"},"name":"esp32-cam.jpg","md5":"0095a1904e0e979483c53d69ed40b3e9","entities":[{"kind":"objects","name":"face-detector","objects":[{"box":[0.2333206442123697,0.13479227484942086,0.5248686450469403,0.6998248600625872],"entities":[{"kind":"classes","name":"face","classes":{"face":0.9846428632736206}},{"kind":"namedpoints","name":"face-landmarks","namedpoints":{"left-eye":[0.3545965671539307,0.4072417259216308],"right-eye":[0.5614657402038574,0.3929729652404785],"nose-tip":[0.43846559524536133,0.5404448652267456],"mouth-left-corner":[0.3985099267959595,0.681445655822754],"mouth-right-corner":[0.549906005859375,0.6702262496948242]}}]}]}]}]}
96948242]}

But in all fields the data is different. How do you know it's the same person?

I try to answer this question by imagining that in the code, in the arduino for example, I will store the 4 fields of values for the tip of the nose, left and right corner of the mouth, etc. and create an average and create a String to then be compared with the next readings.

Is this how it's done?

Because I noticed very different values in the three API responses. Unless the variation is minimal and it is this minimal variation that will differentiate one person from another.

Anyway, does anyone have a piece of code that I can test here, turn on the espcam flash led only if the face is mine.

Thanks

Idahowalker · February 19, 2023, 1:04pm

The ESP32 does not have enough power to differentiate between faces.

If you want to detect particular faces Tensors can be made of each face and the TensorFlow ML can be used to tell different faces apart. A RaspberryPi or a BeagleBone-AI64 can do the job.

jfjlaros · February 19, 2023, 1:07pm

Wikipedia has a whole page about it.

A quick search gave me this.

https://maker.pro/arduino/projects/how-to-build-an-esp32-based-facial-recognition-system

sergioarduino · February 19, 2023, 1:22pm

That's not what the topic is about. But thank you.

sergioarduino · February 19, 2023, 1:22pm

That's not what the topic is about. But thank you.

sergioarduino · February 19, 2023, 1:23pm

Thank you.

gfvalvo · February 19, 2023, 1:26pm

They are numeric values. Why would you convert them to Strings? I'd treat each reply as an N-dimensional vector. Then, determine the difference (error) between subsequent result vectors in a "Mean Squared Error" sense. Then, compare that error against a threshold to determine if there's a match.

johnwasser · February 19, 2023, 4:10pm

sergioarduino:

The API responded in JSON:

{"results":[{"status":{"code":"ok","message":"Success"},"name":"esp32-cam.jpg","md5":"9d69b87ca4f638f685f85cd70f4feccf","entities":[{"kind":"objects","name":"face-detector","objects":[{"box":[0.26976995218709976,0.12491494323036217,0.5152465488728709,0.6869953984971613],"entities":[{"kind":"classes","name":"face","classes":{"face":0.9388834238052368}},{"kind":"namedpoints","name":"face-landmarks","namedpoints":{"left-eye":[0.37281060695648194,0.3989172077178955],"right-eye":[0.5729453086853027,0.3699918079376221],"nose-tip":[0.451428747177124,0.5087626254558564],"mouth-left-corner":[0.435301628112793,0.6622309684753418],"mouth-right-corner":[0.5814676094055176,0.6382978916168214]}}]}]}]}]}
16168214]}

That looks like a "face detection" result rather than a "face recognition" result. It's telling you where in the image it found a face and where it thinks the points of interest are. That is probably insufficient to recognize a face.

The example ESP32-CAM sketch doesn't enable face recognition on a basic ESP32. See the comment in the CameraWebServer example sketch:

// Face Recognition takes upward from 15 seconds per frame on chips other than ESP32S3
// Makes no sense to have it enabled for them

You should upgrade to an ESP32S3-based board.

sergioarduino · February 19, 2023, 4:49pm

Oops, this is Johnwasser,

But isn't that what this API claims to do ?

We send a photo and it even selects person#1, person#2, etc ?

I tested it with three pictures of my own face. But I didn't quite understand how to create a comparator code.

Or did I co-found myself with this API functionality ?

sergioarduino · February 19, 2023, 5:09pm

Now I get it. Mergers. It's their algorithm:

Query parameter: embeddings

The embeddings query parameter allows a client to enable/disable embeddings calculation. If a client passes True value then the service will perform a calculation of embeddings for each face detected in an image. Otherwise, if a client passes False value then embeddings will not be calculated.

Embeddings calculation is disabled by default.

Note: If you want to skip face detection and just calculate embeddings for the whole image, use the following combination of flags: detection=False&embeddings=True.

And their system works on images with more than one face where they just differentiate face1, face2, face3.

sergioarduino · February 19, 2023, 5:13pm

But... would it be very difficult to create your own filter that allows you to differentiate face 1 from face 2 ?

I think it would be enough to do some calculations with that response data and create an X value for each face. As if it were a standard average.

But of course I don't even know how to start doing that.

johnwasser · February 19, 2023, 5:44pm

The 'embeddings' seem to represent the face as a point in a 512-dimensional 'face space'. My guess is that you would find the 'distance' between two points to see how close the two points are in face space.

For each of the 512 values in the two 'embeddings' vectors, subtract A from B and square it. Average the 512 squares and then take the square root.

sergioarduino · February 19, 2023, 5:47pm

Who dares to try ?

gfvalvo · February 19, 2023, 5:50pm

Although you don't need to take the final square root as long as you compare the Mean Squared Error against the proper threshold.

anon57585045 · February 19, 2023, 6:38pm

Indeed.

sergioarduino · February 19, 2023, 7:57pm

Let's have faith.

Someone will show up who will help and post the snippet of the filter code using those 5 results that the API delivers so that we can continue with the tests and see if it is even possible to differentiate faces with moderate precision using the espcam + api.

I left a photo of my face for testing:

https://www.linkpicture.com/view.php?img=LPic63f27e2e77f1c569955444

gfvalvo · February 19, 2023, 9:16pm

  double errorSum = 0;
  for (uint8_t i = 0; i < 5; i++) {
    double error = faceVector1[i] - faceVector2[i];
    errorSum += error * error;
  }
  if (errorSum < matchThreshold) {
    Serial.println("Faces Match");
  }

sergioarduino · February 19, 2023, 9:41pm

I know it's easier to find a couple of dwarf twins than for you to post a code. thanks++;

But what would faceVector1[i] and faceVector2[i] ? and matchThreshold ?

gfvalvo · February 19, 2023, 9:56pm

faceVector1 & faceVector2 would be populated from the API's reply message. You'd have to determine matchThreshold heuristically based on your tolerance for False Positive verses False Negative results.

johnwasser · February 19, 2023, 10:38pm

You have to turn on the calculation of 'embeddings' which are 512-element vectors. You then have to compare those 512 elements per face to the 512 elements for every other face to determine the distance.

Hint: Add "embeddings=True" to the end of the query URL.

Topic		Replies	Views
Problem of ESP32CAM Programming	44	464	September 16, 2025
Face detection on a display Showcase	22	491	April 2, 2025
Questions about esp32cam Programming	18	1249	July 27, 2023
Esp 32 cam with an camera while using the library edge impulse Programming	12	266	March 9, 2026
Cannot upload example code to esp32 cam Programming	9	196	February 13, 2026

Facial recognition, how do you know it's the same person?

Related topics