How to Perform Object Detection Directly on ESP32-CAM?

Hi everyone,

I am working on a project where I want to stream video and perform object detection directly on the ESP32-CAM without relying on an external PC. I understand that the ESP32-CAM has limited hardware resources, so I'm looking for the best way to achieve this.

Here are my questions:

  1. Are there any lightweight object detection models (e.g., TinyML models) that can run directly on the ESP32-CAM?
  2. What tools or frameworks should I use to deploy and run object detection models on the ESP32-CAM? I've read about TensorFlow Lite for Microcontrollers, but I'm unsure if it supports this setup.
  3. Is it possible to run pre-trained models like TinyYOLO on the ESP32-CAM, or do I need to train a custom model optimized for microcontrollers?
  4. Any examples, libraries, or guides to help me set up this pipeline?

I’m particularly interested in detecting simple objects like “person” or “car.” Any advice on achieving this directly on the ESP32-CAM would be greatly appreciated.

Thanks in advance for your help!

Based on your research, does your ESP-32-CAM have enough memory to store one uncompressed image so it can be analyzed?

There are ESP32S3 Dev boards that have a camera connector and SD card holder and 8MB of PSRAM.

The ESP32-CAM has limited memory, with 520 KB of SRAM and, in some models, an additional 4 MB of external PSRAM. It can store small uncompressed images like 320x240 (QVGA), which require around 230 KB of memory, but larger resolutions (e.g., 640x480) will exceed the SRAM capacity unless PSRAM is enabled.
For analysis, most ESP32-CAM setups rely on compressed JPEG images to save memory, decompressing them only if necessary for processing.

Right, and your necessary processing is object detection. So your ESP will have to decompress the image to examine each pixel.

i can put a MicroSD Card * Pre-trained models or lightweight neural network files. Object detection requires computational power and RAM. The ESP32-CAM can only handle very lightweight models

I just thought of Using a cloud service for object detection with the ESP32-CAM involves leveraging the computational power of cloud-based platforms to perform tasks that the ESP32-CAM itself cannot handle due to hardware limitations. does anyone know how to do this ?

You might get some ideas from studying the example ESP32-CAM code that does face detection.

The first step would be to check if you can actually stream video on the ESP32-CAM.

If you haven't done that yet, the results might be disappointing. Depending, of course, on how you define "stream video".

Take a look at https://dronebotworkshop.com/esp32-object-detect/

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.