[TensorFlow Lite] Error with micro_allocator and SingleArenaBufferAllocator when using SDRAM

esteesjose · August 3, 2023, 8:43pm

I'm trying to use a crowd counting model with the Portenta Vision Shield, with the latest version of the TensorFlow Lite repository, but the Portenta H7 doesn't have enough internal memory to assign for the model and the interpreter, so I'm trying to use the SDRAM, which would expand the memory by 8MB.

The code I'm using is a variation of the person_detection example that comes in the library. Here's a resumed version (it doesn't include the loop function, because the error is at setup()):

#include <TensorFlowLite.h>

#include "detection_responder.h"
#include "image_provider.h"
#include "main_functions.h"
#include "model_settings.h"
#include "cc_model_data.h"
#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/micro/micro_log.h"
#include "tensorflow/lite/micro/micro_mutable_op_resolver.h"
#include "tensorflow/lite/micro/system_setup.h"
#include "tensorflow/lite/schema/schema_generated.h"

#include "camera.h"

#include <SDRAM.h>

// Globals, used for compatibility with Arduino-style sketches.
namespace {
  const tflite::Model* model = nullptr;
  tflite::MicroInterpreter* interpreter = nullptr;
  TfLiteTensor* input = nullptr;
  
  // In order to use optimized tensorflow lite kernels, a signed int8_t quantized
  // model is preferred over the legacy unsigned model format. This means that
  // throughout this project, input images must be converted from unisgned to
  // signed format. The easiest and quickest way to convert from unsigned to
  // signed 8-bit integers is to subtract 128 from the unsigned value to get a
  // signed value.
  
  // An area of memory to use for input, output, and intermediate arrays.
  constexpr int kTensorArenaSize = 136 * 1024;
  alignas(8) uint8_t *tensor_arena = (uint8_t *)SDRAM.malloc(kTensorArenaSize);

  constexpr int img_width = 320;
  constexpr int img_height = 240;

  FrameBuffer fbImage(img_width, img_height, 2);
}  // namespace

void setup() {

  Serial.begin(115200);
  while (!Serial);

  tflite::InitializeTarget();

  // Map the model into a usable data structure. This doesn't involve any
  // copying or parsing, it's a very lightweight operation.
  model = tflite::GetModel(cc_model_data);
  if (model->version() != TFLITE_SCHEMA_VERSION) {
    Serial.print("Model provided is schema version ");
    Serial.print(model->version());
    Serial.print(", not equal to supported version ");
    Serial.println(TFLITE_SCHEMA_VERSION);
    return;
  }

  static tflite::MicroMutableOpResolver<6> micro_op_resolver;
  micro_op_resolver.AddDepthwiseConv2D();
  micro_op_resolver.AddConv2D();
  micro_op_resolver.AddSoftmax();
  micro_op_resolver.AddRelu();
  micro_op_resolver.AddAdd();
  micro_op_resolver.AddPad();

  // Build an interpreter to run the model with.
  // HERE'S THE ERROR  < < < < < < < <
  static tflite::MicroInterpreter static_interpreter(
      model, micro_op_resolver, tensor_arena, kTensorArenaSize);

  interpreter = &static_interpreter;

  // Allocate memory from the tensor_arena for the model's tensors.
  TfLiteStatus allocate_status = interpreter->AllocateTensors();
  if (allocate_status != kTfLiteOk) {
    Serial.println("AllocateTensors() failed");
    return;
  }

  // Get information about the memory area to use for the model's input.
  input = interpreter->input(0);

  if ((input->dims->size != 4) || (input->dims->data[0] != 1) ||
      (input->dims->data[1] != kNumRows) ||
      (input->dims->data[2] != kNumCols) ||
      (input->dims->data[3] != kNumChannels) || (input->type != kTfLiteInt8)) {
    Serial.println("Bad input tensor parameters in model");
    return;
  }
}

The error happens when trying to initialize static tflite::MicroInterpreter static_interpreter(model, micro_op_resolver, tensor_arena, kTensorArenaSize):

MicroInterpreter::MicroInterpreter(const Model* model,
                                   const MicroOpResolver& op_resolver,
                                   uint8_t* tensor_arena,
                                   size_t tensor_arena_size,
                                   MicroResourceVariables* resource_variables,
                                   MicroProfilerInterface* profiler)
    : model_(model),
      op_resolver_(op_resolver),
      allocator_(*MicroAllocator::Create(tensor_arena, tensor_arena_size)),
      graph_(&context_, model, &allocator_, resource_variables),
      tensors_allocated_(false),
      initialization_status_(kTfLiteError),
      input_tensors_(nullptr),
      output_tensors_(nullptr),
      micro_context_(&allocator_, model_, &graph_) {
  Init(profiler);
}

Specifically when calling *MicroAllocator::Create(tensor_arena, tensor_arena_size)):

MicroAllocator* MicroAllocator::Create(uint8_t* tensor_arena,
                                       size_t arena_size) {

  uint8_t* aligned_arena =
      AlignPointerUp(tensor_arena, MicroArenaBufferAlignment());

  size_t aligned_arena_size = tensor_arena + arena_size - aligned_arena;

//Here's the ERROR < < < < <
  SingleArenaBufferAllocator* memory_allocator =
      SingleArenaBufferAllocator::Create(aligned_arena, aligned_arena_size);

  // By default create GreedyMemoryPlanner.
  // If a different MemoryPlanner is needed, use the other api.
  uint8_t* memory_planner_buffer = memory_allocator->AllocatePersistentBuffer(
      sizeof(GreedyMemoryPlanner), alignof(GreedyMemoryPlanner));

  GreedyMemoryPlanner* memory_planner =
      new (memory_planner_buffer) GreedyMemoryPlanner();

  return Create(memory_allocator, memory_planner);
}

The error finally coming from SingleArenaBufferAllocator::Create(aligned_arena, aligned_arena_size):

/* static */
SingleArenaBufferAllocator* SingleArenaBufferAllocator::Create(
    uint8_t* buffer_head, size_t buffer_size) {

  TFLITE_DCHECK(buffer_head != nullptr);

  SingleArenaBufferAllocator tmp =
      SingleArenaBufferAllocator(buffer_head, buffer_size);

  // Allocate enough bytes from the buffer to create a
  // SingleArenaBufferAllocator. The new instance will use the current adjusted
  // tail buffer from the tmp allocator instance.
  uint8_t* allocator_buffer = tmp.AllocatePersistentBuffer(
      sizeof(SingleArenaBufferAllocator), alignof(SingleArenaBufferAllocator));

  // ERROR
  // Use the default copy constructor to populate internal states.
  return new (allocator_buffer) SingleArenaBufferAllocator(tmp);
}

Here, the error comes from the last line return new (allocator_buffer) SingleArenaBufferAllocator(tmp) (I printed debug messages and that's as far as the program got), which, if I'm not mistaken, tries to copy the data from tmp to allocator_buffer, and return that pointer.

I've tried:

Assigning less memory in SDRAM
Assigning more memory in SDRAM
Including the SDRAM.h module
Assigning more memory in uint8_t* allocator_buffer = tmp.AllocatePersistentBuffer(sizeof(SingleArenaBufferAllocator), alignof(SingleArenaBufferAllocator));

The only thing that fixes the problem is using internal memory instead of the SDRAM.
And I'm currently trying to print the different assigned pointers to check if they are actually inside the memory.

Thanks.

esteesjose · August 10, 2023, 3:59am

OK, for some reason, the line to initialize the SDRAM was gone, but that wasn't the only issue.
I also had to leave raw memory for the buffer to use, by initializing the SDRAM after that memory, instead of using SDRAM.malloc().

namespace {
...
uint8_t* tensor_arena = (uint8_t*)SDRAM_START_ADDRESS;
...
} //namespace

void setup() {
...
SDRAM.begin(SDRAM_START_ADDRESS + 4 * 1024 * 1024);
...
}

jerteach · September 17, 2023, 2:05pm

How did you get the Portenta working with TensorflowLite? I have an issue on there github here

github.com/tensorflow/tflite-micro

General arduino port of TFLite-micro?

opened 07:09PM - 04 Sep 23 UTC

sussman

Hi all! I'm working my way through the "TinyML" O'Reilly book (by Pete and Dani…el). I already own a couple of modern arduinos (a Adafruit Bluefruit Sense feather, and Adafruit ESP32-S2 Feather), and I'm surprised that TFLite only works on a single specific Arduino (the Nano 33 BLE Sense). The front page of the project cheerfully suggests that TFLiteshould "work on any arm-m4-cortex" device (like my Bluefruit Sense), but after a couple of hours I gave up trying to get the "hello world" sine-wave example to compile in the IDE for my feather. I commented out the #ifdef test that looks for the specific 33 BLE model, but like so many others, got lost in a maze of min() and max() being defined too many times. So I'm asking: is anyone working on a more generalized m4-cortex port? Perhaps one that *doesn't* support all the custom sensor hardware for the 33 BLE Sense device, but is simply guaranteed to be able to do nothing but run the TF Interpreter?

esteesjose · September 23, 2023, 5:57am

I used the same repo that is mentioned in that thread (GitHub - tensorflow/tflite-micro-arduino-examples), by placing it on the libraries folder of Arduino IDE (i.e. /opt/arduino.1.8.19/libraries, in my case), and including the required files in any given sketch, e.g.:

#include <TensorFlowLite.h>
#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/micro/micro_log.h"
#include "tensorflow/lite/micro/micro_mutable_op_resolver.h"
#include "tensorflow/lite/micro/system_setup.h"
#include "tensorflow/lite/schema/schema_generated.h"

The only problem I had was with the peripherals file (src/peripherals/peripherals.h), which you also mention in the thread, given that TensorFlowLite.h requires that file to work, but it fails if you're not using the Arduino Nano 33 BLE Sense, by the way it's written.

I tried two workarounds for this:

Modifying the files in the peripherals folder to support the Portenta (e.g. you can define a custom if block for the M7 Core of the Portenta H7 by using #elif defined(ARDUINO_PORTENTA_H7_M7) in peripherals.h), but it can get tricky if you actually need to define peripheral devices, which fortunately wasn't my case.
Or, simply remove every file in the peripherals folder (in my case, I only have the utility.h file left, but I'm not entirely sure if it's necessary), and comment the line #include "peripherals/peripherals.h" in src/TensorFlowLite.h, so you don't depend on those files.

jerteach · September 29, 2023, 12:56am

I like it. I will try to do the same. Might take a few days.

chirale · July 5, 2024, 7:31pm

We have started a new porting of tflite-micro on Arduino. Look at ArduTFLite and Chirale_TensorFlowLite libraries. Both works on Portenta.

Topic		Replies	Views
Problem using tensorflowlite and esp32 cam Programming	17	556	September 16, 2024
Nicla Sense ME - out of memory Nicla Sense ME	7	720	May 8, 2024
Garbage classification About esp32-cam and TensorFlowLite_ESP32 Programming	8	180	February 13, 2025
Code Problems while trying to run Tensor Flow lite CNN model Programming	1	642	August 6, 2023
Compilation issue with TensorFlow Lite for Microcontrollers in Arduino Web Editor IDE 1.x	13	3606	September 19, 2023

[TensorFlow Lite] Error with micro_allocator and SingleArenaBufferAllocator when using SDRAM

Related topics