[TensorFlow Lite] Error with micro_allocator and SingleArenaBufferAllocator when using SDRAM

I'm trying to use a crowd counting model with the Portenta Vision Shield, with the latest version of the TensorFlow Lite repository, but the Portenta H7 doesn't have enough internal memory to assign for the model and the interpreter, so I'm trying to use the SDRAM, which would expand the memory by 8MB.

The code I'm using is a variation of the person_detection example that comes in the library. Here's a resumed version (it doesn't include the loop function, because the error is at setup()):

#include <TensorFlowLite.h>

#include "detection_responder.h"
#include "image_provider.h"
#include "main_functions.h"
#include "model_settings.h"
#include "cc_model_data.h"
#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/micro/micro_log.h"
#include "tensorflow/lite/micro/micro_mutable_op_resolver.h"
#include "tensorflow/lite/micro/system_setup.h"
#include "tensorflow/lite/schema/schema_generated.h"

#include "camera.h"

#include <SDRAM.h>

// Globals, used for compatibility with Arduino-style sketches.
namespace {
  const tflite::Model* model = nullptr;
  tflite::MicroInterpreter* interpreter = nullptr;
  TfLiteTensor* input = nullptr;
  
  // In order to use optimized tensorflow lite kernels, a signed int8_t quantized
  // model is preferred over the legacy unsigned model format. This means that
  // throughout this project, input images must be converted from unisgned to
  // signed format. The easiest and quickest way to convert from unsigned to
  // signed 8-bit integers is to subtract 128 from the unsigned value to get a
  // signed value.
  
  // An area of memory to use for input, output, and intermediate arrays.
  constexpr int kTensorArenaSize = 136 * 1024;
  alignas(8) uint8_t *tensor_arena = (uint8_t *)SDRAM.malloc(kTensorArenaSize);

  constexpr int img_width = 320;
  constexpr int img_height = 240;

  FrameBuffer fbImage(img_width, img_height, 2);
}  // namespace

void setup() {

  Serial.begin(115200);
  while (!Serial);

  tflite::InitializeTarget();

  // Map the model into a usable data structure. This doesn't involve any
  // copying or parsing, it's a very lightweight operation.
  model = tflite::GetModel(cc_model_data);
  if (model->version() != TFLITE_SCHEMA_VERSION) {
    Serial.print("Model provided is schema version ");
    Serial.print(model->version());
    Serial.print(", not equal to supported version ");
    Serial.println(TFLITE_SCHEMA_VERSION);
    return;
  }

  static tflite::MicroMutableOpResolver<6> micro_op_resolver;
  micro_op_resolver.AddDepthwiseConv2D();
  micro_op_resolver.AddConv2D();
  micro_op_resolver.AddSoftmax();
  micro_op_resolver.AddRelu();
  micro_op_resolver.AddAdd();
  micro_op_resolver.AddPad();

  // Build an interpreter to run the model with.
  // HERE'S THE ERROR  < < < < < < < <
  static tflite::MicroInterpreter static_interpreter(
      model, micro_op_resolver, tensor_arena, kTensorArenaSize);

  interpreter = &static_interpreter;

  // Allocate memory from the tensor_arena for the model's tensors.
  TfLiteStatus allocate_status = interpreter->AllocateTensors();
  if (allocate_status != kTfLiteOk) {
    Serial.println("AllocateTensors() failed");
    return;
  }

  // Get information about the memory area to use for the model's input.
  input = interpreter->input(0);

  if ((input->dims->size != 4) || (input->dims->data[0] != 1) ||
      (input->dims->data[1] != kNumRows) ||
      (input->dims->data[2] != kNumCols) ||
      (input->dims->data[3] != kNumChannels) || (input->type != kTfLiteInt8)) {
    Serial.println("Bad input tensor parameters in model");
    return;
  }
}


The error happens when trying to initialize static tflite::MicroInterpreter static_interpreter(model, micro_op_resolver, tensor_arena, kTensorArenaSize):

MicroInterpreter::MicroInterpreter(const Model* model,
                                   const MicroOpResolver& op_resolver,
                                   uint8_t* tensor_arena,
                                   size_t tensor_arena_size,
                                   MicroResourceVariables* resource_variables,
                                   MicroProfilerInterface* profiler)
    : model_(model),
      op_resolver_(op_resolver),
      allocator_(*MicroAllocator::Create(tensor_arena, tensor_arena_size)),
      graph_(&context_, model, &allocator_, resource_variables),
      tensors_allocated_(false),
      initialization_status_(kTfLiteError),
      input_tensors_(nullptr),
      output_tensors_(nullptr),
      micro_context_(&allocator_, model_, &graph_) {
  Init(profiler);
}


Specifically when calling *MicroAllocator::Create(tensor_arena, tensor_arena_size)):

MicroAllocator* MicroAllocator::Create(uint8_t* tensor_arena,
                                       size_t arena_size) {

  uint8_t* aligned_arena =
      AlignPointerUp(tensor_arena, MicroArenaBufferAlignment());

  size_t aligned_arena_size = tensor_arena + arena_size - aligned_arena;

//Here's the ERROR < < < < <
  SingleArenaBufferAllocator* memory_allocator =
      SingleArenaBufferAllocator::Create(aligned_arena, aligned_arena_size);

  // By default create GreedyMemoryPlanner.
  // If a different MemoryPlanner is needed, use the other api.
  uint8_t* memory_planner_buffer = memory_allocator->AllocatePersistentBuffer(
      sizeof(GreedyMemoryPlanner), alignof(GreedyMemoryPlanner));

  GreedyMemoryPlanner* memory_planner =
      new (memory_planner_buffer) GreedyMemoryPlanner();

  return Create(memory_allocator, memory_planner);
}


The error finally coming from SingleArenaBufferAllocator::Create(aligned_arena, aligned_arena_size):

/* static */
SingleArenaBufferAllocator* SingleArenaBufferAllocator::Create(
    uint8_t* buffer_head, size_t buffer_size) {

  TFLITE_DCHECK(buffer_head != nullptr);

  SingleArenaBufferAllocator tmp =
      SingleArenaBufferAllocator(buffer_head, buffer_size);

  // Allocate enough bytes from the buffer to create a
  // SingleArenaBufferAllocator. The new instance will use the current adjusted
  // tail buffer from the tmp allocator instance.
  uint8_t* allocator_buffer = tmp.AllocatePersistentBuffer(
      sizeof(SingleArenaBufferAllocator), alignof(SingleArenaBufferAllocator));

  // ERROR
  // Use the default copy constructor to populate internal states.
  return new (allocator_buffer) SingleArenaBufferAllocator(tmp);
}


Here, the error comes from the last line return new (allocator_buffer) SingleArenaBufferAllocator(tmp) (I printed debug messages and that's as far as the program got), which, if I'm not mistaken, tries to copy the data from tmp to allocator_buffer, and return that pointer.

I've tried:

  • Assigning less memory in SDRAM
  • Assigning more memory in SDRAM
  • Including the SDRAM.h module
  • Assigning more memory in uint8_t* allocator_buffer = tmp.AllocatePersistentBuffer(sizeof(SingleArenaBufferAllocator), alignof(SingleArenaBufferAllocator));

The only thing that fixes the problem is using internal memory instead of the SDRAM.
And I'm currently trying to print the different assigned pointers to check if they are actually inside the memory.

Thanks.

OK, for some reason, the line to initialize the SDRAM was gone, but that wasn't the only issue.
I also had to leave raw memory for the buffer to use, by initializing the SDRAM after that memory, instead of using SDRAM.malloc().

namespace {
...
uint8_t* tensor_arena = (uint8_t*)SDRAM_START_ADDRESS;
...
} //namespace

void setup() {
...
SDRAM.begin(SDRAM_START_ADDRESS + 4 * 1024 * 1024);
...
}

How did you get the Portenta working with TensorflowLite? I have an issue on there github here

I used the same repo that is mentioned in that thread (GitHub - tensorflow/tflite-micro-arduino-examples), by placing it on the libraries folder of Arduino IDE (i.e. /opt/arduino.1.8.19/libraries, in my case), and including the required files in any given sketch, e.g.:

#include <TensorFlowLite.h>
#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/micro/micro_log.h"
#include "tensorflow/lite/micro/micro_mutable_op_resolver.h"
#include "tensorflow/lite/micro/system_setup.h"
#include "tensorflow/lite/schema/schema_generated.h"

The only problem I had was with the peripherals file (src/peripherals/peripherals.h), which you also mention in the thread, given that TensorFlowLite.h requires that file to work, but it fails if you're not using the Arduino Nano 33 BLE Sense, by the way it's written.

I tried two workarounds for this:

  1. Modifying the files in the peripherals folder to support the Portenta (e.g. you can define a custom if block for the M7 Core of the Portenta H7 by using #elif defined(ARDUINO_PORTENTA_H7_M7) in peripherals.h), but it can get tricky if you actually need to define peripheral devices, which fortunately wasn't my case.

  2. Or, simply remove every file in the peripherals folder (in my case, I only have the utility.h file left, but I'm not entirely sure if it's necessary), and comment the line #include "peripherals/peripherals.h" in src/TensorFlowLite.h, so you don't depend on those files.

1 Like

I like it. I will try to do the same. Might take a few days.

1 Like

We have started a new porting of tflite-micro on Arduino. Look at ArduTFLite and Chirale_TensorFlowLite libraries. Both works on Portenta.

2 Likes