Struct array, linked list, or hash Map

Hi All,

I'm working on a piece of code and hope you guys could give some help.

In the code, I will need to maintain a large array of struct, which contains several integers and chars, for real-time processing.
Basically, every time I receive some data with a unique ID, I will need to search if the ID is present or not in my array and do corresponding data processing, and update the array by either adding new entries of the array or updating the existing entries. Every now and then, I need to update the array list and remove outdated entries.

I've been originally thinking about implementing this using struct array, which means if I remove one entry, I need to do stuff myself, such as shift all the following entries. BTW, the struct looks something like the following:

struct MyStruct{
int unique_id;
int content_A[13];
int content_B[13];
byte content_C;
};

I was told a linked list or hash map might be a more straightforward solution for this, which I didn't have any experience work with in the past few years. I'd appreciate it if you guys could give some suggestions.

Thanks a lot and stay healthy,
Yang

soleilsword:
Hi All,

I'm working on a piece of code and hope you guys could give some help.

In the code, I will need to maintain a large array of struct, which contains several integers and chars, for real-time processing.
Basically, every time I receive some data with a unique ID, I will need to search if the ID is present or not in my array and do corresponding data processing, and update the array by either adding new entries of the array or updating the existing entries. Every now and then, I need to update the array list and remove outdated entries.

I've been originally thinking about implementing this using struct array, which means if I remove one entry, I need to do stuff myself, such as shift all the following entries. BTW, the struct looks something like the following:

struct MyStruct{
int unique_id;
int content_A[13];
int content_B[13];
byte content_C;
};

I was told a linked list or hash map might be a more straightforward solution for this, which I didn't have any experience work with in the past few years. I'd appreciate it if you guys could give some suggestions.

Thanks a lot and stay healthy,
Yang

In order to give any suggestions, you need to tell us about the id. Is it numeric, alphabetic, or a combination? How many unique ids do you expect? What is the lowest and highest id you can have. Will the ids have significant leading zeros when entered for checking?

Also, you do not need to do anything to remove an id. Just mark it as not available. Perhaps a bit change.

How are you going to enter the initial list of ids? Will you do this each time the power is turned on? Where do the ids come from? Are you intending to keep the ids in memory, rather than an SD card?

Paul

Paul_KD7HB:
In order to give any suggestions, you need to tell us about the id. Is it numeric, alphabetic, or a combination? How many unique ids do you expect? What is the lowest and highest id you can have. Will the ids have significant leading zeros when entered for checking?

Also, you do not need to do anything to remove an id. Just mark it as not available. Perhaps a bit change.

How are you going to enter the initial list of ids? Will you do this each time the power is turned on? Where do the ids come from? Are you intending to keep the ids in memory, rather than an SD card?

Paul

Hi Paul,

thanks for your reply and insightful questions. I knew I was going to miss some info and you hit right to the point. Let me try to answer your questions.

  1. The ID is a code which contains four hex number, e.g. 0x32FB, meaning I could have a max of 65535 different unique IDs.
  2. Each ID will associate with quite a bit of information as I tried to illustrate with my struct. Therefore I was trying to remove an entry/id to save memory.
  3. I'm receving data continuously from UART. I have a function to parse the data from UART and save the data to the struct/map/whatever I'm going to create. This is going to be a real-time implementation and so there will be ZERO entries at the beginning of runtime.
  4. If possible I would like to keep the IDs and all the associated data in memory, so I don't have to implement SD card.

You will certainly not be able to support the worst case in Arduino memory which means you need some idea of the average case (which may not be supportable either). If you exceed the number of supportable IDs you will need some way of handling that gracefully and informing some one (via LED or other means) that a critical error has occurred. What conditions would cause you to remove IDs from the list?

SD writes would be pretty slow. You could implement a cache but that will add an extra level of complexity determining which entries to keep in cache and when to write through to the SD.

Which Arduino do you plan on using?

Get yourself an FRAM chip, access it via SPI.transfer() commands.
SRAM speed to read/write, unlimited writability (10 trillion writes per address), can be very big in size

https://www.digikey.com/products/en/integrated-circuits-ics/memory/774?k=fram+memory&k=&pkeyword=fram+memory&sv=0&pv1989=0&pv961=349252&pv2043=406106&sf=1&FV=-8%7C774&quantity=&ColumnSort=0&page=1&stock=1&pageSize=25

(forum stuck some extra characters at start and end of the link, just delete them)

ToddL1962:
You will certainly not be able to support the worst case in Arduino memory which means you need some idea of the average case (which may not be supportable either). If you exceed the number of supportable IDs you will need some way of handling that gracefully and informing some one (via LED or other means) that a critical error has occurred. What conditions would cause you to remove IDs from the list?

SD writes would be pretty slow. You could implement a cache but that will add an extra level of complexity determining which entries to keep in cache and when to write through to the SD.

Which Arduino do you plan on using?

Hi Todd,

This is a revision of a code I had previously to address some new changes. I'm using Arduino Zero and not intend to move to a very different board (unless of a good reason) due to the existing code I need to maintain.

In this previous code, I had maintained a lookup table of equivalent to 400 entries taking into account all the associated data. I could not go beyond 400 as then I will hit the limit of the board.

In this previous code, I limit myself to 400 IDs but now I have to deal with 65535 IDs so I could not use the same code anymore, and I thought of creating a dynamic lookup table which changes over time but having a similar size of the table to not exceed the limit of the board.

In my system, I will receive incoming data from UART and I will do data filtering which tells me whether the IDs (and associated data) should be removed from the table or not. ANd in fact, this actually happens very frequently. In each hour, I will only receive a few IDs, which is unlikely to appear again in the next hour, meaning I will in principle never have more than a few hundred IDs each hour. BUt over time, all the IDs will show up and if I don't remove some of the IDs I will hit the problem considering my limitation of memory.

CrossRoads:
Get yourself an FRAM chip, access it via SPI.transfer() commands.
SRAM speed to read/write, unlimited writability (10 trillion writes per address), can be very big in size

https://www.digikey.com/products/en/integrated-circuits-ics/memory/774?k=fram+memory&k=&pkeyword=fram+memory&sv=0&pv1989=0&pv961=349252&pv2043=406106&sf=1&FV=-8%7C774&quantity=&ColumnSort=0&page=1&stock=1&pageSize=25

(forum stuck some extra characters at start and end of the link, just delete them)

The arduino zero board with 32 KB allows me to have 400 of these entries (IDs + associated information). If I want to have 65535 entries, I will need 5242.8 KB. I am not familiar with FRAM but I wonder if that would be feasible.

A classic exercise in an undergrad programming class is to build such a structure using a sorted binary tree sorted by unique_id. Each element would look like:

struct MyStruct{
  uint16_t unique_id;
  int content_A[13];
  int content_B[13];
  byte content_C;
  MyStruct *leftBranch;
  MyStruct *rightBranch;
};

Elements with a unique_id less than that of the current element would link to the left while those with a unique_id great than that of the current element would link to the right.

There are well-known algorithms for traversing such a tree as well as inserted and deleting elements. Said element are created and deleted dynamically.

How such a scheme would hold up with the memory available in a SAMD might be an issue. It's probably helpful that you'll always be malloc(ing) and free(ing) the same size memory chunk every time (or if you prefer, new(ing) and delete(ing)).

Anyway, that's what I'd try.

gfvalvo:
A classic exercise in an undergrad programming class is to build such a structure using a sorted binary tree sorted by unique_id. Each element would look like:

struct MyStruct{

uint16_t unique_id;
  int content_A[13];
  int content_B[13];
  byte content_C;
  MyStruct *leftBranch;
  MyStruct *rightBranch;
};



Elements with a unique_id less than that of the current element would link to the left while those with a unique_id great than that of the current element would link to the right.

There are well-known algorithms for traversing such a tree as well as inserted and deleting elements. Said element are created and deleted dynamically.

How such a scheme would hold up with the memory available in a SAMD might be an issue. It's probably helpful that you'll always be malloc(ing) and free(ing) the same size memory chunk every time (or if you prefer, new(ing) and delete(ing)).

Anyway, that's what I'd try.

thanks for the good suggestions, this sounds very reasonable to me. To clarify I'm not a cs for undergrad and I did miss the 'data structure' course.

My asked this question to a few friends today and they were pointing me to hash map, linked list, and dictionary. I guess I did not explain to them with details and now that I look at the definitions of these data structure, it seems they would not to achieve my goals as in my structure, I have multiple contents while e.g. in hash map, it seems the key could only be mapped to a sole content.

The binary tree is a form of linked list - but it has two destination links. I think it fits your problem description quite well. I did take undergrad CS and we coded these - in fact it's not that difficult. I think the small table size means you don't need a hash.

aarg:
The binary tree is a form of linked list - but it has two destination links. I think it fits your problem description quite well. I did take undergrad CS and we coded these - in fact it's not that difficult. I think the small table size means you don't need a hash.

The problem with normal binary (search) trees is that they become unbalanced. Especially if you insert the id's in order. Lookup in an unbalanced binary tree is similar to lookup in a linear linked list (O(n)), and you need twice the memory for pointers.

The solution is to balance the tree, for example by using a red-black tree. The with thode is that insertion/deletion are more costly, but you do get O(log n) lookup complexity.

Personally, I'd try using a simple fixed-size array first. Just linear lookup, indicating "empty" elements with an id of -1, for example.
If that's not fast enough, you can try creating sorted bins like in a hash table or use a linked list or BST instead, but it might not even be necessary.

Pieter

Good point. I don't think OP ever provided specs for lookup or insert / delete performance. I didn't even see a particular processor / board mentioned. Given that, maybe it's best to start simple -- just a sorted linked list and live with the O(n) lookup performance.

If that's not good enough, maybe try throwing better hardware at it. That's cheap.

PieterP:
The problem with normal binary (search) trees is that they become unbalanced. Especially if you insert the id's in order. Lookup in an unbalanced binary tree is similar to lookup in a linear linked list (O(n)), and you need twice the memory for pointers.

The solution is to balance the tree, for example by using a red-black tree. The with thode is that insertion/deletion are more costly, but you do get O(log n) lookup complexity.

Personally, I'd try using a simple fixed-size array first. Just linear lookup, indicating "empty" elements with an id of -1, for example.
If that's not fast enough, you can try creating sorted bins like in a hash table or use a linked list or BST instead, but it might not even be necessary.

Pieter

Hi Pieter,
this sounds like an easy solution. I was originally thinking adding one item to each struct to indicate there is valid ID and data or not, but your solution sounds better!