Arduino friendly organized data format? [xml? json? etc?]

Hi everyone. I am currently looking to test some stuff with JSON data structures but I’m wondering if an Arduino (Due) is really made for working with JSON? I see that there is a JSON library that grants you the ability to read JSON files but the memory limitation is crippling the amount of data you can search through and I’m talking like an average size King novel worth of JSON data.

This makes me wonder if JSON is really what you want to use for working with copious amounts of data organized into key => value pairs. There are memory limitations to how large the JSON file you can load up from the sd card and that makes me wonder. Are there any other data formats that are practically made for arduino to work with? Something that can be read / written to from the sd card and has no visible limitation in size other than the processing speed of the arduino and space on the sd card? I understand that it all has to be loaded in the memory. I’d like to see if perhaps I’m unaware of some better data format that arduinos can handle in large amounts.

What is this data? What are you doing with it? How much is there - a megabyte? Why Arduino rather than a Pi?

wildbill:
What is this data? What are you doing with it? How much is there - a megabyte? Why Arduino rather than a Pi?

It's key value pairs. mainly strings, sentences, some variables but there's a lot of data. It's an interactive book - the size of a novel. Kind of like zork. Can't tell the final size yet so perhaps an average ebook maybe twice to accommodate for all the key names. No Pi because I don't want to go through Linux to make this thing. I'd prefer bare metal. I'm looking at an stm32f4 boards as they have some serious meat on them.

If I stick with a Due, I'll most likely have to split the data into multiple JSON files.

Please specify your user interface. Using interactive touch screens and displays of reasonable resolution and update rate require according hardware and software libraries. Libraries may include word processing for text attributes and page layout, and these most probably are not available for Arduinos.

The use of libraries requires almost the same code, regardless of the platform.

A smartphone were a better base for an ebook device.

Are you trying to build an engine that you can have run different games depending on the data?

If so, I suspect that you can probably divide your game into different areas (levels?) and just load the data for the one you're currently using. Split it into different (smaller) files.

If there is a single game, you might look at storing the static game data in progmem. Or you can use the same divide and conquer approach suggested above.

TobiasRipper:
It's key value pairs. mainly strings, sentences, some variables but there's a lot of data. It's an interactive book - the size of a novel. Kind of like zork. Can't tell the final size yet so perhaps an average ebook maybe twice to accommodate for all the key names. No Pi because I don't want to go through Linux to make this thing. I'd prefer bare metal. I'm looking at an stm32f4 boards as they have some serious meat on them.

If I stick with a Due, I'll most likely have to split the data into multiple JSON files.

You could try a Teensy 4.0.
https://www.pjrc.com/store/teensy40.html

I question the need to hold the whole file at once --- if you already know most or all of the key words.

I have parse & lex on AVR's to where an Uno can hold in flash 2000 key words average length 7 chars and still have room for meaningful code. With more flash the dictionary can get larger, multiple AVR's can also be used to increase on that total. And the Due has 512K flash, enough to hold the max 32767 keys at over 10 chars on average.

So if you can hold a dictionary in flash and stream the text file through the process you can get keyword numbers and numbers out at the same speed. Any new words... you have one key# that signifies a non-matched string is the next data, as long as those words are few the output won't bloat. Fact: 80% of all published English only uses 20% of English words.

The current parse&lex is not beginner easy to use but I have just proved a way to clean it up will work and a new easier version will be out (thanks JML and Hazard's Mind for clues and examples) hopefully this year (last version was Jan 2019, first was May 2016) but even on your own the Due/Teensys are so fast you don't need to go to the lengths I did for AVR use.

Perhaps count how many different words your text file has or may have to see what you need. Do not count numeric values as words, evaluate them and have key#'s for each different kind of number (float, integer-byte or word or long) that should follow.

Yeah, I have looked into tokenizing books before. When I started doing parse&lex it was for NC/CNC G-codes and last time I got paid to do it it was cash register accounting codes. Books need WAYYYY more keys!

wildbill:
Are you trying to build an engine that you can have run different games depending on the data?

If so, I suspect that you can probably divide your game into different areas (levels?) and just load the data for the one you're currently using. Split it into different (smaller) files.

If there is a single game, you might look at storing the static game data in progmem. Or you can use the same divide and conquer approach suggested above.

I'm not building a console for playing or emulating games. I'm building an interactive handheld device that is specifically made to run one specific text based game - my game. I'm currently in a stage where I have to pick a most appropriate data structure format for storing and organizing the game (game like Zork). This data format will dictate everything else.

DrDiettrich:
Please specify your user interface. Using interactive touch screens and displays of reasonable resolution and update rate require according hardware and software libraries. Libraries may include word processing for text attributes and page layout, and these most probably are not available for Arduinos.

The use of libraries requires almost the same code, regardless of the platform.

A smartphone were a better base for an ebook device.

I'm, using a custom fabricated query keyboard and an lcd display for io. The data is stored on an SD card.
This is not an e-book, it's a text based game that has as much if not more data in it's files though. I don't want to use smarphones because I'm trying to build an experience that can't be matched by smartphone. In addition, there's a sense of exploration for me by using bare metal and if I simply opt in to use a smartphone, this will quickly become a one boring ass project.

You could try a Teensy 4.0.
Teensy® 4.0

I question the need to hold the whole file at once --- if you already know most or all of the key words.

Far too few IO for that, I'm multiplexing a QWERT keyboard and a parallel port LCD display, a Due is barely providing the needed IO I need. This here is really just a file format question. I'm wondering of there is a data format that is not just plain text and, like JSON, allows you to pull up information using it's key value pairs. Reason I think JSON is because I already know how to work with it and how easy it is to organize data. My game has inventory, items, storages, dialogue options and an open world setting. JSON seems fitting but I'm just wondering if there are better options. I'll look into parse & lex.

TobiasRipper:
I'm not building a console for playing or emulating games. I'm building an interactive handheld device that is specifically made to run one specific text based game - my game. I'm currently in a stage where I have to pick a most appropriate data structure format for storing and organizing the game (game like Zork). This data format will dictate everything else.

If the only consumer of the data is your program then you are free to use any data format that is convenient. Don't feel that you must use some existing standard. Parsing XML is slow and I don't think JSON is designed with performance in mind either.

If you can organise the data in fixed length records with the different elements at fixed locations within each record that will make access much faster - there will be no need to parse the data to find the location of an item. It may also be appropriate to maintain an index for some of the data.

IMHO this sounds like a task that would be infinitely easier using Python on a Raspberry PI. I have a Python project that uses the SQLite database and includes a full-text search facility. It works very well on my laptop, I don't have an RPi so I don't know how it might perform on that.

...R

Robin2:
If the only consumer of the data is your program then you are free to use any data format that is convenient. Don't feel that you must use some existing standard. Parsing XML is slow and I don't think JSON is designed with performance in mind either.

If you can organise the data in fixed length records with the different elements at fixed locations within each record that will make access much faster - there will be no need to parse the data to find the location of an item. It may also be appropriate to maintain an index for some of the data.

IMHO this sounds like a task that would be infinitely easier using Python on a Raspberry PI. I have a Python project that uses the SQLite database and includes a full-text search facility. It works very well on my laptop, I don't have an RPi so I don't know how it might perform on that.

...R

Oh don't get me wrong, I also think that an rpi would make this a breeze... but linux... I don't want to go through linux and an rpi won't teach me squat about MCUs... just going to teach me about... linux....

I can see that you feel that certain things have to be so I’ll say once that you hold your project to restrictions that you do not need.

Custom keyboard? Did you know that Arduino boards can use PS/2 keyboard and mouse almost directly? You can get a set at Walmart for about $15.

I parse & lex text at high speed --on the fly, no pauses. Just so you know, it’s not what you seem to be used to and fun part is that it can be used to turn a text book into a much smaller token book. I figure that if you have a dictionary with all the game words in it, the game text could be tokenized to 16-bit word numbers that correspond to dictionary words. All text input gets tokenized and all output prints the matching words.

I’ve done this kind of thing since I was writing programs to generate NC and CNC tapes in 1980-85 though the last project got killed as a result of the Boesky SNL scandal, a real shame as the important parts were working.

key word <<-------->> number

You can use a supported protocol but it’s going to take more computer to support that.

As far as pins, PS/2 uses a few and a serial interface LCD needs 2. If you want to get used to MCU’s then maybe start using pin multipliers like shift registers or port expanders rather than “go bigger” which is more PC-think.

So long and good luck!

GoForSmoke:
I can see that you feel that certain things have to be so I’ll say once that you hold your project to restrictions that you do not need.

Custom keyboard? Did you know that Arduino boards can use PS/2 keyboard and mouse almost directly? You can get a set at Walmart for about $15.

So the main thing to keep in mind for this is that this isn’t one of the hacky throwaway experiment type projects that gets a week of attention and then “shelved” on the side of a cluttered desk to die a slow and uneventful death. ** I mean for all I know it could be, buuut I’m optimistic** heh. If it was, I’d probably go for the cheapest, least hassle option such as the PS2 option, which you’ve mentioned. If that was the case then you’re absolutely right. Lowest common denominator.

However as you can see here:
https://www.instagram.com/p/B427TS2jbhP/

https://www.instagram.com/p/B43UFREnctH/

https://www.instagram.com/p/B46IPDVH4cV/

This is not one of those projects. I’ve designed, fabricated and assembled a simple custom multiplexed QWERTY keyboard. It works like a charm and thus I have reached the point where I can start coding this text based game. This is the time where I gotta identify and decide on the most suitable data format of handling and arranging my game content since it’s a crucial decision which will dictate the entire basis for the project.

Mainly speaking, arduinos support for PS/2 is great - it’s also not what I need. So in that regard, these aren’t restrictions, these are project design requirements.

Oh and price wise, the total for the pcb fabrications and components for 5 of these came to around $25 bucks so even value wise (excluding assembly time) it’s cheaper. But I did learn about ardinos PS/2 native support so I’ve learned something today and I just might even implement PS/2 support too. :smiley:

GoForSmoke:
As far as pins, PS/2 uses a few and a serial interface LCD needs 2. If you want to get used to MCU’s then maybe start using pin multipliers like shift registers or port expanders rather than “go bigger” which is more PC-think.

There are more reasons why I went with a due, it has several serial ports, which I’m going to be using, it has a nice core as I’m looking to write to a display, network, Serial, play several audio files and edit the game data at once. The LCD I’m using is a parallel port one but I might just switch to an SPI one once they come in. I’m multiplexing an almost full qwerty keypad AND I have quite a few other bells and whistles that I have yet to hang on this thing. So tying to make due with the limited pin-count of a teensy is more of a pain in the rear. If the due won’t work out, I’ll probably switch to an STM32f4 board instead with a whopping 177 pins 6 uarts 2 spis and running at 168 MHz.

GoForSmoke:
I parse & lex text at high speed --on the fly, no pauses. Just so you know, it’s not what you seem to be used to and fun part is that it can be used to turn a text book into a much smaller token book. I figure that if you have a dictionary with all the game words in it, the game text could be tokenized to 16-bit word numbers that correspond to dictionary words. All text input gets tokenized and all output prints the matching words.

I’ve done this kind of thing since I was writing programs to generate NC and CNC tapes in 1980-85 though the last project got killed as a result of the Boesky SNL scandal, a real shame as the important parts were working.

key word <<-------->> number

You can use a supported protocol but it’s going to take more computer to support that.

So long and good luck!

I understand the basics of parsing but in a different scope. (web parsing) so I’m having a difficult time visualizing the application of your method. Do you have any relevant resources you could share on the subject? As for it’s suitability for this priject - Are you familliar with a game called Zork? That’s essentially what I’m working on here. My own game that works like that. There are game items, inventory, dialogues, open world paths that a player can take, container with items that the player can find and take, etc.

So this isn’t just the question of how to read the copious data but also how to store it. You are (as far as I can tell) are inly telling me how to read the data using parsing. What I’m looking for is a data format of storage. Unless I’m misunderstanding and you actually mean that the parse::lex is what I can use in conjunction wit any type of data storage?

If you specify the text to store, or the various attributes or choices, more precise advice can be given. Until then you can use a data base, CSV, key=value pairs, XML or whatsoever.

The game sounds like it should run on a PC of some kind and the peripherals sound like they could run on an MCU connected to that PC.

You want to use system protocols to do system jobs on an MCU, once again good luck. Have fun even, it will keep you busy.

I have my own good-day-eater project making for ATmega1284 a Forth that behaves closer to standard. It should run on other AVR's and port to ARMs but my target is the 1284P in a handheld device.

I tended to write space games in my spare time but did some maze games back before 2000. Mostly what I was really doing was trying out ideas, some even got use in software I was paid to write. The games... what I poked at in between contracts didn't pay to keep going, not when published games were getting less and less lame all the time.

TobiasRipper:
So this isn't just the question of how to read the copious data but also how to store it. You are (as far as I can tell) are inly telling me how to read the data using parsing. What I'm looking for is a data format of storage. Unless I'm misunderstanding and you actually mean that the parse::lex is what I can use in conjunction wit any type of data storage?

I use a word list with a sorted array of pointers into it. Word 1 is at pointer[0].
There is a match function that is run for each char read (it does search/match work before the next char arrives) from text input, the match can be finished when the last char of the list word is matched or wait for a delimiter by the sketch using the function --- the word number is known before the next char arrives with enough cycles left over to do something more with it than just store for later which is what I mean by --on the fly parse&lex--.

Parse is where you cut text into segments (usually at delimiters). Lex is where the segments get matched to known patterns. My little utility does both at the same time with no need to stop or pause input. ==Big Caveat== to my little utility is that getting a big dictionary into flash takes a LOT of source code.

So text segments can be changed into numbers and numbers can be changed to text (word X is at pointer[X-1]) back and forth. You could keep your dialogs, etc, compressed in flash as arrays of numbers.

These protocols you want have overhead of their own, were any designed to run on small environment devices?

TobiasRipper:
just going to teach me about... linux....

Sounds to me like a good thing to learn.

an rpi won't teach me squat about MCUs...

I don't intend to sound unkind but it seems to me the first thing to learn is the difference between a project that is suited to an MCU and project that makes more sense on a PC. To my mind this is a PC project (in which group I include the RPi).

And if you don't want to spend a huge amount of time with Linux then develop the project in Python on a Windows PC and when it works just copy it to the RPi.

...R

An average novel is supposed to be about 1 megabyte of text. That too much for the on-chip memory of most Arduinos (even the due.). But text by it’s nature is slow, so in theory you’d be fine with an external memory (spi flash, or uSD card.). But you’d probably need to look into data formats design for a file system rather than things like Jason or xml, which I’d describe as “in memory” formats.

The used to be a lot of discussion on file/disk structures in “data structures” books, back when ram was tinier...

See also Playing Zork On The Arduino | Hackaday

That's why I suggested using a dictionary to tokenize the book. Both would take less than half the space.

When 64K was a lot of RAM my 8-bit work machine had 32K for system, program and data. I got a lot of practice at paring things down to what's needed rather than all I could want. Even when PC's got bigger my ways went far to kill featuritis in projects.

GoForSmoke:
That's why I suggested using a dictionary to tokenize the book. Both would take less than half the space.

When 64K was a lot of RAM my 8-bit work machine had 32K for system, program and data. I got a lot of practice at paring things down to what's needed rather than all I could want. Even when PC's got bigger my ways went far to kill featuritis in projects.

Considering the memory libations, I'm starting to think that this is indeed what is needed to be done. But just out of curiosity. Can we not expand the memory? The Due has 2 SPI busses so can we not have an spi memory extension for like, 5 megabytes?

Like for example something like this:

16 megabytes of memory. More than enough to load audio and hefty JSON / XML / Text files for parsing and work from there? Any disadvantages to this? it's 2.7 bucks. (Although I've just confused the SPI flash with SRAM. As someone has pointed out, it's like comparing hard drives and ram cards.) When the task is to play multiple audio files AND read / write to JSON, it kinda makes sense to just load everything into ram and work with it from there instead of working in chunks?

EDIT: actually I can put multipe SPI devices and just have to change the CS, Chip Select pin to select which device I want to talk to. I like SPI even more already!

Can we not expand the memory? The Due has 2 SPI busses so can we not have an spi memory extension for like, 5 megabytes?

the SAM3X does not have a capability of treating SPI chips as "memory", so you'd be back to dealing with it similar to a file system...
Some of the Adafruit "circuitPython" boards already incorporate a significant external SPI memory - usually 2MB, but some boards have 8M (Hallowing, PyPortal, NeoTrellis, Grand Central.) The SAMD51 boards have QSPI flash that is at least theoretically mappable as memory (but I don't think that their current code does that :frowning: )

TobiasRipper:
But just out of curiosity. Can we not expand the memory? The Due has 2 SPI busses so can we not have an spi memory extension for like, 5 megabytes?

Isn't that like cutting up and extending a Golf GTI to make a people carrier when it would be much easier just to buy a Renault Espace?

...R