Can someone hand me a small dataset for memory dups on arduino microcontrolers? I need to do forensics on the arduino and i need to present a little demo. So i extracted flash, eeprom, fuses and decompile the flash, and now i want to train a model so it can detect things like: this arduino is a keylogger
Hi @2002coxerea2002. I doubt that anyone has such a thing on hand. You should just produce it yourself.
You can automate the process using the official command line tool provided by Arduino: Arduino CLI. Just assemble a collection of Arduino sketches from the Internet and then write a script that will cause Arduino CLI to compile or upload each in turn, then gather the data from that process for training your model.
For flash alone, there is no need to upload the binary to the Arduino board and then extract it again. You can simply generate the binary on your PC and then decompile that binary. This means you can even generate data for boards you don't have on hand.
You can use arduino-cli compile --export-binaries for this purpose:
https://arduino.github.io/arduino-cli/latest/commands/arduino-cli_compile/
i don't know how to do it. for example i should use the sram (because it has all the active processes in it) but i can't extract it without special equipment like an ATMEL ICE. So if i can do my own dataset how many things should i have in it? (100 - 200?)
If anyone wants to discuss this subject, please use the dedicated topic created by @2002coxerea2002:
As many as it takes to get the quality of results you need from the model.
I recommend you just dive in and see what happens. Start with a small collection of sketches and just the decompiled binary data alone. That will be relatively fast and easy to accomplish and will give you a good idea of the feasibility of your project and what challenges you might encounter in generating a larger data set and training the model.
i think for the tomorrow presentation i should do a code vulnerability check. I am still waiting for some boards and until they arrive i think i can do that. I was thinking to use bigvul so i can see the fulnerabilities of the c/c++ code. After that i will simulate a bunch of scenarios and i will build my own dataset
Note what I said here:
You can compile for any board, even if you don't have the actual hardware on hand.
This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.