Option to include sketch in flash?

Grumpy_Mike:

So, what do you think?

Well it would make the downloading process much longer and it would limit the program size, just so someone who forgot what code was downloaded into the system could get it back.
Not much of a gain for the pain.

As I said in my original post: "... offer a per-sketch option ...". I am not suggesting this be forced on all the users. I would personally use it all the time because my sketches tend to be simple sequencing programs which are not particularly large, but I certainly understand that it isn't for everybody.

Cheers,

  • Dean

The idea has usefulness that is somewhat inversely proportional to how likely it is to work.
It works really well for small sketches like BLINK, but who cares? As your sketch gets bigger and more complicated, it would be more useful, but it's less likely to work :frowning:

Saving a copy of the source doesn't get rid of the need for reasonable discipline in version control. If you have 16k of source with no other markings (like version numbers), and are trying to figure out what it does compared to other source for the same sketch that is 6 months newer (or older), you're probably ... in trouble. In fact, carefully marking up the source code (with an "edit history") is what I did back before I'd heard of version control software, and it's still a good idea.

I wonder if some intermediate level of information stored in flash would be nearly as useful? Sketch name and date, perhaps (+ username + computername + ip address? Gotta make those arduino chips traceable in case someone uses them for evil purposes.)

westfw:
The idea has usefulness that is somewhat inversely proportional to how likely it is to work.

It works really well for small sketches like BLINK, but who cares? As your sketch gets bigger and more complicated, it would be more useful, but it's less likely to work :frowning:

It would work for much more complicated sketches than just Blink. It either works or it doesn't, and you can discover that at the time you compile your sketch so you can adjust your development style accordingly. I think some folks would use this all the time when possible, and some folks would never use it.

westfw:
Saving a copy of the source doesn't get rid of the need for reasonable discipline in version control.

Sure it does. Obviously if there are intermediate versions that I care about, then it is up to me to archive them responsibly - this is true of any content I create. But with Arduino, most of the time I only care about the code that is on the board right now. In fact, I can't think of a single time that I've needed an arbitrary intermediate version of a sketch, but I can think of dozens of times that I've had to go back and make sure I picked up development from the right version of a sketch.

To provide a more concrete example: I make haunted house animatronics, and I use lots of Arduinos to run servos, pneumatics, lights, etc. The sketches are pretty straight forward sequencing (wait for a PIR sensor to fire, cycle through a fixed sequence and then reset). Each prop is hand made and slightly unique. Also, each year, the exact needs of the haunted house may be slightly different (might want to tweak the timing).

So, when I am setting up a prop that needs tweaking, I have to count on my notes to make sure I'm not loading the wrong firmware. I'd much rather plug in my laptop (or any random laptop for that matter), extract/tweak/upload - done. No chance that I picked the wrong firmware, and I really don't care about the timings that the haunted house 3 years ago used.

I agree that from a computer science purist perspective it is a sloppy approach, but in this real-world situation it would absolutely save me time and reduce mistakes.

westfw:
If you have 16k of source with no other markings (like version numbers), and are trying to figure out what it does compared to other source for the same sketch that is 6 months newer (or older), you're probably ... in trouble.

I'm not suggesting that. I simply suggesting putting the current source with the executable. It isn't intended to help you compare versions, or determine history. It is just intended to keep the source and the product together in one place.

Version control is always a good idea. But having a copy of the source on the device that you are 100% sure matches the code on the device eliminates all variables and mitigates bad version control practices, hard drive failures, etc. It also works if you give the project to a friend and they want to continue hacking on it.

westfw:
I wonder if some intermediate level of information stored in flash would be nearly as useful? Sketch name and date, perhaps (+ username + computername + ip address? Gotta make those arduino chips traceable in case someone uses them for evil purposes.)

That would be fantastic, and I'd support getting that info into the flash 100% of the time. I'd also love to see the IDE include integrated version control, even if it is just something trivial and linear. But all that aside, I still don't see why putting the source code into flash would ever be a bad thing if it fits.

Cheers,

  • Dean

I'd hate to encourage people to make their source code smaller by, say, omitting comments. (Those were the "bad old days"!)

This suggestion is more in the spirit of making the development process simpler, as opposed to improving the developer

So what we need is some extra code that runs under the upload button of the IDE (or in the make file) that:

  • add an unique identifier to the code: uint8_t UUID[] = genUUID(- http://www.ietf.org/rfc/rfc4122.txt - Universally unique identifier - Wikipedia - );
    16 bytes in binary / 32 bytes in ascii format, maybe add some tags around:
    "<UUID=12345678912345678912345678900000>" makes 39 bytes
    "BBBBBBBBBBBBBBBB" makes 22 bytes (B = binary encoded)
    the UUID can easily be extracted from hexdump

  • appends the UUID to some textfile that contains:

  • date, time
  • sourcename, location
  • more ...
  • or better writes a copy of the source to your personal cloud ....\myCloud\ArduinoSketches$UUID$.pde

This approach would cost only 22 bytes of flash memory and gives the means to recapture all sources of all sketches made. The size of the sketch is not relevant as not the sketch itself is stored but a reference to it. The reciprocal usefullness (thanks westfw for this magnificent term) is replaced by a constant usefullness :wink:

besides the UUID info like filename date and time purpose etc could be included as ASCII but that costs substantial more bytes (overkill => don't).

Other discussion:
// should the IDE get something like javadoc ==> ArduinoDoc?
// Javadoc is a tool for generating API documentation in HTML format from doc comments in source code.

my 2 cents
Rob

robtillaart:
... or better writes a copy of the source to your personal cloud ....\myCloud\ArduinoSketches$UUID$.pde

That would be a great feature, and it could be used to provide a full revision history as well. That would probably be a more widely useful feature than what I am asking for. I can see a few disadvantages, although I admit they are minor:

  1. Cloud pollution. Thousands of sketch versions building up over the years, and you don't know which 20 you actually need to keep.
  2. Handoff. When somebody else wants to tweak one of my creations, they may not have access to my cloud
  3. No internet. I've had to set up a few haunted houses in areas with no wifi (parking garages), though that is a shrinking problem these days.

I'm going to stick to my guns and say I still want an option to include the sketch in the flash. Here's why:

I am a believer in simplifying things as much as practical. If the source is in the flash, then a stranger can pick up an object I created 10 years prior and start hacking it. There is a reason that scripting languages are so popular: the source IS the executable. Scripting is computationally less efficient than running compiled code, and yet it is extremely popular. This is at least in part because you don't have to worry about loosing the source and you can always tweak a script. Saving the sketch in the flash gives you the simple development cycle of a scripted language with the computational efficiency of a compiled language.

I checked and only one of my projects is too big to fit both the executable and the sketch in 32K, and that project was complex enough that I had done proper source control on it anyway. Dozens of my other sketches would easily fit. So, all that unused flash is busy storing nothing but 0xff's. I'd rather put it to use doing something potentially helpful than let it sit there unused.

Again I ask, where is the harm in an option to include the source if it fits?

Cheers,

  • Dean

Again I ask, where is the harm in an option to include the source if it fits?

There is no harm as long as it is an option which is default off.

The harm can be that

  • I want have access to the source of the sketch, but don't want to publish the source to my customer. (intellectual property)
  • The size of larger sketches do not fit anymore (mentioned before)
    especially well documented sketches => I have several sketches that are 25K+ in size
  • one need to store the used libraries too if you want to be able to debug it in detail ... or at least there version numbers
  • IDE version, compiler version etc need to be stored too (that are not so much byts)
  • ....

Maybe even more, but if I must be able to recreate my PC configuration which I needed to compile and debug this app one need to go this far
(in the medical IT world companies even have working windows 3.1 compiler configs as customers still have them in the field.... That is the only way to give 100% service.
In theory - and I think quite far in practice - my proposal can realize this. It can store the compile environment including version numbers of IDE, avrtools and OS name/version (although it doesn't need to backup those)

OK, for hobby purposes it doesn't need so "extreme" but just to give you food for thought :wink:

Rob

robtillaart:

Again I ask, where is the harm in an option to include the source if it fits?

There is no harm as long as it is an option which is default off.

I have never suggested any other approach. Strictly an opt-in behavior.

robtillaart:
The harm can be that

  • I want have access to the source of the sketch, but don't want to publish the source to my customer. (intellectual property)

Of course. Opt-in makes this a non-issue.

robtillaart:

  • The size of larger sketches do not fit anymore (mentioned before)
    especially well documented sketches => I have several sketches that are 25K+ in size

If it doesn't fit, then it won't get written to flash and there can be no harm.
In every one of my posts, I've indicated this should be an option, and it should only happen if it can fit. I don't see how I could be more clear on these points.

robtillaart:

  • one need to store the used libraries too if you want to be able to debug it in detail ... or at least there version numbers
  • IDE version, compiler version etc need to be stored too (that are not so much byts)
  • ....

Are these things in the sketch file? That is pretty much all that conventional source control schemes handle unless you go out of your way to manage the additional info somehow. The sketch file is all I've ever bothered to archive, and that has always been sufficient for my needs.

robtillaart:
Maybe even more, but if I must be able to recreate my PC configuration which I needed to compile and debug this app one need to go this far
(in the medical IT world companies even have working windows 3.1 compiler configs as customers still have them in the field.... That is the only way to give 100% service.
In theory - and I think quite far in practice - my proposal can realize this. It can store the compile environment including version numbers of IDE, avrtools and OS name/version (although it doesn't need to backup those)

OK, for hobby purposes it doesn't need so "extreme" but just to give you food for thought :wink:

These are issues that even large corporations wrestle with: How to reliably recreate the exact bits that were shipped for any arbitrary release. It is not a problem that I'm attempting to solve, nor is it something that most hobbyists care deeply about. I think just storing the source file is sufficient for the vast majority of uses, and would be an improvement over what is stored now (which is nothing but the executable).

To be clear: I am a fan of the cloud solution - it has some very nice features and could be the basis for some pretty awesome new capabilities. (Arduino development social network?). I am also a fan of optionally storing the sketch file when uploading if it will fit in the leftover space. I don't think these two capabilities need to be mutually exclusive.

Cheers,

  • Dean

Hi Dean,

Don't get me wrong, I like the concept of storing sketches in the device, it is an inspiring problem and got me thinking. It has several advantages (especially finding the right code as you pointed out) no doubt about that, but it is not a final solution.

This problem is called deployment management. [very recognizable]

As a developer I need a solution that I can trust, it should work every time I want to. Storing sketches in the Arduino is not allways possible (due to size) and therefor I cannot rely on it.

So if one wants to spend energy in solving deployment management for Arduino, we should think of a way that is:

  • transparant for the programmer - (don't do things that can be automated --> KISS)
  • configurable (switch on/off etc)
  • works for all deployments
  • robust, reliable
  • and so on.

Storing a sketch in an Arduino does not work allways, as it fails on a crucial point imho SIZE. That doesn't mean it has no value, on contrary it can be very usefull as you pointed out, I am just stating it isn't reliable enough. Storing a reference to the source (etc) takes 16 bytes (UUID) which is 0.05% of the available memory and independant of sketch size. And yes there will be applications that don't even have these 16 bytes free. A real final solution should even work for this case. That means that the reference to the code should be stored in the Arduino but at the same time can't be stored in the Arduino. This is a typical TRIZ contradiction.

Solving that contradiction => the binary code itself is the reference (mmm 32K keys, no good...)

making 32K keys more practical: after uploading a sketch, AVRdude reads the complete memory back and makes a SHA1 hash to be used as reference to the sourcecode. So if one arduino comes back through the mail one can read its memory back, do the SHA1 hash and one has the reference to the sources.

That said, this reference will not be the only way to access the sources, full text indexing of all your sketches is very well possible these days, so such things need to be in the final solution too.

The complexity to realize SketchWithin, SHA1 and UUID is comparable. The differences between the SHA1 and UUID versions are

  • SHA1 will generate a new code for every source iteration, where with the UUID this is optional
  • UUID uses (at least) 16 bytes, SHA1 uses 0 bytes of Arduino memory
  • SHA1 will detect image tampering, UUID will not (key lost??)
  • UUID will probably be faster than SHA1.

My final choice would be using UUID, and the SHA1 at release moments. The cases I need the last 16 bytes I should really consider an new larger platform.

EPILOGUE:
In short storing a sketch in the Arduino is usefull in many cases as you pointed out. However it won't solve the "what version of code have I deployed problem" in all cases. The SHA1 and UUID solutions will perform better especially for large sketches. My choice would be using UUID all the time, and the SHA1 at release moments. The cases I need the last 16 bytes I should consider an new platform :slight_smile:

Again thanks for this inspiration,

Rob

Just use GIT and make regular backups to external media of your choice like everybody should do it.

It's so easy these days. 16GB USB sticks (40MB/s) cost almost nothing, external 2.5'' disks are almost free considering the storage space they offer. Unless we're talking about backing up a video collection.

If the AVR chips had 'quasi unlimited' storage, I wouldn't oppose as much though. But I would never rely on the source code being stored on the chip as well. Murphy's Law would get you anyway, trust me on that. It's much better to keep your valuable code in good condition and safe somewhere else.

It would be nice if the Arduino IDE had some sort of SCC functionality built in. I know I'd appreciate that.

Not meaning to get religious on the matter, and I freely admit that my familiarity with the Arduino toolchain is superficial, but unless there's a timestamp in the binary, or more debugging data that I'd expect (for a microcontroller), it ought to be possible to have different source files compile to the same binary, hence the SHA1 computed solely against the binary couldn't uniquely identify a set of sources. I'd expect that changes in comments wouldn't affect the binary, nor would changes in variable names. Those are a couple of examples I can think of; perhaps there are others. In either of those cases, you could certainly correctly state that they are equivalent sources, and perhaps therefore the discrepancies don't matter.

madworm:
Just use GIT and make regular backups to external media of your choice like everybody should do it.

It's so easy these days. 16GB USB sticks (40MB/s) cost almost nothing, external 2.5'' disks are almost free considering the storage space they offer. Unless we're talking about backing up a video collection.

If the AVR chips had 'quasi unlimited' storage, I wouldn't oppose as much though. But I would never rely on the source code being stored on the chip as well. Murphy's Law would get you anyway, trust me on that. It's much better to keep your valuable code in good condition and safe somewhere else.

I do use source control, and I do backup to external media, and I even make periodic off-site backups. I still have to keep careful notes to ensure I've got the right version of a project for a given board, and I still get it wrong sometimes because I forgot to update my notes when I did a quick last-minute rev on a sketch in the field. Loosing source code is not the problem I'm trying to solve, I'm trying to come up with a fool-proof way of associating a specific source version with the physical object it goes with.

I sometimes velcro a USB stick to larger things like test equipment to keep track of relevant source/drivers/scripts/manuals/notes. This works extremely well, except when somebody borrows it and looses the USB stick (in which case I make a new USB stick from my backups). I was just proposing a built-in version of this for Arduino.

Cheers,

  • Dean

robtillaart:
Don't get me wrong, I like the concept of storing sketches in the device, it is an inspiring problem and got me thinking. It has several advantages (especially finding the right code as you pointed out) no doubt about that, but it is not a final solution.

This problem is called deployment management. [very recognizable]

I've also heard it variously called Release Control, Release Engineering, Product Engineering, and other various things over the years. I have yet to find a company that has completely nailed it.

robtillaart:
As a developer I need a solution that I can trust, it should work every time I want to. Storing sketches in the Arduino is not allways possible (due to size) and therefor I cannot rely on it.

I find this an interesting view, because the '328 (or any processor) has a finite set of resources that your program could eventually outgrow. There are always "grey area" resources like circular logs in RAM or extra debug ports that you jettison along the way as resources get tight. Source-in-flash is just another such thing that you could use until the program outgrows it. You then decide that (1) it is OK to stop using this feature, or (2) it is valuable to your workflow and you find a bigger chip.

Embedded developers are always making tradeoffs about how to deploy the resources of the platform. Would you give up on circular log buffers or debugging ports because some designs don't have enough resources to afford them? Of course not - they are tools that you deploy when appropriate. Source-in-flash is just another tool to be deployed when appropriate. That doesn't make it unreliable, it just makes it another decision in the tradeoff calculations.

I do get your point that just storing a reference to the source is orders of magnitude less expensive than storing the source, drastically altering the tradeoff decision.

robtillaart:
So if one wants to spend energy in solving deployment management for Arduino, we should think of a way that is:

  • transparant for the programmer - (don't do things that can be automated --> KISS)
  • configurable (switch on/off etc)
  • works for all deployments
  • robust, reliable
  • and so on.

Storing a sketch in an Arduino does not work allways, as it fails on a crucial point imho SIZE. That doesn't mean it has no value, on contrary it can be very usefull as you pointed out, I am just stating it isn't reliable enough. Storing a reference to the source (etc) takes 16 bytes (UUID) which is 0.05% of the available memory and independant of sketch size. And yes there will be applications that don't even have these 16 bytes free. A real final solution should even work for this case. That means that the reference to the code should be stored in the Arduino but at the same time can't be stored in the Arduino. This is a typical TRIZ contradiction.

Solving that contradiction => the binary code itself is the reference (mmm 32K keys, no good...)

making 32K keys more practical: after uploading a sketch, AVRdude reads the complete memory back and makes a SHA1 hash to be used as reference to the sourcecode. So if one arduino comes back through the mail one can read its memory back, do the SHA1 hash and one has the reference to the sources.

That said, this reference will not be the only way to access the sources, full text indexing of all your sketches is very well possible these days, so such things need to be in the final solution too.

The complexity to realize SketchWithin, SHA1 and UUID is comparable. The differences between the SHA1 and UUID versions are

  • SHA1 will generate a new code for every source iteration, where with the UUID this is optional
  • UUID uses (at least) 16 bytes, SHA1 uses 0 bytes of Arduino memory
  • SHA1 will detect image tampering, UUID will not (key lost??)
  • UUID will probably be faster than SHA1.

My final choice would be using UUID, and the SHA1 at release moments. The cases I need the last 16 bytes I should really consider an new larger platform.

EPILOGUE:
In short storing a sketch in the Arduino is usefull in many cases as you pointed out. However it won't solve the "what version of code have I deployed problem" in all cases. The SHA1 and UUID solutions will perform better especially for large sketches. My choice would be using UUID all the time, and the SHA1 at release moments. The cases I need the last 16 bytes I should consider an new platform :slight_smile:

Again thanks for this inspiration,

This reasoning all seems sound to me. I also agree that if you are down to the last 16 bytes and can't afford space for the UUID, you are probably ready to look for another platform. I think the utility of your solution would be high enough that I'd go find a way to shrink the image by 16 bytes rather than forgo the UUID.

As for the SHA1, I agree with tastewar that if it is a hash of just the executable image, then it won't capture things like updated comments, formatting changes, or even logic changes that compile down to the same opcode stream. If it is always used in conjunction with (and includes in its hash) the UUID, then it would cover trivial source changes. Also, I think you would need to know the range of memory to calculate hash - you wouldn't want to do the full 32K since that may include garbage from prior uploads, and even runtime flash data.

Cheers,

  • Dean

On the KISS theme, maybe go for the 80/20 rule for funtionality Vs resource requirements. Just perform a 32 bit CRC run ( or maybe a hash code) on the source file and store that result as a long into the flash ( even simpler, into eeprom). That would give the basic hook for a method to verify that the flash code was created from a specific source file, no?

Lefty

The way I solve it for my real projects (not my tinker projects) is to have a version string in the code containing name & versionnr (approx 25 bytes). This string is printed at the startup, so as long as the device can restart it will produce its name and version. If it can be restarted it is time to upgrade anyway. This method is not foolproof as I don't update the versionnr with every increment ...

if time permits I check if I can get an UUID signature from a hexdump today - check if the compiler optimizes it away -

If it's possible with AVR executable images, a digital signature would solve some of the problems. As an added bonus, if the developer keeps the signing key a secret, the provenance of an image can determined.

OK, took some time today, ran a few tests with essentially some variations on the following code.

volatile char UUID[]  = "<UUID=da51a9f0-9a49-11e0-aa80-0800200c9a66>";

char version[] = "UUID_TEST 0.04";

void setup()
{
  Serial.begin(115200);
  Serial.println(version);
  UUID[0] = UUID[0];
}

void loop(){}

Then I retrieved the whole 32K image with [dosbox windows 7]

cd C:\Program Files (x86)\arduino-0021\hardware\tools\avr\bin>
avrdude -C "C:\Program Files (x86)\arduino-0022\hardware\tools\avr\etc\avrdude.conf" -v -v -v -v -p atmega328p -c stk500 -U flash:r:"C:/arduino.bin":r -P\\.\COM5 -b57600

viewing the binary easily reveals the UUID. See picture attached.

Some notes:
The UUID array must be volatile and the assignment UUID[0] = UUID[0] are needed both otherwise the compiler optimizes it away.

size without UUID string: 1908 bytes
size with UUID string: 1960 bytes

so this proof of concept implementation took 52 bytes.

  • Remove the tag structure <UUID=...> (-7) => 45 bytes
  • Remove - signs in the UUID (-4) => 41 bytes
  • Make the UUID binary (-17) => 24 bytes.

proof is in the pudding test:

volatile uint8_t UUID[] = { 0x65, 0x65, 0x65, 0x65, 0x65, 0x65, 0x65, 0x65, 
                          0x65, 0x65, 0x65, 0x65, 0x65, 0x65, 0x65, 0x65 };  // character e 16 times
                          
char version[] = "UUID_TEST 0.06";

void setup()
{
  Serial.begin(115200);
  Serial.println(version);
  UUID[0] = UUID[0];
}

void loop(){}

size without UUID uint8_t array : 1908 bytes
size with UUID uint8_t array: 1932 bytes
so UUID signature now takes 24 bytes.

OK tinkered enough :wink:
Rob

RAM is in short supply. You MUST figure out how to put the UUID in flash only (and not have it garbage collected.)

first try

#include <avr/pgmspace.h>

//volatile char UUID[]  = "<UUID=da51a9f0-9a49-11e0-aa80-0800200c9a66>";
volatile uint8_t UUID[] PROGMEM = 
            { 0x65, 0x65, 0x65, 0x65, 0x65, 0x65, 0x65, 0x65, 
              0x65, 0x65, 0x65, 0x65, 0x65, 0x65, 0x65, 0x65 };
                          
char version[] = "UUID_TEST 0.08";

void setup()
{
  Serial.begin(115200);
  Serial.println(version);
  uint8_t x = UUID[0];
}

void loop(){}

code size: 1928 ; so 4 bytes less but ==> 16 in PROGMEM and 4 in RAM, data still need to be referenced from RAM

gotta think deeper ....:wink:

2.5 hours and many many webpages later, dived into assembly to find out how to declare an array in assembly in flash.

char version[] = "UUID_TEST 0.10";

void setup()
{
  asm volatile(
  ".cseg"   // use code segment
  "uuid:  .byte 101,102,101,102,101,102,101,102,101,102,101,102,101,102,101,102"
  );
  Serial.begin(115200);
  Serial.println(version);
}

void loop(){}

code size: 1924 ; so the 16 bytes of the UUID in flash; no reference needed

filling the "uuid array" can also be done in hex

"uuid: .byte 0x65, 0x65, 0x65, 0x65, 0x65, 0x65, 0x65, 0x65,0x65, 0x65, 0x65, 0x65, 0x65, 0x65, 0x65, 0x65"

or a bit shorter

"uuid: .word 0x6565, 0x6565, 0x6565, 0x6565, 0x6565, 0x6565, 0x6565, 0x6565"

So in the end storing a signature in the flash part of the code is only a few lines.

Thanks westfw for pushing me to the limits, I learned a few bits...
Rob