1M baud Optiboot for mighty1284p

retrolefty:
Looks like some kind of Satanic number sequence to me, be afraid, very afraid.

XD

john1993:
speaking of which im surprised to hear ... that serial is faster than isp.

The difference is polling. With ISP, the target has to be polled for completion at the SPI bitrate #. With a bootloader the machine instruction that initiates programming simply completes.

flash page programming time is the real bottleneck.

Only after the programming circuitry is saturated.

# Or the programmer has to wait twice the erase-program time listed in the datasheet. Some AVR processors do not support polling for completion.

Not only that but some of the code inside avrdude is pretty dumb.
The AVRdragon is very slow in some cases because of avrdude.
In some cases avrdude poorly layered an internal byte API on top of the USB block interface.
As a result, it sometimes fetches a 512 byte block over the USB for every single byte.
So the same 512 bytes of memory ends up getting fetched 512 times to read all of it.
barf....

--- bill

retrolefty:
Looks like some kind of Satanic number sequence to me, be afraid, very afraid.

Now don't anyone go leaking this to Dan Brown. If anyone makes a million on "The Arduino Code" it ought to be us ]:slight_smile:

bperrybap:
Not only that but some of the code inside avrdude is pretty dumb. The AVRdragon is very slow in some cases because of avrdude.

usbasp too. and much worse under windows compared to linux. my statement about little benefit beyond 57k and none over 115k referred to my preferred platform of real mode dos using parallel port with custom turbo c code. or one avr hosting images for another being flashed. BAD os, no tv for you tonight. speaking of satanic, i realize im preaching necronomicon to the choir here but i see little point to 1mbaud and many drawbacks.

and i dont see why one would ignore spec and wait 2x longer for erase,

Hmmmm, I've been looking at the optiboot.c source code, mainly because I'm interested in the possibility of of doing wireless uploading, and I'm kind of surprised that you guys are able to get 1-Mbaud comms at all. Namely, serial Rx is done using polled I/O and not interrupt-driven I/O, as far as I can tell, so I'm surprised the bootloader can even keep up with the incoming datastream, and not lose characters at those speeds, considering how the main-loop is written with so many if...elses that need filtering.

If you look at the source, you'll see that Rx comms is done using the getch() function, which is defined as follows at the end of the file [unneeded bits removed]. Also, ignore the LED_DATA_FLASH bits [they just slow it down a bit more].

uint8_t getch(void) {
  uint8_t ch;

#ifdef LED_DATA_FLASH
#ifdef __AVR_ATmega8__
  LED_PORT ^= _BV(LED);
#else
  LED_PIN |= _BV(LED);
#endif
#endif

  while(!(UCSR0A & _BV(RXC0)))
    ;
  if (!(UCSR0A & _BV(FE0))) {
      /*
       * A Framing Error indicates (probably) that something is talking
       * to us at the wrong bit rate.  Assume that this is because it
       * expects to be talking to the application, and DON'T reset the
       * watchdog.  This should cause the bootloader to abort and run
       * the application "soon", if it keeps happening.  (Note that we
       * don't care that an invalid char is returned...)
       */
    watchdogReset();
  }
  ch = UDR0;
#endif

#ifdef LED_DATA_FLASH
#ifdef __AVR_ATmega8__
  LED_PORT ^= _BV(LED);
#else
  LED_PIN |= _BV(LED);
#endif
#endif

  return ch;
}

oric_dan:
I'm kind of surprised that you guys are able to get 1-Mbaud comms at all.

I did not have any problems at 2 M either. The performance gain was marginal so I stuck with 1 M.

Namely, serial Rx is done using polled I/O and not interrupt-driven I/O, as far as I can tell, so I'm surprised the bootloader can even keep up with the incoming datastream, and not lose characters at those speeds, considering how the main-loop is written with so many if...elses that need filtering.

1 Mbps is 10 microseconds per byte which is 160 cycles. Seems reasonable to me.

john1993:
and i dont see why one would ignore spec and wait 2x longer for erase,

I'm not the one to ask. I'm just the messenger reporting what I've seen.

My guess is that the person did not want to wade through many datasheets to ensure the delay was long enough for every possible target. (Better to have it work slowly than not at all.)

john1993:
i would like to see a no-led version for 1284 like posted in the other thread but 57k.

Give this one a try. Build command for Windows...

omake LED_START_FLASHES=0 BAUD_RATE=57600 atmega1284

optiboot_atmega1284p_57k.hex (1.29 KB)

optiboot_atmega1284p_57k.lst (18.3 KB)

thank you. thank you. thanks you. i will test this and report back asap.

ps specially grateful for the lst file which makes my job trimmimg in asm to add features a cinch. have i remembered to THANK YOU?

oric_dan:
I'm kind of surprised that you guys are able to get 1-Mbaud comms at all.

unless im mistaken stk500 protocol required pretty solid handshaking (wait for sp?) so its not likely to stream continuous. i think that was one of the reasons tests from the freaks fellow showed little benefit to high rates beyond 57k and none over 115k. law of diminishing returns..

but i am suprised 1m or even 115k works for so many people due to hardware issues like cable distortions, clock offsets, etc.. when it works for a particular setup it works, but then somebody else tries and... well...

Is anyone in a position to test the "overlapping flash write with serial comm" feature of optiboot? This is claimed to improve speed, but I have my doubts that there is a significant effect, and it's on the list of "things to remove if I need space."

westfw:
Is anyone in a position to test the "overlapping flash write with serial comm" feature of optiboot?

I am. All I need are the bootloaders you would like to test.

Has optiboot_atmega1284p-slow.hex and optiboot_atmega1284p-fast.hex
each compiled with: make atmega1284 LED_START_FLASHES=0 BAUD_RATE=1000000

Thanks!

I'm out of awake time today. :sleeping: I'll try to knock out some numbers while I eat breakfast.

Take your time. Consider how long any change would take to boil down to arduino production code, anyway.
(and it's not actually interesting on a 1284, since that has plenty of code space...)

westfw:
Take your time.

I can't. If I delay doing it my attention will drift to something else and I'll never get back to it.

Consider how long any change would take to boil down to arduino production code, anyway.

No doubt!

(and it's not actually interesting on a 1284, since that has plenty of code space...)

If you have time, please make a pair for the m328 processor. I would like to run the same test on an Uno.

Results are in the first few rows...

The differences...
Small image: 0.00%
Medium image: -0.02%
Large image: -0.19%

...are essentially irrelevant. For the large image the "improvement" is 0.07 seconds.

If there is anything you would like me to change (e.g. the avrdude version) just let me know.

If you have time, please make a pair for the m328 processor. I would like to run the same test on an Uno.

Done. 115.2k and 1M versions:

More results...

Some observations...

• The processor-to-PC speed is significantly better with the m16u2 than the FTDI. That was my observation the last time I ran tests like these. The best serial speed I observed previously was with the Pololu Programmer (it includes a USB-to-serial converter). I did not test with the Pololu this time.

• At 1 M there is no measurable difference between "fast" and "slow". That was also true for the m1284 processor.

• At 115 k there is a significant different between "fast" and "slow".

• There is a very significant difference between 1 M and 115 k.

A 1 M slow no-LED bootloader would be a good path forward. The upload times are significantly reduced, there is no performance benefit from the fast bootloader, the slow bootloader frees a bit of space, and the no-LED frees a bit of space.

If there is anything else you would like tested or you want any details about the test just let me know.

There are several parameters in the FTDI chip that can be tuned.
Things like the latency timer can start to be very critical when pushing up on the speeds
particularly in a request/response environment like this.

There are also similar parameters in the drivers.
The settings for "normal" situations often hurt in environments that want/need
low latency fast turn around times.

Also how the application (avrdude) does its reads/writes or the mode it puts
the tty port in can start to make a difference.

While measuring things can provide a performance snapshot,
I'm never a fan of just measuring things when looking for tweaks to performance
as it really doesn't tell enough of the picture.
The best way would be start actually profiling the the entire system (avrdude, USB bus, optiboot)
to see where the bottlenecks are because often it can be unlikely places.

--- bill