STM32, Maple and Maple mini port to IDE 1.5.x


Has anyone actually compiled the bootloader ?

I may need to do this for the zet board as I may want to move the DISC pin connection


I'm taking a look at whether its possible to not need to specify the bootloader or non-bootloader (maple) from the device selection

It appears to be possible to defined a symbol that is passed into the linker script .ld file

However as the linker is not called separately, (its called by gcc), the syntax has proved to be a bit convoluted, and its taken me about half an hour to get to the point where possibly the linker is being passed a symbol (definition) by gcc, which is turn being passed it, from platorm.txt which is in turn getting it from boards.txt


I've turned on --verbose for the linker (well its more complicated than that but I'll keep this explanation simple)

But the the linker spits out so much text that the IDE debug window throws most of it away

So it looks like I'll need to manually run the commands on the command line and try to redirect stdout to a file

Anyway, it does look vaguely promising at the moment in terms if simplifying the menu options for users


Just thought I'd post this..

I ran the linker in --verbose and I'm seeing some errors where its attempting to link things it can't find

Note the lines with failed at the end. I know that linking does succeed, but these failure may explain some other issues we have from time to time.

attempt to open libgcc.a failed
attempt to open C:\Users\rclark\Documents\Arduino\hardware\Arduino_STM32\STM32F1\variants\generic_stm32f103rxx/ld\libgcc.a failed
attempt to open C:\Users\rclark\AppData\Local\Temp\build2314974176069918292.tmp\libgcc.a failed
attempt to open c:/users/rclark/appdata/roaming/arduino15/packages/arduino/tools/arm-none-eabi-gcc/4.8.3-2014q1/bin/../lib/gcc/arm-none-eabi/4.8.3/armv7-m\libgcc.a succeeded
attempt to open libc.a failed
attempt to open C:\Users\rclark\Documents\Arduino\hardware\Arduino_STM32\STM32F1\variants\generic_stm32f103rxx/ld\libc.a failed
attempt to open C:\Users\rclark\AppData\Local\Temp\build2314974176069918292.tmp\libc.a failed
attempt to open c:/users/rclark/appdata/roaming/arduino15/packages/arduino/tools/arm-none-eabi-gcc/4.8.3-2014q1/bin/../lib/gcc/arm-none-eabi/4.8.3/armv7-m\libc.a failed
attempt to open c:/users/rclark/appdata/roaming/arduino15/packages/arduino/tools/arm-none-eabi-gcc/4.8.3-2014q1/bin/../lib/gcc/arm-none-eabi/4.8.3/../../../../arm-none-eabi/lib/armv7-m\libc.a succeeded
attempt to open libm.a failed
attempt to open C:\Users\rclark\Documents\Arduino\hardware\Arduino_STM32\STM32F1\variants\generic_stm32f103rxx/ld\libm.a failed
attempt to open C:\Users\rclark\AppData\Local\Temp\build2314974176069918292.tmp\libm.a failed
attempt to open c:/users/rclark/appdata/roaming/arduino15/packages/arduino/tools/arm-none-eabi-gcc/4.8.3-2014q1/bin/../lib/gcc/arm-none-eabi/4.8.3/armv7-m\libm.a failed
attempt to open c:/users/rclark/appdata/roaming/arduino15/packages/arduino/tools/arm-none-eabi-gcc/4.8.3-2014q1/bin/../lib/gcc/arm-none-eabi/4.8.3/../../../../arm-none-eabi/lib/armv7-m\libm.a succeeded
attempt to open libnosys.a failed
attempt to open C:\Users\rclark\Documents\Arduino\hardware\Arduino_STM32\STM32F1\variants\generic_stm32f103rxx/ld\libnosys.a failed
attempt to open C:\Users\rclark\AppData\Local\Temp\build2314974176069918292.tmp\libnosys.a failed
attempt to open c:/users/rclark/appdata/roaming/arduino15/packages/arduino/tools/arm-none-eabi-gcc/4.8.3-2014q1/bin/../lib/gcc/arm-none-eabi/4.8.3/armv7-m\libnosys.a failed
attempt to open c:/users/rclark/appdata/roaming/arduino15/packages/arduino/tools/arm-none-eabi-gcc/4.8.3-2014q1/bin/../lib/gcc/arm-none-eabi/4.8.3/../../../../arm-none-eabi/lib/armv7-m\libnosys.a succeeded


Scrub that

It looks like it eventually finds the files on the third attempt in each case.

Still not idea. You'd imagine you could pass in the correct path to stop it needing to do this



Although I managed to pass a symbol (define) into the linker script, I've now found that symbols are not supported in MEMORY section definitions in linker scripts

There seem to be some hacking work arounds that may work in one version of LD but they don't seem to be a stable way to do this.

But thinking about it, there may be another way, by concatenating the processor type and another flag / variable in the boards.txt file to defined the file name of the linker script.



and stm32f103re_no_bootloader.ld

made of "stm32f103re" and "_bootloader" or "no_bootloader"

I will see if that works, and let you know.

(I just thought I'd post in case anyone else decided to look at this in the future and think they could use symbols in the linker script for MEMORY definitions )


Well, I thought I'd cracked it by splitting the name of linker file into 2 pieces so that it would be possible to specify different linker files based on the settings from more than one menu.


I've now realised that there is a fundamental problem with the IDE's size checking code that I can't see a way around

The problem is, that if the upload method is "Maple dfu" the amount of Flash that is available for the sketch is reduced bu 0x5000 and the amount of Ram is reduced by 3k (Note. I don't know why there is 3K less of Ram, because I wouldn't have thought that the bootloader could be running at the same time as the main sketch, but perhaps it is still running of interrupts etc (I've no idea)

Anyway, assuming that about 20k of Flash is taken up with the bootloader and 3k of ram, the values that are passed to the IDE need to reflect both the selected device (i.e its flash and ram size) less 3k ram and 20k bootloader when Maple DFU is selected for upload

But. I can't see any way to do this.

e.g. for a F103RE with bootloader the values appear to be

It looks like the IDE just reads in those values as a string, and then convert to a number.

I tried seeing if the IDE could read in a string and evaluate the number e.g. 492 * 1000 but the IDE gives an error if you try to do that.

Hence at the moment, if we want the size calculation to be meaningful, then the only option seems to be to tie whether the device has a bootloader or not, to the device type selection Which is what Alexey (@hiddenpilot) did ages ago, and which I have needed to do in the F103R series board definition that I've started to create.

PS. I guess we could move stuff to the programmers menu, but it won't resolve this issue AFIK, and may be even worse than we have now, unless params from the programmers menu can be fed back into the build process e.g build flags. I suspect this is not possible, as it doesn't normally make any sense for the selection of programmer to have any impact on the build of the bin file ???

@Dan I didn't get a chance, I was going to do it with my SD card transfers, and then started having problems with the SDcard and forgot about it. But I do remember that I tested reducing the speed in the SPI port to see when would I start losing performance, and I believe I was still getting almost 2MB/s at 18Mhz SPI speed which is the theoretical maximum, so in DMA mode I don't think there is any big pause.

@Everyone Now I have been doing tests with 16 bit SPI transfer, and I am very satisfied. These are my findings:

SPI1 can change between 8bit and 16bit on the fly with little penalty. I wasn't doing it in the most optimal way (call to function that calls to function to set it, 1 x16bit transfer, call to function that calls to another function to set back to 8bit, this for every 16bit write) and still the penalty was not too big and almost offset by the increase.

Then I decided to test having 16 bit by default, and setting to 8bit and back to 16bit for 8bit transfer, as 1byte transfer are less likely to occur when driving the ILI9163. The performance advantage is huge, about 20% faster.

Non-DMA 8bit SPI Screen Fill missing Text 12570 Lines 102366 Horiz/Vert Lines 17690 Rectangles (outline) 14426 Rectangles (filled) 318796 Circles (filled) 52487 Circles (outline) 43167 Triangles (outline) 43270 Triangles (filled) 108377 Rounded rects (outline) 29813 Rounded rects (filled) 349670 Done!

Non-DMA 16bit SPI: Benchmark Time (microseconds) Screen fill 145994 Text 10217 Lines 84726 Horiz/Vert Lines 12876 Rectangles (outline) 10586 Rectangles (filled) 230131 Circles (filled) 40143 Circles (outline) 35721 Triangles (outline) 35820 Triangles (filled) 79841 Rounded rects (outline) 23562 Rounded rects (filled) 253727 Done!

I decided to write a new dmaSend function to setup 16bit dma to SPI1 and test again. As you can see in fills there isn't much advantage, as filling a block with DMA was keeping the port busy all the time anyway, but it helps with all the other things that involve many commands to address pixels, like lines, circles, etc.:

DMA 8 bit: Lines 55227 Horiz/Vert Lines 5267 Rectangles (outline) 4451 Rectangles (filled) 65831 Circles (filled) 35149 Circles (outline) 29088 Triangles (outline) 18358 Triangles (filled) 44238 Rounded rects (outline) 24167 Rounded rects (filled) 85590 Done!

DMA+16bit SPI: Benchmark Time (microseconds) Screen fill 149422 Text 11887 Lines 42941 Horiz/Vert Lines 5190 Rectangles (outline) 3903 Rectangles (filled) 65708 Circles (filled) 31063 Circles (outline) 22840 Triangles (outline) 15125 Triangles (filled) 41822 Rounded rects (outline) 19493 Rounded rects (filled) 84103 Done!


Was setting the SPI to 16 bit faster than just filling a buffer where you'd pre-split the 16 bit into 2 8 bit (bytes)

I guess that word order was an issue in that case ??

It would only work for some displays, depending on if they wanted high byte first or second (endianness) ?

@Roger for the 9163 I had to invert the bytes order of the color on the byte buffer, and same for the 9341, when using 8bit DMA.

In 16bit SPI mode, I do not have the swap anything around, I can write the "color" half word straight to the buffer, and what is better, for drawing shapes I don't even need to fill any buffer, as I can set the DMA in circular mode, and send the same 16bits making up the pixel color over and over. I am keeping the buffer, as it would be needed for bitmaps in DMA mode, but I can write the color to the 1st position of the buffer and send it over and over. But even if it had to be swapped around, the performance increase seems to outweigh byte swapping penalties.

These are the times when using 16bit SPI and circular DMA mode (I suspect I'm doing something wrong in the screen fill as I get the same times as non-DMA mode): Benchmark Time (microseconds) Screen fill 149420 Text 11752 Lines 42773 Horiz/Vert Lines 4037 Rectangles (outline) 3722 Rectangles (filled) 65417 Circles (filled) 28601 Circles (outline) 22526 Triangles (outline) 14833 Triangles (filled) 34940 Rounded rects (outline) 18761 Rounded rects (filled) 77148 Done!

Regarding the 3k with the maple dfu, I was thinking that the USB serial port routine would use some of that, as it is monitoring the serial port to restart the board on the right sequence (1EAF...). Wouldn't that need some memory to run besides what you use for the sketch? 3k seems rather high, but that's the only thing I could think of.


OK about SPI 16 bit mode, but strange its slow

This could be a hardware bug in the STM32. STM released a document that lists a load of hardware bugs in the uP, and has work-arounds for most of them. But I can’t recall what the doc is called. Perhaps Erratica or something like that

Re: 3K for bootloader

AFIK the bootloader does not handle the USB serial comms, thats handled by the sketch, e.g. every sketch has it built in.

I guess there is one way to test. I could build a bootloader version but change the linker settings so that it gets built to upload to 0x8000000 i.e base of flash instead of 0x8000500

Then perhaps the resulting sketch would have USB Serial … Umm worth a test.

Its been discussed before, but I don’t understand why the bootloader didnt contain both DFU and Serial code for USB. There would need to be some fancy footwork (function tables) so that the bootloader could communicate with the sketch for the serial stuff, but considering how complex libmaple already is. I can’t see it would have been beyond Marti’s (leaflabs) capabilites to code it

I may try having a go at running a maple where I set the ram start to the base of ram rather than the 3k offset and see if it runs (again fairly easy to change the linker settings to test this)

Roger: Maybe keep in mind, if you are compiling a new bootloader to give this a try: @victor: I got the sdfat library partially working (you set the clock divider to 128, I set it back to 2 or 4), but I think it's more unstable than the arduino one library. I dont know why, because of the length of my jumper cables? I have to build a solid vero board with fixed connections and some additional elkos and 100nf caps for the voltage lines. Meanwhile my 4GB SD-card isn't recognized anymore (both libraries, but on PC it works...) SD cards are a misery thing...



That bootloader looks interesting, I'll download and see if I can compile it

Re: My other tests..

The Serial USB is totally contained in the sketch

I had to modify this in boards.cpp

//  #define USER_ADDR_ROM 0x08005000

    #define USER_ADDR_ROM 0x08000000

which it uses in nvic_init() which I think is the vector table stuff, and I defined -DBOOTLOADER_MAPLE in boards.txt for a non-maple board

But I now have a STM32F103ZE board that is enumerating as USB Serial (Maple) and also flashing its test LED and writing stuff to the new serial com port COM4

Which proves that the USB Serial stuff is fully self contained in the sketch

I have yet to try changing the linker MEMORY section for a maple mini to see if I can recover the 3K (I can on the non maple board, but it doesnt have the bootloader in flash)

I'll let you know (perhaps later) whether we an free up 3K on Maple mini


That other bootloader looks interesting but has not been updated for years

It looks like there are very recent changes / fixes in the leaflabs repo

I just need to work out how to compile

Possible under MinGW on my Windows machine. I will just need to copy the compiler into the mingw accessible home folder or perhaps its bin folder

Edit. I tried to change the ram offset, and initially it looked like it had worked, but now my Maple mini won't seem to run the sketches at all, is is not enumerating as a serial USB device - though I can upload via perpetual bootloader

I'm not sure if I've somehow corrupted the bootloader, so I will have to try to upload the bootloader again via USB serial

Actually, I've managed to cause 2 Maple minis to stop working over the last 3 months, but perhaps they just need the bootloader to be reinstated

Quick thinking/question: Is that "3k of ram" in the bootloader the "reserve" for the function "upload to RAM"? Compiling bootloader: In my github link there are many interesting "readme" files for how to compile. forum link for compiling bootloaders:

Hi Matthias

I'm not sure what the RAM is used for.

I may try again, because I really can't see why it needs it, because the main sketch runs independently of the bootloader.

BTW. Does anyone know where the source for the "Maple Mini" bootloader is ?

All I can find is the source for the full Maple board, and looking at the schematic diagram, the GPIO pin they use for USB disconnection is different between the two boards. The Maple Mini seems to use PB9, but the Maple Rev 3 is on PC12, and it looks like the button is on a different pin, and I suspect the LED is probably also on a different pin

But I can't see any build configs that would allow the LeafLabs or the necromant bootloader to be built for Maple mini

Necomant's repo has separate board folders, so it would be possible to make a board folder for the mini, but it doesnt seem to exist at the moment.

I think I'll post an issue to both repos, but I suspect that neither will reply :-(

"Branch" is your friend :)

and a little step-by-step manual how to compile bootloader (osx only)


Thanks Matthias

I'm not sure thats really what branches are for, but never mind ;-)


I can see that they have done loads of fixes on the Master, but it doesnt look like any have been merged into the maple mini branch

I guess I could just try locally merging Master into the mini branch !

BTW. I still don't think branching is the best way to do this, as really its just a config issue and should be done in the makefile probably

you are welcome Roger, I didn't know this before I was on the blogspot page, and I thought "I'm mad?!?!? I was on the leaflabs github page 1000 of times before!"

the haunting ghost of leaflabs is everywhere, even on github :)


Just tried to git merge the master into the maple mini branch and it looks like its very out of date.

6 files need to be manually merged

I think it may be easier just to go back to when the original branch for maple mini was created and work out what the differences are. I doubt its very much apart from which pins are used

Anyway, its getting too late to start this now.

I'll look again tomorrow

I compared all files a little bit, as I can see, the whole difference (pins,...) between "maple" and "maple" mini is in maybe compile a bootloader with the master-branch with the mini config.h? an yes, all mini bootloader files are really out of date, without several bugfixes... I hope there is another source and the compiled *.bin for the mini is from a newer building date...

Edit: I've put the original mini files: hardware.h / hardware.c / config.h (I hope I haven't forgotten a file) into the newer maple bootloader source and compiled it without error, but I've no FTDI at work to try out the new bootloader... maybe at home. On OSX compiling is very easy with homebrew: Step 1 - Install Homebrew Step 2 - Install GCC Arm Toolchain brew tap PX4/homebrew-px4 brew update brew install gcc-arm-none-eabi-48 now you can just use the "make" command inside the terminal (and the right folder ;) )

Just to clarify on 16bit vs 8bit SPI:

-For DMA transfer, performance is exactly the same no matter if it is set as 16bit or 8bit. It will just transfer as fast as the SPI port can shift out. The times above for DMA+16bit SPI means that 16bit transfers were done all the time, including all the pixel addressing that involves a few individual SPI.write with pair of bytes for the coordinates. The Rectangles time is almost the same when using 8bit or 16bit, because for each rectangle you only write the coordinates once without DMA, then fill a few hundred bytes with DMA, so that shows that performance of 8bit vs 16bitDMA is the same.

-For normal SPI transfers using 16bit speeds up considerably. I guess the advantage comes from the overhead of having to read registers, while loops, having to pull lines up and down, etc, only once every 2 bytes. That is why in anything with an angle, that requires a lot of pixel addressing, the performance is much better.

Pixel addressing consist on 1x8bit command (which can be send as 16bit with the command in the lower 8bits), and then 4x 16bit coordinates. That is were the speed up comes from, as those get written in 4 writes rather than 8.