Bit-band Cortex M3 - fast bit modifications in RAM explained

Hi, I've done a video explaining bit-band functionality on Cortex M3 or M4 processors. I also compare compiled assembler outputs for AVR and ARM achitecture. I hope someone will find this interesting. I have also another interesting content on my YouTube channel. Let me know what you think.

Interesting, I was just reading about this today as I poured over Cortex-M7 documents. Cortex-M7/M7F do not support bit-banding.

I thought that perhaps the Arduino core internal libraries might use that technique for the DUE or ZERO for reading/writing I/O. But considering the issues regarding interrupts occurring during the bit-band process (race conditions) and the fact that GPIO might lay above the bitband permitted region, I imagine the coders decided to play it safe.

Thank you very much for sharing.

ODwyerPW:
But considering the issues regarding interrupts occurring during the bit-band process (race conditions)

Bitband operations are atomic, not only at the instruction level, but also at the bus access level. That means they're even atomic with respect to DMA.

Mostly bit banding is not used for gpio because the gpio peripherals have their own equivalent features that are at least theoretically more powerful and easier to use.

Expanding on the previous comments, the GPIO ports on ARM chips usually have "set bit" and "clear bit" registers that operate similar to bit-banding, but can (for example) set or clear multiple bits at one time.
I've always sorta wanted to write a bit-banding based implementation of digitalWrite/digitalRead; it's not immediately obvious whether it could be "better" than a GPIO based implementation in some way. (although frankly, the current Due implementation of digitalWrite/Read is not-at-all limited by the raw IO speed, but rather by other inefficiencies. (sigh.))

(although since CM0 and CM0+ do not generally support bit-banding (and apparently not the M7 either?), the problem is less interesting.)

(as far as I can guess, bit-banding was introduced in CM3 specifically to compete with the fast bit-access instructions available on many 8bit CPUs. But it's never really caught on.)

hubmartin:
Hi, I've done a video explaining bit-band functionality on Cortex M3 or M4 processors. I also compare compiled assembler outputs for AVR and ARM achitecture. I hope someone will find this interesting. I have also another interesting content on my YouTube channel. Let me know what you think.

https://www.youtube.com/watch?v=h78DyF1NOio

You article is very helpful and appreciated , however , can you supply a link to code only?
I know it is in the video , but have not figured out how to copy just the code text, sorry.

I am looking into implementing SPI on PIO and it was suggested that bit banging may work.

Thanks
Jim

  • im about to set up a buffer in the bit band region, how can i tell if another part of code uses the address space?

  • Is it safe to assume if during runtime the addresses are set to 0x0 then they are not used?

  • are all register addresses set to zero on reset and then setup as needed?

with.JPG
without.JPG

images from
Yiu, Joseph. The Definitive Guide to ARM® Cortex®-M3 and Cortex®-M4 Processors.

great book with details on most subjects

  • im about to set up a buffer in the bit band region, how can i tell if another part of code uses the address space?

Huh? It doesn't matter...

  • Is it safe to assume if during runtime the addresses are set to 0x0 then they are not used?

Probably not.

  • are all register addresses set to zero on reset and then setup as needed?

Definitely not; register initial contents will be as described in the datasheet, and NOT only zero. There's a section in the datasheet (called "Register Mapping") for each peripheral showing the post-reset state. For example, the RTT "mode register" resets to 0x8000.
I'm not sure what this has to do with bit-banding, though.

You seem to misunderstand how bit-banding actually works - you don't specifically get a piece of memory that is to be accessed via bit-banding, but rather you get bit-wise access to a pieces of NORMAL memory that can still be accessed via the normal address space as well. So you need to allocate it via normal methods, and just ACCESS it via the bit-band region pointers. The bit-band region typically includes ALL of the on-chip memory, so this is pretty easy:

/*
 * Allocate some chunks of memory, to be accessed as bits...
 */
uint32_t someBits[256/32];  // 256 bits.
uint32_t *lotsOfBits = malloc(320*240/8);  // screen-sized array of bits.

/*
 * create pointers into the Bit band alias region for each chunk of memory.
 * each 32bit word in the BitBand region maps to a single bit.
 */
uint32_t *someBits_bitp = (uint32_t *) (((uint32_t)someBits) | 0x22000000)
uint32_t *lotsOfBits_bitp = (uint32_t *) (((uint32_t)lotsOfBits) | 0x22000000)

/*
 * Access single bits
 */

aBit = someBits_bitp[12];   // get bit 12.
someBits_bitp[200] = 1;     // set bit 200.
lotsOfBits_bitp[y*240+x] = 1;  // set a bit on our "screen"

(this is NOT tested code, BTW...)

maybe i should have said is there anyway to tell if any part of the arduino core uses the bit banding memory addresses.

You have got at least 2 possibilities to obtain an atomic access to a variable.

One of them is the use of atomic assembler instructions LDREX/STREX designed for this purpose. If you are not familiar with inline assembler, use packaged atomic functions, e.g. here:

https://github.com/commaai/openpilot/blob/master/board/inc/core_cmInstr.h

The other one is to declare bitfields variables, e.g:

struct foo_Type {
  uint8_t flip1: 1; // 1 bit
  uint8_t flip2: 4; // 4 bits
  uint8_t pad1: 4;   // 4 bits, total 8 bits
  uint8_t pad2;     // 1 byte, total 2 bytes
  uint16_t pad3;    // 2 bytes, total 4 bytes
  uint32_t pad4;   // 4 bytes, total 8 bytes
};

foo_Type foo;

void setup() {
  
  Serial.begin(250000);
  
  while (true) {

    foo.flip1++;
    Serial.print(foo.flip1);
    Serial.print("  ");
    foo.flip2++;
    Serial.println(foo.flip2);
    delay(2000);
  }
}

void loop() {
}

maybe i should have said is there anyway to tell if any part of the arduino core uses the bit banding memory addresses.

I'm pretty sure that none of the arduino core uses bit banding. It's ... relatively incompatible with C.

I still don't understand why it matters. Bit-banding gives you convenient and atomic access to individual bits, but it's no faster or better than the access you already had to bytes, and ARM libraries are historically and consistently "not good" at being space-efficient (after all, even a tiny ARM chip has LOTS of RAM compared to the bit-addressable memory area of an AVR or 8051, and I always figured it was a feature designed to appease users of such "classic" micros more than something that was truly useful...)

cheers, interesting link

could you explain this syntax for me, in the ""

uint8_t flip1":" 1;

AFAICT Bitfields are implemented with word or byte-level mask and rotate instructions. They give you the ability to declare structure fields smaller than a character width.

Coming back to the subject of your post, you know if one of your interruptions could change a variable whereas the main loop() is using this same variable to decide which branch it should follow.

You can avoid a modification of this variable from an interruption between two instructions by using :

NVIC_SetPriority() to set a low preemption_priority level to the Handler at the beginning of your sketch and enclose the "atomic" code by :

__set_BASEPRI( preemption_priority << (8 - __NVIC_PRIO_BITS)); //only IRQ with higher preemption priority than preemption_priority are permitted.
// Your code
__set_BASEPRI(0); // remove the BASEPRI masking

(I don't think that there are any C compilers that use bit-banding, and it's probably just as well. The idea of being able to have individual bits as variables, even to the point of having pointers to bits, is interesting. But not at all widespread (and "only" addresses 8 megabits, which is nothing compared to what C compilers try to support today.))

cheers for the replies guys.

this was how i was looking at setting it up straight out of an atmel note

#define BITBAND_SRAM_REF 0x20000000
#define BITBAND_SRAM_BASE 0x22000000
#define BITBAND_SRAM(a,b) ((BITBAND_SRAM_BASE + (a-BITBAND_SRAM_REF)*32 + (b*4))) // Convert SRAM address

// Mailbox bit 7
#define MBX_B7 *((volatile unsigned int *)(BITBAND_SRAM(MAILBOX,7)))
unsigned int temp = 0;
MBX_B0 = 1; // Word write
temp = MBX_B7; // Word read

if you can get the job done in one atomic move, surely that has to be better than increasing interrupt latency

I appreciate the efforts of everyone in dealing with bit-banding. I am looking at the datasheet of the Atmel SAM3X SoC: specifically Table 10-6 & Table 10-7 (pages 67-68) and the explanation of bit-banding below. I have to admit being a little confused because I was given to understand that the point of bit-banding is to allow atomic RMW but am I correct in thinking that it is only one bit at a time is atomic? So many registers have multiple bit fields that it appears one would like to RMW but if it's only doing 1 bi at a timet?

I note that synchronization primitives exist and are in the HINTS class of instruction so when you mention other methods, would this be one of them? Sorry if it's me being slow but I'm dreadful at coding anything except assembly language. I am just a little disappointed because I had presumed that ANY arbitrary 32-bit mask could be chosen so that atomic RMWs would do everything.

It just looks very limited.

I would just like to thank westfw for his valuable assembly-language project for the Arduino Uno M0. I've got a 32kbit/sec ACELP & 64kbit/sec mono, fixed-point decoders working on the machine (JUST!) and I've learned a lot about optimizing code for Thumb.

Clubcard:
many registers have multiple bit fields that it appears one would like to RMW but if it's only doing 1 bit at a timet?

No, e.g. you can set or clear the number of bits you want in PIO_ODSR ( as long as the pins exist in the uc and they are broken out):

const uint32_t piob_mask = (1 << 27) | (1 << 26) | (1 << 25);

void setup() {
  
  PMC->PMC_PCER0 |= PMC_PCER0_PID12;  // PIOB power ON
  PIOB->PIO_PER |= piob_mask;                  // The pins are driven by the GPIO
  PIOB->PIO_OER = piob_mask;                   // Enable output

}

void loop() {

  PIOB->PIO_ODSR ^= piob_mask;             // Toggle masked pins
  Sleep(1000);

}

Regarding atomic access to variables, you can either use assembler instructions (LDRX,B,H,W and STRX,B,H,W) or use these functions:

https://developer.arm.com/docs/dui0552/latest/the-cortex-m3-processor/memory-model/programming-hints-for-the-synchronization-primitives

You might be interested by the book "The Definitive Guide to ARM Cortex M3 and Cortex M4" from Joseph YIU. This book contains lots of examples written in assembler instructions.

am I correct in thinking that it is only one bit at a time is atomic?

Yes, I'm pretty sure that ARM Cortex bit-banding is one-bit-at-a-time.

It just looks very limited.

Yes, but there are similar single-bit only features on 8bit processors (8051 has a section of bit-addressable memory, and AVR has SBI and CBI (which people are constantly reminding us are SO MUCH FASTER THAN DIGITALWRITE()), plus SBRC, SBRS, SBIC, SBIS instructions. AFAICT, these features are mostly useful for conserving RAM and program space, which was a lot more important back when a "big" chip had 8k of ROM and 128 bytes of RAM, rather than a "small" chip having 16k of flash and 2k of RAM. (Theoretically, anyway. I observe that it's pretty common to use up that extra memory at a furious rate, when it's assumed to be available.)

Interestingly, NXP (Freescale) has a very similar "Bit Manipulation Engine" on some of their chips (Kinetis E02 series, for example) that does permit manipulation of multi-bit fields.

I would just like to thank westfw for his valuable assembly-language project for the Arduino Uno M0.

Thanks!. Um, I wrote M0 assembler? I only recall writing SAMD10 assembler (which might have been useful.)
This one? Minimal-ARM/Atmel/samd10asmBlink/samd10asmBlink/main.S at master · WestfW/Minimal-ARM · GitHub
(If so, it sounds like you've come a long way since then!)