UART communication between multiple stm32f103c8t6 using DMA

Hello,

I am using Stm32f103c8t6 with Offical core with HID bootloader. I am working on Visual Studio Code IDE using Microsoft Arduino Extension.

Is it possible to make use of UART communication in DMA without disturbing the Loop function which will be running in CPU or Flash Memory?

If so, How can I achieve it? Any examples would be really helpful. Are there any limitations in using DMA for UART communication? A detailed explanation would also be appreciatable.

Okay, let me be more clear here. I have implemented a library for a custom made PCB board with stm32f103c8t6 with lots of functionalities (All the functionality were explained as follow (The board has 6 General Purpose Output (using built internal GPIO pins) with all necessary electronics to drive up to 24 volts. It also has 4 analogue industrial inputs (using external ADC pins via I2C1 communication (ADS1115 IC https://www.ti.com/lit/ds/symlink/ads1113.pdf)) with all necessary electronics to read the voltage up to 10 volts. Then, the board has 6 PWM using BTN8962TA https://www.infineon.com/dgdl/Infineon- ... 2d247a7bf5 to drive the motors using PWM frequency and Dutycycle using the outputs from PWM. To be noted for all the 6 PWMs, from the schematics of the board there are two external ADC pins(one for I sense, and V sense(using ADS1115 IC via I2C1)), one external GPIO pin(for Inhibit pin of PWM(using MCP23017 IC http://ww1.microchip.com/downloads/en/D ... 01952C.pdf via I2C1)), and an internal timer pin connected to PWM directly from the microcontroller. Then there are 5 thermocouple pins using MCP9600 IC http://ww1.microchip.com/downloads/en/D ... 02417A.pdf via I2C1. Then, the board has 3.3 to 5V I2C2 pins to use Sensirion SCD30 Sensor Module and Senseair K30 Sensor Module and Senseair K33 Sensor Module. Also, it has compatibility to support NeoPixel LED. Also, SPI2 is also available for using an SD card and touch screen to work it as HMI. Above all that it also has its I2C2, USART1, USART2, LIN.)). Now, The entire program functionality is able to fit in the MCU with all the functionalities having still 7-kilo bytes of free space. :slight_smile:

The duration of the entire loop to run takes almost approximately 1200 milliseconds.

So, to reduce the time of the loop, I thought of using DMA for UART communication. [ From my understanding, using DMA will free up the main CPU. So, I think using DMA is like multitasking. (That is in the main CPU it will do the work that it needs to do without getting disturbed by UART communication because UART communication will be done in DMA with Channel 4 and Channel 5 for USART1, and Channel 6 and Channel 7 for USART2.) So, I think that UART communication in DMA and other functionalities can take place in the main CPU simultaneously. ] By this, I believe and hope that some amount of time will be reduced in my entire loop.

But, I also did some research on my own to get some relevant or working examples for my purpose in Arduino with stm32official core. But I could not able to find it.

So, let me know If I was wrong in understanding the DMA. I hope, I understood about DMA correctly. If so, please guide me on how can I achieve this.

I hope to get your suggestions, opinions, guidance, directions, and references to achieve my task.

Thanks all in advance.

DMA does frees up the CPU to do other task in your main loop. Without DMA, normally, when you receive data in the UART, an interrupt is triggered that allows the CPU to copy this UART data from the UART received register to a place in SRAM. This takes up some CPU cycle as the CPU is involve in copying the data from UART register to SRAM. With DMA, the DMA controller is responsible for copying the data directly to SRAM so the CPU can do other stuff

Initializing DMA in the STM32 is very complicated and varies by family (M0+, M3, M4, etc). At a minimum, you need to setup the DMA channel and the DMA stream for your peripheral, and configure the peripheral itself to use DMA transfer. I have this DMA code for USART6 for the STM32F4, there going to be similarities but the M4 is certainly more complicated than the M0. It will take a lot of reading and debugging (real) to get it to work with your MCU version

// initialize the DMA controllers
void periphInit_DMA(void)
{
	// enable DMA1 and DMA2 RCC clock
	RCC->AHB1ENR |= RCC_AHB1ENR_DMA2EN | RCC_AHB1ENR_DMA1EN;

	// enable DMA2 stream 6 interrupts
    NVIC_SetPriority(DMA2_Stream6_IRQn, 6);
    NVIC_EnableIRQ(DMA2_Stream6_IRQn);

	return;
}

// initialize the UART6 hardware
void periphInit_UART6(void)
{
	// enable UART6 clock
	RCC->APB2ENR |= RCC_APB2ENR_USART6EN;

	// set alternate GPIO functions for UART6 TX and RX
	GPIOG->MODER |= GPIO_MODER_MODER14_1 | GPIO_MODER_MODER9_1;
	GPIOG->AFR[1] |= 8<<GPIO_AFRH_AFSEL14_Pos | 8<<GPIO_AFRH_AFSEL9_Pos;

	// USART6 is on APB2 with Fclk of 90 MHz [115200 bps]
	USART6->BRR = 48<<4 | 13<<0;

	// configure UART6
	USART6->CR3 = 0x00
				| 0<<USART_CR3_ONEBIT_Pos							// sample three bits, default
				| 0<<USART_CR3_CTSIE_Pos | 0<<USART_CR3_CTSE_Pos	// disable CTS signal
				| 0<<USART_CR3_RTSE_Pos								// disable RTS signal
				| 0<<USART_CR3_DMAT_Pos								// no DMA transmit
				| 0<<USART_CR3_DMAR_Pos								// no DMA receive
				| 0<<USART_CR3_SCEN_Pos								// disable SmartCard mode
				| 0<<USART_CR3_NACK_Pos								// NOT APPLICABLE
				| 0<<USART_CR3_HDSEL_Pos							// NOT APPLICABLE
				| 0<<USART_CR3_IRLP_Pos								// IrDA LP disabled
				| 0<<USART_CR3_IREN_Pos								// IrDA mode disabled
				| 0<<USART_CR3_EIE_Pos;								// no error interrupts

	USART6->CR2 = 0x00
				| 0<<USART_CR2_LINEN_Pos							// LIN mode disabled
				| 0<<USART_CR2_STOP_Pos								// ONE stop bit
				| 0<<USART_CR2_CLKEN_Pos							// asynchronous serial
				| 0<<USART_CR2_CPOL_Pos | 0<<USART_CR2_CPHA_Pos		// not applicable in async mode
				| 0<<USART_CR2_LBCL_Pos								// NOT APPLICABLE
				| 0<<USART_CR2_LBDIE_Pos | 0<<USART_CR2_LBDL_Pos	// NOT APPLICABLE
				| 0<<USART_CR2_ADD_Pos;								// NOT APPLICABLE

	USART6->CR1 = 0x00
				| 0<<USART_CR1_OVER8_Pos							// 16 bit oversampling
				| 0<<USART_CR1_M_Pos								// one START bit, eight DATA bits
				| 0<<USART_CR1_WAKE_Pos								// IDLE line wakeup mode
				| 0<<USART_CR1_PCE_Pos								// disable PARITY control
				| 0<<USART_CR1_PS_Pos								// even parity, not applicable
				| 0<<USART_CR1_PEIE_Pos								// parity error interrupt disabled
				| 0<<USART_CR1_TXEIE_Pos							// transmit empty interrupt disabled
				| 1<<USART_CR1_RXNEIE_Pos							// receiver not empty interrupt enabled
				| 0<<USART_CR1_TCIE_Pos								// transmit complete interrupt disabled
				| 0<<USART_CR1_IDLEIE_Pos							// no IDLE interrupt
				| 0<<USART_CR1_RWU_Pos | 0<<USART_CR1_SBK_Pos		// not applicable
				| 1<<USART_CR1_RE_Pos								// receiver enabled
				| 1<<USART_CR1_TE_Pos								// transmitter enabled
				| 1<<USART_CR1_UE_Pos;								// UART enabled

    NVIC_SetPriority(USART6_IRQn, 16);
    NVIC_EnableIRQ(USART6_IRQn);

	return;
}

// the actual DMA write function
void DMAUSART6_writeData(uint8_t * SRCDATA, uint32_t LENGTH)
{
	/***************
	Setup DMA2
	Stream 6
	Channel 5
	USART6 TX
	***************/

	// disable DMA2 stream 6 first
	DMA2_Stream6->CR = 0;

	// wait for ongoing DMA2 stream to finish
	while(DMA2_Stream6->CR & DMA_SxCR_EN_Msk);

	/***************
	check LISR and
	HISR registers
	here
	***************/

	// set USART6 Data Register as DMA2 peripheral destination
	DMA2_Stream6->PAR = USART6_BASE + 0x0004UL;

	// set source memory pointer
	DMA2_Stream6->M0AR = (volatile uint32_t) SRCDATA;

	// set number of items (in bytes) to transfer
	DMA2_Stream6->NDTR = LENGTH;

	// configure DMA channel for USART6 TX
	DMA2_Stream6->CR = 0x00
			| 5<<DMA_SxCR_CHSEL_Pos													// channel 5 select for USART6 TX
			| 0<<DMA_SxCR_MBURST_Pos												// single memory transfer
			| 0<<DMA_SxCR_PBURST_Pos												// single peripheral transfer
			| 0<<DMA_SxCR_CT_Pos | 0<<DMA_SxCR_DBM_Pos								// target memory 0, disable double buffer mode
			| DMA_SxCR_PL_0															// medium priority level
			| 0<<DMA_SxCR_PINCOS_Pos												// offset = PSIZE
			| 0<<DMA_SxCR_PSIZE_Pos | 0<<DMA_SxCR_PINC_Pos							// PSIZE = 08bit, no peripheral increment
			| 0<<DMA_SxCR_MSIZE_Pos | 1<<DMA_SxCR_MINC_Pos							// MSIZE = 08bit, increment memory
			| 0<<DMA_SxCR_CIRC_Pos													// disable circular mode
			| 1<<DMA_SxCR_DIR_Pos													// memory to peripheral direction
			| 0<<DMA_SxCR_PFCTRL_Pos												// DMA is the flow controller
			| 1<<DMA_SxCR_TCIE_Pos | 0<<DMA_SxCR_HTIE_Pos							// transfer complete interrupt enabled
			| 0<<DMA_SxCR_TEIE_Pos | 0<<DMA_SxCR_DMEIE_Pos							// no error interrupts
			| 0<<DMA_SxCR_EN_Pos;													// disable DMA channel initially

	// disable DMA FIFO, no FIFO error interrupts, use full FIFO (4 words), use direct mode
	DMA2_Stream6->FCR = 0<<DMA_SxFCR_FEIE_Pos | 0<<DMA_SxFCR_DMDIS_Pos | DMA_SxFCR_FTH;


	/***************
	Setup USART6 TX
	for DMA transfer
	***************/

	// clear the TC flag
	USART6->SR &= ~USART_SR_TC_Msk;

	// enable USART6 DMA transmitter
	USART6->CR3 |= USART_CR3_DMAT;

	// start DMA transfer
	DMA2_Stream6->CR |= DMA_SxCR_EN;


	/************************
	DMA does the rest
	from this point forward
	until DMA transfer is
	complete and interrupt
	is triggered
	************************/

	return;
}

The way it works is, you provide the SRAM pointer where the data that needs to be printed is located, then let DMA does everything to print it out of the UART6. So technically, you can print out 65000 chars and all the CPU cycles involved is setting up the DMA. The DMA just informs the CPU (via interrupt) when the DMA transfer is finished.

DMA for the UARTs won’t speed up your loop() if the functions are still limited (blocked) by needing to output a certain amount of data in a certain time.

It would certainly be a waste of effort to do dma uarts without confirming that that is wher your program is spending time. (I suspect delay()s in your sensor reading code.)
And implementing a non-blocking non-DMA uarts interface is probably easier and just as effective.

Thank you so much for the clear explanation. Now, I have a clear idea of what DMA is and how DMA can be used.

Could you please share your whole example implementing DMA for my reference? I will try to implement it in my MCU with your example.

But using DMA will reduce at least some milliseconds. And the functions will not be blocked.

(From my understanding of DMA)

I'll have to sanitize my example first since it's part of a big application with lots of peripherals just like yours.

Another way to do non-blocking UART is use the DRE interrupts. It basically triggers when the UART data register is empty and use the ISR to fill the new data to the TX register. It does take CPU cycles compared to pure DMA but allows the CPU to do other stuff while the data is shifted out the UART. I do agree that 1200 ms for a main loop is quite long and you probably can narrow it down with proper state machine code

I have the same idea. But using DMA will also reduce a few milliseconds(I hope). Also, I believe if I make use of DMA now, then I might get real hands-on experience in DMA stuff with microcontrollers. Because I have never worked with DMA before.

That's why I asked you! Is it possible to sanitize your code and share it with me? This will be really helpful for me to implement DMA in my MCU with your example code as a reference.

I did find an old Discovery board with an STM32F1XX which closely matches with what you have. I'll see what I can do get UART DMA running there. I can't promise a timeline as I have a full time job just like everyone else.

Thank you so much. It will be really helpful. Sure, will be happy to receive your guidance for my success. :star_struck:

For example:

there are 5 thermocouple pins using MCP9600

The MCP9600 has conversion time of up to 320ms. Unless you've done something special to make reading the chip asynchronous, there's your 1200ms right there.

Got this to work with an STM32F100RB which is almost similar to the STM32F103XX. USART3 is the same pinout for both MCU so it's an easy migration. The only thing to change I think is the baud rate register since the peripheral clock for your MCU is probably different.

static void print_UART3(char * data)
{
	for(uint8_t count=0; data[count]!=0; count++)
	{
		USART3->DR = data[count];
		while((USART3->SR & USART_SR_TXE) == 0);
	}

	return;
}

int main(void)
{
  /**********************
  Core libraries should
  take care of this part
  **********************/
  HAL_Init();
  SystemClock_Config();

  /**********************
  initialize the USART3
  pin RCC clock
  PB10 - TX
  PB11 - RX
  **********************/
  periphInit_GPIO();

  /**********************
  setup USART3
  9600-8-n-1
  **********************/
  periphInit_UART3();

  /**********************
  setup DMA1
  USART3_TX is on
  DMA1 channel 2
  **********************/
  periphInit_DMA();

  // test print without DMA
  print_UART3("this is a test message\r\n\r\n");

  HAL_Delay(1000);

  // sample message
  uint8_t data_to_print[] = "this is a very loooooong message to print to the USART3 DMA test 1234567890 ABCDEFGHIJKLMNOPQRSTUVWXYZ\r\n\r\n";

  // test print with DMA
  DMAUSART3_writeData(data_to_print, sizeof(data_to_print));

  // you can't print on USART3 until the DMA transaction is finished
  HAL_Delay(5000);

  print_UART3("Did it work?\r\n");


  while (1)
  {
  }
}

Here are the functions to setup DMA on USART3

void periphInit_GPIO(void)
{
	/**********************
	Enable GPIO ports
	**********************/
	RCC->APB2ENR |= RCC_APB2ENR_IOPBEN_Msk | RCC_APB2ENR_IOPDEN_Msk;

	return;
}

void periphInit_UART3(void)
{
	// enable UART3 clock
	RCC->APB1ENR |= RCC_APB1ENR_USART3EN;

	// set alternate GPIO functions for UART3 TX and RX
	GPIOB->CRH &= ~(GPIO_CRH_CNF11_Msk | GPIO_CRH_MODE11_Msk | GPIO_CRH_CNF10_Msk | GPIO_CRH_MODE10_Msk);
	GPIOB->CRH |= (GPIO_CRH_CNF11_1 | GPIO_CRH_MODE11_1 | GPIO_CRH_CNF10_1 | GPIO_CRH_MODE10_1);

	// USART3 is on APB1 with Fclk of 24 MHz
	// set baud rate to 9600 bps (BRR = 156.25)
	USART3->BRR = 156<<4 | 4<<0;

	// configure UART3
	USART3->CR3 = 0x00
				| 0<<USART_CR3_ONEBIT_Pos							// sample three bits, default
				| 0<<USART_CR3_CTSIE_Pos | 0<<USART_CR3_CTSE_Pos	// disable CTS signal
				| 0<<USART_CR3_RTSE_Pos								// disable RTS signal
				| 0<<USART_CR3_DMAT_Pos								// no DMA transmit
				| 0<<USART_CR3_DMAR_Pos								// no DMA receive
				| 0<<USART_CR3_SCEN_Pos								// disable SmartCard mode
				| 0<<USART_CR3_NACK_Pos								// NOT APPLICABLE
				| 0<<USART_CR3_HDSEL_Pos							// NOT APPLICABLE
				| 0<<USART_CR3_IRLP_Pos								// IrDA LP disabled
				| 0<<USART_CR3_IREN_Pos								// IrDA mode disabled
				| 0<<USART_CR3_EIE_Pos;								// no error interrupts

	USART3->CR2 = 0x00
				| 0<<USART_CR2_LINEN_Pos							// LIN mode disabled
				| 0<<USART_CR2_STOP_Pos								// ONE stop bit
				| 0<<USART_CR2_CLKEN_Pos							// asynchronous serial
				| 0<<USART_CR2_CPOL_Pos | 0<<USART_CR2_CPHA_Pos		// not applicable in async mode
				| 0<<USART_CR2_LBCL_Pos								// NOT APPLICABLE
				| 0<<USART_CR2_LBDIE_Pos | 0<<USART_CR2_LBDL_Pos	// NOT APPLICABLE
				| 0<<USART_CR2_ADD_Pos;								// NOT APPLICABLE

	USART3->CR1 = 0x00
				| 0<<USART_CR1_OVER8_Pos							// 16 bit oversampling
				| 0<<USART_CR1_M_Pos								// one START bit, eight DATA bits
				| 0<<USART_CR1_WAKE_Pos								// IDLE line wakeup mode
				| 0<<USART_CR1_PCE_Pos								// disable PARITY control
				| 0<<USART_CR1_PS_Pos								// even parity, not applicable
				| 0<<USART_CR1_PEIE_Pos								// parity error interrupt disabled
				| 0<<USART_CR1_TXEIE_Pos							// transmit empty interrupt disabled
				| 0<<USART_CR1_RXNEIE_Pos							// receiver not empty interrupt disabled
				| 0<<USART_CR1_TCIE_Pos								// transmit complete interrupt disabled
				| 0<<USART_CR1_IDLEIE_Pos							// no IDLE interrupt
				| 0<<USART_CR1_RWU_Pos | 0<<USART_CR1_SBK_Pos		// not applicable
				| 1<<USART_CR1_RE_Pos								// receiver enabled
				| 1<<USART_CR1_TE_Pos								// transmitter enabled
				| 1<<USART_CR1_UE_Pos;								// UART enabled

	/*
    NVIC_SetPriority(USART3_IRQn, 6);
    NVIC_EnableIRQ(USART3_IRQn);
    */

	return;
}

void periphInit_DMA(void)
{
	// enable DMA1 RCC clock
	RCC->AHBENR |= RCC_AHBENR_DMA1EN;

	// enable DMA1 channel 2 interrupts
	/*
    NVIC_SetPriority(DMA1_Channel2_IRQn, 9);
    NVIC_EnableIRQ(DMA1_Channel2_IRQn);
    */

	return;
}

void DMAUSART3_writeData(uint8_t * SRCDATA, uint32_t LENGTH)
{
	/***************
	Setup DMA1
	Channel 2
	USART3 TX
	***************/

	// clear DMA1 channel 2 first
	DMA1_Channel2->CCR = 0;

	// wait for ongoing DMA1 stream to finish
	while(DMA1_Channel2->CCR & DMA_CCR_EN_Msk);

	// clear DMA1 flag registers
	DMA1->IFCR = DMA_IFCR_CTEIF2_Msk | DMA_IFCR_CHTIF2_Msk | DMA_IFCR_CTCIF2_Msk | DMA_IFCR_CGIF2_Msk;

	// set USART3 Data Register as DMA1 peripheral destination
	DMA1_Channel2->CPAR = USART3_BASE + 0x0004UL;

	// set source memory pointer
	DMA1_Channel2->CMAR = (volatile uint32_t) SRCDATA;

	// set number of items (in bytes) to transfer
	DMA1_Channel2->CNDTR = LENGTH;

	// configure DMA channel for USART3 TX
	DMA1_Channel2->CCR = 0x00
			| 0<<DMA_CCR_MEM2MEM_Pos
			| 0<<DMA_CCR_PL_Pos														// LOW priority level
			| 0<<DMA_CCR_PSIZE_Pos | 0<<DMA_CCR_PINC_Pos							// PSIZE = 08bit, no peripheral increment
			| 0<<DMA_CCR_MSIZE_Pos | 1<<DMA_CCR_MINC_Pos							// MSIZE = 08bit, increment memory
			| 0<<DMA_CCR_CIRC_Pos													// disable circular mode
			| 1<<DMA_CCR_DIR_Pos													// memory to peripheral direction
			| 0<<DMA_CCR_TCIE_Pos | 0<<DMA_CCR_HTIE_Pos								// disable interrupts
			| 0<<DMA_CCR_TEIE_Pos													// no error interrupts
			| 0<<DMA_CCR_EN_Pos;													// disable DMA channel initially


	/***************
	Setup USART3 TX
	for DMA transfer
	***************/

	// clear the TC flag
	USART3->SR &= ~USART_SR_TC_Msk;

	// enable USART3 DMA transmitter
	USART3->CR3 |= USART_CR3_DMAT;

	// start DMA transfer
	DMA1_Channel2->CCR |= DMA_CCR_EN;


	/************************
	DMA does the rest
	from this point forward
	until DMA transfer is
	complete and interrupt
	is triggered
	************************/

	return;
}

Thanks will use this as my reference. If I have any doubts. I will come back. Thank you so much.

There is a status register in mcp9600. So, I read new values, only if there is new data available. otherwise, The loop will skip the function. So, basically, there is no delay in mcp9600.