Trouble With MCP_CAN Not Sending Fast Enough

Hello everyone! Hope you are all doing well. This is going to be a long post, so buckle up.

My friend and I are working on a CAN bridge of sorts (I don't want to get into it too in-depth as we plan on selling it later, but I will post more code if it's necessary). We are using ESP32-S3's, and MCP2515 breakout boards (with 20Mhz crystals). We are using a custom PCB, and everything works (kinda, it misses messages- more on that below).

We are using coryjfowler's MCP_CAN library, and it has been working well so far. We set up interrupts to read the CAN messages, and we can read two full 250kbps bus (at 100% bus load each). We have PCAN boxes so that we can simulate CAN bus traffic (as well as read messages that we send onto the CAN bus).

The problem we are running into is sending messages back onto another bus. We send the messages to another ESP32-S3, it sends the messages into a sending queue (all the messages make it in the queue), and then to the SendMsgBuf function. There, it takes so long to send the message that it misses messages in the queue to send (the queue overflows). We could increase the queue size, but that's just delaying the inevitable. The MCP2515 breakout board has its own SPI bus, as well as an empty CAN bus (on the other end is the PCAN box where we can read the messages). We tested resistance across the lines and got 60 ohms like we should. However, we're not getting all the messages we should be sending.

We are using both cores, too. The receive interrupts are on core one, and the Transmit and Receive_Buffer (where it sends the CAN message) are on core zero. We originally had the Receive_Buffer send the message to a queue, with the can_send function (on core one, we also tried it on core zero to no effect) taking the message from that queue and sending it on the CAN bus, but it was still too slow. This is going to be a two-way communication device, by the way, so both of the ESP32-S3's need both the transmit and receive functions.

To get a sense of where we are now, see the rates below (we sent messages from the PCAN box to simulate messages, and calculated the rate by doing messages_received/messages_sent (both on the PCAN boxes).

At 100% bus load on one bus, we received 89% of messages.
At 75% bus load on one bus, we received 95% of messages.
At 75% bus load on two busses, we received 65% of messages.

We are aiming for a 95% receive rate for two busses each at 100% load. If we take out the SendMsgBuf (and add a memcpy so that it has something to do), that function is fast enough. That leads us to believe that the SendMsgBuf function is slow. We're not sure where, though. Are the TX buffers on the MCP2515 full? Is it stuck waiting for something on the CAN bus? Can it only send at a certain speed and we're exceeding that? Is there a faster function we can use? Can we put the sending function inside an interrupt like we did for the receive function (we thought about using them, but we couldn't figure out how to trigger it when there's a new message to send)? Would adding another thread (on the same core) of the Receive_Buffer (doing the same thing) increase speed? Some code snippets are below.

Struct definition and struct variable names

//Struct message for queues and sending/receiving messages
typedef struct struct_message {
  int8_t can_chan = 0;  //what channel the CAN message is on
  unsigned long can_id;  //the CAN ID of the message
  byte can_data[8];  //the CAN data for the CAN message
  byte extended;  //a true/false if the ID is extended or not (mainly needed for sending/receiving a CAN message)
  unsigned char length;
} struct_message;

// Create the structs for can 1 & 2 
struct_message can1_incoming_message;  //in terms of the CAN bus
struct_message can2_incoming_message;  //in terms of the CAN bus
char can1_char[sizeof(can1_incoming_message)];
char can2_char[sizeof(can2_incoming_message)];

CAN receive ISR (this is mimicked for CAN 2)

// ISR for CAN 1 recieving and packaging to common queue
void IRAM_ATTR can1_can()
{
  if(!digitalRead(CAN1_INT_PIN))  // If CAN1_INT pin is low, read receive buffer
  {
     
    CAN1.readMsgBuf(&can1_incoming_message.can_id, &can1_incoming_message.extended, &can1_incoming_message.length, can1_incoming_message.can_data);  //read CAN message and store in struct
    can1_incoming_message.can_chan = 1;  //set the channel that the CAN message came from

    memcpy(can1_char, &can1_incoming_message, sizeof(can1_incoming_message));  //convert struct to char array for transferring to other board

    xQueueSend(can12_spisend_queue, &can1_char, (TickType_t)0);  //send the message to the spi send queue for CAN 1&2
    counter++;  //troubleshooting - making sure we receive all messages
     
  }
}

The receive buffer function

void rx_buffer(void *pvParameters) {
  //Create structs for the outgoing messages, and RX char 
  struct_message empty_message;
  char rx_messages[ARRAY_ELEMENTS][sizeof(empty_message)];
  struct_message can1_outgoing_message;
  struct_message can2_outgoing_message;

  while(1) {
    
    if(uxQueueMessagesWaiting(rx_buffer) > 0) {  //check to see if any messages have been received
      xQueueReceive(rx_buffer, &(rx_messages), (TickType_t)0);  //read a message from the receive queue

      for(int i = 0; i < ARRAY_ELEMENTS; i++) {  //cycle through the incoming message
        switch ((uint8_t)rx_messages[i][0])  //read the equivalent of the CAN channel to send to the proper queue
        {
        case 1: //If for CAN 1, memcpy to struct from the array, and then send to CAN 1 MCP board
            //xQueueSend(can1_send_queue, rx_messages[i], (TickType_t)0);  //send the incoming data to the SPI send queue

            //portDISABLE_INTERRUPTS();
            memcpy(&can1_outgoing_message, rx_messages[i], sizeof(can1_outgoing_message));
            CAN1.sendMsgBuf(can1_outgoing_message.can_id, can1_outgoing_message.extended, can1_outgoing_message.length, can1_outgoing_message.can_data);  //send the CAN message
            //portENABLE_INTERRUPTS();
            can1_rxcount ++;

            break;
        case 2: //If for CAN 2, memcpy to struct from the array, and then send to CAN 2 MCP board
            //xQueueSend(can2_send_queue, rx_messages[i], (TickType_t)0);  //send the incoming data to the CAN send queue

            //portDISABLE_INTERRUPTS();
            memcpy(&can2_outgoing_message, rx_messages[i], sizeof(can2_outgoing_message));
            CAN2.sendMsgBuf(can2_outgoing_message.can_id, can2_outgoing_message.extended, can2_outgoing_message.length, can2_outgoing_message.can_data);  //send the CAN message
            //portENABLE_INTERRUPTS();
            can2_rxcount ++;
            break;
        
        default:
        //should only run if there is an empty message
            break;
        }
      }
    }

  }

}

As you can see in this function, we have a lot commented out. We tried many different things, including disabling and then re-enabling interrupts, to no effect. The interrupts are operating on a completely different core. It is, however, the same SPI bus, which is why we originally had the CAN_send function on the same core. We cut that out and are now just sending from the receive_buffer function. If it stays here, we will uncomment the interrupt enable/disable commands so that there are no issues there. However, for right now, we are not receiving anything when we are sending (we are trying to get one way communication down first).

If you have any questions, feel free to ask and I'll do my best to answer them. Thank you in advance!

That's a disadvantage. Focus on facts, technical terms etc. Telling about failed attempts etc doesn't tell anything useful.

Maybe some other day......

1 Like

Why are you using the non-interrupt safe xQueueSend inside an interrupt?
Why are you ignoring the diagnostic return codes from functions?
Why is there no attempt to measure the duration of anything?

I was unaware that it was not interrupt safe, so it has been corrected! Thank you for the catch!

I'm ignoring them because I don't believe that they're going to tell me anything useful. They return that the message has been sent to the queue successfully, and that the CAN message was sent successfully, but that doesn't really help solve the problem that I'm running into.

That was the next step in our troubleshooting process. We set a variable equal to the current time in millis right before and after the sendMsgBuf function, and we found that the sendMsgBuf function takes around one millisecond to send a message. That's way too slow for our use case, because to completely fill a CAN bus at 250kbps, you need to send a message every 0.5 milliseconds (we calculated that using the total number of bits a CAN message uses and divided that by 250,000 - about 128 bits if I remember correctly). Our calculation is confirmed when using PCAN - sending two messages every millisecond results in 100% bus load.

When we're receiving messages, we did the same test and found that the time never changed (in milliseconds) from the beginning of the readMsgBuf function to the end. That means that it can read messages in under a millisecond, which is what we need. The readMsgBuf function is fast enough, so we need to get that speed to the sendMsgBuf function as well.

edited to note who made the quotes

Something else to note: while looking through the documentation again of the MCP_CAN library, I noticed that the library had never been tested with 20Mhz, though it supported up to that clock speed. The author said that the math was correct, but that it had never been tested, so maybe the receive function math was correct, but the send function was a little bit off? I'm not sure how that's calculated, but another test we're going to try is to put in 16Mhz crystals and see if that helps/solves the problem. I will update when we receive the crystals and test it.

Why choose the MCP2515 instead of using the ESP32 CAN controller peripheral built into the chip?

We're not using it because each ESP32S3 needs to handle two separate CAN channels, and the ESP32S3 only has one CAN interface. Since we have four total CAN busses, that would be four ESP32S3's, which would double the amount of boards we have now (and be doubly expensive) and would complicate the communication between the boards. Even if we did one bus on the ESP32S3 CAN interface and the other bus on the MCP2515 chip, the second bus on the MCP2515 chip would be too slow.

Hello! I figured out the problem! I'll post the solution here for anyone else that has the same problem.

The crystals got delivered and we put them in the MCP2515 breakout boards, and there was no change. So, I started digging into the MCP_CAN library. I went through the various send functions, and finally found the main sending function, called sendMsg() starting on line 1104 in mcp_can.cpp. I copied the function below:

/*********************************************************************************************************
** Function name:           sendMsg
** Descriptions:            Send message
*********************************************************************************************************/
INT8U MCP_CAN::sendMsg()
{
    INT8U res, res1, txbuf_n;
    uint32_t uiTimeOut, temp;

    temp = micros();
    // 24 * 4 microseconds typical
    do {
        res = mcp2515_getNextFreeTXBuf(&txbuf_n);                       /* info = addr.                 */
        uiTimeOut = micros() - temp;
    } while (res == MCP_ALLTXBUSY && (uiTimeOut < TIMEOUTVALUE));

    if(uiTimeOut >= TIMEOUTVALUE) 
    {   
        return CAN_GETTXBFTIMEOUT;                                      /* get tx buff time out         */
    }
    uiTimeOut = 0;
    mcp2515_write_canMsg( txbuf_n);
    mcp2515_modifyRegister( txbuf_n-1 , MCP_TXB_TXREQ_M, MCP_TXB_TXREQ_M );
    
    temp = micros();
    do
    {       
        res1 = mcp2515_readRegister(txbuf_n-1);                         /* read send buff ctrl reg 	*/
        res1 = res1 & 0x08;
        uiTimeOut = micros() - temp;
    } while (res1 && (uiTimeOut < TIMEOUTVALUE));   
    
    if(uiTimeOut >= TIMEOUTVALUE)                                       /* send msg timeout             */	
        return CAN_SENDMSGTIMEOUT;
    
    return CAN_OK;
}

A quick explanation: in the sendMsg() function, it gets an available TX buffer, sends the CAN message to the buffer, then makes sure that the message sends.

It is important to note that the MCP2515 chip has three transmit buffers (register addresses 0x30, 0x40, and 0x50). This will be important later.

After looking through this function, other functions, and the MCP2515 datasheet, I found something peculiar. The last part of that function explanation, where the function makes sure that the message sends, isn't really necessary, and I'll explain why. I'll copy that section of the function below so that you can follow along (starting at line 1128 and going through 1137).

temp = micros();
    do
    {       
        res1 = mcp2515_readRegister(txbuf_n-1);                         /* read send buff ctrl reg 	*/
        res1 = res1 & 0x08;
        uiTimeOut = micros() - temp;
    } while (res1 && (uiTimeOut < TIMEOUTVALUE));   
    
    if(uiTimeOut >= TIMEOUTVALUE)                                       /* send msg timeout             */	
        return CAN_SENDMSGTIMEOUT;

What this does is it first resets the time counter temp to the current time (for timeout purposes). Then, it enters the do/while loop and sets res1 equal to the register value at the transmit buffer (the -1 is there because that's where the info is stored (0x30, 0x40, 0x50), whereas the CAN data is stored in txbuf_n (0x31, 0x41, 0x51)). It then takes that value, and 'ands' it with 0x08, which is 0b1000 (in binary). That part essentially just says "I don't care what any of the values are, other than whats in bit three" (starting counting from 0- where the 1 is in the binary number). If you look on the datasheet for the MCP2515, you'll find that bit 3 is the Message Transmit Request Bit. It will have a value of 1 if the buffer is currently pending transmission, and a value of 0 if the buffer is clear. Essentially, the function checks that register if there is a message in that buffer or not. The variable res1 will have a value of 1 if there is a message in that buffer, and a value of 0 if there is not a message in that buffer. In the while parenthesis, it will continue that loop as long as res1 equals 1 (message still in that buffer), and the function has not timed out.

In essence, the library, as it stands, is only using one transmit buffer. I commented that entire block of code out (from lines 1128 to 1137), and now we can completely fill two CAN busses at 100% busload with the MCP2515 breakout board!

And in case you're wondering, if there are no free transmit buffers (aka you filled them too fast - what this block of code was trying to prevent), the block of code that finds a free transmit buffer will keep looping until it finds a free buffer or times out. So, there isn't really a big downside to commenting this part out (that I've found). The only thing I can think of is that if your physical CAN bus isn't set up right and the MCP2515 can't send the message, you'll have no way of knowing that it didn't send the message except that your transmit buffers are full and you didn't receive a message on the other side.

I recommend this fix if you're trying to fully load down a high speed CAN bus, but do not recommend this if you're only sending a message every couple of milliseconds (or greater), as it can aid in troubleshooting. If you're curious, you need to send a CAN message every 0.5 milliseconds on a 250kbps bus in order to load it to 100%.

Here is the link to the MCP2515 datasheet. The part about the transmit buffer registers are on page 18 (also seen in screenshot attached).

For those interested, here is the entire sendMsg() function with the correct part commented out:

/*********************************************************************************************************
** Function name:           sendMsg
** Descriptions:            Send message
*********************************************************************************************************/
INT8U MCP_CAN::sendMsg()
{
    INT8U res, res1, txbuf_n;
    uint32_t uiTimeOut, temp;

    temp = micros();
    // 24 * 4 microseconds typical
    do {
        res = mcp2515_getNextFreeTXBuf(&txbuf_n);                       /* info = addr.                 */
        uiTimeOut = micros() - temp;
    } while (res == MCP_ALLTXBUSY && (uiTimeOut < TIMEOUTVALUE));

    if(uiTimeOut >= TIMEOUTVALUE) 
    {   
        return CAN_GETTXBFTIMEOUT;                                      /* get tx buff time out         */
    }
    uiTimeOut = 0;
    mcp2515_write_canMsg( txbuf_n);
    mcp2515_modifyRegister( txbuf_n-1 , MCP_TXB_TXREQ_M, MCP_TXB_TXREQ_M );
    
    // temp = micros();
    // do
    // {       
    //     res1 = mcp2515_readRegister(txbuf_n-1);                         /* read send buff ctrl reg 	*/
    //     res1 = res1 & 0x08;
    //     uiTimeOut = micros() - temp;
    // } while (res1 && (uiTimeOut < TIMEOUTVALUE));   
    
    // if(uiTimeOut >= TIMEOUTVALUE)                                       /* send msg timeout             */	
    //     return CAN_SENDMSGTIMEOUT;
    
    return CAN_OK;
}

I hope this helps people in the future!

edited to add the full sendMsg() function with the corrections

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.