MCP_CAN library and arduino program speed

Hey guys, quick question here that someone may be able to answer. There is a library out there (a couple at least actually) for the MCP2515 and 2551 chips called MCP_CAN. There are two versions but I was only able to get the seed one to work easily: https://github.com/Seeed-Studio/CAN_BUS_Shield

While using this library with two MCP2515+2551 chip pairs, in a simple pass-through filter application, it takes almost a full millisecond to process and retransmit a single message. Here's an example of what the code would look like:

const byte CS_ONE = 4;
const byte CS_TWO = 5;

MCP_CAN CAN_ONE(CS_ONE); //4 is the CS pin for CAN_ONE
MCP_CAN CAN_TWO(CS_TWO); //5 is the CS pin for CAN_TWO

void setup()
{
  //SPI.begin(); //omitted because CAN.begin^ initializes SPI already, as part of the CAN startup

  startCan(CS_ONE, CAN_ONE);
  startCan(CS_TWO, CAN_TWO);
}

void loop()
{
  checkCAN(CAN_ONE, CAN_TWO);
  checkCAN(CAN_TWO, CAN_ONE);
}

inline byte checkCAN(MCP_CAN CAN_IN, MCP_CAN CAN_OUT)
{
  INT32U id = 0;
  unsigned char dlc = 0;
  unsigned char data[8];

  if (CAN_IN.readMsgBufID(&id, &dlc, data))
  {
    CAN_OUT.sendMsgBuf(id, EXTFLAG, dlc, data);
    return 1;
  }
  else
    return 0;
}

Let me know if this sounds like something that would normally take 1ms to process per loop and/or if there's something that could be done to improve this. Ideally I'd need to get this to process about 3000 times per second for it to fit the filter application I'd like (versus the 1000 currently). It seems like it should be faster than it is but maybe its just a limitation of the processor speed.

MCP_CAN CAN_ONE(CS_ONE); //4 is the CS pin for CAN_ONE
MCP_CAN CAN_TWO(CS_TWO); //5 is the CS pin for CAN_TWO

Do those comments add any value?

How are you measuring the time one iteration of loop() takes?

The comments are mostly there to help others who will potentially modify the program know what some of the constants and functions are doing. A lot of them I have removed in the example.

As for measuring the speed, we have a 2 channel CAN interface and related software plugged in to the input and the output and we have the time stamp of each. They are consistently about 1ms apart.