faster micros()?

I do not recommend inlining micros(), because I coded it for robustness rather than speed. Inlining it will not gain you much.

Instead, if you are guaranteed that you will always be measuring strictly less than a millisecond, you can read the SysTick counter directly. The counter counts down from 83999 to 0 with each clock tick. You use it a bit like micros(), except that you have to handle the wraparound explicitly, and because it counts down the subtraction is the other way around. You could divide the result by 84 to get microseconds, but if you are after a specific delay would be better to multiply the delay you require by 84 to get it in clock ticks.

Here is an example:

// SysTick example by stimmer

void setup() {
  Serial.begin(115200);
}

void loop() {
  
  int r=random(200);
  int v=SysTick->VAL;
  delayMicroseconds(r);
  v=v-SysTick->VAL;
  if(v<0)v+=84000;
  Serial.print("delayMicroseconds(");
  Serial.print(r);
  Serial.print(") took ");
  Serial.print(v);
  Serial.print(" clock ticks, which when divided by 84 equals ");
  Serial.println(v/84);
  delay(500);
  
}
delayMicroseconds(169) took 14205 clock ticks, which when divided by 84 equals 169
delayMicroseconds(132) took 11097 clock ticks, which when divided by 84 equals 132
delayMicroseconds(117) took 9837 clock ticks, which when divided by 84 equals 117
delayMicroseconds(47) took 3957 clock ticks, which when divided by 84 equals 47
delayMicroseconds(172) took 14457 clock ticks, which when divided by 84 equals 172
delayMicroseconds(68) took 5721 clock ticks, which when divided by 84 equals 68
delayMicroseconds(180) took 15129 clock ticks, which when divided by 84 equals 180
delayMicroseconds(23) took 1941 clock ticks, which when divided by 84 equals 23