Here is a speed comparison
sx1 = (int)(cos(angle1) * radius1 + centerx1);
sx2 = (int)(cos(angle2) * radius2 + centerx2);
sx3 = (int)(cos(angle3) * radius3 + centerx3);
sx4 = (int)(cos(angle4) * radius4 + centerx4);
y1 = (int)(sin(angle1) * radius1 + centery1);
y2 = (int)(sin(angle2) * radius2 + centery2);
y3 = (int)(sin(angle3) * radius3 + centery3);
y4 = (int)(sin(angle4) * radius4 + centery4);
Not exactly the sort of code that is likely to run fast on an 8bit CPU...
OTOH, the program used isn't well-optimized for the SAMD51, either (should use sinf() to better utilize the single precision floating point hardware!), and the code contains a LIMIT of the FPS that it will generate, so it's not clear that the M4 isn't even faster than it looks in that video...