What is the fastest way to read/write GPIOs on SAMD21 boards?

There is a utility "arm-none-eabi-objdump" in the same directory that contains the compiler that I used to get the assembler listing. (/Applications/Arduino-1.8.13.app/Contents/Java/portable/packages/arduino/tools/arm-none-eabi-gcc/7-2017q4/bin/arm-none-eabi-objdump -SC /tmp/Arduino1.8.13Build/*.elf | less in my case.)

How about reading from a pin are they still the same?

The same logic applies, except there was also that 32-bit read to boolean conversion that you might not need to do in many cases.
If you're just reading a bit, the REG and struct should give the same results, but if you have a function that toggles a clock bit and reads a data bit, that would likely have the same improvements by using the struct with its base register.

One of the reasons we're seeing some different code is that the compiler "knows" that a function has access to certain registers that it doesn't need to save, and it looks like it will make full use of that fact if it can. But if the code is in-line, it can better optimize the register usage.

For fun, you can consider that the SAMD21 allows byte writes in addition to 32bit writes of the entire port. Theoretically, you can write any bit in the port using a 8bit constant, which would be quicker to load. Something like:

  PORT->Group[0].OUTSET.reg = (1 << 21);
    20fc:   2280       movs    r2, #32           ;construct 8bit constant 1<<(21-(2*8))
    20fe:   4b02       ldr     r3, [pc, #8]      ; get address of PORT
    2102:   619a       strb    r2, [r3, #24+2]   ; byte store to 3rd byte of OUTSET

The samd21 .h files don't have any definitions for such 8bit access, so the C code to make it happen would be pretty gross.