So what i read was correct then there is some abysmal overhead arduino is doing.
Moving on i found this documentation which pretty much has examples on all possible GPIO related situations.
On the documentation mentioned above, it does not use PORT_IOBUS , is it about the same performance?