Using the wiring functions is definitely slower, and definitely easier than manipulating the raw hardware registers.
Suppose you wanted to drive a pin high. Using the wiring (Arduino) function, you would write:
digitalWrite(4, HIGH);
where "4" is the number of the pin you want high. Pretty easy. digitalWrite() probably take about 2 microseconds (that's a rough guess...) If you have an Arduino board, the pins are labeled with the numbers you use with the wiring functions.
If you wanted to make that pin high by directly manipulating the registers, you write something like:
PORTB |= (1<<5);
Here you have to worry about which port the pin is part of, in this case port B, and which pin number is it within that port (pin 5). If you're using the chip with the datasheet or with a non-Arduino board, the pins are probably labeled with the letter/number info you need. The logical OR operation means you're setting only bit 5 without changing the other bits, and you have to get the logical expression correct, or else it can do something very different (like set several pins at once).
You might think this would take 3 cycles, read, logical or, write (0.1875 us), but if it's only a single bit changed the compiler substitutes an instruction that does it in only 2, which is 0.125 us. That is a LOT faster than calling digitalWrite(4, HIGH); Even 0.1875 us is a lot faster!
Internally, digitalWrite() ultimately does this same OR operation (actually, the compiler can't substitute the shorter instruction). It also does 2 table lookups to figure out which port and what number to OR, and some other checks, a conditional branch (OR if you say HIGH, AND if you say LOW), plus as well as the call and return which push and pop 16 bits to and from the stack. That all takes time. It's many instructions where only one was really necessary.
If you're blinking an LED, or even turning a motor on or off, the difference between 2us and 0.125us isn't going to matter. In fact, it doesn't really matter for a lot of things. But in some cases, like "bit bashing" to communicate with another chip, you might do thousands of operations to send or receive something meaningful and all those extra 2us times can add up.
If you are very good at writing code like PORTB |= (1<<n) then go for it. But if you sometimes make a mistake, the amount of time it takes to troubleshoot is a LOT more than 2us, so going with a less error-prone way can save YOU a lot of time, even if the processor has to do quite a bit more work.
But on the scale of microseconds, it is a lot slower.