Code Correction

Please use the code tags to properly format your code.

So bascially you say you need ~1 microsecond per count. Let's see what I would expect the while loop to translate to:

reading pin
and operation
compare operation
branch operation
increment two byte integer
jump operation

--> I would not expect much more from this code. I suggest to use avrdump to disassemble the .elf file. Then use the datasheet to count the cycles. Once you understand how many cycles this code takes you can start to tune it.

I would expect that

while ((PIND & B00001000) == B00001000)

would be better implemented as

((PIND & B00001000))

Also I would suggest to use 1 byte integers