Declaring an asm block as volatile means the compiler wont try and optimise it away. Anything that is in the asm volatile block will be in the final compilation.
With the "brne 1b", you will note there is also the label "1:". These go together. It basically says jump backwards to label 1. It doesn't just have to be 1, you can have any single digit number (maybe multiple digits, can't remember). There are also cases where the label might be forwards (i.e. the label after the statement). Here is a convoluted example of why the b and f are used:
asm volatile (
"1: \n\t"
"brne 1f \n\t"
"dec r1 \n\t"
"rjmp 1b \n\t"
"1: \n\t"
"nop \n\t"
);
That code would perform a 'while' behaviour, rather than a 'do-while'. Essentially it says if initially the number was zero, then jump over the dec and rjmp instructions to the label 1 which appears forward (below) this line. Otherwise it will decrement r1 and then do a jump to the label that appears behind (above) the line. As there are two 1's you have to identify which to pick.
You couldn't do just 'brne' without anything else as you haven't specified where to branch off to. Now instead of adding labels, you can actually be explicit:
asm volatile ("
"dec %[counter] \n\t"
"brne .-4 ;Note the '.' \n\t"
);
This would say branch 4 "bytes" backwards. Why 4 bytes? Well the branch is done relative to the end of the brne instruction, and we need to get back to the start of the dec instruction - each one is 2 bytes, so a jump totalling -4 is required. To go forwards you would do "brne .+16" or whatever.
Now you could instead jump in terms of number of words (2 bytes) by doing "brne -2" which would compile to be the same as above - notice that in the second option there is no "."!
This is useful for very short loops, but it is a pain if you suddenly decide to add an instruction in the loop as you have to recalculate all of the jumps manually.
As for the tab characters, that one is mostly just for if you start looking at the assembly before compilation. In fact for the most part it doesn't make any real difference - I just do it because that's how I've seen it done! It works fine without the tab, but the new line is a must!
Also you have to wrap each line in its own "", as otherwise it won't work.
Oh, and this is a very useful reference from Atmel: AVR Instruction Set. It basically lists the full instruction set of the AVR CPU including small examples.
And then this one has some useful inline assembly hints: Inline Assembler Cookbook
Also, a correction to the example in my previous post. Really "I" shouldn't be used for passing the [loop] value into the inline assembler. This is because technically "I" represents a 6bit constant, when in fact you should be using "M" as that is an 8bit constant - though in practice it seems to make little difference.
Also if you happen to want to pass a 16bit constant, you can actually use a method which I haven't seen documented - found it by mistake.
asm volatile (
"ldi r24, lo8(%0) \n"
"ldi r25, hi8(%0) \n"
...
:
: "i" (65535)
);
Notice in this case the use of the lower case "i" to specify a 16bit integer, and then the use of hi8() and lo8() to get the upper and lower 8bit chunks of the constant.