adressing data from assembler

Hello,

[sorry , i could not post this in "Syntax..." on the forum, donno why...]

i am optimizing my code with inline assembler but i got questions...

ex: i have to acess elements of a 2 dimentional array one by one and refer them as operands that gonna be mapped to registers

__asm__ __volatile__
(

    "mov r24,%[LEDS]" "\n\t"

 : [LEDS] "+r" (leds[0][0].value) ::
);

how do i adress a whole array instead of having to assign 1 register per operand ?

the problem is that if i use too many operands, the compiler generates a lot of code to load data in registers (before my code)
and a lot of code to load modified registers back to the memory (after my code)

so i thought i could do the adressing myself by storing the adress of an array for instance in the z register
and the parsing memory, read/write the required values, but how to do that ?

also the compiler generates a lot of push before my actual asm code (without any pops ??? wtf?)
does anybody knows why and how to supress this, i lose a lot of cycles for nothing as the whole code runs in the "void loop(){...}" section

ex:

void loop()
{

     400:	4f 92       	push	r4
     402:	5f 92       	push	r5
     404:	6f 92       	push	r6
     406:	7f 92       	push	r7
     408:	8f 92       	push	r8
     40a:	9f 92       	push	r9
     40c:	af 92       	push	r10
     40e:	bf 92       	push	r11
     410:	cf 92       	push	r12
     412:	df 92       	push	r13
     414:	ef 92       	push	r14
     416:	ff 92       	push	r15
     418:	0f 93       	push	r16
     41a:	1f 93       	push	r17
     41c:	df 93       	push	r29
     41e:	cf 93       	push	r28
     420:	0f 92       	push	r0
     422:	cd b7       	in	r28, 0x3d	; 61
     424:	de b7       	in	r29, 0x3e	; 62
     426:	c0 90 04 04 	lds	r12, 0x0404
     42a:	8c 2c       	mov	r8, r12
     42c:	99 24       	eor	r9, r9
     42e:	70 91 05 04 	lds	r23, 0x0405
     432:	87 2f       	mov	r24, r23
     434:	90 e0       	ldi	r25, 0x00	; 0
     436:	f0 90 36 01 	lds	r15, 0x0136
     43a:	2c 01       	movw	r4, r24
     43c:	44 0c       	add	r4, r4
     43e:	55 1c       	adc	r5, r5
     440:	44 0c       	add	r4, r4
     442:	55 1c       	adc	r5, r5
     444:	48 0e       	add	r4, r24
     446:	59 1e       	adc	r5, r25
     448:	88 e7       	ldi	r24, 0x78	; 120
     44a:	90 e0       	ldi	r25, 0x00	; 0
     44c:	9c 01       	movw	r18, r24
     44e:	82 9e       	mul	r8, r18
     450:	c0 01       	movw	r24, r0
     452:	83 9e       	mul	r8, r19
     454:	90 0d       	add	r25, r0
     456:	92 9e       	mul	r9, r18
     458:	90 0d       	add	r25, r0
     45a:	11 24       	eor	r1, r1
     45c:	48 0e       	add	r4, r24
     45e:	59 1e       	adc	r5, r25
     460:	f2 01       	movw	r30, r4
     462:	ec 5c       	subi	r30, 0xCC	; 204
     464:	fe 4f       	sbci	r31, 0xFE	; 254
     466:	43 81       	ldd	r20, Z+3	; 0x03
     468:	54 81       	ldd	r21, Z+4	; 0x04
     46a:	91 81       	ldd	r25, Z+1	; 0x01
     46c:	30 91 06 04 	lds	r19, 0x0406
     470:	3a 01       	movw	r6, r20
     472:	b9 2e       	mov	r11, r25
     474:	a3 2e       	mov	r10, r19
     476:	27 2f       	mov	r18, r23
     478:	26 95       	lsr	r18
     47a:	26 95       	lsr	r18
     47c:	26 95       	lsr	r18

}

plz help

thx

anybody ?

i am optimizing my code with inline assembler

Why?

The compiler optimizes your code too. What makes you think you can do a better job, Rumpelstiltskin?

rompelstilchen:
... i could not post this in "Syntax..." on the forum, donno why...

Because that part of the forum is read-only? Just a guess.

to have you ask why

i know, but if you dont try.. right?

btw i end up having a code 10 times faster

seeing the code in asm, also helps you understand things that could be optimized

i need fast refresh, and with c code, the led panel flickers
arduino is not that fast

rompelstilchen:

[quote author=Nick Gammon link=topic=158948.msg1191654#msg1191654 date=1365417389]
The compiler optimizes your code too. What makes you think you can do a better job, Rumpelstiltskin?

i know, but if you dont try.. right?

btw i end up having a code 10 times faster

seeing the code in asm, also helps you understand things that could be optimized
[/quote]

Excellent answer. Looking at the generated code has always helped me to make more efficient programs. While the optimizations can be incredible, the compiler does not always generate the most efficient code.

I probably know less about AVR assembler than any other assembler in the world, but I'll try to help. The tail-end registers can be paired ( R26:R27, R28:R29 and R30:R31). These are treated as 16-bit pointer registers (X, Y and Z) that can point into SRAM. Maybe setting one of these as a base pointer and using offsets you can accomplish what you want. Like I said, I have written no AVR assembler in the past, but I've written tons of ARM7, PIC and mainframe assembler along with scads of other micros.

Generating your own pro/epilogue code: Improving the Interrupt Service Routine | µC eXperiment

rompelstilchen:
i need fast refresh, and with c code, the led panel flickers
arduino is not that fast

It's fast enough to generate VGA signals which don't flicker:

I suggest you post your C code rather than trying to convert it all to assembler. By all means look at the generated assembler code, that's what I did. And then work out what lines of C code are generating more assembler code than you though.

To do that, find the .elf file from your compile (turn on verbose compiling) and type this at a command window:

avr-objdump -S -z filename.elf