ASM: Circular or Wrap around buffer example

I know assembly is not a big thing in this forum and maybe not even for this category, but for those that might be interested, I've cooked together an algorithm which will be an integral part of the TWI (I2C) driver that I'm designing in ASM

The scope of this routine is to provide a convenient means by which to read/write characters to a 255 byte buffer, only by passing a pointer to the array in "X" and the character in R16. Optionally it can be a means by which establishing a pointer in "X" to the beginning or end of buffer dependant upon reading or writing.

The buffer needs to be declared in this format although alignment on any boundary is not necessary.

; =============================================================================================

		.byte  256			; 256 byte wrap around/circular buffer
  BIdx:		.byte  1			; End Index
		.byte  1			; Start Index
		.byte  2			; Pointer to nested routine for subroutine.

I've tested this thuorouly and it's benchmarked out at approx. 2.45 µs/iteration and obviously, if a nested procedure is declared, that would have to be taken into account.

; =============================================================================================
; Read/Write a Cylindrical/Wrap Around 256 byte buffer, with optional nested function or 
; subroutine.

;	ENTER:	   X = Pointer to buffers indices
;		TEMP = Byte to be written to buffer
;		  R0 : Bit 0 = 1 to write to buffer, read otherwise
;			   1 = 1 to execute optional function/subroutine

;	LEAVE:	TEMP = Value returned from next position in buffer

;		 R16 (TEMP)
;		 R09
;		 R10			Altered
;		 R11

;	FLAGS:	If error R10 = 0FFH and carry flag is set, otherwise clear
; ---------------------------------------------------------------------------------------------
  RW_Buffer:	push	ZL
		push	ZH
		push	XL
		push	XL
		bst	R0, 0			; Simplifies condition @ RW_Post

	; Caller has passed pointer to indices, so by default we'll calculate how bytes are
	; left to be read or if bit 0 of R0 is set, then number of bytes that can be written.

		ld	R10, X+			; Offset to end of data
		ld	R11, X+			; Offset to beginning of data
		sub	R10, R11		; R10 = bytes to be read
		mov	R9, R10			; This simplifies 
		sbrc	R0, 0
		com	R10			; R10 = byte that can be written.
		brne	RW_Proc

	; At this point R10 = 0, so we know dependant upon bit 0 of R0, buffer is either
	; full or empty

 		dec	R10			; Set FF as return code buffer full/empty
		sec				; Set carry (error)
		pop	XL			; Waste value on stack
		rjmp	RW_Error

	; In the event Bit 1 of R0 is on (optional procedure), Z needs to be initialized
	; for ICALL instruction.

      RW_Proc:	ld	ZL, X+
		ld	ZH, X

	; X needs to point to proper position in buffer dependant upon reading or writing.
	; Space needs to be allocated a whole page (256 bytes) before Buffer_Indices.

		dec	XH			; Bump back one page
		subi	XL, 2			; Re-align to beginning
		add	XL, R11			; Offset where next byte is to be read from
		sbrc	R0, 0			; Are we writing. Bit 0 = 1
		add	XL, R9			; Bump ahead to next position to write to.
		sbrs	R0, 1					
		rjmp	RW_Post			; Bounce is not executing nested routine

	; R10 = bytes that can be read, or bytes that can be written to and X is the pointer.
	; It will be callee's responsibility to return error codes and registers according
	; to algorithm's scope. 

		pop	XL			; Waste parameter on stack
		rjmp	RW_Error

     RW_Post:	brts	RW_Write		; Was "T" bit set in preamble

        ; Read byte from buffer, point "X" to appropriate index

                ld	TEMP, X
                pop	XL
                inc	XL
                rjmp	RW_Finished

        ; Write byte to buffer, point "X" to appropriate index

    RW_Write:	st	X, TEMP
                pop	XL

        ; Update appropriate index

 RW_Finished:	inc	XH			; Bump ahead one page
                ld	R9, X			; Read index
                inc	R9			; Bump it by 1
                st	X, R9			; Write it back
                clc				; Assure carry is cleared

        ; Postable or Error cleans up stack at returns to caller

    RW_Error:	pop	XL
		pop	ZH
                pop	ZL

The need for this snippet was prompted by thinking about what I needed to send data packets to 1602 LCD via TWI using interrupts.

Why so gratuitously incompatible with the avr-gcc function call definition? (In particular, using R9/R10/R11 without saving them!)

Why so gratuitously incompatible with the avr-gcc function call definition?

Inauguraly the scope and purpose is to learn the AVR architecture and methodologies oriented around the instruction set. As I'm just a mere 10 days into this, I didn't realize AVR even had a calling convention, but that being said, even in IA32 I move to the beat of my own drum and that may very well be the case here.

I do appreciate your input as it is just another piece to the plethora of information that I've been gleaning in the past few days.

Ah. The info is here: Frequently Asked Questions