chunklady:
which is exactly what I am doing - I would rather do the entire thing in asm, than decompile my binaries all the time, to count cycles. + its fun 
What I often do - when the occasion arises - is to decompile the generated C code. That's easy to do, and you can count cycles as well.
(edit) Re-reading your post, it looks like you are doing exactly that. 
If it's any help here is my table of cycles for the Atmega328, as a Lua table:
cycles = {
-- ARITHMETIC AND LOGIC INSTRUCTIONS
ADC = "1",
ADD = "1",
ADIW = "2",
AND = "1",
ANDI = "1",
CBR = "1",
CLR = "1",
COM = "1",
DEC = "1",
EOR = "1",
FMUL = "2",
FMULS = "2",
FMULSU = "2",
INC = "1",
MUL = "2",
MULS = "2",
MULSU = "2",
NEG = "1",
OR = "1",
ORI = "1",
SBC = "1",
SBCI = "1",
SBIW = "2",
SBR = "1",
SER = "1",
SUB = "1",
SUBI = "1",
TST = "1",
-- BRANCH INSTRUCTIONS
BRBC = "1/2",
BRBS = "1/2",
BRCC = "1/2",
BRCS = "1/2",
BREQ = "1/2",
BRGE = "1/2",
BRHC = "1/2",
BRHS = "1/2",
BRID = "1/2",
BRIE = "1/2",
BRLO = "1/2",
BRLT = "1/2",
BRMI = "1/2",
BRNE = "1/2",
BRPL = "1/2",
BRSH = "1/2",
BRTC = "1/2",
BRTS = "1/2",
BRVC = "1/2",
BRVS = "1/2",
CALL = "4",
CP = "1",
CPC = "1",
CPI = "1",
CPSE = "1/2/3",
ICALL = "3",
IJMP = "2",
JMP = "3",
RCALL = "3",
RET = "4",
RETI = "4",
RJMP = "2",
SBIC = "1/2/3",
SBIS = "1/2/3",
SBRC = "1/2/3",
SBRS = "1/2/3",
-- BIT AND BIT-TEST INSTRUCTIONS
ASR = "1",
BCLR = "1",
BLD = "1",
BSET = "1",
BST = "1",
CBI = "2",
CLC = "1",
CLH = "1",
CLI = "1",
CLN = "1",
CLS = "1",
CLT = "1",
CLV = "1",
CLZ = "1",
LSL = "1",
LSR = "1",
ROL = "1",
ROR = "1",
SBI = "2",
SEC = "1",
SEH = "1",
SEI = "1",
SEN = "1",
SES = "1",
SET = "1",
SEV = "1",
SEZ = "1",
SWAP = "1",
-- DATA TRANSFER INSTRUCTIONS
IN = "1",
LD = "2",
LDD = "2",
LDI = "1",
LDS = "2",
LPM = "3",
MOV = "1",
MOVW = "1",
OUT = "1",
POP = "2",
PUSH = "2",
ST = "2",
STD = "2",
STS = "2",
-- MCU CONTROL INSTRUCTIONS
NOP = "1",
SLEEP = "1",
WDR = "1",
} -- end of cycles table
A bit of scanning of the assembler, a regular expression or two, and you can output the cycle count next to each line. For example, for the posted snippet above:
be: 84 e3 ldi r24, 0x34 ; 52 (1)
c0: 92 e1 ldi r25, 0x12 ; 18 (1)
c2: 90 93 85 00 sts 0x0085, r25 (2)
c6: 80 93 84 00 sts 0x0084, r24 (2)
Cycles in brackets.