AVR opcodes analyzed

In this first section I give an overview of opcodes and mnemonics for the Atmel AVR (8 bit family) of microcontrollers. For a later project, I need to know how the instruction set is built up and encoded in executables.

To do this, I entered the mnemonics and the binary patterns that identify each one of them in a text file and then resorted the file, based on the opcode. This yielded a list with the NOP in top and all other instructions following it. The list shows that

The list with mnemonics and opcodes

Through the internet I found document "0865E-AVR-11/05 'Atmel 8 bit AVR instruction set'" as a PDF file. From it, I made the following list:

adc		: 0 0 0 1 1 1
add		: 0 0 0 0 1 1
adiw		: 1 0 0 1 0 1 1 0
and		: 0 0 1 0 0 0
andi		: 0 1 1 1
asr		: 1 0 0 1 0 1 0           0 1 0 1
bclr		: 1 0 0 1 0 1 0 0 1       1 0 0 0
bld		: 1 1 1 1 1 0 0 	  0
brbc		: 1 1 1 1 0 1
brbs		: 1 1 1 1 0 0
brcc		: 1 1 1 1 0 1               0 0 0
brcs		: 1 1 1 1 0 0               0 0 0
break		: 1 0 0 1 0 1 0 1 1 0 0 1 1 0 0 0
breq		: 1 1 1 1 0 0               0 0 1
brge		: 1 1 1 1 0 1               1 0 0
brhc		: 1 1 1 1 0 1               1 0 1
brhs		: 1 1 1 1 0 0               1 0 1
brid		: 1 1 1 1 0 1               1 1 1
brie		: 1 1 1 1 0 0               1 1 1
brlo		: 1 1 1 1 0 0               0 0 0
brlt		: 1 1 1 1 0 0               1 0 0
brmi		: 1 1 1 1 0 0               0 1 0
brne		: 1 1 1 1 0 1               0 0 1
brpl		: 1 1 1 1 0 1               0 1 0
brsh		: 1 1 1 1 0 1               0 0 0
brtc		: 1 1 1 1 0 1               1 1 0
brts		: 1 1 1 1 0 0               1 1 0
brvc		: 1 1 1 1 0 1               0 1 1
brvs		: 1 1 1 1 0 0               0 1 1
bset		: 1 0 0 1 0 1 0 0 0       1 0 0 0
bst		: 1 1 1 1 1 0 1           0
call		: 1 0 0 1 0 1 0           1 1 1
cbi		: 1 0 0 1 1 0 0 0
cbr		: 0 1 1 1
clc		: 1 0 0 1 0 1 0 0 1 0 0 0 1 0 0 0
clh		: 1 0 0 1 0 1 0 0 1 1 0 1 1 0 0 0
cli		: 1 0 0 1 0 1 0 0 1 1 1 1 1 0 0 0
cln		: 1 0 0 1 0 1 0 0 1 0 1 0 1 0 0 0
clr		: 0 0 1 0 0 1
cls		: 1 0 0 1 0 1 0 0 1 1 0 0 1 0 0 0
clt		: 1 0 0 1 0 1 0 0 1 1 1 0 1 0 0 0
clv		: 1 0 0 1 0 1 0 0 1 0 1 1 1 0 0 0
clz		: 1 0 0 1 0 1 0 0 1 0 0 1 1 0 0 0
com		: 1 0 0 1 0 1 0           0 0 0 0
cp		: 0 0 0 1 0 1
cpc		: 0 0 0 0 0 1
cpi		: 0 0 1 1
cpse		: 0 0 0 1 0 0
dec		: 1 0 0 1 0 1 0           1 0 1 0
eicall		: 1 0 0 1 0 1 0 1 0 0 0 1 1 0 0 1
eijmp		: 1 0 0 1 0 1 0 0 0 0 0 1 1 0 0 1
elpm R0		: 1 0 0 1 0 1 0 1 1 1 0 1 1 0 0 0
elpm Rd		: 1 0 0 1 0 0 0           0 1 1 0
elpm rd, Z	: 1 0 0 1 0 0 0           0 1 1 1
eor  	 	: 0 0 1 0 0 1
fmul		: 0 0 0 0 0 0 1 1 0       1
fmuls		: 0 0 0 0 0 0 1 1 1       0
fmulsu		: 0 0 0 0 0 0 1 1 1       1
icall		: 1 0 0 1 0 1 0 1 0 0 0 0 1 0 0 1
ijmp		: 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0 1
in		: 1 0 1 1 0
inc		: 1 0 0 1 0 1 0           0 0 1 1
jmp		: 1 0 0 1 0 1 0           1 1 0
ld Rd, x	: 1 0 0 1 0 0 0           1 1 0 0
ld Rd, X+	: 1 0 0 1 0 0 0           1 1 0 1
ld Rd, -X	: 1 0 0 1 0 0 0           1 1 1 0
ld Rd, Y	: 1 0 0 0 0 0 0           1 0 0 0
ld Rd, Y+	: 1 0 0 1 0 0 0           1 0 0 1
ld Rd, -Y	: 1 0 0 1 0 0 0           1 0 1 0
ldd Rd, Y+q	: 1 0   0     0           1
ld Rd, Z	: 1 0 0 0 0 0 0           0 0 0 0
ld Rd, Z+	: 1 0 0 1 0 0 0           0 0 0 1
ld Rd, -Z	: 1 0 0 1 0 0 0           0 0 1 0
ldd Rd, Z+q	: 1 0   0     0           0
ldi 		: 1 1 1 0
lds		: 1 0 0 1 0 0 0           0 0 0 0
lpm R0, Z	: 1 0 0 1 0 1 0 1 1 1 0 0 1 0 0 0
lpm Rd, Z	: 1 0 0 1 0 0 0           0 1 0 0
lpm Rd, Z+	: 1 0 0 1 0 0 0           0 1 0 1
lsl 		: 0 0 0 0 1 1
lsr		: 1 0 0 1 0 1 0           0 1 1 0
mov		: 0 0 1 0 1 1
movw		: 0 0 0 0 0 0 0 1
mul		: 1 0 0 1 1 1
muls		: 0 0 0 0 0 0 1 0
mulsu		: 0 0 0 0 0 0 1 1 0       0
neg		: 1 0 0 1 0 1 0           0 0 0 1
nop		: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
or		: 0 0 1 0 1 0
ori		: 0 1 1 0
out		: 1 0 1 1 1
pop		: 1 0 0 1 0 0 0           1 1 1 1
push		: 1 0 0 1 0 0 1           1 1 1 1
rcall		: 1 1 0 1
ret		: 1 0 0 1 0 1 0 1 0 0 0 0 1 0 0 0
reti		: 1 0 0 1 0 1 0 1 0 0 0 1 1 0 0 0
rjmp		: 1 1 0 0
rol		: 0 0 0 1 1 1
ror		: 1 0 0 1 0 1 0           0 1 1 1
sbc		: 0 0 0 0 1 0
sbci		: 0 1 0 0
sbi		: 1 0 0 1 1 0 1 0
sbic		: 1 0 0 1 1 0 0 1
sbis		: 1 0 0 1 1 0 1 1
sbiw		: 1 0 0 1 0 1 1 1
sbr		: 0 1 1 0
sbrc		: 1 1 1 1 1 1 0           0
sbrs		: 1 1 1 1 1 1 1           0
sec		: 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0
seh		: 1 0 0 1 0 1 0 0 0 1 0 1 1 0 0 0
sei		: 1 0 0 1 0 1 0 0 0 1 1 1 1 0 0 0
sen		: 1 0 0 1 0 1 0 0 0 0 1 0 1 0 0 0
ser		: 1 1 1 0 1 1 1 1         1 1 1 1
ses		: 1 0 0 1 0 1 0 0 0 1 0 0 1 0 0 0
set		: 1 0 0 1 0 1 0 0 0 1 1 0 1 0 0 0
sev		: 1 0 0 1 0 1 0 0 0 0 1 1 1 0 0 0
sez		: 1 0 0 1 0 1 0 0 0 0 0 1 1 0 0 0
sleep		: 1 0 0 1 0 1 0 1 1 0 0 0 1 0 0 0
spm		: 1 0 0 1 0 1 0 1 1 1 1 0 1 0 0 0
st X, R		: 1 0 0 1 0 0 1           1 1 0 0
st X+, R	: 1 0 0 1 0 0 1           1 1 0 1
st -X, R	: 1 0 0 1 0 0 1           1 1 1 0
st Y, R		: 1 0 0 0 0 0 1           1 0 0 0
st Y+, R	: 1 0 0 1 0 0 1           1 0 0 1
st -Y, R	: 1 0 0 1 0 0 1           1 0 1 0
std Y+q, R	: 1 0   0     1           1
st Z, R	 	: 1 0 0 0 0 0 1           0 0 0 0
st Z+, R	: 1 0 0 1 0 0 1           0 0 0 1
st -Z, R	: 1 0 0 1 0 0 1           0 0 1 0
std Z+q, R	: 1 0   0     1           0
sts 	 	: 1 0 0 1 0 0 1           0 0 0 0
sub		: 0 0 0 1 1 0
subi		: 0 1 0 1
swap		: 1 0 0 1 0 1 0           0 0 1 0
tst		: 0 0 1 0 0 0
wdr		: 1 0 0 1 0 1 0 1 1 0 1 0 1 0 0 0
    
The 'holes' in the list represent the places where addresses of registers and flags must be inserted. I could have entered the 'rrrrr' and 'ddddd' from the Atmel documentation but that would have rendered the file practically unreadable.

Mnemonics sorted by opcode

I resorted the file with the Unix 'sort' filter as follows:

     sort opcodes --key=2 -t ':' >numcodes
    
which resulted in the following list:
      
nop		: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
movw		: 0 0 0 0 0 0 0 1
muls		: 0 0 0 0 0 0 1 0
mulsu		: 0 0 0 0 0 0 1 1 0       0
fmul		: 0 0 0 0 0 0 1 1 0       1
fmuls		: 0 0 0 0 0 0 1 1 1       0
fmulsu		: 0 0 0 0 0 0 1 1 1       1
cpc		: 0 0 0 0 0 1
sbc		: 0 0 0 0 1 0
add		: 0 0 0 0 1 1
lsl 		: 0 0 0 0 1 1
cpse		: 0 0 0 1 0 0
cp		: 0 0 0 1 0 1
sub		: 0 0 0 1 1 0
adc		: 0 0 0 1 1 1
rol		: 0 0 0 1 1 1
and		: 0 0 1 0 0 0
tst		: 0 0 1 0 0 0
clr		: 0 0 1 0 0 1
eor  	 	: 0 0 1 0 0 1
or		: 0 0 1 0 1 0
mov		: 0 0 1 0 1 1
cpi		: 0 0 1 1
sbci		: 0 1 0 0
subi		: 0 1 0 1
ori		: 0 1 1 0
sbr		: 0 1 1 0
andi		: 0 1 1 1
cbr		: 0 1 1 1
ldd Rd, Z+q	: 1 0   0     0           0
ldd Rd, Y+q	: 1 0   0     0           1
std Z+q, R	: 1 0   0     1           0
std Y+q, R	: 1 0   0     1           1
ld Rd, Z	: 1 0 0 0 0 0 0           0 0 0 0
ld Rd, Y	: 1 0 0 0 0 0 0           1 0 0 0
st Z, R	 	: 1 0 0 0 0 0 1           0 0 0 0
st Y, R		: 1 0 0 0 0 0 1           1 0 0 0
lds		: 1 0 0 1 0 0 0           0 0 0 0
ld Rd, Z+	: 1 0 0 1 0 0 0           0 0 0 1
ld Rd, -Z	: 1 0 0 1 0 0 0           0 0 1 0
lpm Rd, Z	: 1 0 0 1 0 0 0           0 1 0 0
lpm Rd, Z+	: 1 0 0 1 0 0 0           0 1 0 1
elpm Rd		: 1 0 0 1 0 0 0           0 1 1 0
elpm rd, Z	: 1 0 0 1 0 0 0           0 1 1 1
ld Rd, Y+	: 1 0 0 1 0 0 0           1 0 0 1
ld Rd, -Y	: 1 0 0 1 0 0 0           1 0 1 0
ld Rd, x	: 1 0 0 1 0 0 0           1 1 0 0
ld Rd, X+	: 1 0 0 1 0 0 0           1 1 0 1
ld Rd, -X	: 1 0 0 1 0 0 0           1 1 1 0
pop		: 1 0 0 1 0 0 0           1 1 1 1
sts 	 	: 1 0 0 1 0 0 1           0 0 0 0
st Z+, R	: 1 0 0 1 0 0 1           0 0 0 1
st -Z, R	: 1 0 0 1 0 0 1           0 0 1 0
st Y+, R	: 1 0 0 1 0 0 1           1 0 0 1
st -Y, R	: 1 0 0 1 0 0 1           1 0 1 0
st X, R		: 1 0 0 1 0 0 1           1 1 0 0
st X+, R	: 1 0 0 1 0 0 1           1 1 0 1
st -X, R	: 1 0 0 1 0 0 1           1 1 1 0
push		: 1 0 0 1 0 0 1           1 1 1 1
com		: 1 0 0 1 0 1 0           0 0 0 0
neg		: 1 0 0 1 0 1 0           0 0 0 1
swap		: 1 0 0 1 0 1 0           0 0 1 0
inc		: 1 0 0 1 0 1 0           0 0 1 1
asr		: 1 0 0 1 0 1 0           0 1 0 1
lsr		: 1 0 0 1 0 1 0           0 1 1 0
ror		: 1 0 0 1 0 1 0           0 1 1 1
dec		: 1 0 0 1 0 1 0           1 0 1 0
jmp		: 1 0 0 1 0 1 0           1 1 0
call		: 1 0 0 1 0 1 0           1 1 1
bset		: 1 0 0 1 0 1 0 0 0       1 0 0 0
sec		: 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0
ijmp		: 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0 1
sez		: 1 0 0 1 0 1 0 0 0 0 0 1 1 0 0 0
eijmp		: 1 0 0 1 0 1 0 0 0 0 0 1 1 0 0 1
sen		: 1 0 0 1 0 1 0 0 0 0 1 0 1 0 0 0
sev		: 1 0 0 1 0 1 0 0 0 0 1 1 1 0 0 0
ses		: 1 0 0 1 0 1 0 0 0 1 0 0 1 0 0 0
seh		: 1 0 0 1 0 1 0 0 0 1 0 1 1 0 0 0
set		: 1 0 0 1 0 1 0 0 0 1 1 0 1 0 0 0
sei		: 1 0 0 1 0 1 0 0 0 1 1 1 1 0 0 0
bclr		: 1 0 0 1 0 1 0 0 1       1 0 0 0
clc		: 1 0 0 1 0 1 0 0 1 0 0 0 1 0 0 0
clz		: 1 0 0 1 0 1 0 0 1 0 0 1 1 0 0 0
cln		: 1 0 0 1 0 1 0 0 1 0 1 0 1 0 0 0
clv		: 1 0 0 1 0 1 0 0 1 0 1 1 1 0 0 0
cls		: 1 0 0 1 0 1 0 0 1 1 0 0 1 0 0 0
clh		: 1 0 0 1 0 1 0 0 1 1 0 1 1 0 0 0
clt		: 1 0 0 1 0 1 0 0 1 1 1 0 1 0 0 0
cli		: 1 0 0 1 0 1 0 0 1 1 1 1 1 0 0 0
ret		: 1 0 0 1 0 1 0 1 0 0 0 0 1 0 0 0
icall		: 1 0 0 1 0 1 0 1 0 0 0 0 1 0 0 1
reti		: 1 0 0 1 0 1 0 1 0 0 0 1 1 0 0 0
eicall		: 1 0 0 1 0 1 0 1 0 0 0 1 1 0 0 1
sleep		: 1 0 0 1 0 1 0 1 1 0 0 0 1 0 0 0
break		: 1 0 0 1 0 1 0 1 1 0 0 1 1 0 0 0
wdr		: 1 0 0 1 0 1 0 1 1 0 1 0 1 0 0 0
lpm R0, Z	: 1 0 0 1 0 1 0 1 1 1 0 0 1 0 0 0
elpm R0		: 1 0 0 1 0 1 0 1 1 1 0 1 1 0 0 0
spm		: 1 0 0 1 0 1 0 1 1 1 1 0 1 0 0 0
adiw		: 1 0 0 1 0 1 1 0
sbiw		: 1 0 0 1 0 1 1 1
cbi		: 1 0 0 1 1 0 0 0
sbic		: 1 0 0 1 1 0 0 1
sbi		: 1 0 0 1 1 0 1 0
sbis		: 1 0 0 1 1 0 1 1
mul		: 1 0 0 1 1 1
in		: 1 0 1 1 0
out		: 1 0 1 1 1
rjmp		: 1 1 0 0
rcall		: 1 1 0 1
ldi 		: 1 1 1 0
ser		: 1 1 1 0 1 1 1 1         1 1 1 1
brbs		: 1 1 1 1 0 0
brcs		: 1 1 1 1 0 0               0 0 0
brlo		: 1 1 1 1 0 0               0 0 0
breq		: 1 1 1 1 0 0               0 0 1
brmi		: 1 1 1 1 0 0               0 1 0
brvs		: 1 1 1 1 0 0               0 1 1
brlt		: 1 1 1 1 0 0               1 0 0
brhs		: 1 1 1 1 0 0               1 0 1
brts		: 1 1 1 1 0 0               1 1 0
brie		: 1 1 1 1 0 0               1 1 1
brbc		: 1 1 1 1 0 1
brcc		: 1 1 1 1 0 1               0 0 0
brsh		: 1 1 1 1 0 1               0 0 0
brne		: 1 1 1 1 0 1               0 0 1
brpl		: 1 1 1 1 0 1               0 1 0
brvc		: 1 1 1 1 0 1               0 1 1
brge		: 1 1 1 1 0 1               1 0 0
brhc		: 1 1 1 1 0 1               1 0 1
brtc		: 1 1 1 1 0 1               1 1 0
brid		: 1 1 1 1 0 1               1 1 1
bld		: 1 1 1 1 1 0 0 	  0
bst		: 1 1 1 1 1 0 1           0
sbrc		: 1 1 1 1 1 1 0           0
sbrs		: 1 1 1 1 1 1 1           0
    

Observations

Look at this:

      
    clr     : 0 0 1 0 0 1
    eor     : 0 0 1 0 0 1
   
It seems that the CLR and EOR instructions have identical opcodes. When we go check the datasheet, we see that CLR uses a 5 bit register address, but there are 10 bits to be filled in. The EOR on the other hand, needs two groups of 5 bits:
      
    eor     : 0 0 1 0 0 1 r d d d d d r r r r
   
in which 'rrrrr' and 'ddddd' are 5 bit addresses. So if you use identical numbers for 'rrrrr' and 'ddddd', you get
    eor   Rx, Rx
   
which boils down to
    clr   Rx
   
Apparently you just need to know this, since without this knowledge, you cannot compose the 10 bit register address of the CLR instruction.

Another strange example:

    add    : 0 0 0 0 1 1
    lsl    : 0 0 0 0 1 1
   
It turns out, that the LSL (Logical Shift Left) is composed of an 'ADD Rx, Rx' instruction. Which is a bit of a disappointment since shifting is considered more efficient than adding. So take care with building the opcodes for the LSL instruction since it needs two times the same (but scrambled) 5 bit addressfield.

    adc    : 0 0 0 1 1 1
    rol    : 0 0 0 1 1 1
   
Yet another trick. ROL shifts left the involved number and then copies the Carry flag into the LSB. Which, if you think of it carefully, is identical to
    ADC   Rx, Rx
   
Of course with the same addressing trick. Neat.

    and    : 0 0 1 0 0 0
    tst    : 0 0 1 0 0 0
   
Anding a register with itself is identical to performing a TST. Yet another trick of the Atmel engineers. Just to spoil us. I wonder how many more monkeys they have up their sleeves.

Two more examples: 'OR Immediate' versus 'Set Bits in Register' and 'AND Immediate' versus 'Clear Bits in Register':

    ori    : 0 1 1 0
    sbr    : 0 1 1 0
    andi   : 0 1 1 1
    cbr    : 0 1 1 1
   
The seasoned programmers among us will get another 'Aha!' experience. Of course an ORI is the same as an SBR since you use the OR instruction (or logical gate) to force bits to a '1' position. And when you want to clear bits, you AND them with a complemented '1' (aka a '0'). Hence the relation between ANDI and CBR.
From this, I must conclude that the CBR instruction effectively is a standard macro of the AVR assembler. You cannot find out which one was intended by the the programmer just by looking at the opcode. You can guess, but not more than that.

Here's a more peculiar one:

    bset   : 1 0 0 1 0 1 0 0 0       1 0 0 0
    sec    : 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0
   
The bset (Bit SET in status register) instruction can set bits in the flags register. It has a three bit hole in the third nibble from the left. The 8 bits of the flags register are addressed as follows:
    I    1 1 1
    T    1 1 0
    H    1 0 1
    S    1 0 0
    V    0 1 1
    N    0 1 0
    Z    0 0 1
    C    0 0 0
   
so if we insert the code for the Carry flag in the Bset mnemonic we get the opcode for the 'sec' (SEt Carry flag) instruction.

Here's more of the same. You can figure out how it's done:

    bclr   : 1 0 0 1 0 1 0 0 1       1 0 0 0
    clc    : 1 0 0 1 0 1 0 0 1 0 0 0 1 0 0 0
    clz    : 1 0 0 1 0 1 0 0 1 0 0 1 1 0 0 0
    cln    : 1 0 0 1 0 1 0 0 1 0 1 0 1 0 0 0
    clv    : 1 0 0 1 0 1 0 0 1 0 1 1 1 0 0 0
    cls    : 1 0 0 1 0 1 0 0 1 1 0 0 1 0 0 0
    clh    : 1 0 0 1 0 1 0 0 1 1 0 1 1 0 0 0
    clt    : 1 0 0 1 0 1 0 0 1 1 1 0 1 0 0 0
    cli    : 1 0 0 1 0 1 0 0 1 1 1 1 1 0 0 0
   

Something similar with the 'brbs' instruction (BRanch if Bit in flags register is Set).

    brbs   : 1 1 1 1 0 0
    brcs   : 1 1 1 1 0 0               0 0 0
    brlo   : 1 1 1 1 0 0               0 0 0
    breq   : 1 1 1 1 0 0               0 0 1
    brmi   : 1 1 1 1 0 0               0 1 0
    brvs   : 1 1 1 1 0 0               0 1 1
    brlt   : 1 1 1 1 0 0               1 0 0
    brhs   : 1 1 1 1 0 0               1 0 1
    brts   : 1 1 1 1 0 0               1 1 0
    brie   : 1 1 1 1 0 0               1 1 1
   
As you can see, BRCS (BRanch on Carryflag Set) is identical to BRLO (BRanch on LOwer). Which is logical since 'A lower than B' will always lead to a set carry flag after the operation 'A - B'.

The next one is almost identical to the BRBS case. BRBC is short for 'BRanch if Bit in flagsregister is Cleared':

    brbc   : 1 1 1 1 0 1
    brcc   : 1 1 1 1 0 1               0 0 0
    brsh   : 1 1 1 1 0 1               0 0 0
   
Here we have the brcc (BRanch on Carryflag Cleared) versus the brsh (BRanch if Same or Higher).

Another example:

    ldd    Rd, Z + q    : 1 0 d 0 d d 0           0 d d d
    ld     Rd, Z        : 1 0 0 0 0 0 0           0 0 0 0
   
I entered the encoding of the displacement 'q' with the letters 'd' in the ldd instruction. It is totally obvious that the ld instruction is just a special case of the ldd.

And one last one, that slipped my inspection in the first place:

    ldi    : 1 1 1 0
    ser    : 1 1 1 0 1 1 1 1         1 1 1 1
   
ldi is short for LoaD Immediate and ser is SEt all bits in Register. So, ser loads the value FF into the register. See something funny here? The ser is a special case of the ldi instruction. Ser is ldi with the FF built-in...

Conclusions

There are quite some doubled instructions in the AVR instruction set. It can be nice to have two ways of writing for the same opcode, but it may also be confusing, since without a breakdown of the instructionset it is not very logical to assume that two (sometimes rather) diverse mnemonics do exactly the same.