AVR opcodes analyzed
In this first section I give an overview of opcodes and mnemonics for the Atmel AVR (8 bit family) of microcontrollers. For a later project, I need to know how the instruction set is built up and encoded in executables.
To do this, I entered the mnemonics and the binary patterns that identify each one of them in a text file and then resorted the file, based on the opcode. This yielded a list with the NOP in top and all other instructions following it. The list shows that
The list with mnemonics and opcodes
Through the internet I found document "0865E-AVR-11/05 'Atmel 8 bit AVR instruction set'" as a PDF file. From it, I made the following list:
adc : 0 0 0 1 1 1
add : 0 0 0 0 1 1
adiw : 1 0 0 1 0 1 1 0
and : 0 0 1 0 0 0
andi : 0 1 1 1
asr : 1 0 0 1 0 1 0 0 1 0 1
bclr : 1 0 0 1 0 1 0 0 1 1 0 0 0
bld : 1 1 1 1 1 0 0 0
brbc : 1 1 1 1 0 1
brbs : 1 1 1 1 0 0
brcc : 1 1 1 1 0 1 0 0 0
brcs : 1 1 1 1 0 0 0 0 0
break : 1 0 0 1 0 1 0 1 1 0 0 1 1 0 0 0
breq : 1 1 1 1 0 0 0 0 1
brge : 1 1 1 1 0 1 1 0 0
brhc : 1 1 1 1 0 1 1 0 1
brhs : 1 1 1 1 0 0 1 0 1
brid : 1 1 1 1 0 1 1 1 1
brie : 1 1 1 1 0 0 1 1 1
brlo : 1 1 1 1 0 0 0 0 0
brlt : 1 1 1 1 0 0 1 0 0
brmi : 1 1 1 1 0 0 0 1 0
brne : 1 1 1 1 0 1 0 0 1
brpl : 1 1 1 1 0 1 0 1 0
brsh : 1 1 1 1 0 1 0 0 0
brtc : 1 1 1 1 0 1 1 1 0
brts : 1 1 1 1 0 0 1 1 0
brvc : 1 1 1 1 0 1 0 1 1
brvs : 1 1 1 1 0 0 0 1 1
bset : 1 0 0 1 0 1 0 0 0 1 0 0 0
bst : 1 1 1 1 1 0 1 0
call : 1 0 0 1 0 1 0 1 1 1
cbi : 1 0 0 1 1 0 0 0
cbr : 0 1 1 1
clc : 1 0 0 1 0 1 0 0 1 0 0 0 1 0 0 0
clh : 1 0 0 1 0 1 0 0 1 1 0 1 1 0 0 0
cli : 1 0 0 1 0 1 0 0 1 1 1 1 1 0 0 0
cln : 1 0 0 1 0 1 0 0 1 0 1 0 1 0 0 0
clr : 0 0 1 0 0 1
cls : 1 0 0 1 0 1 0 0 1 1 0 0 1 0 0 0
clt : 1 0 0 1 0 1 0 0 1 1 1 0 1 0 0 0
clv : 1 0 0 1 0 1 0 0 1 0 1 1 1 0 0 0
clz : 1 0 0 1 0 1 0 0 1 0 0 1 1 0 0 0
com : 1 0 0 1 0 1 0 0 0 0 0
cp : 0 0 0 1 0 1
cpc : 0 0 0 0 0 1
cpi : 0 0 1 1
cpse : 0 0 0 1 0 0
dec : 1 0 0 1 0 1 0 1 0 1 0
eicall : 1 0 0 1 0 1 0 1 0 0 0 1 1 0 0 1
eijmp : 1 0 0 1 0 1 0 0 0 0 0 1 1 0 0 1
elpm R0 : 1 0 0 1 0 1 0 1 1 1 0 1 1 0 0 0
elpm Rd : 1 0 0 1 0 0 0 0 1 1 0
elpm rd, Z : 1 0 0 1 0 0 0 0 1 1 1
eor : 0 0 1 0 0 1
fmul : 0 0 0 0 0 0 1 1 0 1
fmuls : 0 0 0 0 0 0 1 1 1 0
fmulsu : 0 0 0 0 0 0 1 1 1 1
icall : 1 0 0 1 0 1 0 1 0 0 0 0 1 0 0 1
ijmp : 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0 1
in : 1 0 1 1 0
inc : 1 0 0 1 0 1 0 0 0 1 1
jmp : 1 0 0 1 0 1 0 1 1 0
ld Rd, x : 1 0 0 1 0 0 0 1 1 0 0
ld Rd, X+ : 1 0 0 1 0 0 0 1 1 0 1
ld Rd, -X : 1 0 0 1 0 0 0 1 1 1 0
ld Rd, Y : 1 0 0 0 0 0 0 1 0 0 0
ld Rd, Y+ : 1 0 0 1 0 0 0 1 0 0 1
ld Rd, -Y : 1 0 0 1 0 0 0 1 0 1 0
ldd Rd, Y+q : 1 0 0 0 1
ld Rd, Z : 1 0 0 0 0 0 0 0 0 0 0
ld Rd, Z+ : 1 0 0 1 0 0 0 0 0 0 1
ld Rd, -Z : 1 0 0 1 0 0 0 0 0 1 0
ldd Rd, Z+q : 1 0 0 0 0
ldi : 1 1 1 0
lds : 1 0 0 1 0 0 0 0 0 0 0
lpm R0, Z : 1 0 0 1 0 1 0 1 1 1 0 0 1 0 0 0
lpm Rd, Z : 1 0 0 1 0 0 0 0 1 0 0
lpm Rd, Z+ : 1 0 0 1 0 0 0 0 1 0 1
lsl : 0 0 0 0 1 1
lsr : 1 0 0 1 0 1 0 0 1 1 0
mov : 0 0 1 0 1 1
movw : 0 0 0 0 0 0 0 1
mul : 1 0 0 1 1 1
muls : 0 0 0 0 0 0 1 0
mulsu : 0 0 0 0 0 0 1 1 0 0
neg : 1 0 0 1 0 1 0 0 0 0 1
nop : 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
or : 0 0 1 0 1 0
ori : 0 1 1 0
out : 1 0 1 1 1
pop : 1 0 0 1 0 0 0 1 1 1 1
push : 1 0 0 1 0 0 1 1 1 1 1
rcall : 1 1 0 1
ret : 1 0 0 1 0 1 0 1 0 0 0 0 1 0 0 0
reti : 1 0 0 1 0 1 0 1 0 0 0 1 1 0 0 0
rjmp : 1 1 0 0
rol : 0 0 0 1 1 1
ror : 1 0 0 1 0 1 0 0 1 1 1
sbc : 0 0 0 0 1 0
sbci : 0 1 0 0
sbi : 1 0 0 1 1 0 1 0
sbic : 1 0 0 1 1 0 0 1
sbis : 1 0 0 1 1 0 1 1
sbiw : 1 0 0 1 0 1 1 1
sbr : 0 1 1 0
sbrc : 1 1 1 1 1 1 0 0
sbrs : 1 1 1 1 1 1 1 0
sec : 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0
seh : 1 0 0 1 0 1 0 0 0 1 0 1 1 0 0 0
sei : 1 0 0 1 0 1 0 0 0 1 1 1 1 0 0 0
sen : 1 0 0 1 0 1 0 0 0 0 1 0 1 0 0 0
ser : 1 1 1 0 1 1 1 1 1 1 1 1
ses : 1 0 0 1 0 1 0 0 0 1 0 0 1 0 0 0
set : 1 0 0 1 0 1 0 0 0 1 1 0 1 0 0 0
sev : 1 0 0 1 0 1 0 0 0 0 1 1 1 0 0 0
sez : 1 0 0 1 0 1 0 0 0 0 0 1 1 0 0 0
sleep : 1 0 0 1 0 1 0 1 1 0 0 0 1 0 0 0
spm : 1 0 0 1 0 1 0 1 1 1 1 0 1 0 0 0
st X, R : 1 0 0 1 0 0 1 1 1 0 0
st X+, R : 1 0 0 1 0 0 1 1 1 0 1
st -X, R : 1 0 0 1 0 0 1 1 1 1 0
st Y, R : 1 0 0 0 0 0 1 1 0 0 0
st Y+, R : 1 0 0 1 0 0 1 1 0 0 1
st -Y, R : 1 0 0 1 0 0 1 1 0 1 0
std Y+q, R : 1 0 0 1 1
st Z, R : 1 0 0 0 0 0 1 0 0 0 0
st Z+, R : 1 0 0 1 0 0 1 0 0 0 1
st -Z, R : 1 0 0 1 0 0 1 0 0 1 0
std Z+q, R : 1 0 0 1 0
sts : 1 0 0 1 0 0 1 0 0 0 0
sub : 0 0 0 1 1 0
subi : 0 1 0 1
swap : 1 0 0 1 0 1 0 0 0 1 0
tst : 0 0 1 0 0 0
wdr : 1 0 0 1 0 1 0 1 1 0 1 0 1 0 0 0
The 'holes' in the list represent the places where addresses of registers and flags must be inserted. I could
have entered the 'rrrrr' and 'ddddd' from the Atmel documentation but that would have rendered the file
practically unreadable.
Mnemonics sorted by opcode
I resorted the file with the Unix 'sort' filter as follows:
sort opcodes --key=2 -t ':' >numcodes
which resulted in the following list:
nop : 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
movw : 0 0 0 0 0 0 0 1
muls : 0 0 0 0 0 0 1 0
mulsu : 0 0 0 0 0 0 1 1 0 0
fmul : 0 0 0 0 0 0 1 1 0 1
fmuls : 0 0 0 0 0 0 1 1 1 0
fmulsu : 0 0 0 0 0 0 1 1 1 1
cpc : 0 0 0 0 0 1
sbc : 0 0 0 0 1 0
add : 0 0 0 0 1 1
lsl : 0 0 0 0 1 1
cpse : 0 0 0 1 0 0
cp : 0 0 0 1 0 1
sub : 0 0 0 1 1 0
adc : 0 0 0 1 1 1
rol : 0 0 0 1 1 1
and : 0 0 1 0 0 0
tst : 0 0 1 0 0 0
clr : 0 0 1 0 0 1
eor : 0 0 1 0 0 1
or : 0 0 1 0 1 0
mov : 0 0 1 0 1 1
cpi : 0 0 1 1
sbci : 0 1 0 0
subi : 0 1 0 1
ori : 0 1 1 0
sbr : 0 1 1 0
andi : 0 1 1 1
cbr : 0 1 1 1
ldd Rd, Z+q : 1 0 0 0 0
ldd Rd, Y+q : 1 0 0 0 1
std Z+q, R : 1 0 0 1 0
std Y+q, R : 1 0 0 1 1
ld Rd, Z : 1 0 0 0 0 0 0 0 0 0 0
ld Rd, Y : 1 0 0 0 0 0 0 1 0 0 0
st Z, R : 1 0 0 0 0 0 1 0 0 0 0
st Y, R : 1 0 0 0 0 0 1 1 0 0 0
lds : 1 0 0 1 0 0 0 0 0 0 0
ld Rd, Z+ : 1 0 0 1 0 0 0 0 0 0 1
ld Rd, -Z : 1 0 0 1 0 0 0 0 0 1 0
lpm Rd, Z : 1 0 0 1 0 0 0 0 1 0 0
lpm Rd, Z+ : 1 0 0 1 0 0 0 0 1 0 1
elpm Rd : 1 0 0 1 0 0 0 0 1 1 0
elpm rd, Z : 1 0 0 1 0 0 0 0 1 1 1
ld Rd, Y+ : 1 0 0 1 0 0 0 1 0 0 1
ld Rd, -Y : 1 0 0 1 0 0 0 1 0 1 0
ld Rd, x : 1 0 0 1 0 0 0 1 1 0 0
ld Rd, X+ : 1 0 0 1 0 0 0 1 1 0 1
ld Rd, -X : 1 0 0 1 0 0 0 1 1 1 0
pop : 1 0 0 1 0 0 0 1 1 1 1
sts : 1 0 0 1 0 0 1 0 0 0 0
st Z+, R : 1 0 0 1 0 0 1 0 0 0 1
st -Z, R : 1 0 0 1 0 0 1 0 0 1 0
st Y+, R : 1 0 0 1 0 0 1 1 0 0 1
st -Y, R : 1 0 0 1 0 0 1 1 0 1 0
st X, R : 1 0 0 1 0 0 1 1 1 0 0
st X+, R : 1 0 0 1 0 0 1 1 1 0 1
st -X, R : 1 0 0 1 0 0 1 1 1 1 0
push : 1 0 0 1 0 0 1 1 1 1 1
com : 1 0 0 1 0 1 0 0 0 0 0
neg : 1 0 0 1 0 1 0 0 0 0 1
swap : 1 0 0 1 0 1 0 0 0 1 0
inc : 1 0 0 1 0 1 0 0 0 1 1
asr : 1 0 0 1 0 1 0 0 1 0 1
lsr : 1 0 0 1 0 1 0 0 1 1 0
ror : 1 0 0 1 0 1 0 0 1 1 1
dec : 1 0 0 1 0 1 0 1 0 1 0
jmp : 1 0 0 1 0 1 0 1 1 0
call : 1 0 0 1 0 1 0 1 1 1
bset : 1 0 0 1 0 1 0 0 0 1 0 0 0
sec : 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0
ijmp : 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0 1
sez : 1 0 0 1 0 1 0 0 0 0 0 1 1 0 0 0
eijmp : 1 0 0 1 0 1 0 0 0 0 0 1 1 0 0 1
sen : 1 0 0 1 0 1 0 0 0 0 1 0 1 0 0 0
sev : 1 0 0 1 0 1 0 0 0 0 1 1 1 0 0 0
ses : 1 0 0 1 0 1 0 0 0 1 0 0 1 0 0 0
seh : 1 0 0 1 0 1 0 0 0 1 0 1 1 0 0 0
set : 1 0 0 1 0 1 0 0 0 1 1 0 1 0 0 0
sei : 1 0 0 1 0 1 0 0 0 1 1 1 1 0 0 0
bclr : 1 0 0 1 0 1 0 0 1 1 0 0 0
clc : 1 0 0 1 0 1 0 0 1 0 0 0 1 0 0 0
clz : 1 0 0 1 0 1 0 0 1 0 0 1 1 0 0 0
cln : 1 0 0 1 0 1 0 0 1 0 1 0 1 0 0 0
clv : 1 0 0 1 0 1 0 0 1 0 1 1 1 0 0 0
cls : 1 0 0 1 0 1 0 0 1 1 0 0 1 0 0 0
clh : 1 0 0 1 0 1 0 0 1 1 0 1 1 0 0 0
clt : 1 0 0 1 0 1 0 0 1 1 1 0 1 0 0 0
cli : 1 0 0 1 0 1 0 0 1 1 1 1 1 0 0 0
ret : 1 0 0 1 0 1 0 1 0 0 0 0 1 0 0 0
icall : 1 0 0 1 0 1 0 1 0 0 0 0 1 0 0 1
reti : 1 0 0 1 0 1 0 1 0 0 0 1 1 0 0 0
eicall : 1 0 0 1 0 1 0 1 0 0 0 1 1 0 0 1
sleep : 1 0 0 1 0 1 0 1 1 0 0 0 1 0 0 0
break : 1 0 0 1 0 1 0 1 1 0 0 1 1 0 0 0
wdr : 1 0 0 1 0 1 0 1 1 0 1 0 1 0 0 0
lpm R0, Z : 1 0 0 1 0 1 0 1 1 1 0 0 1 0 0 0
elpm R0 : 1 0 0 1 0 1 0 1 1 1 0 1 1 0 0 0
spm : 1 0 0 1 0 1 0 1 1 1 1 0 1 0 0 0
adiw : 1 0 0 1 0 1 1 0
sbiw : 1 0 0 1 0 1 1 1
cbi : 1 0 0 1 1 0 0 0
sbic : 1 0 0 1 1 0 0 1
sbi : 1 0 0 1 1 0 1 0
sbis : 1 0 0 1 1 0 1 1
mul : 1 0 0 1 1 1
in : 1 0 1 1 0
out : 1 0 1 1 1
rjmp : 1 1 0 0
rcall : 1 1 0 1
ldi : 1 1 1 0
ser : 1 1 1 0 1 1 1 1 1 1 1 1
brbs : 1 1 1 1 0 0
brcs : 1 1 1 1 0 0 0 0 0
brlo : 1 1 1 1 0 0 0 0 0
breq : 1 1 1 1 0 0 0 0 1
brmi : 1 1 1 1 0 0 0 1 0
brvs : 1 1 1 1 0 0 0 1 1
brlt : 1 1 1 1 0 0 1 0 0
brhs : 1 1 1 1 0 0 1 0 1
brts : 1 1 1 1 0 0 1 1 0
brie : 1 1 1 1 0 0 1 1 1
brbc : 1 1 1 1 0 1
brcc : 1 1 1 1 0 1 0 0 0
brsh : 1 1 1 1 0 1 0 0 0
brne : 1 1 1 1 0 1 0 0 1
brpl : 1 1 1 1 0 1 0 1 0
brvc : 1 1 1 1 0 1 0 1 1
brge : 1 1 1 1 0 1 1 0 0
brhc : 1 1 1 1 0 1 1 0 1
brtc : 1 1 1 1 0 1 1 1 0
brid : 1 1 1 1 0 1 1 1 1
bld : 1 1 1 1 1 0 0 0
bst : 1 1 1 1 1 0 1 0
sbrc : 1 1 1 1 1 1 0 0
sbrs : 1 1 1 1 1 1 1 0
Observations
Look at this:
clr : 0 0 1 0 0 1
eor : 0 0 1 0 0 1
It seems that the CLR and EOR instructions have identical opcodes. When we go check the datasheet, we see that
CLR uses a 5 bit register address, but there are 10 bits to be filled in. The EOR on the other hand, needs two
groups of 5 bits:
eor : 0 0 1 0 0 1 r d d d d d r r r r
in which 'rrrrr' and 'ddddd' are 5 bit addresses. So if you use identical numbers for 'rrrrr' and 'ddddd', you
get
eor Rx, Rx
which boils down to
clr Rx
Apparently you just need to know this, since without this knowledge, you cannot compose the 10 bit register
address of the CLR instruction.
Another strange example:
add : 0 0 0 0 1 1
lsl : 0 0 0 0 1 1
It turns out, that the LSL (Logical Shift Left) is composed of an 'ADD Rx, Rx' instruction. Which is a bit of
a disappointment since shifting is considered more efficient than adding. So take care with building the
opcodes for the LSL instruction since it needs two times the same (but scrambled) 5 bit addressfield.
adc : 0 0 0 1 1 1
rol : 0 0 0 1 1 1
Yet another trick. ROL shifts left the involved number and then copies the Carry flag into the LSB. Which, if
you think of it carefully, is identical to
ADC Rx, Rx
Of course with the same addressing trick. Neat.
and : 0 0 1 0 0 0
tst : 0 0 1 0 0 0
Anding a register with itself is identical to performing a TST. Yet another trick of the Atmel engineers. Just
to spoil us. I wonder how many more monkeys they have up their sleeves.
Two more examples: 'OR Immediate' versus 'Set Bits in Register' and 'AND Immediate' versus 'Clear Bits in Register':
ori : 0 1 1 0
sbr : 0 1 1 0
andi : 0 1 1 1
cbr : 0 1 1 1
The seasoned programmers among us will get another 'Aha!' experience. Of course an ORI is the same as an SBR
since you use the OR instruction (or logical gate) to force bits to a '1' position. And when you want to clear
bits, you AND them with a complemented '1' (aka a '0'). Hence the relation between ANDI and CBR.
Here's a more peculiar one:
bset : 1 0 0 1 0 1 0 0 0 1 0 0 0
sec : 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0
The bset (Bit SET in status register) instruction can set bits in the flags register. It has a three bit hole
in the third nibble from the left. The 8 bits of the flags register are addressed as follows:
I 1 1 1
T 1 1 0
H 1 0 1
S 1 0 0
V 0 1 1
N 0 1 0
Z 0 0 1
C 0 0 0
so if we insert the code for the Carry flag in the Bset mnemonic we get the opcode for the 'sec' (SEt Carry
flag) instruction.
Here's more of the same. You can figure out how it's done:
bclr : 1 0 0 1 0 1 0 0 1 1 0 0 0
clc : 1 0 0 1 0 1 0 0 1 0 0 0 1 0 0 0
clz : 1 0 0 1 0 1 0 0 1 0 0 1 1 0 0 0
cln : 1 0 0 1 0 1 0 0 1 0 1 0 1 0 0 0
clv : 1 0 0 1 0 1 0 0 1 0 1 1 1 0 0 0
cls : 1 0 0 1 0 1 0 0 1 1 0 0 1 0 0 0
clh : 1 0 0 1 0 1 0 0 1 1 0 1 1 0 0 0
clt : 1 0 0 1 0 1 0 0 1 1 1 0 1 0 0 0
cli : 1 0 0 1 0 1 0 0 1 1 1 1 1 0 0 0
Something similar with the 'brbs' instruction (BRanch if Bit in flags register is Set).
brbs : 1 1 1 1 0 0
brcs : 1 1 1 1 0 0 0 0 0
brlo : 1 1 1 1 0 0 0 0 0
breq : 1 1 1 1 0 0 0 0 1
brmi : 1 1 1 1 0 0 0 1 0
brvs : 1 1 1 1 0 0 0 1 1
brlt : 1 1 1 1 0 0 1 0 0
brhs : 1 1 1 1 0 0 1 0 1
brts : 1 1 1 1 0 0 1 1 0
brie : 1 1 1 1 0 0 1 1 1
As you can see, BRCS (BRanch on Carryflag Set) is identical to BRLO (BRanch on LOwer). Which is logical since
'A lower than B' will always lead to a set carry flag after the operation 'A - B'.
The next one is almost identical to the BRBS case. BRBC is short for 'BRanch if Bit in flagsregister is Cleared':
brbc : 1 1 1 1 0 1
brcc : 1 1 1 1 0 1 0 0 0
brsh : 1 1 1 1 0 1 0 0 0
Here we have the brcc (BRanch on Carryflag Cleared) versus the brsh (BRanch if Same or Higher).
Another example:
ldd Rd, Z + q : 1 0 d 0 d d 0 0 d d d
ld Rd, Z : 1 0 0 0 0 0 0 0 0 0 0
I entered the encoding of the displacement 'q' with the letters 'd' in the ldd instruction. It is totally
obvious that the ld instruction is just a special case of the ldd.
And one last one, that slipped my inspection in the first place:
ldi : 1 1 1 0
ser : 1 1 1 0 1 1 1 1 1 1 1 1
ldi is short for LoaD Immediate and ser is SEt all bits in Register. So, ser loads the value FF into the
register. See something funny here? The ser is a special case of the ldi instruction. Ser is ldi with the FF
built-in...
Conclusions
There are quite some doubled instructions in the AVR instruction set. It can be nice to have two ways of writing for the same opcode, but it may also be confusing, since without a breakdown of the instructionset it is not very logical to assume that two (sometimes rather) diverse mnemonics do exactly the same.