In this article I’m describing all assembly operations that I’ve encountered myseld and also wasn’t lazy anough to put down an explanation about here. However, I won’t be paying much attention to some operation that I consider straightforward, like ADD
. I’m going to put a flag for each operation indicating corresponding arch: arm
or x86
(just learning ARM myself for iOS analysis).
Most of instruction have the following anatomy: instruction <destination operand>, <source operand>
. Some operations look like this: instruction <source operand>
when <destination operand>
is always the same register (default). An example: MUL
. When MUL
ing, you always multiply eax
on some value.
Moving Data
MOV
x86
The god of assembly operations. Its main purpose is pretty obvious: to move stuff from one place to another (like the Tower of Hanoi). Let’s suppose that initially we have EAX = 0x89, EBX = 0x11 and ESI = 0x4037C0.
Instruction | Description |
---|---|
mov eax, ebx |
Copy data from EBX into EAX. Now EAX = EBX = 0x89. |
mov ebx, 0x4037C4 |
Copy 0x4037C4 and put it into EBX. Now EAX = 0x89 and EBX = 0x4037C4 |
mov eax, [ebx] |
Copy data from address stored in EBX into EAX. If EBX = 0x4037C4, CPU goes to 0x4037C4 address, looks out for the data at this address, say, 0x77 and put it into EAX. Now EBX = 0x4037C4 and EAX is now 0x77. |
mov eax, [0x4037C4] |
Since EBX = 0x4037C4, this operation is equivalent to the previous one. |
mov ebx, [esi+eax*4] |
First, CPU calculates the address: ESI + EAX*4 = 0x4037C4 + 0x77*4 = 0x4037C4 + 1DC = 4039A0. Then, CPU goes to 0x4039A0 and copies the value at this address into EBX. Say, we have 0x33 there. So, EBX is not 0x33. |
So, to conclude, you can MOV
a value directly, a value at address, a value in another register, an address itself (which is technically also a value) or a value at address using expression (the last example). Whenever there is an address “in assembly’s mind”, you’ll see square brackets []. Whenever the value - no brackets. It’s something that in the higher levels of abstraction is usually called a reference type ([address]) when a reference is copies and a value type (value) when the value is copied. In case of reference types, whenever you change it, it changes elsewhere. For example, consider mov eax, 0x4037C4
and mov ebx, 0x4037C4
. If we mov [eax], 0x42
, the ebx
is also 0x42
now since it point to the same memory address.
LEA
x86
For smarties, it’s called “load effective address”. Usually used for arrays and complex address calculations. Let’s assume that initially we have EAX = 0x89, EBX = 0x11 and ESI = 0x4037C0.
Instruction | Description |
---|---|
lea ebx, [eax*5 + 5] |
eax*5 + 5 is equivalent to 5*(eax+1) (ordinary mathematical manipulation). (0x89 + 1 )*5 = 0x99 * 5 = 2FD . In not lea it would require 4 operations instead: inc eax (0x89 + 1); mov ecx, 5 mul ecx (0x99 * 5); mov ebx, eax . For mul operation see below in Arithmetic section. |
MOV vs LEA
Let’s compare these two:
MOV | LEA |
---|---|
mov eax,[ebx+8] |
lea eax,[ebx+8] |
The first instruction (the one with mov
) does the following: “Add 8 to the value at ebx
, go to this address and store the value found in eax
”. So, it calculates the address and gets the value at this address to store in eax
.
The second instruction (the one with lea
) does the following: “Add 8 to the value at ebx
and store the result in eax
”. So, it calculates the address and puts the address into eax
.
To conclude, lea
stores address (reference types) and mov
usually stores values (value types). However, note that mov
can move addresses as well, since address is also just an integer, i.e. value.
MOVSXD
x86
Example: movsxd rsi, [rbp+8h]
Copies the contents of the <src_operand>
to the <dst_operand>
and sign extends the value to 16 or 32 bits. The size of the converted value depends on the operand-size attribute. In 64-bit mode, the instructionโs default operation size is 32 bits.
Arithmetic
ADD
x86
add eax, 5
- adds a value to a value in register, address or to another value.
SUB
x86
Affected flags: CF
, ZF
CF
=1 if <destination operand>
is less than the <source operand>
, i.e. after substraction there is a negative number.ZF
= 1
if <destination operand>
= <source operand>
and the result is zero.
Let’s assume that EAX
= 0x99
and EBX
= 0x2
.
instruction | description |
---|---|
sub eax, 0x99 |
Now eax is 0 , therefore ZF = 1 . |
sub ebx, 0x10 |
This results in negative number, therefore, CF = 1 . |
SBB
x86
Affected by flags: CF
Almoust the same but a little tricky. It’s affected by CF
flag. If CF
= 0, then sbb eax, 0x10
is equivalent to sub eax, 0x10
(which is eax - 0x10
). If CF
= 1, then it means: eax = eax - 0x10 - 1
.
MUL and DIV
x86
Affected flags:
CF
= OF
= 0
if the high-order bits of the product are 0.
Both of these instructions operate on a predefined register. For example, mul ecx
is actually mul eax, ecx
i.e. eax
* ecx
. The result is stored in register AX
, DX:AX
, or EDX:EAX
(depending on the operand size). The high-order bits of the product are in AH
, DX
, or EDX
, respectively.
IMUL and IDIV
x86
Affected flags: CF
, OF
Same as MUL
and DIV
but operate on signed values.CF
= OF
= 1
if significant bits are carried into the upper half of the result. cdq
instruction is usully used before IDIV
. It converts a double to quad, quote:
The CDQ instruction copies the sign (bit 31) of the value in the EAX register into every bit position in the EDX register.
Forms:
- Like
MUL
andDIV
when the<src operand>
is used inly. imul edi, esi
when both<dst operand>
and<src operand>
are usedimul edi, esi, edx
, when beside the<dst operand>
there are two<src operand>
. The operations are as follows:esi
*edx
=edi
.
<dst operand>
is always a register or memory address. <src operand>
can be a register, an address or a value. When a value is used, it is sign-extended to the length of the destination operand format.
NB โ The length of the product is calculated to twice the length of the operands. With the one-operand form, the product is stored exactly in the destination. With the two- and three- operand forms, however, result is truncated to the length of the destination before it is stored in the destination register. This is why the CF
or OF
flag should be tested to ensure that no significant bits are lost.
If this instruction is used for unsigned operations (since the lower half of the product is the same regardless if the operands are signed or unsigned. ), the CF
and OF
flags cannot be used to determine if the upper half of the result is non-zero.
Shifts
ROR and ROL
x86
Affected flags: CF
, OF
Rotate the integer n-time to the left or right. When you see such an instruction, very often it is an indication of encryption. To better understand both I need an example. Let’s take an 8-bit binary value. The initial state is:
0 1 0 0 1 1 1 0
See the 1
at the beginning of this number, second bit from the left (let’s call him Matt). We will then locate him after ROR
and ROL
.
Let’s now ROR
(rotate bits right) by 1. Every bit is moved to the right by one position. Our first state (for future reference):
0 0 1 0 0 1 1 1
Where the hell is Matt now? Now, this bit is the third bit from the left.
Let’s now ROL
(rotate bits left) by 1. Every bit is moved to the left by one position. Our second state:
1 0 0 1 1 1 0 0
Where the hell is Matt now? This bit is the first bit from the left, so, he’s become the most significant bit in this number ๐ .
Let’s now ROL
the last number by 1 again. Every bit is again moved to the left. But Matt has nowhere to move! He’s falling nowhere…
Where the hell is Matt now? He seemed to have got drowned, but he managed it through the swamp and emerged… But now…Matt used to be the most significant bit ๐when in the second state , but now he’s just a ๐ฉ, the least significant bit. As you can see, he’s the first from the end.This is the third state:
0 0 1 1 1 0 0 1
Let’s make him worthy again and give him his newly acquired and recently lost regalia. Let’s ROR
him by 1 again and get back to the second state (unforunately he’ll have to dive into the swamp again):
1 0 0 1 1 1 0 0
When moving from the second to the third state Matt has been in a swamp, or in a wormhole ๐ if you prefer a space metaphor. Let me introduce our wormhole - CF
flag. The spirit of Matt was printed on this flag. In other, less eloquent words, when falling from the edge into the swamp, his value (1
) was copied into CF
. So as any other bit that would “fall”. For example, if we get back to the third (and the most unfortunate for Matt) state (0 0 1 1 1 0 0 1
). Matt’s spirit is still there, therefore CF
is still 1
. Let’s ROL
this number by 1 once again, Martha (who’s now the most significant, i.e. the first bit of the number) falls into the swamp, gets copied into CF
and emerges at the end as the least significant bit ๐ฉ, making Matt now the second least significant bit, i.e. the second bit from the end (which is not that bad now)โ. Now we have the forth state:
0 1 1 1 0 0 1 0
and the CF
= 0 now bearing Martha’s spirit.
The processor restricts the count to a number between 0 and 31 by masking all the bits in the count operand except the 5 least-significant bits.
RCR and RCL
x86
Affected flags: CF
, OF
It’s pretty much the same, with just one small difference. CF
flag is now taken into account, it’s not just a wormhole ๐ anymore. Let’s consider the third state from the previous examples:
0 0 1 1 1 0 0 1
Let CF
be 1 now (may be it was set by some preceding operation like ROL
).
If we now RCL
, the fourth state will be as follows:
CF
= Martha = 0.
0 1 1 1 0 0 1 1
The value that was in CF
is now at the end of our number (1
), and it’s the most significant bit is now in CF
. Everyone else has just shifted to the left by 1 bit. It’s as if we were operating not on a 8-bit value, but on a 9 bit value:
MAIN value | CF |
---|---|
0 1 1 1 0 0 1 1 |
0 |
which results in something like that: 0 1 1 1 0 0 1 1
0
.
Let’s now RCR
back to the third state. CF
= 0
, now it is moved to it’s place (most significant bit) 0 0 1 1 1 0 0 1
and since it was Matt (1
) who’s falling from the cliff, CF
= 1
. Everyone else has just shifted to the right.
Another flag, which behaviour is quite peculiar, is OF
. It only changes when we shift by 1. When we whift by 2 or more - nothing’s happening to it. After CPU’s performed the rotates, it calculated OF
like this. For left rotates (RCL
and ROL
), the OF
= CF XOR the most-significant bit
. For right rotates, the OF
= most-significant-bit-1 XOR most-significant-bit-2
. For the example above with RCL
, when we enetered the fourth state:
CF
= 0
and the number itself is 0 1 1 1 0 0 1 1
.
OF
= 0 XOR 0
= 0
For RCR operation leading us back to the third state: 0 0 1 1 1 0 0 1
. Never mind CF
since it’s not included in the calculations. The two most significant bits after rotation are 0
and 0
(the first two digits). OF
= 0 XOR 0
= again 0
.
SHL and SHR
x86
Affected flags: CF
Shifts bits by the value specified in second operand to the left or to the right. The last bit dropped off is written to CF
“before death โ ๏ธ “. Example:
1 0 1 0 1 1 0 1
Let’s SHR
the above number: 0 1 0 1 0 1 1 0
.
Let’s now SHR
once again: 0 0 1 0 1 0 1 1
.
The main rule here: for each SHR
add a 0
at the beginning and remove one digit from the end. The same is for SHL
: for each SHL
add 0
to the end and remove one digit from the beginning.
Above number is 8 bit long. So we can pop 8 bits by shifting in one direction (SHR
for example only). When the last digit is poped off its value is written to CF
, in the example above it was 1
, hence now CF
= 1
.
Useful tip
SHL
can be used as an optimized multiplication and division by 2^n
”. Here is an example:
SHL | equivalent |
---|---|
shl eax, 1 |
eax * 2 |
shl eax, 2 |
eax * 4 |
shl eax, 3 |
eax * 8 |
To read more about this and how this really works, read here.
ROL/ROR vs SHL/SHR
What’s the difference between the two (well, even four)? When I inspected my old notes, I’ve got a little confused because I’d totally forgotten that. Tha’ts why I’ve included this section for future, should my memory fail me once again.
The difference between the two is pretty much the same as the difference between “rolling” and “shifting”. Say, we have a password padlock for a suitcase and set our passcode to 1234
.
Then we shuffle it and have 5432
. How to open it then? We rotate each dial until we get to our passcode digits: the dial with 5
is rotated 4 times to get 1
, the dial with 4
is rotated 6 times and etc. No one would expect that when we rotate a dial on the lock, it disapears after reaching the end. But that would be the case if the operation in the padlock’s intestines was shifting. And that what’s happening to the shifted bits when shifting:
So, in ROR/ROL
instruction no bits are lost, all of the bits of the original number are preserved. They are just rolling like those numbers in the lock ๐. But with SHL/SHR
instruction the numbers are dimped into a ๐ wormhole and never seen again. If we shift long enogh, we turn any number to a bunch of zeroes until the only footprint left would be a CF
flag which will hold the last shifted and dropped off bit. But even this would be overwritten with 0
shoud you shift one last time…
Comparisons
TEST
x86
Affected flags: ZF
A beautiful instruction in that it’s so simply and lightweight. It does the same as AND
but operands are not changed. The result is in ZF
(either 0
or 1
).
AND | test |
---|---|
and eax, eax |
test eax |
Interesting! ๐ฎ โThe above operations are identical, but the second takes less CPU cycles. It’s usually used to test, whether the value is 0
.
CMP
x86
Affected flags: CF
, ZF
This one is like SUB
. It’s almoust the same as SUB eax, edx
, for example. This instruction, just like the previous one, doesn’t change operands, however:
ZF | CF | |
---|---|---|
dst = src | 1 |
0 |
dst < src | 0 |
1 |
dst > src | 0 |
0 |
When dst = src, dst - src = 0 therefore we set ZF
(zero flag) to 1
. When dst < src, dst - src = negative number therefor CF
(carry flag) = 1
. When everything is primitive (dst > src), dst - src = positive number, hence no flags are changed.
โ When some of the flags were changed suring some previously performed operation, are they reverted to the states above according to the values in dst and src? Example, if
ZF
= 1 before ourCMP dst, src
where dst < src, willZF
be set to0
after this instruction is executed?
Buffers
REP
x86
Affected flags: ZF
This class of instruction is comprosed of different kinds of loops. It uses RSI or ESI as the source (ESI means “source index”) and EDI or RDI as the destination (EDI means “destination index”). ECX (counter) is used as a … surprise-surprise… a counter. There are several types of REP
instruction:
instruction | description |
---|---|
rep |
|
repe or repz |
|
repne or repnz |
REP
family is never seen alone. It’s always followed by some operation. Why? Because basically it’s a repetition. You can’t repeat nothing. There repeatiotions are performed on buffers (strings, for example). There are 4 possible operations seen with rep
:
instruction | description | C++ analogr |
---|---|---|
repe cmpsb |
Compare two buffers | memcmp |
rep stosb |
Set all bytes to some value in AL |
memset |
rep movsb |
Copy a buffer | memcpy |
repne scasb |
Search for a byte |
repe cmpsb
. To better illustrate, I’ve written the below pseudocode:
function bool compare(){
edi = ['d','s', 't', '1'];
esi = ['s', 'r', 'c', '1'];
for(ecx = len(edi); ecx >= 0; ecx--){
if (edi[ecx] != esi[ecx]) return false;
}
return true;
}
ECX
is set to the buffer’s length, ESI
- is a pointer to the first buffer, EDI
- the pointer to the second. The loop runs until ECX
= 0 or the bytes compared are different. The above loop will run 2 times and return false when ecx
= 2
since edi[2]='t'
and esi[2] = 'c'
which means that the buffers are different and there is no need to run the loop further.
rep stosb
. Destination buffer - EDI
, source - AH
. ECX
is a counter.
function buffer[] init(){
ah = 'a';
edi = [];
for (ecx = len(edi); ecx >= 0; ecx--){
edi[ecx] = ah;
}
return edi;
}
The above loop will run 3 times. Upon function return edi
= ['a', 'a', 'a']
. Very often is seen after xor eax, eax
, since xor
something on itself returns that something being filled with 0
, i.e. it means zeroing out a value. And we need to make sure there is no garbage lurking in EAX
before setting it the desired value (in our example, 'a'
) to be later used to set edi
to a
. Just to remind, al
is the lowerst byte of EAX
register.
rep movsb
. ESI
- source buffer, EDI
- destination buffer, ECX
- counter.
function void copy(){
esi = ['s', 'r', 'c'];
edi = ['d', 's', 't'];
for (ecx = len(edi); ecx >= 0; ecx--) {
edi[ecx] = esi[ecx];
}
return;
}
The above loop will run 3 times. At the end, edi=esi = ['s', 'r', 'c']
.
repne scasb
. EDI
- buffer address, AL
- byte to search. ECX
- counter.
function bool search(){
edi = ['d', 's', 't'];
al = 'd';
// len(edi) = 3
for (ecx = len(edi); ecx >= 0; ecx--) {
//this will return true on the 3rd iteration, when ecx = 1
if edi[ecx] == al return true;
}
return false;
}
The above loop will run 3 times, and on the 3rd time being run it’ll return true
, because edi[1]
= 't'
.
Jumps
JMP and friends
x86
Affected by flags: ZF
, OF
In general, these instruction have this skeleton: jmp location
.
instruction | description | note |
---|---|---|
jmp |
unconditional jump, meaning “Jump Forest, jump!” no matter what | |
jz |
Jump if ZF = 1 (the result of previous instruction was 0 ) |
|
jnz |
opposite to jz . Jump if ZF = 0 (the result of previous instruction was not 0 ) |
|
je |
if the result of preceding cmp op1, op2 was 0 (the operands were equal) |
|
jne |
opposite to je . Jump if the result of preceding cmp op1, op2 was not 0 (the operands were not equal) |
|
jg , ja |
Jump if the result of preceding cmp op1, op2 was a positive integer (op1 > op2, is greater). ja for unsigned comarison. |
|
jge , jae |
like jg or ja combined with je . Jump if the result of preceding cmp op1, op2 was a positive integer or 0 (op1 >= op2, is greater or equal). jae for unsigned comarison. |
|
jl , jb |
opposite to jg , ja . Jump if the result of preceding cmp op1, op2 was a negative integer (op1 < op2, is less). jb for unsigned comarison. |
|
jle , jbe |
like jl or jb combined with je . Jump if the result of preceding cmp op1, op2 was a positive integer or 0 (op1 <= op2, is less or equal). jbe for unsigned comarison. |
|
jo |
jump if the result of the previous instruction set OF to 1 |
|
js |
jump if the result of the previous instruction set SF to 1 |
|
jecxz |
jump if ecx = 0 |
Getting familiar with jumps. Below is the table of examples. Try to quickly determine the location of the jump. Answers are listed right below the table, so spoiler alert! โ
Logical
AND
x86
Interesting! :open_mouth :Can be used to clear some bits with a mask. For example, if you have 1100 1011
and you need to zero all bits out. All you need to do, is to and
with 0000 0000
. To determine whether an integer is even or not, mask it with 0000 0001
. Even numbers have 0
at the end, and uneven - 1
. and
ing an even number with 0000 0001
will result in 0
and and
ing an uneven - with 1
. Also, you can make a number less by 2, 4, 8 etc by applying a corresponding mask:
1101 |
substract 2 |
0111 |
Subtract 8 |
1011 |
substract 4 |
OR
Set to 1
if either of the bits is 1
. Repeat for each bit of the first operand and the second operand. Writes to the destination operand.
Interesting! ๐ฎ Can be used to set all bits to 1
with 1111
. โ For example, we have 1110 or 1111
= 1111
. Basically, any value or
ed by 1111
is 1111
.
NOT
XOR
Exclusive OR. 1 if the first operand’s bit is not equal to the second’s.
Interesting! ๐ฎ A quick way to set eax
to 0
. Operation’s xor eax, eax
opcode is 33 C0
(2 bytes) while mov eax, 0
- opcode b8 00 00 00 00
which is 5 bytes (costy ๐ด ).
Also, an interesting observation to investigate further: If I mask any value with 1111
, I get an operation equal to substraction (unsigned):
1010 xor 1111
is 0101
(5 in decimal)
1011 xor 1111
is 0100
(4d)
1100 xor 1111
is 0011
(3d)
Unsorted
PUSH/POP
PUSHA/PUSHAD/POPA/POPAD
Save stack order.
Interesting! ๐ฎ Often seen in shellcodes and custom packers. Compilers rarely use these instructions.
NOP
Do nothing. Used for padding and controlling the time of program execution.
Interesting! ๐ฎ โOften seen in shellcodes and when attempting a buffer overflow.