Operations - Analyst

reverse

Created: 27.09.2020

In this article I’m describing all assembly operations that I’ve encountered myseld and also wasn’t lazy anough to put down an explanation about here. However, I won’t be paying much attention to some operation that I consider straightforward, like ADD. I’m going to put a flag for each operation indicating corresponding arch: arm or x86 (just learning ARM myself for iOS analysis).

Most of instruction have the following anatomy: instruction <destination operand>, <source operand>. Some operations look like this: instruction <source operand> when <destination operand> is always the same register (default). An example: MUL. When MULing, you always multiply eax on some value.

Moving Data

MOV

x86

The god of assembly operations. Its main purpose is pretty obvious: to move stuff from one place to another (like the Tower of Hanoi). Let’s suppose that initially we have EAX = 0x89, EBX = 0x11 and ESI = 0x4037C0.

Instruction	Description
`mov eax, ebx`	Copy data from EBX into EAX. Now EAX = EBX = 0x89.
`mov ebx, 0x4037C4`	Copy 0x4037C4 and put it into EBX. Now EAX = 0x89 and EBX = 0x4037C4
`mov eax, [ebx]`	Copy data from address stored in EBX into EAX. If EBX = 0x4037C4, CPU goes to 0x4037C4 address, looks out for the data at this address, say, 0x77 and put it into EAX. Now EBX = 0x4037C4 and EAX is now 0x77.
`mov eax, [0x4037C4]`	Since EBX = 0x4037C4, this operation is equivalent to the previous one.
`mov ebx, [esi+eax*4]`	First, CPU calculates the address: ESI + EAX4 = 0x4037C4 + 0x774 = 0x4037C4 + 1DC = 4039A0. Then, CPU goes to 0x4039A0 and copies the value at this address into EBX. Say, we have 0x33 there. So, EBX is not 0x33.

So, to conclude, you can MOV a value directly, a value at address, a value in another register, an address itself (which is technically also a value) or a value at address using expression (the last example). Whenever there is an address “in assembly’s mind”, you’ll see square brackets []. Whenever the value - no brackets. It’s something that in the higher levels of abstraction is usually called a reference type ([address]) when a reference is copies and a value type (value) when the value is copied. In case of reference types, whenever you change it, it changes elsewhere. For example, consider mov eax, 0x4037C4 and mov ebx, 0x4037C4. If we mov [eax], 0x42, the ebx is also 0x42 now since it point to the same memory address.

LEA

x86

For smarties, it’s called “load effective address”. Usually used for arrays and complex address calculations. Let’s assume that initially we have EAX = 0x89, EBX = 0x11 and ESI = 0x4037C0.

Instruction	Description
`lea ebx, [eax*5 + 5]`	`eax5 + 5` is equivalent to `5(eax+1)` (ordinary mathematical manipulation). `(0x89 + 1 )5 = 0x99 5 = 2FD`. In not `lea` it would require 4 operations instead: `inc eax` (0x89 + 1); `mov ecx, 5` `mul ecx` (0x99 * 5); `mov ebx, eax`. For `mul` operation see below in Arithmetic section.

MOV vs LEA

anakin-vs-obiwan

Let’s compare these two:

MOV	LEA
`mov eax,[ebx+8]`	`lea eax,[ebx+8]`

The first instruction (the one with mov) does the following: “Add 8 to the value at ebx, go to this address and store the value found in eax”. So, it calculates the address and gets the value at this address to store in eax.

The second instruction (the one with lea) does the following: “Add 8 to the value at ebx and store the result in eax”. So, it calculates the address and puts the address into eax.

To conclude, lea stores address (reference types) and mov usually stores values (value types). However, note that mov can move addresses as well, since address is also just an integer, i.e. value.

MOVSXD

x86

Example: movsxd rsi, [rbp+8h]

Copies the contents of the <src_operand> to the <dst_operand> and sign extends the value to 16 or 32 bits. The size of the converted value depends on the operand-size attribute. In 64-bit mode, the instruction’s default operation size is 32 bits.

Arithmetic

ADD

x86

add eax, 5 - adds a value to a value in register, address or to another value.

SUB

x86

Affected flags: CF, ZF

CF =1 if <destination operand> is less than the <source operand>, i.e. after substraction there is a negative number.ZF = 1 if <destination operand> = <source operand> and the result is zero.

Let’s assume that EAX = 0x99 and EBX = 0x2.

instruction	description
`sub eax, 0x99`	Now `eax` is `0`, therefore `ZF` = `1`.
`sub ebx, 0x10`	This results in negative number, therefore, `CF` = `1`.

SBB

x86

Affected by flags: CF

Almoust the same but a little tricky. It’s affected by CF flag. If CF = 0, then sbb eax, 0x10 is equivalent to sub eax, 0x10 (which is eax - 0x10). If CF = 1, then it means: eax = eax - 0x10 - 1.

MUL and DIV

x86

Affected flags:

CF = OF = 0 if the high-order bits of the product are 0.

Both of these instructions operate on a predefined register. For example, mul ecx is actually mul eax, ecx i.e. eax * ecx. The result is stored in register AX, DX:AX, or EDX:EAX (depending on the operand size). The high-order bits of the product are in AH, DX, or EDX, respectively.

IMUL and IDIV

x86

Affected flags: CF, OF

Same as MUL and DIV but operate on signed values.CF = OF = 1 if significant bits are carried into the upper half of the result. cdq instruction is usully used before IDIV. It converts a double to quad, quote:

The CDQ instruction copies the sign (bit 31) of the value in the EAX register into every bit position in the EDX register.

Forms:

Like MUL and DIV when the <src operand> is used inly.
imul edi, esi when both <dst operand> and <src operand> are used
imul edi, esi, edx, when beside the <dst operand> there are two <src operand>. The operations are as follows: esi*edx = edi.

<dst operand> is always a register or memory address. <src operand> can be a register, an address or a value. When a value is used, it is sign-extended to the length of the destination operand format.

NB ❗ The length of the product is calculated to twice the length of the operands. With the one-operand form, the product is stored exactly in the destination. With the two- and three- operand forms, however, result is truncated to the length of the destination before it is stored in the destination register. This is why the CF or OF flag should be tested to ensure that no significant bits are lost.

If this instruction is used for unsigned operations (since the lower half of the product is the same regardless if the operands are signed or unsigned. ), the CF and OF flags cannot be used to determine if the upper half of the result is non-zero.

Shifts

ROR and ROL

x86

Affected flags: CF , OF

Rotate the integer n-time to the left or right. When you see such an instruction, very often it is an indication of encryption. To better understand both I need an example. Let’s take an 8-bit binary value. The initial state is:

0 1 0 0 1 1 1 0

See the 1 at the beginning of this number, second bit from the left (let’s call him Matt). We will then locate him after ROR and ROL.

Let’s now ROR (rotate bits right) by 1. Every bit is moved to the right by one position. Our first state (for future reference):

0 0 1 0 0 1 1 1

Where the hell is Matt now? Now, this bit is the third bit from the left.

Let’s now ROL (rotate bits left) by 1. Every bit is moved to the left by one position. Our second state:

1 0 0 1 1 1 0 0

Where the hell is Matt now? This bit is the first bit from the left, so, he’s become the most significant bit in this number 👑 .

Let’s now ROL the last number by 1 again. Every bit is again moved to the left. But Matt has nowhere to move! He’s falling nowhere…

Where the hell is Matt now? He seemed to have got drowned, but he managed it through the swamp and emerged… But now…Matt used to be the most significant bit 👑when in the second state , but now he’s just a 💩, the least significant bit. As you can see, he’s the first from the end.This is the third state:

0 0 1 1 1 0 0 1

Let’s make him worthy again and give him his newly acquired and recently lost regalia. Let’s ROR him by 1 again and get back to the second state (unforunately he’ll have to dive into the swamp again):

1 0 0 1 1 1 0 0

When moving from the second to the third state Matt has been in a swamp, or in a wormhole 🐛 if you prefer a space metaphor. Let me introduce our wormhole - CF flag. The spirit of Matt was printed on this flag. In other, less eloquent words, when falling from the edge into the swamp, his value (1) was copied into CF. So as any other bit that would “fall”. For example, if we get back to the third (and the most unfortunate for Matt) state (0 0 1 1 1 0 0 1). Matt’s spirit is still there, therefore CF is still 1. Let’s ROL this number by 1 once again, Martha (who’s now the most significant, i.e. the first bit of the number) falls into the swamp, gets copied into CF and emerges at the end as the least significant bit 💩, making Matt now the second least significant bit, i.e. the second bit from the end (which is not that bad now). Now we have the forth state:

0 1 1 1 0 0 1 0

and the CF = 0 now bearing Martha’s spirit.

The processor restricts the count to a number between 0 and 31 by masking all the bits in the count operand except the 5 least-significant bits.

RCR and RCL

x86

Affected flags: CF , OF

It’s pretty much the same, with just one small difference. CF flag is now taken into account, it’s not just a wormhole 🐛 anymore. Let’s consider the third state from the previous examples:

0 0 1 1 1 0 0 1

Let CF be 1 now (may be it was set by some preceding operation like ROL).

If we now RCL, the fourth state will be as follows:

CF = Martha = 0.

0 1 1 1 0 0 1 1

The value that was in CF is now at the end of our number (1), and it’s the most significant bit is now in CF. Everyone else has just shifted to the left by 1 bit. It’s as if we were operating not on a 8-bit value, but on a 9 bit value:

MAIN value	CF
`0 1 1 1 0 0 1 1`	`0`

which results in something like that: 0 1 1 1 0 0 1 1 0.

Let’s now RCR back to the third state. CF = 0, now it is moved to it’s place (most significant bit) 0 0 1 1 1 0 0 1 and since it was Matt (1) who’s falling from the cliff, CF = 1. Everyone else has just shifted to the right.

Another flag, which behaviour is quite peculiar, is OF. It only changes when we shift by 1. When we whift by 2 or more - nothing’s happening to it. After CPU’s performed the rotates, it calculated OF like this. For left rotates (RCL and ROL), the OF = CF XOR the most-significant bit. For right rotates, the OF = most-significant-bit-1 XOR most-significant-bit-2. For the example above with RCL, when we enetered the fourth state:

CF = 0 and the number itself is 0 1 1 1 0 0 1 1.

OF = 0 XOR 0 = 0

For RCR operation leading us back to the third state: 0 0 1 1 1 0 0 1. Never mind CF since it’s not included in the calculations. The two most significant bits after rotation are 0 and 0 (the first two digits). OF = 0 XOR 0 = again 0.

SHL and SHR

x86

Affected flags: CF

Shifts bits by the value specified in second operand to the left or to the right. The last bit dropped off is written to CF “before death ☠️ “. Example:

1 0 1 0 1 1 0 1

Let’s SHR the above number: 0 1 0 1 0 1 1 0 .

Let’s now SHR once again: 0 0 1 0 1 0 1 1 .

The main rule here: for each SHR add a 0 at the beginning and remove one digit from the end. The same is for SHL: for each SHL add 0 to the end and remove one digit from the beginning.

Above number is 8 bit long. So we can pop 8 bits by shifting in one direction (SHR for example only). When the last digit is poped off its value is written to CF, in the example above it was 1, hence now CF = 1.

Useful tip

SHL can be used as an optimized multiplication and division by 2^n”. Here is an example:

SHL	equivalent
`shl eax, 1`	eax * 2
`shl eax, 2`	eax * 4
`shl eax, 3`	eax * 8

To read more about this and how this really works, read here.

ROL/ROR vs SHL/SHR

What’s the difference between the two (well, even four)? When I inspected my old notes, I’ve got a little confused because I’d totally forgotten that. Tha’ts why I’ve included this section for future, should my memory fail me once again.

The difference between the two is pretty much the same as the difference between “rolling” and “shifting”. Say, we have a password padlock for a suitcase and set our passcode to 1234.

ror-rol

Then we shuffle it and have 5432. How to open it then? We rotate each dial until we get to our passcode digits: the dial with 5 is rotated 4 times to get 1, the dial with 4 is rotated 6 times and etc. No one would expect that when we rotate a dial on the lock, it disapears after reaching the end. But that would be the case if the operation in the padlock’s intestines was shifting. And that what’s happening to the shifted bits when shifting:

So, in ROR/ROL instruction no bits are lost, all of the bits of the original number are preserved. They are just rolling like those numbers in the lock 🔒. But with SHL/SHR instruction the numbers are dimped into a 🐛 wormhole and never seen again. If we shift long enogh, we turn any number to a bunch of zeroes until the only footprint left would be a CF flag which will hold the last shifted and dropped off bit. But even this would be overwritten with 0 shoud you shift one last time…

Comparisons

TEST

x86

Affected flags: ZF

A beautiful instruction in that it’s so simply and lightweight. It does the same as AND but operands are not changed. The result is in ZF (either 0 or 1).

AND	test
`and eax, eax`	`test eax`

Interesting! 😮 The above operations are identical, but the second takes less CPU cycles. It’s usually used to test, whether the value is 0.

CMP

x86

Affected flags: CF, ZF

This one is like SUB. It’s almoust the same as SUB eax, edx, for example. This instruction, just like the previous one, doesn’t change operands, however:

	ZF	CF
dst = src	`1`	`0`
dst < src	`0`	`1`
dst > src	`0`	`0`

When dst = src, dst - src = 0 therefore we set ZF (zero flag) to 1. When dst < src, dst - src = negative number therefor CF (carry flag) = 1. When everything is primitive (dst > src), dst - src = positive number, hence no flags are changed.

❓ When some of the flags were changed suring some previously performed operation, are they reverted to the states above according to the values in dst and src? Example, if ZF = 1 before our CMP dst, src where dst < src, will ZF be set to 0 after this instruction is executed?

Buffers

REP

x86

Affected flags: ZF

This class of instruction is comprosed of different kinds of loops. It uses RSI or ESI as the source (ESI means “source index”) and EDI or RDI as the destination (EDI means “destination index”). ECX (counter) is used as a … surprise-surprise… a counter. There are several types of REP instruction:

instruction	description
`rep`
`repe` or `repz`
`repne` or `repnz`

REP family is never seen alone. It’s always followed by some operation. Why? Because basically it’s a repetition. You can’t repeat nothing. There repeatiotions are performed on buffers (strings, for example). There are 4 possible operations seen with rep:

instruction	description	C++ analogr
`repe cmpsb`	Compare two buffers	`memcmp`
`rep stosb`	Set all bytes to some value in `AL`	`memset`
`rep movsb`	Copy a buffer	`memcpy`
`repne scasb`	Search for a byte

repe cmpsb. To better illustrate, I’ve written the below pseudocode:

function bool compare(){
  edi = ['d','s', 't', '1'];
	esi = ['s', 'r', 'c', '1'];

  for(ecx = len(edi); ecx >= 0; ecx--){
    if (edi[ecx] != esi[ecx]) return false;
  }
  return true;
}

ECX is set to the buffer’s length, ESI - is a pointer to the first buffer, EDI - the pointer to the second. The loop runs until ECX = 0 or the bytes compared are different. The above loop will run 2 times and return false when ecx = 2 since edi[2]='t' and esi[2] = 'c' which means that the buffers are different and there is no need to run the loop further.

rep stosb. Destination buffer - EDI, source - AH. ECX is a counter.

function buffer[] init(){
  ah = 'a';
  edi = [];
  for (ecx = len(edi); ecx >= 0; ecx--){
    edi[ecx] = ah;
  }
  return edi;
}

The above loop will run 3 times. Upon function return edi = ['a', 'a', 'a']. Very often is seen after xor eax, eax, since xor something on itself returns that something being filled with 0, i.e. it means zeroing out a value. And we need to make sure there is no garbage lurking in EAX before setting it the desired value (in our example, 'a') to be later used to set edi to a. Just to remind, al is the lowerst byte of EAX register.

rep movsb. ESI - source buffer, EDI - destination buffer, ECX - counter.

function void copy(){
  esi = ['s', 'r', 'c'];
  edi = ['d', 's', 't'];
  
  for (ecx = len(edi); ecx >= 0; ecx--) {
    edi[ecx] = esi[ecx];
  }
  return;
}

The above loop will run 3 times. At the end, edi=esi = ['s', 'r', 'c'].

repne scasb. EDI - buffer address, AL - byte to search. ECX - counter.

function bool search(){
  edi = ['d', 's', 't'];
  al = 'd';
  
  // len(edi) = 3
  for (ecx = len(edi); ecx >= 0; ecx--) {
    //this will return true on the 3rd iteration, when ecx = 1
    if edi[ecx] == al return true;
  }
  return false;
}

The above loop will run 3 times, and on the 3rd time being run it’ll return true, because edi[1] = 't'.

Jumps

JMP and friends

x86

Affected by flags: ZF, OF

In general, these instruction have this skeleton: jmp location.

instruction	description	note
`jmp`	unconditional jump, meaning “Jump Forest, jump!” no matter what
`jz`	Jump if `ZF` = 1 (the result of previous instruction was `0`)
`jnz`	opposite to `jz`. Jump if `ZF` = 0 (the result of previous instruction was not `0`)
`je`	if the result of preceding `cmp op1, op2` was 0 (the operands were equal)
`jne`	opposite to `je`. Jump if the result of preceding `cmp op1, op2` was not 0 (the operands were not equal)
`jg`, `ja`	Jump if the result of preceding `cmp op1, op2` was a positive integer (op1 > op2, is greater). `ja` for unsigned comarison.
`jge`, `jae`	like `jg` or `ja` combined with `je`. Jump if the result of preceding `cmp op1, op2` was a positive integer or 0 (op1 >= op2, is greater or equal). `jae` for unsigned comarison.
`jl`, `jb`	opposite to `jg`, `ja`. Jump if the result of preceding `cmp op1, op2` was a negative integer (op1 < op2, is less). `jb` for unsigned comarison.
`jle`, `jbe`	like `jl` or `jb` combined with `je`. Jump if the result of preceding `cmp op1, op2` was a positive integer or 0 (op1 <= op2, is less or equal). `jbe` for unsigned comarison.
`jo`	jump if the result of the previous instruction set `OF` to `1`
`js`	jump if the result of the previous instruction set `SF` to `1`
`jecxz`	jump if `ecx` = 0

Getting familiar with jumps. Below is the table of examples. Try to quickly determine the location of the jump. Answers are listed right below the table, so spoiler alert! ❗

Logical

AND

x86

Interesting! :open_mouth :Can be used to clear some bits with a mask. For example, if you have 1100 1011 and you need to zero all bits out. All you need to do, is to and with 0000 0000. To determine whether an integer is even or not, mask it with 0000 0001. Even numbers have 0 at the end, and uneven - 1. anding an even number with 0000 0001 will result in 0 and anding an uneven - with 1. Also, you can make a number less by 2, 4, 8 etc by applying a corresponding mask:


`1101`	substract 2
`0111`	Subtract 8
`1011`	substract 4

OR

Set to 1 if either of the bits is 1. Repeat for each bit of the first operand and the second operand. Writes to the destination operand.

Interesting! 😮 Can be used to set all bits to 1 with 1111. For example, we have 1110 or 1111 = 1111. Basically, any value ored by 1111 is 1111.

NOT

XOR

Exclusive OR. 1 if the first operand’s bit is not equal to the second’s.

Interesting! 😮 A quick way to set eax to 0. Operation’s xor eax, eax opcode is 33 C0 (2 bytes) while mov eax, 0 - opcode b8 00 00 00 00 which is 5 bytes (costy 💴 ).

Also, an interesting observation to investigate further: If I mask any value with 1111 , I get an operation equal to substraction (unsigned):

1010 xor 1111 is 0101 (5 in decimal)

1011 xor 1111 is 0100 (4d)

1100 xor 1111 is 0011 (3d)

Unsorted

PUSH/POP

PUSHA/PUSHAD/POPA/POPAD

Save stack order.

Interesting! 😮 Often seen in shellcodes and custom packers. Compilers rarely use these instructions.

NOP

Do nothing. Used for padding and controlling the time of program execution.

Interesting! 😮 Often seen in shellcodes and when attempting a buffer overflow.