# Operations

Created: 27.09.2020

In this article I’m describing all assembly operations that I’ve encountered myseld and also wasn’t lazy anough to put down an explanation about here. However, I won’t be paying much attention to some operation that I consider straightforward, like `ADD`. I’m going to put a flag for each operation indicating corresponding arch: `arm` or `x86` (just learning ARM myself for iOS analysis).

Most of instruction have the following anatomy: `instruction <destination operand>, <source operand>`. Some operations look like this: `instruction <source operand>` when `<destination operand>` is always the same register (default). An example: `MUL`. When `MUL`ing, you always multiply `eax` on some value.

## Moving Data

### MOV

`x86`

The god of assembly operations. Its main purpose is pretty obvious: to move stuff from one place to another (like the Tower of Hanoi). Let’s suppose that initially we have EAX = 0x89, EBX = 0x11 and ESI = 0x4037C0.

Instruction Description
`mov eax, ebx` Copy data from EBX into EAX. Now EAX = EBX = 0x89.
`mov ebx, 0x4037C4` Copy 0x4037C4 and put it into EBX. Now EAX = 0x89 and EBX = 0x4037C4
`mov eax, [ebx]` Copy data from address stored in EBX into EAX. If EBX = 0x4037C4, CPU goes to 0x4037C4 address, looks out for the data at this address, say, 0x77 and put it into EAX. Now EBX = 0x4037C4 and EAX is now 0x77.
`mov eax, [0x4037C4]` Since EBX = 0x4037C4, this operation is equivalent to the previous one.
`mov ebx, [esi+eax*4]` First, CPU calculates the address: ESI + EAX*4 = 0x4037C4 + 0x77*4 = 0x4037C4 + 1DC = 4039A0. Then, CPU goes to 0x4039A0 and copies the value at this address into EBX. Say, we have 0x33 there. So, EBX is not 0x33.

So, to conclude, you can `MOV` a value directly, a value at address, a value in another register, an address itself (which is technically also a value) or a value at address using expression (the last example). Whenever there is an address “in assembly’s mind”, you’ll see square brackets []. Whenever the value - no brackets. It’s something that in the higher levels of abstraction is usually called a reference type ([address]) when a reference is copies and a value type (value) when the value is copied. In case of reference types, whenever you change it, it changes elsewhere. For example, consider `mov eax, 0x4037C4` and `mov ebx, 0x4037C4`. If we `mov [eax], 0x42`, the `ebx` is also `0x42` now since it point to the same memory address.

### LEA

`x86`

For smarties, it’s called “load effective address”. Usually used for arrays and complex address calculations. Let’s assume that initially we have EAX = 0x89, EBX = 0x11 and ESI = 0x4037C0.

Instruction Description
`lea ebx, [eax*5 + 5]` `eax*5 + 5` is equivalent to `5*(eax+1)` (ordinary mathematical manipulation). `(0x89 + 1 )*5 = 0x99 * 5 = 2FD`. In not `lea` it would require 4 operations instead: `inc eax` (0x89 + 1); `mov ecx, 5` `mul ecx` (0x99 * 5); `mov ebx, eax`. For `mul` operation see below in Arithmetic section.

### MOV vs LEA Let’s compare these two:

MOV LEA
`mov eax,[ebx+8]` `lea eax,[ebx+8]`

The first instruction (the one with `mov`) does the following: “Add 8 to the value at `ebx`, go to this address and store the value found in `eax`”. So, it calculates the address and gets the value at this address to store in `eax`.

The second instruction (the one with `lea`) does the following: “Add 8 to the value at `ebx` and store the result in `eax`”. So, it calculates the address and puts the address into `eax`.

To conclude, `lea` stores address (reference types) and `mov` usually stores values (value types). However, note that `mov` can move addresses as well, since address is also just an integer, i.e. value.

### MOVSXD

`x86`

Example: `movsxd rsi, [rbp+8h]`

Copies the contents of the `<src_operand>` to the `<dst_operand>` and sign extends the value to 16 or 32 bits. The size of the converted value depends on the operand-size attribute. In 64-bit mode, the instruction’s default operation size is 32 bits.

## Arithmetic

`x86`

`add eax, 5` - adds a value to a value in register, address or to another value.

### SUB

`x86`

Affected flags: `CF`, `ZF`

`CF` =1 if `<destination operand>` is less than the `<source operand>`, i.e. after substraction there is a negative number.`ZF` = `1` if `<destination operand>` = `<source operand>` and the result is zero.

Let’s assume that `EAX` = `0x99` and `EBX` = `0x2`.

instruction description
`sub eax, 0x99` Now `eax` is `0`, therefore `ZF` = `1`.
`sub ebx, 0x10` This results in negative number, therefore, `CF` = `1`.

### SBB

`x86`

Affected by flags: `CF`

Almoust the same but a little tricky. It’s affected by `CF` flag. If `CF` = 0, then `sbb eax, 0x10` is equivalent to `sub eax, 0x10` (which is `eax - 0x10`). If `CF` = 1, then it means: `eax = eax - 0x10 - 1`.

### MUL and DIV

`x86`

Affected flags:

`CF` = `OF` = `0` if the high-order bits of the product are 0.

Both of these instructions operate on a predefined register. For example, `mul ecx` is actually `mul eax, ecx` i.e. `eax` * `ecx`. The result is stored in register `AX`, `DX:AX`, or `EDX:EAX` (depending on the operand size). The high-order bits of the product are in `AH`, `DX`, or `EDX`, respectively.

### IMUL and IDIV

`x86`

Affected flags: `CF`, `OF`

Same as `MUL` and `DIV` but operate on signed values.`CF` = `OF` = `1` if significant bits are carried into the upper half of the result. `cdq` instruction is usully used before `IDIV`. It converts a double to quad, quote:

The CDQ instruction copies the sign (bit 31) of the value in the EAX register into every bit position in the EDX register.

Forms:

1. Like `MUL` and `DIV` when the `<src operand>` is used inly.
2. `imul edi, esi` when both `<dst operand>` and `<src operand>` are used
3. `imul edi, esi, edx`, when beside the `<dst operand>` there are two `<src operand>`. The operations are as follows: `esi`*`edx` = `edi`.

`<dst operand>` is always a register or memory address. `<src operand>` can be a register, an address or a value. When a value is used, it is sign-extended to the length of the destination operand format.

NB ❗ The length of the product is calculated to twice the length of the operands. With the one-operand form, the product is stored exactly in the destination. With the two- and three- operand forms, however, result is truncated to the length of the destination before it is stored in the destination register. This is why the `CF` or `OF` flag should be tested to ensure that no significant bits are lost.

If this instruction is used for unsigned operations (since the lower half of the product is the same regardless if the operands are signed or unsigned. ), the `CF` and `OF` flags cannot be used to determine if the upper half of the result is non-zero.

## Shifts

### ROR and ROL

`x86`

Affected flags: `CF` , `OF`

Rotate the integer n-time to the left or right. When you see such an instruction, very often it is an indication of encryption. To better understand both I need an example. Let’s take an 8-bit binary value. The initial state is:

`0 1 0 0 1 1 1 0 `

See the `1` at the beginning of this number, second bit from the left (let’s call him Matt). We will then locate him after `ROR` and `ROL`.

Let’s now `ROR` (rotate bits right) by 1. Every bit is moved to the right by one position. Our first state (for future reference):

`0 0 1 0 0 1 1 1`

Where the hell is Matt now? Now, this bit is the third bit from the left.

Let’s now `ROL` (rotate bits left) by 1. Every bit is moved to the left by one position. Our second state:

`1 0 0 1 1 1 0 0`

Where the hell is Matt now? This bit is the first bit from the left, so, he’s become the most significant bit in this number 👑 .

Let’s now `ROL` the last number by 1 again. Every bit is again moved to the left. But Matt has nowhere to move! He’s falling nowhere… Where the hell is Matt now? He seemed to have got drowned, but he managed it through the swamp and emerged… But now…Matt used to be the most significant bit 👑when in the second state , but now he’s just a 💩, the least significant bit. As you can see, he’s the first from the end.This is the third state:

`0 0 1 1 1 0 0 1`

Let’s make him worthy again and give him his newly acquired and recently lost regalia. Let’s `ROR` him by 1 again and get back to the second state (unforunately he’ll have to dive into the swamp again):

`1 0 0 1 1 1 0 0`

When moving from the second to the third state Matt has been in a swamp, or in a wormhole 🐛 if you prefer a space metaphor. Let me introduce our wormhole - `CF` flag. The spirit of Matt was printed on this flag. In other, less eloquent words, when falling from the edge into the swamp, his value (`1`) was copied into `CF`. So as any other bit that would “fall”. For example, if we get back to the third (and the most unfortunate for Matt) state (`0 0 1 1 1 0 0 1`). Matt’s spirit is still there, therefore `CF` is still `1`. Let’s `ROL` this number by 1 once again, Martha (who’s now the most significant, i.e. the first bit of the number) falls into the swamp, gets copied into `CF` and emerges at the end as the least significant bit 💩, making Matt now the second least significant bit, i.e. the second bit from the end (which is not that bad now)​. Now we have the forth state:

`0 1 1 1 0 0 1 0`

and the `CF` = 0 now bearing Martha’s spirit.

The processor restricts the count to a number between 0 and 31 by masking all the bits in the count operand except the 5 least-significant bits.

### RCR and RCL

`x86`

Affected flags: `CF` , `OF`

It’s pretty much the same, with just one small difference. `CF` flag is now taken into account, it’s not just a wormhole 🐛 anymore. Let’s consider the third state from the previous examples:

`0 0 1 1 1 0 0 1`

Let `CF` be 1 now (may be it was set by some preceding operation like `ROL`).

If we now `RCL`, the fourth state will be as follows:

`CF` = Martha = 0.

`0 1 1 1 0 0 1 1`

The value that was in `CF` is now at the end of our number (`1`), and it’s the most significant bit is now in `CF`. Everyone else has just shifted to the left by 1 bit. It’s as if we were operating not on a 8-bit value, but on a 9 bit value:

MAIN value CF
`0 1 1 1 0 0 1 1` `0`

which results in something like that: `0 1 1 1 0 0 1 1` `0`.

Let’s now `RCR` back to the third state. `CF` = `0`, now it is moved to it’s place (most significant bit) `0 0 1 1 1 0 0 1` and since it was Matt (`1`) who’s falling from the cliff, `CF` = `1`. Everyone else has just shifted to the right.

Another flag, which behaviour is quite peculiar, is `OF`. It only changes when we shift by 1. When we whift by 2 or more - nothing’s happening to it. After CPU’s performed the rotates, it calculated `OF` like this. For left rotates (`RCL` and `ROL`), the `OF` = `CF XOR the most-significant bit`. For right rotates, the `OF` = `most-significant-bit-1 XOR most-significant-bit-2`. For the example above with ` RCL`, when we enetered the fourth state:

`CF` = `0` and the number itself is `0 1 1 1 0 0 1 1`.

`OF` = `0 XOR 0` = `0`

For RCR operation leading us back to the third state: `0 0 1 1 1 0 0 1`. Never mind `CF` since it’s not included in the calculations. The two most significant bits after rotation are `0` and `0` (the first two digits). `OF` = `0 XOR 0` = again `0`.

### SHL and SHR

`x86`

Affected flags: `CF`

Shifts bits by the value specified in second operand to the left or to the right. The last bit dropped off is written to `CF` “before death ☠️ “. Example:

`1 0 1 0 1 1 0 1`

Let’s `SHR` the above number: `0 1 0 1 0 1 1 0 `.

Let’s now `SHR` once again: `0 0 1 0 1 0 1 1 `.

The main rule here: for each `SHR` add a `0` at the beginning and remove one digit from the end. The same is for `SHL`: for each `SHL` add `0` to the end and remove one digit from the beginning.

Above number is 8 bit long. So we can pop 8 bits by shifting in one direction (`SHR` for example only). When the last digit is poped off its value is written to `CF`, in the example above it was `1`, hence now `CF` = `1`.

#### Useful tip

`SHL` can be used as an optimized multiplication and division by `2^n`”. Here is an example:

SHL equivalent
`shl eax, 1` eax * 2
`shl eax, 2` eax * 4
`shl eax, 3` eax * 8

### ROL/ROR vs SHL/SHR

What’s the difference between the two (well, even four)? When I inspected my old notes, I’ve got a little confused because I’d totally forgotten that. Tha’ts why I’ve included this section for future, should my memory fail me once again.

The difference between the two is pretty much the same as the difference between “rolling” and “shifting”. Say, we have a password padlock for a suitcase and set our passcode to `1234`. Then we shuffle it and have `5432`. How to open it then? We rotate each dial until we get to our passcode digits: the dial with `5` is rotated 4 times to get `1`, the dial with `4` is rotated 6 times and etc. No one would expect that when we rotate a dial on the lock, it disapears after reaching the end. But that would be the case if the operation in the padlock’s intestines was shifting. And that what’s happening to the shifted bits when shifting: So, in `ROR/ROL` instruction no bits are lost, all of the bits of the original number are preserved. They are just rolling like those numbers in the lock 🔒. But with `SHL/SHR` instruction the numbers are dimped into a 🐛 wormhole and never seen again. If we shift long enogh, we turn any number to a bunch of zeroes until the only footprint left would be a `CF` flag which will hold the last shifted and dropped off bit. But even this would be overwritten with `0` shoud you shift one last time… ## Comparisons

### TEST

`x86`

Affected flags: `ZF`

A beautiful instruction in that it’s so simply and lightweight. It does the same as `AND` but operands are not changed. The result is in `ZF` (either `0` or `1`).

AND test
`and eax, eax` `test eax`

Interesting! 😮 ​The above operations are identical, but the second takes less CPU cycles. It’s usually used to test, whether the value is `0`.

### CMP

`x86`

Affected flags: `CF`, `ZF`

This one is like `SUB`. It’s almoust the same as `SUB eax, edx`, for example. This instruction, just like the previous one, doesn’t change operands, however:

ZF CF
dst = src `1` `0`
dst < src `0` `1`
dst > src `0` `0`

When dst = src, dst - src = 0 therefore we set `ZF` (zero flag) to `1`. When dst < src, dst - src = negative number therefor `CF` (carry flag) = `1`. When everything is primitive (dst > src), dst - src = positive number, hence no flags are changed.

❓ When some of the flags were changed suring some previously performed operation, are they reverted to the states above according to the values in dst and src? Example, if `ZF` = 1 before our `CMP dst, src` where dst < src, will `ZF` be set to `0` after this instruction is executed?

## Buffers

### REP

`x86`

Affected flags: `ZF`

This class of instruction is comprosed of different kinds of loops. It uses RSI or ESI as the source (ESI means “source index”) and EDI or RDI as the destination (EDI means “destination index”). ECX (counter) is used as a … surprise-surprise… a counter. There are several types of `REP` instruction:

instruction description
`rep`
`repe` or `repz`
`repne` or `repnz`

`REP` family is never seen alone. It’s always followed by some operation. Why? Because basically it’s a repetition. You can’t repeat nothing. There repeatiotions are performed on buffers (strings, for example). There are 4 possible operations seen with `rep`:

instruction description C++ analogr
`repe cmpsb` Compare two buffers `memcmp`
`rep stosb` Set all bytes to some value in `AL` `memset`
`rep movsb` Copy a buffer `memcpy`
`repne scasb` Search for a byte

`repe cmpsb`. To better illustrate, I’ve written the below pseudocode:

``````function bool compare(){
edi = ['d','s', 't', '1'];
esi = ['s', 'r', 'c', '1'];

for(ecx = len(edi); ecx >= 0; ecx--){
if (edi[ecx] != esi[ecx]) return false;
}
return true;
}
``````

`ECX` is set to the buffer’s length, `ESI` - is a pointer to the first buffer, `EDI` - the pointer to the second. The loop runs until `ECX` = 0 or the bytes compared are different. The above loop will run 2 times and return false when `ecx` = `2` since `edi='t'` and `esi = 'c'` which means that the buffers are different and there is no need to run the loop further.

`rep stosb`. Destination buffer - `EDI`, source - `AH`. `ECX` is a counter.

``````function buffer[] init(){
ah = 'a';
edi = [];
for (ecx = len(edi); ecx >= 0; ecx--){
edi[ecx] = ah;
}
return edi;
}
``````

The above loop will run 3 times. Upon function return `edi` = `['a', 'a', 'a']`. Very often is seen after `xor eax, eax`, since `xor` something on itself returns that something being filled with `0`, i.e. it means zeroing out a value. And we need to make sure there is no garbage lurking in `EAX` before setting it the desired value (in our example, `'a'`) to be later used to set `edi` to `a`. Just to remind, `al` is the lowerst byte of `EAX` register.

`rep movsb`. `ESI` - source buffer, `EDI` - destination buffer, `ECX` - counter.

``````function void copy(){
esi = ['s', 'r', 'c'];
edi = ['d', 's', 't'];

for (ecx = len(edi); ecx >= 0; ecx--) {
edi[ecx] = esi[ecx];
}
return;
}
``````

The above loop will run 3 times. At the end, `edi=esi = ['s', 'r', 'c']`.

`repne scasb`. `EDI` - buffer address, `AL` - byte to search. `ECX` - counter.

``````function bool search(){
edi = ['d', 's', 't'];
al = 'd';

// len(edi) = 3
for (ecx = len(edi); ecx >= 0; ecx--) {
//this will return true on the 3rd iteration, when ecx = 1
if edi[ecx] == al return true;
}
return false;
}
``````

The above loop will run 3 times, and on the 3rd time being run it’ll return `true`, because `edi` = `'t'`.

## Jumps

### JMP and friends

`x86`

Affected by flags: `ZF`, `OF`

In general, these instruction have this skeleton: `jmp location`.

instruction description note
`jmp` unconditional jump, meaning “Jump Forest, jump!” no matter what
`jz` Jump if `ZF` = 1 (the result of previous instruction was `0`)
`jnz` opposite to `jz`. Jump if `ZF` = 0 (the result of previous instruction was not `0`)
`je` if the result of preceding `cmp op1, op2` was 0 (the operands were equal)
`jne` opposite to `je`. Jump if the result of preceding `cmp op1, op2` was not 0 (the operands were not equal)
`jg`, `ja` Jump if the result of preceding `cmp op1, op2` was a positive integer (op1 > op2, is greater). `ja` for unsigned comarison.
`jge`, `jae` like `jg` or `ja` combined with `je`. Jump if the result of preceding `cmp op1, op2` was a positive integer or 0 (op1 >= op2, is greater or equal). `jae` for unsigned comarison.
`jl`, `jb` opposite to `jg`, `ja`. Jump if the result of preceding `cmp op1, op2` was a negative integer (op1 < op2, is less). `jb` for unsigned comarison.
`jle`, `jbe` like `jl` or `jb` combined with `je`. Jump if the result of preceding `cmp op1, op2` was a positive integer or 0 (op1 <= op2, is less or equal). `jbe` for unsigned comarison.
`jo` jump if the result of the previous instruction set `OF` to `1`
`js` jump if the result of the previous instruction set `SF` to `1`
`jecxz` jump if `ecx` = 0

Getting familiar with jumps. Below is the table of examples. Try to quickly determine the location of the jump. Answers are listed right below the table, so spoiler alert! ❗

## Logical

### AND

`x86`

Interesting! :open_mouth :Can be used to clear some bits with a mask. For example, if you have `1100 1011` and you need to zero all bits out. All you need to do, is to `and` with `0000 0000`. To determine whether an integer is even or not, mask it with `0000 0001`. Even numbers have `0` at the end, and uneven - `1`. `and`ing an even number with `0000 0001` will result in `0` and `and`ing an uneven - with `1`. Also, you can make a number less by 2, 4, 8 etc by applying a corresponding mask:

`1101` substract 2
`0111` Subtract 8
`1011` substract 4

### OR

Set to `1` if either of the bits is `1`. Repeat for each bit of the first operand and the second operand. Writes to the destination operand.

Interesting! 😮 Can be used to set all bits to `1` with `1111`. ​ For example, we have `1110 or 1111` = `1111`. Basically, any value `or`ed by `1111` is `1111`.

### XOR

Exclusive OR. 1 if the first operand’s bit is not equal to the second’s.

Interesting! 😮 A quick way to set `eax` to `0`. Operation’s `xor eax, eax` opcode is `33 C0` (2 bytes) while `mov eax, 0` - opcode `b8 00 00 00 00 ` which is 5 bytes (costy 💴 ).

Also, an interesting observation to investigate further: If I mask any value with `1111` , I get an operation equal to substraction (unsigned):

`1010 xor 1111` is `0101` (5 in decimal)

`1011 xor 1111` is `0100` (4d)

`1100 xor 1111` is `0011` (3d)