Any FAT partition has two main parts: system area and data area. System area contains FAT boot record (every file system has a boot record), 1st FAT and 2nd FAT. FAT12 and FAT16 also have Root directory in the system area. Data area consists of Root directory in case of FAT32 and file and subdirectory data in clusters.
Versions
FAT12
For the cluster addressing there were 12 bits available, which is 2^12 clusters at most (4096 clusters).
FAT16
For the cluster addressing there were 16 bits available, which is 2^16 clusters at most (65536 clusters).
FAT32
For the cluster addressing there were 28 bits available (not 32), which is 2^28 clusters at most (268 435 456 clusters). 4 bytes were reserved. Unlike in FAT12 and FAT16 the Root directory is in the data area, giving more space for data.
ExFAT
Uses all 32 bits for cluster addressing. Thus, the maximum number of clusters is 2^32 which gives us 4 294 967 296 clusters to call. First supported by Windows CE 2006. Max volume size was 64 zettabytes โ. File size limit - 16 exabytes โ. Cross-platform. Used for large external media.
FAT Structure
Volume Boot Record
โ ๏ธ Not the MBR!!!!
Located at sector 0 of the volume (โ๏ธ not the physical sector 0). Starts with a jump instruction 3 bytes long at offset 0x0
(relative to VBR). Contains information about the volume (offset relative to the VBR start - size - name - description). Relevant information below:
0x3
- 8 - OEM ID. Most likely MS-DOS5.0 for Win2000 and above.0x0B
- 2- Bytes per sector (512 usually)0x0D
- 1 - Sectors per cluster0x0E
- 2 - Reserved sectors0x10
- 1 - Number of FATs (2, one of them is for backup purposes)1C
- 4 - Hidden sectors ๐ (preceding the volume)- Total sectors (size of the volume)
0x16
- 2 - Sectors per FAT (FAT12 and 16)0x24
- 4 - Sectors per FAT (FAT32)0x2C
- 4 - Starting cluster of the root dir (2nd usually)0x32
- 2 - Back-up boot sector location (6th usually)0x43
- 4 - Volume serial number. In case of thumbdrive, this serial number can be used to track the device across the PC and other systems.0x47
- 11 - Volume name/label ๐ท (not user defined, “NONAME” usually)0x52
- 8 - FS type
Here is the full information available:
Root Directory
The name speaks for itself. It’s the highest node in the dir structure of this volume, consists of 32 byte dir entries. Lists files and dirs in the root directory. FS stops reading these entries when it sees anything starting with 0x0
. So, data written here won’t be seen by the OS and this is one of the ways to hide data. Types of root dir entries:
-
volume name (user created)
-
short file name (8 uppercase letters + “.” + 3 letters for extension)
-
8 bytes for the file name (uppercase), always includes a
~
at the -2 offset. -
3 bytes for extension
-
1 byte for attributes (hidden (
0x02
), read-only (0x01
), system (0x04
), volume label (0x08
), directory (0x10
) and archive (0x20
)). These attributes can be combined. Flags occupy just one byte and when there are more than one flag, their values are combined (like the access flags on Unix systems). See the attributes on the picture below. Only the bit forvolume
is set, which, given its position, has the value0x08
. -
1 reserved byte -
0x00
for the long file name (not-8.3 compliant) and0x10
for short file name (8.3 compliant) [5]. In my case, however, it was always0x00
. -
1 byte for created time in 10 milliseconds
-
4 bytes for created date and time. See how the data is converted in the Timestamps section below.
-
Two bytes for Last accessed date (โ ๏ธ no time!)
-
Pointer to the first cluster of the file/directory (high word). If the file is somewhere close to the disk start, it will be equal to
00 00
. -
Four bytes for modified date and time
-
Pointer to the first cluster of the file/directory (low word).
-
Four bytes for the file size in bytes. โ ๏ธ It’s always
00 00 00 00
for directories!!!
-
-
long file name. Can consist of several entries (each 32-bits). If the file name is more than 8 characters, there will be more than one long entry. These are called a set. The last one contains the last characters + extension.
- 1 byte for a status byte. When the entry is the last in the set (1 set for each file), its sequence number starts with
4
. Otherwise, it indiacates the entries number. - 10 bytes for the file name (Unicode chars), i.e. for 5 characters of the name since one Unicode character occupies 2 bytes. If the file name is not long enough, unused bytes are filled with
0xFF
. - 1 byte that’s always
0x0F
and indicates a long file name. - 1 byte reserved. In case it’s
0x00
- it’s a long file name. If it’s0x10
- short file name. So, along with the previous byte it can be used to determine that’s the long name entry [5]. In my case, however, it was always0x00
. - 1 byte for error correction (checksum)
- 12 bytes for the next part of the file name (6 Unicode characters). If the file name is not long enough, unused bytes are filled with
0xFF
. - 2 bytes of zeroes
- 4 bytes for the next part of the file name (2 Unicode characters). If the file name is not long enough, unused bytes are filled with
0xFF
.
- 1 byte for a status byte. When the entry is the last in the set (1 set for each file), its sequence number starts with
See below the example of a long entry set for a file with the name asdjasdlkjasldkjsalkdaskljdjaljdajd.txt
. The lowest part - the short file name. Then there are three long file name entries. The first byte in the yellow area is 0x01
, meaning, it’s the first entry in the set. The next (green one), has the first byte (sequence byte) set to 0x02
, indicating it’s the second entry in the set.The last one (on the top, colored in red) has the status byte 0x43
. The first nibble (4
) indicates that this is the last long entry in the set. The second nibble (3
) is the sequence number.
If the entry is a folder, its size will be 00 00 00 00
(the last 4 bytes). In its short file name entry find the first cluster, go this cluster and you’ll see another list of files but that are in this folder. Find the entry for the file you are looking for.
FAT high word. Start at 65536 and continue as power of two.
โ Hot to get there?
From the FAT32 Boot Record get the following information:
- bytes per sector (2 bytes at offset
0x0B
) - marked with orange ๐ - sectors per cluster (1 byte at offset
0x0D
) - marked with green ๐ - number of reserved sectors (2 bytes at offset
0x0E
) - marked with dark blue ๐ซ - number of FATs (one byte at offset
0x10
) - marked with coral ๐ฆ - sector per FAT (four bytes ar offset
0x24
) - marked with blue ๐
All this information is needed to get the offset to the start of data area (in sectors!!!!). Since root directory is the first in the data area (FAT32 only), this will give us what we are looking for. Root directory is usually at the 2nd cluster relative to the start of the VBR, but better check at the offset 0x2C
(keep in mind the endianness). Then, we need to calculate the number of bytes to the data area/root directory.
int root = (number_of_FATs * sectors_per_fat) + reserved_sectors
Here is an example:
The values on the right pane represent raw data in hex, and on the left - its human-readable interpretation by Active@Disk Editor using a FAT32 Boot Record template. We get the value at 0x0B
which is 0x200
(when coverted to little-endian) and which equals 512 in decimal (marked in red). These are bytes per sector ๐.
We then go to offset 0x0D
which is 0x8
and 8 in decimal (marked in green). That’s sectors per cluster ๐. Reading at offset 0x0E
๐ซ I get the number of reserved sectors (4110 in decimal). The number of FATs ๐ฆ is 2 (standard for FAT32) at offset 0x10
and the number of sectors per FAT ๐ is 2041 in decimal.
Let’s use the data in the formula: (2 * 2041) + 4110 = 8192
. This is the starting sector of data area and usually it’s the same with the root directory. Now, to get the numbrer of bytes to this root directory, we need to multiply 8192
by bytes per sector (512) = 4194304 in decimal (0x400000
in hex). Our VBR starts at 0x00010000
in hex or 65536
in decimal. Adding 4194304 to 65536 (or 0x400000
to 0x10000
) will give us the offset to root directory from the start of VBR ( โ ๏ธ not GUID/MBR header). This value is 4259840 (0x410000
). Let’s go there in hex editor or if you are using Active@Disk Editor, using Go to offset button on the top pane. Voila!
You may ask, what’s the use of this information (knowing exactly how to find the offset of the root directory). There might be cases when we need to perform manual file recovery using a hex editor (remember, Active@Disk Editor is for Windows and Linux only) or with limited tools. Also, I find it useful to understand what the tool does, since tools fail too sometimes, as well as humans. We need to verify each other from time to time.
โ One question that remains for me here: why the hell is the root cluster
0x2
? The second cluster is2 * (8 * 512) = 8192
bytes away from the VBR start and there is no root dir there. However, when clicking on the field in Active@Disk Editor, you’ll get to the root dir.
Directories (not Root)
Entries for directories have 00 00 00 00
as file size, 0x10
as attributes value and 20 20 20
as extension. They point to some location, where files are listed. So, basically, a folder is just a pointer.
Root directory is the first in the tree, but not the only one. Each directory on the drive will have its own “table of contents” and its structure is a little different. It also consists of file name entries, each - 32 bytes long for FAT32.
The first 32 bytes start with a 0x2E
byte (.
in ASCII). The rest of the information has the same structure as an ordinary SFN (short file name). In this case, about this directory itself.
The next 32 bytes start with a double 0x2E
byte (.
in ASCII), i.e. ..
. The first cluster number would be 00 00
if the parent is the root directory. The rest of the information has the same structure as an ordinary SFN (short file name). In this case, about the parent directory.
๐ If you are using a Terminal or console or PowerShell and
cd
somewhere from time to time, these both entries are not mysterious for you. For those who are not -.
denotes the current directory,..
the parent one. For example, you have the following folder structure:
- root
- folder1
- folder4
< current directory
- folder2
- folder5
- folder6
- folder3
If we opened folder4, the value of
.
(current directory) is the address of folder4. While remaining in this folder,..
(parent directory) for folder4 is folder1...
(parent directory) for folder1 is root. So,
File Allocation Table (FAT)
Keeps track of clusters in use and free ones. There are FAT1 and FAT2 (the same). FAT2 for backup. Both are located in the system area. Also, singly-linked list, each entry points to the next cluster of a file (in case a file is fragmented, i.e. occupies more than one cluster). 0x00000000
- the cluster is free, 0xffffff0f
- end of file, then pointer to the next cluster (if any).
The first four bytes in the FAT32 (two bytes in FAT12 and FAT16) is the media descriptor (0xF8FFFFFF
, usually indicates a fixed disk). The next four bytes are for the FAT type (0xFFFFFFFF
in case of FAT32). Then each four bytes tell us about each cluster ordered sequencially, i.e. CL-2 (contains the root directory in FAT32), then CL-3, then CL-4 etc. It can have three possible values: 0x00000000
- if the cluster is free, 0xffffff0f
- if the cluster is the last in the chain (end of file, EOF marker) or a pointer to the next cluster (if any, convert to little-endian and then to decimal to find it).
๐ Using Active@Disk Editor, press Navigate > Primary FAT32 > FAT1
button on the top pane. Here is the example from my flash drive:
Discarding the media descriptor and FAT version (the first 8 bytes) in the picture above we can see several clusters, most of them containing EOF marker (0xFFFFFF0F
). That means, most of the files occupy one cluster only. Clusters 2-12 are marked with green and orange rectangles for better accessibly. Cluster 13 is marked in red and its value is 0x014A
(330
) after converting to little-endian. That means that cluster 13 is not the last one in the chain and the next cluster is cluster number 330.
Now, this is what a contiguous file would look like in FAT table (a file that occupies more than 1 cluster and the clusters following one another on disk):
Let’s read each 4 bytes starting from the cluster with the value 0x70 00 00 00
. This is the first cluster in the chain that points to the next cluster (0x70
cluster, i.e. cluster 112 in decimal). Cluster 112 (0x70
), the next one, points to the next cluster in the chain (0x71
, i.e. 113) and so on and so forth. But as you may notice it’s easy to note such chains.
To make it easier to read FAT32 table (which has 4 bytes for each entry), you may change view preferences in Active@Disk Editor (File > Preferences > Disk Editor > Bytes per line > 4
).
Timestamps
Local times, not UTC! For Last Accessed we only have date, no time!
Four bytes for date and time created (first two bytes for time ๐ฐ and the second two bytes for date ๐). The time bytes are first converted from the little-endian notation (two bytes flipped). In the example below, for the yellow short entry the created date and time 0x55 0x6E 0x31 0x53
. Bytes 0x55 0x6E
are for the time and 0x31 0x53
are for the date. Let’s take the time bytes 0x55 0x6E
. Flip them to convert from the little-endian notation: 06E 0x55
. Convert each nibble to a binary value: b0110
, b1110
, b0101
, b0101
. Now, write them in a row and separate with the following template in mind: 5 bits - 6 bits - 5 bits: b01101 b110010 b10101
. The first five bits are for hours, the next 6 bits are for minutes and the last ones are for seconds. Then, each value is converted into a decimal separately to get us 13 hrs, 50 mins and 21 seconds in the end (see the above picture only shows hours and minutes). For the date ๐ value the template is 7-4-5 but the process is the same.
FAT File Creation And Deletion
Creating
Steps that are taken when a file is created:
- A directory entry is created is written to the parent directory
- Data is written to the first available cluster
- Entries in the FAT1 and FAT2 are made for all the clusters used
Deleting
Steps that are taken when a file is deleted:
- The first character of the directory entry is changed to
0xE5
. - The clusters in the FAT1 and FAT2 are filled with zeroes.
- The data area remains unchanged (โ ๏ธ data is still out there!).
exFAT Structure
Also consists of System (Boot sector, backup boot sector, FAT1) and Data areas. exFAT doesn’t have FAT2.
Boot Record
Located at sector 0
of the volume. It contains the information about the volume (as usual).
โ ๏ธ Offsets are relative to the start of the volume.
In general, it looks very close to FAT32. Something new - bytes per sector shift
(between 9 and 12) and bytes per cluster shift
(~sectors per cluster, 0-25). The value in this field is the power to which we need raise 2 to get the result. For example, if the field bytes per sector shift
is set to 9, we raise 2 to the power of 9 (2^9 = 512
, which means each sector is 512 bytes long). The same math is applied to the bytes per cluster shift
field.
FAT1
32-bit entries. Media descriptor is the same: 0xFF FF FF F8
. Only tracks file fragmentation and doesn’t track file allocation (Bitmap is used for that instead)!
Root Directory
Types of directory entries:
- volume label (critical primary),
0x83
. User-created name for the volume. โ ๏ธ Must be there! - file directory (critical primary) ๐. Tracks attributes, MAC times (UTC), โ ๏ธ but doesn’t point to the parent directory (no
..
entry like in FAT32).0x85
- in use,0x05
- free. โ ๏ธAll files will have this entry! - stream extension directory (critical secondary) ๐. Size and start of the file. โ ๏ธ Size of the filename is here (in characters)! Starts with
0xC0
if in use,0x40
if not.Not FAT chain
- if set, it’s not a fragmented file. Also, there are two interesting values:Valid data length
(init size) andData length
. If, say a file was downloaded, FS will allocate certain amount of space. But if that download was interrupted, then the file won’t occupy all the space, thus these two values will be different. - file name (critical secondary) ๐. Unicode for file name.
0xC1
if in use,0x41
if not. Up to 15 Unicode chars. Might be more than one (~like long file names in FAT, can be more than one entry in case it’s a long name). - system files (critical primary)
- Bitmap
0x81
. Usually starts at cluster 2. - Upcase
0x82
. Usually starts at cluster 4.
- Bitmap
Entries mared with ๐ are those, that make up a directory set. Below is the breakdown of a directory set. The first byte (0x85
in this case) indicates that the file is currently in use. It could be 0x05
if it were deleted.
The next byte is the secondary count, which indicated how many other directory entries we have in this file directory entry set (โ). The next two bytes are for error checking and then the next two 0x3A 00
are attributes. The we have MAC times. We have last accesssed time, which we didn’t in FAT32.
Additional Files
exFAT has two additional files that FAT32 does not. They probably come from the NTFS…: Bitmap
and Upcase
(table of Unicode chars, used to convert characters for searching).
โ ๏ธ Both files have an entry in the root directory, but don’t have a filename.
Timestamps
UTC! The timezone offset is in 15mins increments (see a breakdown below). So, we have created, modified and last accessed date/times. Each of these timestamps will have a corresponding UTC offset in the file directory entry. It makes sense that these are always the same. In my case, 0xF4
-2
in decimal, which would make a UTC-2
(London timezone). We now know (in this case) that MAC times are for London timezone.
โ ๏ธ We have last accesssed time, which we didn’t in FAT32.
Below is the breakdown of how to convert a UTC byte into a human-readable value.
exFAT File Creation and Deletion
Creation
- directory set created
- bitmap for allocated clusters set to
1
- FAT updated (if fragmented)
- data written to the allocated clusters
๐งช Are the timestamps updated for a deleted file in the directory entry? exFAT, FAT32 and NTFS as well.
Deletion
- first bytes of each entry in the directory entry set is set in a way to show the file is not in use (
05
for a file directory entry,40
for stream41
for filename entries). - bitmap entries for this file clusters are set to
0
- FAT may or may not be zeroed out
- file/dir contents remains there until and if is overwritten
โ ๏ธ If the parent folder is deleted, the child entries remain unchanged.