Logo
RSS Feed

Interview Prep

Created: 28.07.2022

This is about … .

Resources

  1. https://www.coursera.org/professional-certificates/google-cloud-security?action=enroll#courses
  2. https://www.educative.io/courses/grokking-coding-interview-patterns-python/qZnwmQO8ADp
  3. https://github.com/gracenolan/Notes/blob/master/interview-study-notes-for-security-engineering.md

Temp Priority

Coding

Strings

I am mostly using Python, so, I need to understand how Python implements string and what’s the time and space complexity for string operations. I know that strings are immutable.

Keep in mind that string.replace() creates a new string, as strings are immutable in Python. So, if you need to make several replacements in a string, consider using a mutable data structure like a list or a StringBuilder (from the io module) to avoid creating multiple new strings and improve performance.

💡 A very interesting approach to use bitwise operations to store a set of boolean values with just two bool operations: bit_vector & (1 << char_code) to check and bit_vector |= (1 << char_code) to set the bit. I have moved the detailed explanation to the reverse section (reverse -> basics -> assembly)

Unique strings

I have come up with several solutions, and additionally, I’ve worked through one more which is a significant improvement time and space-wise:

# The solution using a hashtable is the most straightforward and simple. Hashtables are the best option when it comes to efficiently searching for duplicates in terms of time complexity. However, hashtables require additional space, which might be an issue in some cases. In this particular situation, it's not a concern because we only have 256 characters to consider, which essentially amounts to O(1) space complexity.
def check_if_all_char_unique_1(s):
    register = {}
    if len(s) <= 0: return False
    for char in s: 
        if char in register: return False
        else: register[char] = 1
    return True
# Time complexity: O(n * 1). We move through the string (n) and for each character we do the insert and check operation which requires constant time O(n * 1).
# Space complexity: O(n). We keep an additional structure with the values. In case of a string, with ASCII characters, that would be O(1), constant, since there is a limited number of characters out there. However, in case of, say, numbers, it becomes O(n).

# The same solution but using collections.Counter 
def check_if_all_char_unique_counter(s):
    if len(s) <= 0: return False
    char_count = Counter(s) # populate the hashtable, the same as the loop in the previous code snippet.
    return all(count == 1 for count in char_count.values()) # returns True if all values (`count`) are equal to 1.
# O(n + m) We need to iterate through the whole string which is, of course, O(n) and then iterate through the dictionary on the last line, which is O(m). The total TC thus becomes O(n+m).
# O(n)

def check_if_all_char_unique_2(s):
    characters = list(s)
    characters.sort()
    for char_index in range(len(s)-2): 
        if characters[char_index] == characters[char_index + 1]: return False
    return True

# O(2n + n*log n)
# O(n)

def check_if_all_char_unique_3(s):
    for index in range(len(s) - 1):
        if s[index] in s[index+1:]: return False
    return True

# O(n * m)
# O(1)


# In this solution we still need to iterate over the whole string, that's why it's O(n), but the << & and |= operations are performed in constant time, so, it can be neglected.  The space complexity is O(1) since we do not use any additional data structures to keep track of our results. The bitmap is of constant length and it's size is so small that it can be neglected as well.
def check_if_all_char_unique_4(s):
    bitmap = 0

    for char in s:
        mask = 1 << (ord(char) - ord('a'))
        if bitmap & mask: return False
        bitmap |= mask

    return True

# O(n + 1)
# O(1)

Permutations

from collections import Counter

# Time Complexity: O(2n + m)
# Space Complexity: O(n)
def check_if_perm_hash_table(s1, s2):
    if len(s1) != len(s2): return False
    if len(s1) or len(s2) <= 0: return False

    characters = Counter(s1)

    for char in s2: 
        if char in characters: characters[char] -= 1
        else: return False
    return all(counter == 0 for counter in characters.values())


# O(2n * log n + n) + whatever it takes to get the length of a string O(1) because it's basically an array. Why is it immutable then? 
# O(2n), although, if the string only contains ASCII or Unicode characters, the space complexity is O(1).
def check_if_perm_sort(s1, s2):

    if len(s1) != len(s2): return False

    s_1 = sorted(s1)
    s_2 = sorted(s2)
    
    for i in range(len(s_1)):
        if s_1[i] != s_2[i]: return False
    return True

Replace (URLify)

# TC: O(n)
# SC: O(n)
def urlfy(s):
    result = ""

    for char in s:
        if char == " ": result += "%20"
        else: result += char
    return result

# This is an improvement only if we are given an array in the first place since 
# strings in Python are immutable.
def urlfy_list(s_list, trueLength):
    l = list(s_list)
    space_count = 0
    index, i = 0, 0

    for i in range(trueLength):
        if l[i] == ' ': space_count += 1 
    
    index = trueLength + space_count * 2 # the first pointer starts from the end of data structure, including all the additional spaces
    if trueLength < len(l): 
        l[trueLength] = "\0"
    for i in range(trueLength - 1, 0, -1): # the second pointer starts from the end of THE ACTUAL string
        if l[i] == ' ': 
            l[index - 1] = '0'
            l[index - 2] = '2'
            l[index - 3] = '%'
            index = index - 3
        else: 
            l[index - 1] = l[i]
            index -= 1
    
    return ''.join(l) 

Compressor

def compress(s):
    if len(s) <= 2: return s
    p_1, p_2 = 0,1
    result = list()
    while p_2 < len(s):
        while s[p_1] == s[p_2]: 
            p_2 += 1
            if p_2 == len(s): break # this line is important. Without it I will get overflow error or the wrong number of the characters at the end.
        result.append(str(s[p_1]))
        result.append(str(p_2 - p_1))
        p_1 = p_2
        p_2 += 1
    
    if len(result) >= len(s): return s
    return ''.join(result)
    

print(compress("asdasdasd"))

I need to use list instead of a string, because the string concatenation might increase the time complexity of the function which is going to be noticable for larger strings.

Permutations Palindrome

from collections import Counter

def check_if_perm_hash_table(s1, s2):
    if len(s1) != len(s2): return False
    if len(s1) or len(s2) <= 0: return False

    characters = Counter(s1)

    for char in s2: 
        if char in characters: characters[char] -= 1
        else: return False
    return all(counter == 0 for counter in characters.values())
# Time Complexity: O(2n + m)
# Space Complexity: O(n)

def check_if_perm_sort(s1, s2):
    if len(s1) != len(s2): return False

    s_1 = sorted(s1)
    s_2 = sorted(s2)
    
    for i in range(len(s_1)):
        if s_1[i] != s_2[i]: return False
    return True
# O(2(n * log n) + n) + whatever it takes to get the length of a string O(1) because it's basically an array. Why is it immutable then? 
# O(2n)



print(check_if_perm_sort("asdff", "fdsaa"))

Similar Strings

def is_similar(s1, s2):
    if abs(len(s1) - len(s2)) > 1: 
        return False
    if len(s1) - len(s2) == 1:
        return insert_one(s2, s1)
    if len(s2) - len(s1) == 1:
       return insert_one(s1, s2)
    if len(s1) - len(s2) == 0:
       return replace_one(s2, s1)

def replace_one(s1, s2):
    strike = False
    for i in range(len(s1)):
        if s1[i] != s2[i]:
            if strike == True: return False
            strike = True
    return True

def insert_one(s1, s2):
    s1_p, s2_p, strike = 0, 0, False
    while s1_p < len(s1) and s2_p < len(s2):
        if s1[s1_p] == s2[s2_p]: 
            s1_p += 1
            s2_p += 1
            continue
        else:
            if strike: return False
            strike = True
            s2_p += 1
    return True
        
print(is_similar("abbba", "abbbb"))

Bitwise

Bitwise magic article in the reverse section explains all the magic in detail. Here is just the code.

Extra character

# TC: O(3n) for each cycle
# SC: O(1) no additional space is required other than result var which is always of the same size regardless of the str1 or str2 length.
def extra_character_index(str1, str2):
    result = 0
    
    if len(str1) - len(str2) != 1:
        if len(str2) - len(str1) != 1: return -1

    for i in range(0, len(str1)):
        result ^= ord(str1[i])
    for i in range(0, len(str2)):
        result ^= ord(str2[i])
    
    chracter = chr(result)

    if len(str1) > len(str2): 
        for i in range(0, len(str1)):
            if str1[i] == chracter: 
                return i
    else:
        for i in range(0, len(str2)):
            if str2[i] == chracter:
                return i
            
    return -1


print(extra_character_index("hello", "helloq"))

10th Complement

from math import log2, floor

# O(log n) due to the bin() and int() operations
def find_bitwise_complement_mine(num):
    if isinstance(num, int): 
        list_of_num = list(bin(num))[2:]
        for i in range(0, len(list_of_num)):
            if list_of_num[i] == '1': list_of_num[i] = '0'
            else: list_of_num[i] = '1'
        
        return int(''.join(list_of_num), 2)
    else:
        return -1

# 🔥 TC: O(1)
# 🔥 SC: O(1)
def find_bitwise_complement(num):
    if isinstance(num, int): 
        bits_needed = floor(log2(num)) + 1
        xor_mask = pow(2, bits_needed) - 1
        return num ^ xor_mask
    else:
          return -1
    

print(find_bitwise_complement(42))

Flip image

# swapping can be done without the use of tmp variable. However, wether to use a tmp var or XOR swap - doesn't really matter, it doesn't add to the overall complexity, but can make the code less readable. It's only that XOR might be more efficient on the hardware level. 
# TC: O(n^2) since we need to iterate over the whole matrix which is n x n in size. 
# SC: O(1) since we are not using any extra space other than row_length and mid which are of fixed size
def flip_and_invert_image(image):
    row_length = len(image)
    mid = (row_length + 1) // 2
    for row in image:
        for column in range(mid):
            if column != row_length - column - 1:
                row[column] = row[column] ^ 1
                row[row_length - column - 1] = row[row_length - column - 1] ^ 1                
                row[column] ^= row[row_length - column - 1]
                row[row_length - column - 1] ^= row[column]
                row[column] ^= row[row_length - column - 1]
            else: 
                row[column] = row[column] ^ 1
            
    return image

# print(flip_and_invert_image([[1, 1, 0], [1, 0, 1], [0, 0, 0]]))

# TC: O(n^2) since we need to iterate over the whole matrix which is n x n in size. 
# SC: O(1) since we are not using any extra space other than row_length, temp and mid which are of fixed size
# This solution looks more readable.
def flip_and_invert_image(image):
    row_count = len(image)
    mid = (row_count+1)//2
    # iterate over each row of the image
    for row in image:
        # iterate over first half of each row
        for i in range(mid):
            # flip and invert the row
            temp = row[i] ^ 1
            row[i] = row[len(row) - 1 - i] ^ 1
            row[len(row) - 1 - i] = temp
    # return the flipped and inverted image
    return image

Strings, permutation

There are other solutions to this problem, but when reading about XOR I realised that the permutation problem could benefit from the XOR operation. However, there is a problem here. As I have demonstrated in my bitwise magic article earlier (https://bakerst221b.com/docs/reverse/basics/bitwise/), something XORed with itself results in 0. And something XORed with 0 results in itself. In the below code I XOR all the characters of the both strings with each other and check if the result is 0. Even if we are checking the string length, if we can’t be sure that there can’t be more than 1 different character, we might encounter the following input resulting in True when it’s not: aa and ff. The XOR operation of a XOR a XOR f XOR f will result in 0, although the characters from the both strings are obviously different. What I’ve deciced to do is quite simple.

After I XOR all the characters of the first string, I check if it’s 0. If it is, I know that all the characters there are in pairs. Then I XOR all the characters from the second string and check the result. If both are 0, there is a chance I have this case and I fall back to the previous check_if_perm_hash_table, less efficient algorithm. That effectively makes my code a lot more complex and the overall time complexity becomes O(n) at best (if I don’t envoke the check_if_perm_hash_table function) and O(3n + m) in the worst case scenario (when I do). The space complexity is O(1) if I don’t get that corner case versus O(n) if I do. Not the most elegant solution, but maybe I come up with something better in the future.

from collections import Counter

def check_if_perm_hash_table(s1, s2):
    if len(s1) != len(s2): return False
    if len(s1) or len(s2) <= 0: return False

    characters = Counter(s1)

    for char in s2: 
        if char in characters: characters[char] -= 1
        else: return False
    return all(counter == 0 for counter in characters.values())
# Time Complexity: O(2n + m)
# Space Complexity: O(n)

def check_if_palindrome_perm(s1, s2): 
    if len(s1) != len(s2): return False # O(n)
    result = 0
    first_is_paired = False
    for char in s1: result ^= ord(char) # O(n)
    if result == 0: first_is_paired = True
    for char in s2: result ^= ord(char) # O(n)

    if first_is_paired != 1 and result == 0: return True # O(1)
    if first_is_paired == 0 and result == 0: return check_if_perm_hash_table(s1, s2)
    return False
# TC: O(n) best and O(3n + m) worst
# SC: O(1) best and O(n) worst
# ❗️ Won't work for the following input "aa", "ff" since both are canceling each other resulting in 0.

Web Scraper

Write a script to scrape information from a website.

The networker.py helper function:

import asyncio
from functools import wraps
import aiohttp

semaphore = asyncio.Semaphore(12)

# Process some of the possible errors 
def process_status_code(status):
    if status == 429:
        print("Trying again...")
    elif status == 404:
        print("Something wrong with the session")
    elif status == 403:
        print("Something wrong with the session")

# probably unneccassary for the webscraper
async def process_response_type_async(response):
    if "json" in response.content_type: return await response.json()
    else: return await response.text()


def handle_errors(func):
    @wraps(func)
    async def wrapper(*args, **kwargs):
        while True:
            try:
                response, data = await func(*args, **kwargs)
                process_status_code(response.status)
                if response.status == 429: continue
                return response, data
            except Exception as e:
                print(f"Oops! Something went wrong: \n{e}") 
    return wrapper

@handle_errors
async def send_get_request_async(endpoint_url, headers):  
    async with semaphore: 
        async with aiohttp.ClientSession(headers=headers) as session:
            response = await session.get(endpoint_url)
            data = await process_response_type_async(response)
            return response, data

@handle_errors
async def send_post_request_async(endpoint_url, headers, query, json = False):  
    async with semaphore: 
        async with aiohttp.ClientSession(headers=headers) as session:
            if json: response = await session.post(url=endpoint_url, json=query)
            else: response = await session.post(url=endpoint_url, data=query)
            data = await process_response_type_async(response)
            return response, data

The scraper.py:

from bs4 import BeautifulSoup
import networker as net
import asyncio


async def get_headers(url, headers):
    response, data = await net.send_get_request_async(url, headers = headers)
    soup = BeautifulSoup(data, 'html.parser')
    headings = soup.find_all(['h1', 'h2', 'h3'])
    for heading in headings:
        print(heading.text)

async def scrape_all_urls():
    tasks = []
    urls = ["https://bakerst221b.com", "https://bakerst221b.com/docs/toolkit/"]
    for url in urls:
        tasks.append(asyncio.create_task(get_headers(url, {})))
    await asyncio.gather(*tasks)


loop = asyncio.get_event_loop()

loop.run_until_complete(scrape_all_urls())
loop.close()

Port Scanner

from ctypes import sizeof
import errno
import socket
from contextlib import closing


open_ports = []
def scan_ports_sync(ports):
    for port in ports:
        with closing(socket.socket(socket.AF_INET, socket.SOCK_STREAM)) as s:
            s.settimeout(1)
            result = s.connect_ex(("127.0.0.1", port))
            if result == 0: open_ports.append(port)


print(open_ports)

Security Themed Coding Challenges

These security engineering challenges focus on text parsing and manipulation, basic data structures, and simple logic flows. Give the challenges a go, no need to finish them to completion because all practice helps.

  • Cyphers / encryption algorithms
    • Implement a cypher which converts text to emoji or something.
    • Be able to implement basic cyphers.
  • Collect logs (of any kind) and write a parser which pulls out specific details:
    • domains - pattern = r'\b((?:(?:[a-zA-Z0-9]+[.])+[a-zA-Z]{2,}))+\b'
    • executable names, pattern = r'\b(([a-zA-Z0-9]+[.])[exe|elf|dll]{0,3})+\b'
    • timestamps
      • ^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z\d{6}$
  • Web scrapers. Write a script to scrape information from a website.
  • Port scanners.
    • Write a port scanner
    • Detect port scanning
  • Botnets
  • Password bruteforcer
    • Generate credentials and store successful logins.
  • Scrape metadata from PDFs
    • Write a mini forensics tool to collect identifying information from PDF metadata.
  • Recover deleted items
    • Most software will keep deleted items for ~30 days for recovery. Find out where these are stored.
    • Write a script to pull these items from local databases.
  • Malware signatures
    • A program that looks for malware signatures in binaries and code samples.
    • Look at Yara rules for examples.

Put your work-in-progress scripts on GitHub and link to them on your resume/CV. Resist the urge to make your scripts perfect or complete before doing this.

Theory

Networking

  • OSI Model
    • Application; layer 7 (and basically layers 5 & 6) (includes API, HTTP, etc).
    • Transport; layer 4 (TCP/UDP).
    • Network; layer 3 (Routing).
    • Datalink; layer 2 (Error checking and frame synchronisation).
    • Physical; layer 1 (Bits over fibre).
  • Firewalls
    • Rules to prevent incoming and outgoing connections.
  • NAT
    • Useful to understand IPv4 vs IPv6.
  • DNS https://2018.offzone.moscow/report/dns-rebinding-in-2k18/
    • (53)
    • Requests to DNS are usually UDP, unless the server gives a redirect notice asking for a TCP connection. Look up in cache happens first. DNS exfiltration. Using raw IP addresses means no DNS logs, but there are HTTP logs. DNS sinkholes.
    • In a reverse DNS lookup, PTR might contain- 2.152.80.208.in-addr.arpa, which will map to 208.80.152.2. DNS lookups start at the end of the string and work backwards, which is why the IP address is backwards in PTR.
  • DNS exfiltration
    • Sending data as subdomains.
    • 26856485f6476a567567c6576e678.badguy.com
    • Doesn’t show up in http logs.
  • DNS configs
    • Start of Authority (SOA).
    • IP addresses (A and AAAA).
    • SMTP mail exchangers (MX).
    • Name servers (NS).
    • Pointers for reverse DNS lookups (PTR).
    • Domain name aliases (CNAME).
  • ARP
    • Pair MAC address with IP Address for IP connections.
  • DHCP
    • UDP (67 - Server, 68 - Client)
    • Dynamic address allocation (allocated by router).
    • DHCPDISCOVER -> DHCPOFFER -> DHCPREQUEST -> DHCPACK
  • Multiplex
    • Timeshare, statistical share, just useful to know it exists.
  • Traceroute
    • Usually uses UDP, but might also use ICMP Echo Request or TCP SYN. TTL, or hop-limit.
    • Initial hop-limit is 128 for windows and 64 for *nix. Destination returns ICMP Echo Reply.
  • Nmap
    • Network scanning tool.
  • Intercepts (PitM - Person in the middle)
    • Understand PKI (public key infrastructure in relation to this).
  • ❗️ VPN
    • Hide traffic from ISP but expose traffic to VPN provider.
  • Tor
    • Traffic is obvious on a network.
      • TOR-related IP
      • TOR-related DNS
      • Short sessions, nnumerous connections
      • Tor directory protocol (often on port 9030), Tor hidden service protocols (often on ports 80 or 443), or Tor control protocol (often on port 9050). Detection of these specific protocols can indicate potential Tor usage.
      • TOR-certs
    • How do organised crime investigators find people on tor networks.
  • Proxy
    • Why 7 proxies won’t help you.
  • BGP
    • Border Gateway Protocol.
    • Holds the internet together.
  • Network traffic tools
    • Wireshark
    • Tcpdump
    • Burp suite
  • HTTP/S
    • (80, 443)
  • SSL/TLS
    • (443)
    • Super important to learn this, includes learning about handshakes, encryption, signing, certificate authorities, trust systems. A good primer on all these concepts and algorithms is made available by the Dutch cybersecurity center.
    • POODLE, BEAST, CRIME, BREACH, HEARTBLEED.
      • POODLE (Padding Oracle On Downgraded Legacy Encryption): POODLE is a vulnerability that affects the SSL 3.0 protocol, which is used for secure communication between a client (such as a web browser) and a server. It allows an attacker to exploit the protocol’s weakness in handling padding bytes to decrypt secure communications. By exploiting this vulnerability, an attacker can potentially gain access to sensitive information, such as login credentials or session cookies.
      • BEAST (Browser Exploit Against SSL/TLS): BEAST is a vulnerability that targets the encryption protocols SSL/TLS. It takes advantage of a weakness in the way block ciphers are implemented in older versions of SSL/TLS. By intercepting and analyzing encrypted data packets, an attacker can decrypt the secure traffic and gain unauthorized access to sensitive information.
      • CRIME (Compression Ratio Info-leak Made Easy): CRIME is a vulnerability that affects the compression algorithm used in SSL/TLS protocols. It allows an attacker to exploit the information leaked by the compression ratio of compressed data to recover sensitive information, such as session cookies. By manipulating the size of the compressed data, an attacker can infer the contents of the encrypted communication
      • BREACH (Browser Reconnaissance and Exfiltration via Adaptive Compression of Hypertext): BREACH is a vulnerability that targets HTTP compression, specifically the use of the gzip compression algorithm. It allows an attacker to exploit the HTTP response size and the presence of user-input in the response to recover sensitive information, such as authentication tokens or CSRF tokens. By injecting specially crafted requests and analyzing the compressed responses, an attacker can gradually deduce the secrets used in the communication.
      • HEARTBLEED: HEARTBLEED is a critical vulnerability that affected the OpenSSL library, which is widely used to secure internet communications. It allowed an attacker to exploit a flaw in the OpenSSL heartbeat extension to retrieve sensitive information from the memory of a vulnerable server. This could include private keys, usernames, passwords, and other data transmitted over the encrypted connection.
  • TCP/UDP
    • Web traffic, chat, voip, traceroute.
    • TCP will throttle back if packets are lost but UDP doesn’t.
    • Streaming can slow network TCP connections sharing the same network.
  • ICMP
    • Ping and traceroute.
  • Mail
    • SMTP (25, 587, 465)
    • IMAP (143, 993)
    • POP3 (110, 995)
  • SSH
    • (22)
    • Handshake uses asymmetric encryption to exchange symmetric key.
  • Telnet
    • (23, 992)
    • Allows remote communication with hosts.
  • ARP
    • Who is 0.0.0.0? Tell 0.0.0.1.
    • Linking IP address to MAC, Looks at cache first.
  • DHCP
    • (67, 68) (546, 547)
    • Dynamic (leases IP address, not persistent).
    • Automatic (leases IP address and remembers MAC and IP pairing in a table).
    • Manual (static IP set by administrator).
  • IRC
    • Understand use by hackers (botnets).
      • Command-and-Control (C&C) Channel: IRC provides a communication channel for botmasters (hackers) to remotely control their network of compromised computers, known as bots. The IRC server acts as a central point where botmasters can send commands to the bots and receive data or execute malicious operations.
      • Botnet Recruitment: Hackers use various techniques to infect computers and turn them into bots. Once a computer is compromised, it can be programmed to connect to a specific IRC server and join a specific channel, effectively becoming part of the botnet. The botnet can grow in size as more computers get infected and join the IRC network.
      • Command Distribution: The botmaster uses the IRC channel to distribute commands to the bots. These commands can include launching DDoS attacks, spreading malware, stealing data, or performing other malicious activities. The botnet receives commands from the botmaster and carries out the instructions, allowing the hacker to control a large number of compromised computers remotely.
      • Data Exfiltration: The botnet can be used to collect and exfiltrate sensitive data from the compromised computers. The IRC channel serves as a conduit for transmitting the stolen information back to the botmaster. This can include login credentials, financial data, personal information, or any other data that the hacker is interested in.
      • Updates and Maintenance: IRC can also be used for updating and maintaining the botnet. The botmaster can push software updates or patches to the bots, change the botnet’s behavior, or reconfigure its settings via the IRC channel.
  • FTP/SFTP
    • (21, 22)
  • RPC
    • Predefined set of tasks that remote clients can execute.
    • Used inside orgs.
  • Service ports
    • 0 - 1023: Reserved for common services - sudo required.
    • 1024 - 49151: Registered ports used for IANA-registered services.
    • 49152 - 65535: Dynamic ports that can be used for anything.
  • HTTP Header
    • | Verb | Path | HTTP version |
    • Domain
    • Accept
    • Accept-language
    • Accept-charset
    • Accept-encoding(compression type)
    • Connection- close or keep-alive
    • Referrer
    • Return address
    • Expected Size?
  • HTTP Response Header
    • HTTP version
    • Status Codes:
      • 1xx: Informational Response
      • 2xx: Successful
      • 3xx: Redirection
      • 4xx: Client Error
      • 5xx: Server Error
    • Type of data in response
    • Type of encoding
    • Language
    • Charset
  • UDP Header
    • Source port
    • Destination port
    • Length
    • Checksum
  • Broadcast domains and collision domains.
    • Collision domain - area in which all devices hear or see collisions occuring. Hubs and repeaters are in the same collision domain while switches are not.
    • Broadcasr domain - area in which all devices hear or see broadcast traffic (hubs, switches break up broadcast domains; routers do) ❓
  • Root stores (root certs)
  • CAM table overflow

Web Application

  • Same origin policy
  • ❗️ CORS
  • HSTS
  • Cert transparency, Can verify certificates against public logs
  • HTTP Public Key Pinning- (HPKP) Deprecated by Google Chrome (like SSL pinning in iOS and Android, I suupose)
  • Cookies, httponly - cannot be accessed by javascript.
  • CSRF, Cross-Site Request Forgery
  • XSS
    • Reflected XSS.
    • Persistent XSS.
    • DOM based /client-side XSS.
    • <img scr=””> will often load content from other websites, making a cross-origin HTTP request.
  • SQLi
  • Person-in-the-browser (flash / java applets) (malware).
  • Validation / sanitisation of webforms.
  • POST
  • GET. Queries, Visible from URL.
  • Directory traversal
    • Find directories on the server you’re not meant to be able to see.
    • There are tools that do this.
  • APIs
    • Think about what information they return.
    • And what can be sent.
  • Beefhook, Get info about Chrome extensions.
  • User agents. Is this a legitimate browser? Or a botnet?
  • Browser extension take-overs
    • Miners, cred stealers, adware.
  • Local file inclusion
  • Remote file inclusion (not as common these days)
  • SSRF
  • Web vuln scanners.
  • SQLmap.
  • Malicious redirects.

Infrastructure (Prod / Cloud) Virtualisation

  • Hypervisors. https://2018.offzone.moscow/report/wake-up-neo/
  • Hyperjacking.
  • ❗️Containers, VMs, clusters.
    • Escaping techniques.
      • Kernel bugs
      • Container misconfigurations
      • FS attacks
      • via processes
      • over the network
  • Network connections from VMs / containers.
  • Lateral movement and privilege escalation techniques.
    • Cloud Service Accounts can be used for lateral movement and privilege escalation in Cloud environments.
    • ❗️GCPloit tool for Google Cloud Projects.
  • Site isolation.
  • Side-channel attacks.
    • Spectre, Meltdown.
  • ❗️ Beyondcorp. BeyondCorp is a security model and approach developed by Google. ZTN.
    • Device trust
    • IAM, string auth, RBAC
    • Context
    • Monitor for abnormalities
  • Log4j vuln - https://www.linkedin.com/pulse/poc-log4j-exploits-shell-access-mitigation-lasantha-nanayakkara/

OS Implementation and Systems

  • Privilege escalation techniques, and prevention.
    • vulnerabilities - matching
    • misconfigurations - least priv, proper config
    • password cracking - strong auth
    • social engineering - amployee training
    • Examples of priv escalation
  • Buffer Overflows.
  • Directory traversal (prevention).
    • whitelisting
    • use absolute paths
    • limit permissions
    • access controls
    • WAF
  • Remote Code Execution / getting shells.
  • Local databases
    • Some messaging apps use sqlite for storing messages.
    • Useful for digital forensics, especially on phones.
  • Windows
    • ❗️Windows registry and group policy.
    • Active Directory (AD).
      • Bloodhound tool.
      • ❗️Kerberos authentication with AD.
    • Windows SMB.
    • Samba (with SMB).
    • Buffer Overflows.
    • ROP. - https://www.youtube.com/watch?v=muhqy8tm2nc
  • *nix
    • SELinux. SELinux works by enforcing security policies that define which processes can access which resources and with what privileges. These policies are based on a set of rules that specify the types of objects (such as files, directories, and network ports) and the types of actions (such as read, write, and execute) that are allowed or denied for each process on the system.
    • Kernel, userspace, permissions.
    • MAC vs DAC.
    • /proc
    • /tmp - code can be saved here and executed.
    • /shadow
    • LDAP - Lightweight Directory Browsing Protocol. Lets users have one password for many services. This is similar to Active Directory in windows.
  • MacOS
    • Gotofail error (SSL). Vulnerability in Apple OS - The error was caused by a missing “goto” statement in the code that handled SSL/TLS certificate validation, which caused the code to skip over important checks and accept any SSL/TLS certificate, even if it was not valid. Presumably Sandworm Team 🇷🇺 took advantage of it.
    • MacSweeper.
    • Research Mac vulnerabilities.

Mitigations

  • Patching
  • Data Execution Prevention
  • Address space layout randomisation
    • To make it harder for buffer overruns to execute privileged instructions at known addresses in memory.
  • Principle of least privilege
    • Eg running Internet Explorer with the Administrator SID disabled in the process token. Reduces the ability of buffer overrun exploits to run as elevated user.
  • Code signing
    • Requiring kernel mode code to be digitally signed.
  • Compiler security features
    • Use of compilers that trap buffer overruns.
  • Encryption
    • Of software and/or firmware components.
  • Mandatory Access Controls
    • (MACs)
    • Access Control Lists (ACLs)
    • Operating systems with Mandatory Access Controls - eg. SELinux.
  • “Insecure by exception”
    • When to allow people to do certain things for their job, and how to improve everything else. Don’t try to “fix” security, just improve it by 99%.
  • Do not blame the user
    • Security is about protecting people, we should build technology that people can trust, not constantly blame users.

Cryptography, Authentication, Identity

  • Encryption vs Encoding vs Hashing vs Obfuscation vs Signing
    • Be able to explain the differences between these things.
    • Various attack models (e.g. chosen-plaintext attack).
  • Encryption standards + implementations
    • RSA (asymmetrical).
    • AES (symmetrical).
    • ECC (namely ed25519) (asymmetric).
    • Chacha/Salsa (symmetric).
  • Asymmetric vs symmetric
    • Asymmetric is slow, but good for establishing a trusted connection.
    • Symmetric has a shared key and is faster. Protocols often use asymmetric to transfer symmetric key.
    • Perfect forward secrecy - eg Signal uses this.
  • Cyphers
  • Integrity and authenticity primitives
  • Entropy
    • PRNG (pseudo random number generators).
    • Entropy buffer draining.
    • Methods of filling entropy buffer.
  • Authentication
    • Certificates
      • What info do certs contain, how are they signed?
      • Look at DigiNotar.
    • Trusted Platform Module
      • (TPM)
      • Trusted storage for certs and auth data locally on device/host.
    • O-auth
      • Bearer tokens, this can be stolen and used, just like cookies.
    • Auth Cookies
      • Client side.
    • Sessions
      • Server side.
    • Auth systems
      • SAMLv2o.
      • OpenID.
      • Kerberos.
        • Gold & silver tickets.
        • Mimikatz.
        • Pass-the-hash.
    • Biometrics
      • Can’t rotate unlike passwords.
    • Password management
      • Rotating passwords (and why this is bad).
      • Different password lockers.
    • U2F / FIDO
      • Eg. Yubikeys.
      • Helps prevent successful phishing of credentials.
    • Compare and contrast multi-factor auth methods.
  • Identity
    • Access Control Lists (ACLs)
      • Control which authenicated users can access which resources.
    • Service accounts vs User accounts
      • Robot accounts or Service accounts are used for automation.
      • Service accounts should have heavily restricted priviledges.
      • Understanding how Service accounts are used by attackers is important for understanding Cloud security.
        • 1. Stolen Credentials: Attackers may attempt to steal or compromise the credentials associated with service accounts. This can be done through techniques like phishing, social engineering, or exploiting vulnerabilities in the cloud infrastructure. Once they have the credentials, attackers can impersonate the service account and perform unauthorized actions.
        • Misconfigured Permissions: If the permissions and access controls assigned to service accounts are misconfigured, attackers can take advantage of excessive privileges. They can abuse these accounts to access sensitive data, manipulate resources, or perform unauthorized actions within the cloud environment.
        • API Abuse: Cloud service providers offer APIs that allow programmatic access to cloud resources. Attackers can abuse service accounts to interact with these APIs, bypassing security controls and performing malicious actions such as data exfiltration, resource deletion, or unauthorized configuration changes.
        • Lateral Movement: Once an attacker gains access to a service account, they can use it as a launching point for lateral movement within the cloud environment. They may pivot to other systems or services, escalate privileges, and further compromise additional resources.
        • Service Account Impersonation: Attackers may attempt to impersonate legitimate service accounts to evade detection and gain unauthorized access to sensitive resources. By imitating a trusted service account, they can bypass security measures and perform malicious activities undetected.
    • impersonation
      • Exported account keys.
      • ActAs, JWT (JSON Web Token) in Cloud.
    • Federated identity

Malware & Reversing

  • Interesting malware
    • Conficker. A worm exploiting an RCE on Windows, spreading fast, scanning for vulnerabilities like weak passwords or jumping to and from a removable media. https://github.com/itaymigdal/malware-analysis-writeups/blob/main/Conficker/Conficker.md. Random domain name generation.
    • Morris worm.
    • Zeus malware. Zeus operated by capturing keystrokes, intercepting web traffic, and monitoring the user’s activities, particularly when they accessed banking websites. It would stealthily collect login credentials, account numbers, credit card details, and other sensitive information, sending it back to the command-and-control servers operated by the cybercriminals.
    • Stuxnet. State-sponsored, tartgetd nucler centrifuges to sabotage Iranian nuclear program.
    • Wannacry. 💸 ransomware + worm. Exploited an EternalBlue vulnerability in Windows. The attack was widely believed to have originated from North Korea, specifically the Lazarus Group, a cybercriminal organization associated with the country. However, definitive attribution remains challenging in the realm of cyber-attacks. Shadow Brokers leaked the EB vuln from NSA.
    • Petya (targeted 🇺🇦), 💸 also known as NotPetya or GoldenEye. Another ransomware. Once executed, it would encrypt the master boot record (MBR) of the infected system, rendering it unbootable and preventing access to critical files and data. NotPetya exploited the same EternalBlue vulnerability as WannaCry, leveraging a tool allegedly developed by the U.S. National Security Agency (NSA) and leaked by the Shadow Brokers hacking group.
    • CookieMiner. Targeted macOS users, specifically those who were active in the cryptocurrency space. It employed a combination of keylogging, clipboard monitoring, and cookie theft to obtain users’ login credentials, wallet information, and other valuable data. By gaining access to this information, the malware aimed to gain control over users’ cryptocurrency accounts and assets. “cooking baking,” a play on words combining “cookie” and “cryptocurrency mining.” This involved the creation of a hidden cryptocurrency mining operation on infected systems.
    • Sunburst. SolarWinds supply chain attack. The attackers had managed to inject a backdoor into SolarWinds’ software updates, allowing them to gain unauthorized access to the systems of SolarWinds’ customers. 🇷🇺?
  • Malware features
    • Various methods of getting remote code execution.
    • Domain-flux. By constantly shifting between different IP addresses, the malware can maintain communication with its command-and-control servers while evading detection.
    • Fast-Flux. Also the underlying DNS records associated with a domain are rapidly changed.
    • Covert C2 channels.
    • Evasion techniques (e.g. anti-sandbox).
    • Process hollowing.
    • Mutexes.
    • Multi-vector and polymorphic attacks.
    • RAT (remote access trojan) features. keylogging, screen capture, file transfer, remote shell access, webcam and microphone control, and other capabilities that allow an attacker to remotely manipulate and gather information from the compromised system. RATs are often used for espionage, data theft, or as a foothold for launching further attacks.
  • Decompiling/ reversing
    • Obfuscation of code, unique strings (you can use for identifying code).
    • IdaPro, Ghidra.
  • Static / dynamic analysis
    • Describe the differences.
    • Virus total.
    • Reverse.it.
    • Hybrid Analysis.

Exploits

  • Low-level
    • use-after-free
    • bufferoverflow
  • Three ways to attack - Social, Physical, Network
    • Social
      • Ask the person for access, phishing.
      • Cognitive biases - look at how these are exploited.
        • Authority bias: People tend to trust and comply with figures of authority. Cyber attackers may impersonate authoritative figures, such as IT administrators, security professionals, or company executives, to deceive individuals into divulging sensitive information or performing actions that compromise security.
        • Confirmation bias: Individuals have a tendency to seek information that confirms their existing beliefs or assumptions. Attackers can exploit this bias by crafting phishing emails or social engineering tactics that align with the recipient’s preconceived notions, making them more likely to fall for the deception.
        • Urgency bias: Humans have a natural inclination to prioritize immediate concerns over long-term risks. Attackers exploit this bias by creating a sense of urgency, such as threatening the immediate closure of an account or the loss of important data, to prompt hasty actions without proper consideration of potential security risks.
        • Social proof bias: People often look to others for guidance in uncertain situations. Attackers can exploit this bias by creating a sense of social validation or urgency through fake testimonials, reviews, or user interactions to manipulate individuals into engaging in risky behavior or disclosing sensitive information.
        • Scarcity bias: Individuals tend to place greater value on scarce resources. Attackers may exploit this bias by creating a sense of limited availability, exclusive offers, or time-limited opportunities to manipulate individuals into taking actions that compromise their security, such as clicking on malicious links or providing personal information.
        • Anchoring bias: People tend to rely heavily on the first piece of information they receive when making subsequent judgments or decisions. Attackers can exploit this bias by presenting a misleading or manipulated initial piece of information, which then influences the victim’s perception and subsequent actions.
        • Recency bias: Individuals often give more weight to recent events or information when making decisions. Attackers can leverage this bias by timing their attacks to coincide with significant events, news, or trends to exploit heightened emotions or increased attention, increasing the likelihood of successful manipulation.
      • Spear phishing.
      • Water holing. A Watering Hole attack is a sophisticated and targeted cyberattack strategy that focuses on compromising specific groups or individuals by infecting websites they frequently visit. It involves identifying websites that are likely to be visited by the intended targets and then injecting malicious code into those websites.
      • Baiting (dropping CDs or USB drivers and hoping people use them).
      • Tailgating.
    • Physical
      • Get hard drive access, will it be encrypted?
      • Boot from linux.
      • Brute force password.
      • Keyloggers.
      • Frequency jamming (bluetooth/wifi).
      • Covert listening devices.
      • Hidden cameras.
      • Disk encryption.
      • Trusted Platform Module About Apple’s Secure Enclave and Keychain: https://bakerst221b.com/docs/dfir/investigation/artefacts/accounts/apple-auth/. KMS in the Cloud.
      • Spying via unintentional radio or electrical signals, sounds, and vibrations (TEMPEST - NSA).
    • Network
      • Nmap.
      • Find CVEs for any services running.
      • Interception attacks.
      • Getting unsecured info over the network.
  • Exploit Kits and drive-by download attacks
  • Remote Control
    • Remote code execution (RCE) and privilege.
    • Bind shell (opens port and waits for attacker).
    • Reverse shell (connects to port on attackers C2 server).
  • Spoofing
    • Email spoofing.
    • IP address spoofing.
    • MAC spoofing.
    • Biometric spoofing.
    • ARP spoofing.
  • Tools
    • Metasploit.
    • ExploitDB.
    • Shodan - Google but for devices/servers connected to the internet.
    • Google the version number of anything to look for exploits.
    • Hak5 tools.
  • Recent vulnerabilities, attacks, 0-days

Attack Structure

Practice describing security concepts in the context of an attack. These categories are a rough guide on attack structure for a targeted attack. Non-targeted attacks tend to be a bit more “all-in-one”.

  • Reconnaissance
    • OSINT, Google dorking, Shodan.
  • Resource development
    • Get infrastructure (via compromise or otherwise).
    • Build malware.
    • Compromise accounts.
  • Initial access
    • Phishing.
    • Hardware placements.
    • Supply chain compromise.
    • Exploit public-facing apps.
  • Execution
    • Shells & interpreters (powershell, python, javascript, etc.).
    • Scheduled tasks, Windows Management Instrumentation (WMI).
  • Persistence
    • Additional accounts/creds.
    • Start-up/log-on/boot scripts, modify launch agents, DLL side-loading, Webshells.
    • Scheduled tasks.
  • Privilege escalation
    • Sudo, token/key theft, IAM/group policy modification.
    • Many persistence exploits are PrivEsc methods too.
  • Defense evasion
    • Disable detection software & logging.
    • Revert VM/Cloud instances.
    • Process hollowing/injection, bootkits.
  • Credential access
    • Brute force, access password managers, keylogging.
    • etc/passwd & etc/shadow.
    • Windows DCSync, Kerberos Gold & Silver tickets.
    • Clear-text creds in files/pastebin, etc.
  • Discovery
    • Network scanning.
    • Find accounts by listing policies.
    • Find remote systems, software and system info, VM/sandbox.
  • Lateral movement
    • SSH/RDP/SMB.
    • Compromise shared content, internal spear phishing.
    • Pass the hash/ticket, tokens, cookies.
  • Collection
    • Database dumps.
    • Audio/video/screen capture, keylogging.
    • Internal documentation, network shared drives, internal traffic interception.
  • Exfiltration
    • Removable media/USB, Bluetooth exfil.
    • C2 channels, DNS exfil, web services like code repos & Cloud backup storage.
    • Scheduled transfers.
  • Command and control
    • Web service (dead drop resolvers, one-way/bi-directional traffic), encrypted channels.
    • Removable media.
    • Steganography, encoded commands.
  • Impact
    • Deleted accounts or data, encrypt data (like ransomware).
    • Defacement.
    • Denial of service, shutdown/reboot systems.

Threat Modeling

  • Threat Matrix
  • Trust Boundries
  • Security Controls
  • STRIDE framework
    • Spoofing
    • Tampering
    • Repudiation
    • Information disclosure
    • Denial of service
    • Elevation of privilege
  • MITRE Att&ck framework
  • Excellent talk on “Defense Against the Dark Arts” by Lilly Ryan (contains many Harry Potter spoilers)

Detection

  • IDS
    • Intrusion Detection System (signature based (eg. snort) or behaviour based).
    • Snort/Suricata/YARA rule writing
    • Host-based Intrusion Detection System (eg. OSSEC)
  • SIEM
    • Security Information and Event Management.
  • IOC
    • Indicator of compromise (often shared amongst orgs/groups).
    • Specific details (e.g. IP addresses, hashes, domains)
  • Things that create signals
    • Honeypots, snort.
  • Things that triage signals
    • SIEM, eg splunk.
  • Things that will alert a human
    • Automatic triage of collated logs, machine learning.
    • Notifications and analyst fatigue.
    • Systems that make it easy to decide if alert is actual hacks or not.
  • Signatures
    • Host-based signatures
      • Eg changes to the registry, files created or modified.
      • Strings in found in malware samples appearing in binaries installed on hosts (/Antivirus).
    • Network signatures
      • Eg checking DNS records for attempts to contact C2 (command and control) servers.
  • Anomaly / Behaviour based detection
    • IDS learns model of “normal” behaviour, then can detect things that deviate too far from normal - eg unusual urls being accessed, user specific- login times / usual work hours, normal files accessed.
    • Can also look for things that a hacker might specifically do (eg, HISTFILE commands, accessing /proc).
    • If someone is inside the network- If action could be suspicious, increase log verbosity for that user.
  • Firewall rules
    • Brute force (trying to log in with a lot of failures).
    • Detecting port scanning (could look for TCP SYN packets with no following SYN ACK/ half connections).
    • Antivirus software notifications.
    • Large amounts of upload traffic.
  • Honey pots
    • Canary tokens.
    • Dummy internal service / web server, can check traffic, see what attacker tries.
  • Things to know about attackers
    • Slow attacks are harder to detect.
    • Attacker can spoof packets that look like other types of attacks, deliberately create a lot of noise.
    • Attacker can spoof IP address sending packets, but can check TTL of packets and TTL of reverse lookup to find spoofed addresses.
    • Correlating IPs with physical location (is difficult and inaccurate often).
  • Logs to look at
    • DNS queries to suspicious domains.
    • HTTP headers could contain wonky information.
    • Metadata of files (eg. author of file) (more forensics?).
    • Traffic volume.
    • Traffic patterns.
    • Execution logs.
  • Detection related tools
    • Splunk.
    • Arcsight.
    • Qradar.
    • Darktrace.
    • Tcpdump.
    • Wireshark.
    • Zeek.
  • A curated list of awesome threat detection resources

Digital Forensics

  • Evidence volatility (network vs memory vs disk)
  • Network forensics
    • DNS logs / passive DNS. Passive DNS, in the context of network forensics, refers to the collection and analysis of historical DNS (Domain Name System) data without actively querying the DNS infrastructure in real-time. It involves capturing and storing DNS resolution information from network traffic or DNS server logs for later analysis.
    • Netflow. NetFlow, in the context of network forensics, is a network protocol developed by Cisco Systems that provides visibility into network traffic and facilitates the analysis of network behavior. It collects and records network traffic flow information, such as source and destination IP addresses, ports, protocols, timestamps, and byte counts, among other data. NetFlow works by sampling and exporting network flow data from routers, switches, or other network devices. When enabled, these devices generate NetFlow records that capture information about individual network flows. A network flow represents a unidirectional stream of packets with common characteristics, such as the same source and destination IP addresses and ports. NetFlow records are then collected and stored by a NetFlow collector for subsequent analysis.
    • Sampling rate. In the context of network forensics, the sampling rate refers to the frequency at which network traffic is captured and analyzed. It determines the number of packets or flows that are examined and recorded for further analysis.
  • Disk forensics
    • Disk imaging
    • Filesystems (NTFS / ext2/3/4 / AFPS)
    • Logs (Windows event logs, Unix system logs, application logs)
    • Data recovery (carving)
    • Tools
      • plaso / log2timeline
        • log2timeline: This is the primary tool that performs the timeline generation process. It takes input data sources, such as log files, Windows Event Logs, web browser artifacts, and more, and processes them to extract relevant events and their associated timestamps. The tool supports a wide range of data sources and can process them in a parallel and efficient manner.
        • plaso-parser: Plaso includes a collection of parsers responsible for extracting events from specific artifacts or log file formats. These parsers are used by log2timeline to process the input data sources. Plaso provides a large number of built-in parsers for various operating systems, applications, and file formats, including Windows, macOS, Linux, web browsers, system logs, and more.
        • psort: Once the timeline is generated, psort (Plaso’s sorting and filtering tool) can be used to sort, filter, and analyze the events based on specific criteria. It allows investigators to narrow down their focus and extract relevant information from the timeline. psort supports advanced filtering options, time range selection, and output formatting.
        • pinfo: pinfo is a tool provided by Plaso for interactively exploring and inspecting event data. It allows users to navigate through the generated timeline, view event details, and perform ad-hoc searches within the dataset. It provides a command-line interface for interactive analysis of the timeline.
      • FTK imager
      • encase
  • Memory forensics
    • Memory acquisition (footprint - part, smear - all, hiberfiles)
    • Virtual vs physical memory
    • Life of an executable
    • Memory structures
    • Kernel space vs user space
    • Tools
    • Volatility
    • Google Rapid Response (GRR) / Rekall
    • WinDbg
  • Mobile forensics
    • Jailbreaking devices, implications
    • Differences between mobile and computer forensics
    • Android vs. iPhone
  • Anti forensics
    • How does malware try to hide?
    • Timestomping
      • Modifying the file system’s metadata: Attackers can directly modify the timestamps stored in the file system’s metadata, such as the Master File Table (MFT) in NTFS file systems on Windows.
      • Manipulating system calls: By intercepting or hooking into system calls responsible for retrieving or updating file timestamps, attackers can modify the data returned to the requesting process.
      • Using specialized tools: There are specialized tools available that automate timestomping techniques and make it easier for attackers to manipulate timestamps.
  • Chain of custody
    • Handover notes

Incident Management

References

Expand… Something here