πŸ—’ Incident Response Methodology


Preparation

  • πŸ“ Compile a list of all your assets (servers, networks, applications etc)
  • πŸ“ˆ Rank them by level of importance.
  • πŸ‘ Monitor their traffic patterns and create baselines.
  • πŸ«‚ Create a communication plan (who to contact, how, and when based on each incident type). Incident Handler Communications and Facilities (contacts, on-call info, reporting mechanism, issue tracking, smartphones, encryption soft, war room, secure storage).
  • 🚧 Determine which security events, and at what thresholds, these events should be investigated.
  • πŸ“– Create an incident response plan for each type of incident.

Below are some questions to help assess your readiness:

  • ❓ Area all members aware of the sec policies?
  • ❓ Do all members of IR know whom to contact?
  • ❓ Do all of them have access to journals and toolkit?
  • ❓ Have they all participated in IR drills?

Creating a playbook

Detection and Analysis (SANS Identification)

Lot’s of systems can be used to monitor the network and host for malicious activity. These are IDS, IPS, firewalls, AV, SIEM (Security Information and Event Management solutions, SIM + SEM), IR frameworks, threat intelligence etc. However, the incident might be reported by employee (for example, receiving a suspicious email) or by a third party (data exfiltrated seen in the dark net). Before it is confirmed to be true or be the result of a malicious intent, it’s called an event.

  1. Discover. Discovery and classification (Where is the sensitive data?), Entitlements reporting (Who can access), Vulnerability Assessment (How to secure it?).
  2. Harden. Reconfigure, mask, encrypt (How to protect sensitive data?)
  3. Monitor and Protect. Activity monitoring, Blocking quarantine, dynamic data masking
  4. Repeat.

Precursors❓ - signs that in incident may occur in the future. For example, logs show that a vulnerability scanner was used. Or a vulnerability was found in some software that’s used by the company.

Indicators 🚨 - signs that an incident have occurred already or is occurring now. Malware detected by a anti-virus program, configurations changed, multiple failed logins etc.

There are several techniques that help confirming the incident:

  • Frequency analysis. Something that is outstandingly rare accross systems might be malicious
  • Normal vs Evil. You know what’s considered normal for the given system/user and something that’s not usual is a primary suspect for the further analysis.
  • IoC. Indicators of compromise, like some virus hashes or file extensions used by a known cryptor malware.

For AWS - https://docs.aws.amazon.com/whitepapers/latest/aws-security-incident-response-guide/visibility-and-alerting.html

After the incident is confirmed, you initiate an investigation.

πŸ“ I found it quite hard to separate incident response and digital forensics articles from one another. For now I am seeing incident response as a more general process that could require more thorough examination (digital forensics) but not neccessarilly. As I see it, we have an event, start IR process to determine whether it’s malicious. Identify compromised hosts and map the artifacts to ATCK MITRE framework. Then, to be able to fully contain and remediate the incident, we call for DF, trying to analyse malware, analyse the registry on Windows, recover data from unallocated space etc in order to reconstruct the whole picture in detail. So, roughly speaking, IR is about sketching and DF is about details.

Common Malicious Activity, Detection and Countermeasures

Mapping. Attackers need to collect information about the target system first: ping, nmap. Countermeasures:

  • record traffic
  • look for suspicious activity
  • use a host scanner and keep track of all the PCs withing your network. A new PC should trigger an alert.

Packet Sniffing. Now, attacker may perform (or try to perform) traffic sniffing. It is possible in the following cases: Broadcasting, promiscuous NIC, and if the traffic is unencrypted. Countermeasures:

  • periodcally check wether the NICs of the hosts are in promiscuous mode
  • switched Ethernet (1 host per segment of broadcast media).

IP-spoofing. Can generate “raw” IP packets, putting any value as a source IP. Receiver won’t be able to know that the other party is spoofed. Countermeasures:

DoS. Countermeasures:

  • filter out flooded packets (eg SYN) before they reach the host
  • traceback to source of floods

🎣 Phishing. Spear phishing - targeted attack to a specific users or groups of users. Whaling - spear phishing for high executive managers. Strategies:

  • They claim that they’ve seen numerous failed login attempts or other suspicious activity.
  • They claim that there is some problem with your account or payment method.
  • Confirm some personal information.
  • Include a fake invoice.
  • Click on a link to make a payment.
  • You are eligible for a government refund.
  • Coupon for free stuff.

How to detect?

  • Suspicious sender’s address.
  • Generic signature and greetings.
  • Spoofed hyperlinks and websites.
  • Spelling and layout.
  • Suspicious attachment.

PoS. Most of the malware is equipped with a backdoor and C&C features. The data of the card holder is decrypted in RAM, that’s why this type of malware sniffs the RAM (RAM scraping). Countermeasures:

  • Monitor network for changes
  • Good encryption
  • limit the host that communicate with PoS
  • chip-card enabled PoS terminals
  • Employee screening and training

Injections. MongoDB. $where is interpreted as JS. DoS: $where="d=new Date; do {c=new Date;} while {c-d<10000};", XPath. 'or 1=1 or '1'='1, LDAP.*)(cn=))(|cn=*, SQL or 1=1-- etc. The best practice is whitelisting allowed and needed characters.

  • Preventing OS injection
    • Don’t use OS commands
    • Sanitise input
    • Accept, say, ids and lookup
    • Leat priv
    • do not run through shell interpreters
    • explicit paths
  • Preventing SQLi
    • prepared statements
    • sanitise
    • do not expose errors with sql statements etc
    • limit permissions
    • stored procedures
    • ORM libs (Hibernate)

Common AWS issues The following list is the top ten identified misconfigurations in Amazon EC2 instances and workloads from our data set:

  1. Unconfigured EC2 instance single-point-of-failure and/or auto scaling issue
  2. S3 logging not enabled
  3. S3 object versioning is not enabled
  4. User not configured to use MFA
  5. User access key not configured with rotation
  6. IAM policies are attached directly to user
  7. Dangerous user privileged access to S3
  8. ELB security group allows insecure access to ports or protocols
  9. IAM access keys unused for 90 days
  10. Dangerous user privileged access to RDS

Spillage

If it’s AWS - open up a case with AWS Business Support. Report spillage findings and response actions, DoD 5220.

Documentation

This is the stage when you start taking notes for the incident (or event). Current status of the incident, summary, indicators, other incidents related, actions taken by all, chain of custody*, impact assessment, contacts, list of evidence, comments from incident handlers (IH), next steps to be taken (rebuild the host/upgrade).

How to categorize the incident? Simply put, how bad is that?

  • Functional Impact Categories
    • None - nothing’s changed.
    • Low - the business can still provide its critical services but the efficiency is lost.
    • Medium - O. has lost an ability to provide its critical services to some users.
    • High - O. is no longer able to provide its critical services.
  • Information Impact Categories
    • None - no info was exfiltrated or compromised.
    • Privacy breach - sensitive PII was accessed or exfiltrated.
    • Proprietary breach - unclassified proprietary info exfiltrated (PCII).
    • Integrity loss - sensitive or proprietary info was changed or deleted.
  • Recoverability Effort Categories, TTR - time to recovery
    • Regular - TTR is predictable with existing resources.
    • Supplemented - TTR is predictable with additional resources.
    • Extended - TTR is unpredictable.
    • If not recoverable, launch the investigation.

Protected Data

If the GDPR data was leaked - 72 hours to disclose the breach. HIPAA has other time limits.

Containment. Eradication. Recovery.

Containment πŸ“¦ aims to stop the bleeding🩸. It’s a very important stage. While during Identification (aka detection and analysis) we are confirming that something’s going wrong, this stage is used in order to see the full picture, how the breach happened, what damage was already done, which systems are compromised etc. Quoting SANS:

The goal of containment is to limit damage from the current security incident and prevent any further damage. Several steps are necessary to completely mitigate the incident, while also preventing the destruction of evidence that may be needed for prosecution.

So, digital forensics can occur anywhere from Analysis (aka Identification) to the Lessons Learned stage. It may not occur, however. For example, a DoS attack may not need this facility.

Choosing a containment strategy:

  • ❓ Is there a potential damage or theft of resources?
  • ❓ Do we need to preserve the evidence for court/ourselves, make an image of the compromised system?
  • ❓ Do we need the service to be available during the investigation?
  • ❓ How much time do we have and need?
  • ❓What resources are available in order to implement this strategy?
  • ❓ What’s the effectiveness of the strategy (partial of full containment)?

Below is the checklist for the reference to make sure, the responder has done their job. These questions can be used in a ticket or IR timesheet, mentioned above in the Documentation section.

  • ❓ Who attacked you and why?
  • ❓ When and How did it happen?
  • ❓ Did this happen because of the poor sec policy and processes?
  • ❓ How widespread is the incident?
  • ❓ What steps are being taken to determine what happened and prevent it in future?
  • ❓ What’s the impact?
  • ❓ Was any personally identifiable information leaked?
  • ❓ What is the estimated cost of this incident?

The SANS containment process involves:

Short-term containmentβ€”limiting damage before the incident gets worse, usually by isolating network segments, taking down hacked production server and routing to failover.

System backupβ€”taking a forensic image of the affected system(s) with tools such as Forensic Tool Kit (FTK) or EnCase, and only then wipe and reimage the systems. This will preserve evidence from the attack that can be used in court, and also for further investigation of the incident and lessons learned.

Long-term containmentβ€”applying temporarily fixes to make it possible to bring production systems back up. The primary focus is removing accounts or backdoors left by attackers on the systems, and addressing the root causeβ€”for example, fixing a broken authentication mechanism or patching a vulnerability that led to the attack.

Eradication aims to remove the threat. For example, get rid of the malware, retrun configs to its normal state. It’s a common mistake of many companies to jump right to this stage without proper prior investigation and containment. In this case, the attacker might still be inside, or the vulnerability they used is still unmatched, or the user they social engineered is still not informed about this event properly etc. It can turn into “whack-a-mole” game which is no fun when a comapny’s reputation and finance πŸ’΅ are concerned.

  • ❓ Can the problem be isolated? Are all affected isolated?
  • ❓ Have the forensic copied been created? For each evidence, keep the info (or chain of custody, even better):
    • Identifying information (location, serial, model, hostname, MAC and IP)
    • Name, title, phone of each individual who collected or handled the evidence
    • Time and date (+ time zone) of each occurrence of evidence handling
    • Location where the evidence were stored
  • ❓ Can this system be reimaged and patched?
  • ❓ Was all malware removed and the system hardened?
  • ❓ What tools are you going to use to test, monitor and verify that the systems being restored and are no longer compromised by the same cause?

Recovery aims to get the system operational if it went down or simply back to business as usual if it didn’t.

Post-Incident Activity (SANS Lessons Learned)

Learn from your experience so you can better respond to future security events. Adjust your playbooks accordingly.

Cloud IR Specifics

In case there are any shared resources with other companies which do not care about security as much, you might be into trouble. For example, several companies host their web application on one website.

Cloud IR is easier. Some tools and methods are available ONLY for the Cloud. IR Semulation on AWS.

https://docs.aws.amazon.com/whitepapers/latest/aws-security-incident-response-guide/security-incident-response-simulations.html

Automation - https://docs.aws.amazon.com/whitepapers/latest/aws-security-incident-response-guide/automation-1.html

References

[1] IBM course

[2] Mac IR

[3] NIST vs SANS IR frameworks

[4] Documentation is to Incident Response as an Air Tank is to Scuba Diving

[5] Incident Response SANS: The 6 Steps in Depth

[6] IR in the Cloud, SANS