The Road to Benchmarked MITRE ATT&CK Alignment: Defense Success Metrics

by | Aug 19, 2022

TL;DR

You can describe the progress of your cybersecurity program in a single, threat-driven metric: the Defense Success Metric.  This metric is born from prioritized MITRE ATT&CK alignment and can be benchmarked with your peers.

 

Prelude: NIST CSF and MITRE ATT&CK

NIST CSF is a high-level framework that wants to describe an entire security program including governance, preventative, detective and restorative controls. It does not provide detailed guidance to assess the ability to defend against advanced threat actors. Teams perform self-assessments and often ask outside parties to conduct independent ones, using this same lens. Professional judgment, as in any field, varies.

NIST CSF: Widely adopted for it’s simplicity (and there are 108 controls within these five areas)

NIST CSF: Widely adopted for it’s simplicity (and there are 108 controls within these five areas)

MITRE ATT&CK is a highly detailed framework that describes threat actor attack techniques. Most organizations who conduct Purple Teams create their test plans based on focus areas they choose within ATT&CK. Teams interpret ATT&CK and create their test cases. Professional judgement, experience and other threat intelligence comes into play when designing a specific test case. And once again, internal and independent perspectives are welcome and vary.

MITRE ATT&CK (excerpt): A library attempting to describe the ever-expanding universe of more specific things that threat actors do.

 

The Challenges with MITRE ATT&CK

MITRE ATT&CK:

  • Changes often and is hard to keep up with. ATT&CK is updated twice per year which is fast for a framework. NIST CSF in contrast has only been updated once since it’s inception in 2014 (the update was, in a word, boring). If a team is observing ATT&CK’s entirety, it’s a Sisyphean task. And if you’re tracking, you’ve just realized…
  • It’s not pre-prioritized for you. There are some techniques that apply to everyone and many more that do not, and they are not distinguishable without considerable knowledge & experience.
  • Security industry vendors have co-opted it. And this means they’re telling teams that “our solution meets 100% of MITRE ATT&CK.” Honestly, you should know better than to entertain that. There can be a version of the truth where a detection platform says they have at least one analytic for each ATT&CK technique. But take the example of Account Discovery – is there just a single way of doing that? Remember, this is an expanding universe. Another version of the truth is a SIEM that says it can intake any log, and therefore can detect any technique, and will even show you a map of how it is poised to do that. That view in your SIEM is not validated. In security, we need to inspect what we expect. In other words, don’t trust anything you don’t test. And on that topic…
  • Does not provide detailed self-assessment methods aka test cases. There is a reason: it’s hard to say one way is the right way. For example, there are many parameters in a brute force password attack. How many password attempts comprise a brute force? One team may select 500 and another chooses 5. 500 is easier to detect, but is it really what threat actors do? Which threat actors? The ones targeting healthcare or banks?
  • Does not give you a way to document your team’s test cases and results. But I do give you a way. Start with the free VECTR.io. Security Risk Advisors publishes it on Github.
  • Does not give you a way to benchmark with your peers. And for as hopeless as that sounds given these challenges, please hang in there for just a little more suspense, because we need to talk about metrics for a moment.

 

Your Metrics Don’t Tie Back to Threat Actors

A lift & update from my 2019 blog on this topic, with a little expansion. Your metrics aren’t close enough to threat actors. You have no really good way to measure your progress on MITRE ATT&CK, let alone benchmark with your industry and peers. What you generally have is these:

  • Hygiene Metrics: Systems missing patches, system with high risk vulnerabilities, systems missing AV, systems that are business-critical, and 35 ways to describe the intersection of time and those four conditions. I’m not saying this isn’t important, but I am saying it doesn’t fully describe how susceptible to attack you are, or how well you are defending these systems. Keep your hygiene metrics, the best ones only, and read on.
  • Compliance Metrics: Can you disable a user account in 24 hours? Do you have an accurate inventory of all your assets? Do you document your changes and releases? What kind of fire suppression system is in your data center? Wait you have a data center? Are you doing the things your auditor told you to do? Does your auditor have actual experience hacking or defending a network? Unlikely. Do whatever you need to do to avoid fines, but those same things won’t avoid threat actors.
  • Hyperbole Metrics: These are the huge numbers that mean nothing. Our organization defended against 3 billion attacks last quarter! Santé!  Actually, the NGFW and email gateway just dropped them. Those “attacks” were untargeted, not a serious threat and not applicable to systems inside the firewall.  Hyperbole metrics can also give a wrong sense of a team’s actual work efforts.  A team spends a LOT of time preventing and responding to a much smaller volume of more significant events.  But it’s hard to draw attention to 30 when there is a 3,000,000,000 in the same dashboard.  We painted ourselves into a corner with hyperbole metrics, because now the Board thinks they actually mean something and they expect to see them.  If we can transition to something more meaningful, we can phase out Hyperbole metrics.
  • SecOps Metrics: It makes sense to trend types of incidents, tickets worked & closed, etc. It helps us adjust resources and tuning priorities. But while I’m at it, can we stop being obsessed with DWELL TIME? You can’t consistently capture it unless you’re consistently getting compromised! I sincerely hope that’s not a monthly, quarterly, or even annual thing for you.

 

Metrics that Tie Back to Threat Actors

In 2019 I first wrote about the concept of Defense Success Metrics (DSM). Their raison d’être is to fill the “so what” gap in hygiene and hyperbole metrics, and to help teams establish their measurement of a Threat-Driven security program, as opposed to compliance, politics, or any other thing less important than protecting the organization against threat actors.

 

Defense Success Metrics: WHAT – WHEN – HOW?

WHAT?

  • A purposeful, prioritized way to align to MITRE ATT&CK.
  • A way to describe your security program’s readiness to protect, detect and respond to threat actors.
  • A way to obtain a meaningful benchmark in your industry or smaller circle of trust.

WHEN?

  • Now.
  • You (maybe): But we’re not mature enough.
  • Me (always): You won’t be mature until you take your first snapshot and understand where your controls can be proven to block, detect and create actionable alerts for your SOC.

HOW?

  • Purple Teams. Call it something else if you must (adversary simulation, etc.).
  • Collaborative, open-book work with red, blue, and GRC all weighing in on test case outcomes (GRC should redefine its scope to be a part of these types of exercises – more on that another time).
  • Limited automation. Sorry vendors, this is not a single button click.
  • The core of the DSM is a single value, a percentage. Specifically, the percentage of test cases “passed”.

 

Metrics that Tie Back to Threat Actors, How?

Start with a common denominator. In order to prioritize alignment to MITRE ATT&CK and specific threat actor techniques, we need to agree to which techniques matter the most. We’ve tackled this and have a solution to share with you. The solution is an Index of MITRE ATT&CK which is a representative sample of ATT&CK and does not try to emulate too much, and certainly not all of it. It’s an Index just like the S&P, the Dow Jones, etc. It changes over time with threats but never tries to describe every threat at once.

Starting in early 2020, we drew together friends from organizations you would know in the financial services industry, specifically friends who care about threat intel, hunting, red, purple and blue. This group agreed to the top five threat actors targeting their sector. We went to MITRE ATT&CK and filtered on the techniques that ATT&CK attributed to those threat actors. Thankfully we didn’t stop there or I’d have nothing but a chart for you The next essential step was to create specific, repeatable test procedures for each of those techniques, down to the command line level where needed. We did that, incorporated the group’s feedback, and released v1. We created a shared purple team / threat emulation plan that would be consistently performed and repeatable across organizations. The original Financial Services Index was 56 test cases. Sound like the basis for a hyperbole metric? Absolutely not. The current version of Financial Services Threat Index is 70 test cases and covers 10 threat actor groups. Not too big, not too small. It can be emulated in 2.5 days once you know what you’re doing. The Defense Success Metric is successful outcomes / 70.

You can download the Financial Services Threat Index here.

 

Other Industry Indexes

We rinsed and repeated with our friends in the health industry to create… the Health Threat Simulation Index. Health Index is based on 7 threat actor groups and 52 test cases. Note the bell curve for Health is narrower, indicating lower variance among organizations, but this is likely to change over time.

You can download the Health Threat Index here.

Security Risk Advisors also maintains an industry-neutral set of test cases called Purple Team Essentials and the current benchmark is 65%.  It is always comprised of our top 50 test cases and are suitable for any organization.  We require that our 24×7 CyberSOC clients go through these exercises so that we develop joint ownership and priority to fix visibility gaps.

 

Defense Success Metric Over Time

A snapshot is useful, but the powerful story of Defense Success Metrics is historical trending. We’ve found quarterly purple teams propels this measurement in most organizations. Executing a compact purposeful test plan is the fast part – remediation takes more time.

Historical trending in VECTR.io. We overlay SRA’s benchmarks each quarter.

Historical trending in VECTR.io. We overlay SRA’s benchmarks each quarter.

 

Connecting Purple Teams to APT Resilience

VECTR™’s users love the dynamic MITRE ATT&CK heatmap that can also represent the team’s Defense Success Metric. Organizations making the most use of VECTR™ continue to add their own test cases and this becomes a library / repository to document broader-purpose security testing.

MITRE heatmap in VECTR™ filtered to demonstrate coverage against APT 29

MITRE heatmap in VECTR™ filtered to demonstrate coverage against APT 29

A powerful feature is the ability to filter test cases by any threat group described in ATT&CK. This facilitates answering senior management questions like, “I just read about APT29, what have we done to prepare?”

 

Further Storytelling with Defense Success Metrics

Defense Success Metrics are not limited to a single value. We can unpack more related metrics to show specific progress to technical management. These successive heat maps show how from 2021 to 2022 the team reduced the organizations threat actor risk by focusing on leftmost tactics.

Left areas are the earliest opportunities to detect and stop threat actors. They are the most critical to improve first, and the more dependable detection stages. In this illustration the organization increased its defense success for:

  • Execution, where the adversary tries to run malicious code, by 7%
  • Lateral Movement, where the adversary is trying to move undetected through the environment, by 16%

MITRE ATT&CK heatmap in VECTR™ demonstrating improved coverage from 2021-2022.

 

Getting Started

  1. Get VECTR™ . It’s free, sits nicely in AWS (on-prem if you must) and has SSO and role-based access support. Join the VECTR™ Community and Discord channel for tips & tricks.
  2. Download an Index and try it. If you don’t have the skills or want someone independent, call SRA or another great firm. If you’re not in Financial Services (retail, asset, fintech, insurance) or Health (providers, devices, pharma / life sciences), I’m open to helping you create another industry Index. We expect to create a Retail & Hospitality Index in Q4.
  3. Start or adapt your Purple Teaming with metrics in mind. Capture your first Defense Success Metric. Maybe wait till you’ve done it twice before you go big with it, you’ll tell a better story of how you’ve already improved and have a roadmap for what’s ahead. You just might also develop some quantitative support for your next budget ask.
Tim Wainwright
CEO | Archive

Tim has been a speaker at RSA, Gartner, FS-ISAC, H-ISAC and (ISC)2 National Congress. Tim helped found Security Risk Advisors in 2010. Tim advises CISO Offices on modernizing cybersecurity strategy to improve governance, communication, team culture and growth, detection and response capabilities. Tim is a thought leader in the area of purple teams and attack simulation and metrics to describe quantified “defense success.” Tim has a background in penetration testing, security assessment, and frameworks.