Purple Teams and Defense Success Metrics

by | Feb 14, 2019

Purple Teams and Defense Success Metrics through VECTR.io

This article covers how a Purple Team process done correctly can:

  • Be documented and organized using the free VECTR.io platform (https://vectr.io) and align to MITRE ATT&CK
  • Generate quantitative success defense metrics more meaningful than existing hygiene and hyperbole metrics
  • Change your teams’ attitude and align Red and Blue towards the same mission: protecting the organization by discovering and plugging detection and prevention gaps, and sharing the credit

I’ve spent a lot of time over the past two years in dialogue with Cybersecurity leadership on the topic of Purple Teams.  I have also helped oversee SRA’s development of the free Purple Teams management tool VECTR (vectr.io).  In my discussions, it’s understood that Red + Blue = Purple and that means drills, generating worklists of tuning & engineering needs, and coloring in squares from the MITRE ATT&CK framework.  But if these ends are the exclusive goals, Purple Teams miss their potential to drive continuous improvement, collaboration and skills increase, and significant executive level metrics.

In my first article on this topic, I’ll begin with my definition of Purple Teams, set some ground rules and then focus on what I think is the most often missed benefit: meaningful metrics.  Here goes: Purple Teams is an open-book-exam process that prioritizes and shows quantifiable improvements in defenses over time.  Every noun and verb I’ve carefully chosen here.

Organizational pre-requisite:  Attitude. 

In order to benefit from an open-book-exam approach, it needs to be a no-blame game.  Before you get started, Red and Blue Teams must confess several difficult-to-utter things to one another.  As someone with a Red Team background, I’ll start:

  1. I’m sorry I wrote 30-page reports for how to own the network, and I never included enough guidance on better detection rules
  2. I only used the tricks that I needed so I could win. There were a lot of other Tactics, Techniques and Procedures (TTPs) I didn’t judge you were ready for, or worse, I subconsciously saved them for next time
  3. I never wrote down my TTPs with enough detail for you to effectively reproduce them after I left
  4. I thought my most important job was to find things no one had seen in the environment and bring them to light (and in doing so I often looked cool). Instead, I should have been focused on teaching you everything I knew every time we worked together
  5. With a different attitude, I could have better protected the organization we both serve

And now, what we need the Blue Team to say:

  1. I bought tools (or my boss did) because we liked the salespeople and their marketing. I didn’t fully vet if we were investing in the right stuff.  TBH I didn’t fully know how to vet them, or I ran out of time
  2. Related, I didn’t fight hard enough to decommission old tools that were wasting space and budget
  3. I didn’t fully deploy our tools. It was hard and everyone was against us
  4. I resented the Red team because the job seemed more interesting, but I felt stuck in this role because Blue was the area that needed me most, or I wasn’t seen for some reason as having the right credentials to be Red
  5. With a different attitude, I could have better protected the organization we both serve

A collaborative, mission-oriented and no-blame mentality is manifest in the open-book-exam approach.  You will miss this if you don’t operate side-by-side and treat it like family puzzle night.  You sit together, you teach and remind each other you are taking the time to do this because it will protect the organization you both serve.

A process that prioritizes.

What are your security engineers doing right now?  Are they deploying a new security tool?  Troubleshooting an existing one?  Sitting in meeting after meeting?  I see a lot of talented ones doing mainly these three things.  I believe the reason is they don’t have a process which feeds an agreed, prioritized list of detection and prevention use cases to work on.  This should be the primary, daily work of the security engineer: fixing known detection and prevention gaps in the environment.

Purple Teams identifies these specific gaps, and when it’s done with Blue (including security engineering) and Red side-by-side, they have open, informed and collaborative dialogue which forms agreement about what the priority gaps are.  They can make a top ten fix-it list and have joint ownership of the issues.

Excerpt from performing Purple Team:  The Red Operator establishes a C2 channel from the target server to AWS.  Red and Blue Operators watch and blink as nothing shows up in the SIEM dashboard.  Blue digs into the data lake, with Red over their shoulder and supplying the timestamp and source address info.  Blue finds a low severity alert in the data lake that was never destined for the SIEM, like a sea turtle crossing an airfield to get to the ocean.  Red thinks, the old me would say, well Blue, you have what you need from me so good luck I’m outta here.  Instead, Red says I did some research on this and I have a good idea of the IOC’s you needLet me add them to your detection blue print for this attack pattern.  Blue says, great, I’ll go talk to server ops to get this log source forwarded to SIEM.  Then I’ll configure the blueprint in the SIEM and set it to fire at a Critical level.  Can you help me re-test it Friday?  Red and Blue kiss long and deeply because SURPRISE THEY’RE MARRIED NOW!  After doing Purple Teams the right way for a year they got together on a whole new level.

And now, metrics (the quantifiable improvement over time).

I’ve been encouraged to talk to more organizations recently who have understood the open-book-exam and prioritization that comes from doing Purple Teams correctly.  What remains overlooked is showing quantifiable improvements in defenses over time.  It starts with thinking about the limitations of your current metrics program:

  • Your current metrics do not describe your defense success, and the inverse is your residual attack risk. This is a half-full v. half-empty way to describe this ultimate, single metric.  Choose the one that your organizations’ leadership will respond to.  Some will want to show their attack risk decreasing over time, others will want to describe the increasing completion of their defense success.
  • You have too many metrics because a consulting firm wrote a big exhaustive set, and no one questioned them. These traditional metrics are not informing your senior leadership on what matters about your security program: “how are we continually simulating and fortifying against attack patterns that can damage our business?”

These are the labels I use for the first two types of traditional metrics, and then the new one I believe we all need to adopt:

  1. Hygiene Metrics: Systems missing patches, system with high risk vulnerabilities, systems missing AV, systems that are business-critical, and 35 ways to describe the intersection of time and those four conditions. I’m not saying this isn’t important, but I am saying it doesn’t fully describe how susceptible to attack you are, or how well you are defending these systems.  Keep your hygiene metrics, the best ones only, and read on.
  2. Hyperbole Metrics: You blocked 3 billion attacks last quarter! Champagne Toast!  No, you didn’t, your NGFW and email gateway did, and lot of those “attacks” were unintelligent, not a serious threat and not applicable to your systems inside the firewall.  Your hyperbole metrics also give no real sense of your team’s actual work efforts.  Your team spent a LOT of time preventing and responding to a much smaller volume of more significant events.  But it’s hard to draw attention to 30 when you have 3,000,000,000 in the same dashboard.  We painted ourselves into a corner with hyperbole metrics, because now the Board thinks they actually mean something and they expect to see them.  Over time, if we can transition to something more meaningful, we can phase out Hyperbole metrics.
  3. Defense Success Metrics (new!): Purple Teams creates the opportunity to establish quantifiable metrics about how well your defense capabilities are working at preventing and detecting consequential attack patterns.  This is accomplished by intentionally bringing attack patterns into your scope and saying, “these we intend to protect against”.  The Defense Success Metric can now be based on that denominator of attack patterns, and as I’ll illustrate, is a foundation that can continue to grow.

This concept understands that any red team attack pattern (or “operation”) can be documented – it’s not magic.  Any attack pattern can be described in one or more commands executed at a terminal.  The attack pattern also has other important context like the its origin and destination.  Our industry now has generally-accepted databases of such things, even if they are not fully documented down to the command level (yet).  MITRE ATT&CK is the most known, with Atomic Red as a more practical implementation of some of it.

You can get started with Defense Success Metrics when you choose and bring your first 25 attack pattern test cases into a Purple Team exercise.  Which to choose?  Make a hypothesis like “We have our doubts that we are good at privileged account abuse detection”.  Then proceed to understand, design and execute the test cases in your environment with Red and Blue working to score either detection success or fail, with no blame allowed.  What you’ll witness is the foundation for a sustainable quantitative metric, and quite possibly (per my earlier musing) two souls finding and completing each other.

You brought 25 test cases into your environment and let’s say you “passed” just under half of them at 12.  You now have a Defense Success Metric of 48%.  Don’t think of it as bleak, the first time you drew any metric that you had positive assumptions about, it was probably disappointing.  Don’t miss that one of the best features of this approach is your Red and Blue operators know exactly what was missed and they can work together to prioritize the top ten fix-it list.  They can have joint ownership to write the fixes, retest them, and claim the mutual success of improving the Defense Success Metric along with its real significance of protecting the organization against attack patterns that were proven dangerous.

Next month you add another 10 test cases on a different topic.  You score, remediate, retest.  You represent your metric as a % because the test cases are going to grow as you take on new kill chain verticals, develop insider threat program test cases and simulate malware.  Threat Modelling and Threat Intelligence functions can start to set the objectives and state the hypotheses to grow the test case set.

You may want to start reporting this metric only after you’ve recorded it for 12, maybe 18 months.  I suggest it as a trendline and not point-in-time number.  This is a positive and repeatable metric.  I will admit it is not your easiest or cheapest to collect but it could become your most important one.

Historical Trending Defense Success Metric in VECTR.io.  We witness a steady effort of fixing gaps and reducing the overall exploitability in the environment.  We can slide the timeline to show short or long term efforts.


Historical Trending Pie Chart of 3 major Purple Teams results in VECTR.io.  We also see number of detections & blocks attributed to specific tools categories in the environment.  We see what is working for us and what needs to be questioned (I’m looking at you, WAF!)

How do I do it?

You can do all this with SRA’s free VECTR.io tool (https://vectr.io).  We built it for ourselves, but we believe it helps solve a large industry problem and we don’t believe good security practices are competitive intel.  We’ve imported the MITRE ATT&CK and Atomic Red content (https://github.com/redcanaryco/atomic-red-team) (thanks all, let’s keep sharing)!  There’s lots of curated Security Risk Advisors (SRA) content in there too that is practical and based on countless pen tests and red teams.

Sample Coverage Mapping & Scoring of MITRE ATT&CK in VECTR.io.  Several test cases can comprise a single MITRE ATT&CK pattern (and in some cases, there should be lots).


Operator-Level view in VECTR.io.  We see a technical attack plotted along a realistic kill chain, status of each test case and timelines.  It’s not all pie charts, this can be your security test case repo.  The results of test cases can be applied to other campaigns (malware, hunting, etc.)

Get started!

Send me an email if I can help.


Tim Wainwright
CEO | Archive

Tim has been a speaker at RSA, Gartner, FS-ISAC, H-ISAC and (ISC)2 National Congress. Tim helped found Security Risk Advisors in 2010 and oversees service delivery at SRA’s largest clients.

Tim advises CISO Offices on modernizing cybersecurity strategy to improve governance, communication, team culture and growth, detection and response capabilities. Tim is a thought leader in the area of purple teams and attack simulation and metrics to describe quantified “defense success.”

Tim has a background in security assessment, frameworks and policies.