Know Thyself (or, Better SOC Metrics)

Understanding security operations capabilities and performance

Summary

Infosec operations teams, and their organizations, need to understand their capabilities (what they can do) and performance (what they have done) to justify investment and maintain team satisfaction. We provide 8 examples of operations metrics that address both.

Capabilities are the tools, data, and expertise used perform assigned cybersecurity functions. Capabilities metrics should show coverage (which risks are addressed) and availability (over what assets and when). Performance covers real-world security outcomes, value, and cost. Performance metrics show impact (actual results) and efficiency (their cost).

Introduction

Cybersecurity operations teams, and the organizations they serve, need to understand their infosec capabilities and performance to justify investment and maintain team satisfaction. Stakeholders want to see a return on current and future security investment, and security professionals want to see their work makes a difference. Clear metrics help meet both needs.

Capabilities are the tools, data, and expertise used to perform assigned cybersecurity functions (for security operations centers, typically a subset of Identify and Protect, and most of Detect and Respond). Capability metrics describe which security risks the organization can address, and where. We recommend two families of capabilities metrics:

Performance includes real-world security achievements; outcomes and the cost to achieve them, when capabilities meet circumstance. We recommend two families of performance metrics:

  • Impact metrics showing actual results of how capabilities address risks in practice, including the value of those outcomes
  • Efficiency metrics showing the cost to produce the impact, usually in terms of time, effort, and/or money

We provide 8 specific examples across these families to kickstart an effective security operations metrics program:

  1. Adversary Behavior Coverage capabilities coverage
  2. Controls Framework Coverage capabilities coverage
  3. Availability by Asset capabilities availability
  4. False Negatives performance impact
  5. True Positives performance impact
  6. Incident Time performance impact
  7. False Positives performance efficiency
  8. Return on Security Investment (ROSI) performance efficiency

Capabilities

What can we do? Which security risks can we address, how, and where?

Capabilities metrics describe what the organization can do: the possible. Contrast this with performance metrics that describe what the organization has done: the actual.

Coverage

Coverage metrics describe which risks your capabilities address, how, and where. When setting goals to improve coverage, consider the whole picture: covering every attacker behavior or control is not beneficial if you have poor availability or performance. Beware of diminishing returns.

Metric 1: Adversary Behavior Coverage

Key Question: Which adversary techniques can we protect against, detect, and respond to?

Process

  1. Start with a list of possible adversary behaviors. MITRE ATT&CK is the de facto standard and uses the terms “techniques” and “sub-techniques” to refer to things attackers do (their methods).

  2. Remove those that don’t apply to your platforms.

  3. Prioritize based on risk: which are are most likely, most dangerous, or both? Start with your own investigations (true positives, false negatives), public lists1 or with techniques used by the most threats2. We built this from ATT&CK v8.2 as a head start:

    # Used by the Most Malware Used by the Most Threat Groups
    1 Ingress Tool Transfer T1105 Spearphishing Attachment T1566.001
    2 System Information Discovery T1082 Malicious File T1204.002
    3 Web Protocols T1071.001 Obfuscated Files or Information T1027
    4 Windows Command Shell T1059.003 PowerShell T1059.001
    5 Obfuscated Files or Information T1027 Windows Command Shell T1059.003
    6 File and Directory Discovery T1083 Ingress Tool Transfer T1105
    7 Process Discovery T1057 Registry Run Keys/Startup Folder T1547.001
    8 File Deletion T1070.004 Web Protocols T1071.001
    9 System Network Configuration Discovery T1016 File Deletion T1070.004
    10 Registry Run Keys/Startup Folder T1547.001 Scheduled Task T1053.005
  4. Emulate the highest-priority techniques throughout your environment, either manually (e.g., penetration test, purple team) or with automation (e.g., Atomic Red Team, AttackIQ). The quality of the metrics depends heavily on the quality of this emulation.

  5. Identify attacker techniques from real-world incidents3 (see False Negatives and True Positives, below)

  6. Track which techniques were prevented, detected, or neither. Unobserved is not the same as undetected/un-prevented.

  7. Calculate your coverage metrics for your high-priority subset and overall.

  8. Repeat periodically, particularly upon changes in attacker behaviors or your capabilities.

Display and Interpretation

Basic Display: Ratios (percentages) of high priority and overall techniques prevented and detected, e.g., in a 2x2 grid. Show “current as of” timestamp.

Interpretation: All else equal, higher is better, first in high-priority then overall.4

Supplemental Ideas:

Metric 2: Controls Framework Coverage

Key Question: Which controls do we have in place?

Controls framework coverage is both a means to mitigate risk and also an end in itself (if non-compliance carries direct consequences, e.g., NERC CIP fines). This may be easier to track than adversary behavior coverage, especially for smaller teams, and can build bridges to the compliance/GRC team as well.

Knowing if you can detect ransomware encryption (T1486) is important, as is knowing if your backups are healthy (CP-9).

Process

  1. Start with a list of recommended or mandated controls. The NIST Cybersecurity Framework (CSF) and NIST 800-53 are reputable, popular, free options that integrate well with one another. The CIS Controls are also great.
  2. Customize and prioritize the list based on risk. Controls come at a cost and not every organization needs every control. Reputable frameworks have instructions on how to do this (e.g., CSF profiles, CIS implementation groups).
  3. Verify the presence of these controls throughout your environment, with technical confirmation when applicable (e.g., using Nessus compliance checks or similar). The quality of the metric depends heavily on the quality of this verification. Don’t ignore non-technical controls! They highlight real risks that operators need to understand (untrained employees, outdated response plans, etc.)
  4. Track which controls are implemented, and to what extent. You might select a threshold level of availability to count a control as “implemented,” use a bucketed scale like “unplanned, planned, partial, full” or capture the raw percentage.
  5. Calculate your coverage metrics for your prioritized, risk-informed subset of controls.
  6. Repeat periodically, particularly upon changes in your framework, risk, capabilities.

Display and Interpretation

Basic Display: Ratios or percentages of prioritized, risk-informed controls at each threshold of implementation (e.g., unplanned, planned, partial, full). Show “current as of” timestamp.

Interpretation: All else equal, higher is better, both for percentage of controls and level of implementation.

Supplemental Ideas:

  • Show a per-control coverage “heat map”
  • Show coverage based on varying availability of controls.

Availability

Availability metrics describe where and when capabilities are present (i.e., which assets, what schedule or uptime). When setting goals to improve availability, consider the whole picture: deploying to every asset is not beneficial if you have poor coverage or performance.

Availability metrics are also well-suited to creating alerts (“page someone if these tools go down on these systems”): they tie directly to risk, and can themselves indicate attacker activity.

Metric 3: Availability by Asset

Key Questions: Where are my capabilities deployed? Which assets are secured with which tools?

This relates to controls coverage which focuses on controls (implemented on some subset of assets), whereas this focuses on assets (covered by some subset of controls). You can usually build both from the same data.

Process

  1. Build an accurate asset inventory. This may involve active and passive discovery, querying cloud and container platforms, scrubbing CI/CD pipelines, and more. This is hard, but necessary: it’s tough to secure assets you don’t know you have.
  2. Prioritize your inventory based on risk: determine which assets are business critical, contain sensitive data, etc. See our other posts on this topic for more ideas on how.
  3. In order of priority, verify which capabilities are available for each asset. Use technical verification whenever possible (actually check if the EDR agent is running, AV signatures are updating, patches are current, etc.). Some capabilities, like network boundary devices, may cover whole subsets of assets.
  4. Calculate your availability metrics, perhaps grouped by priority, location, etc.
  5. Repeat periodically, particularly upon changes to your asset inventory or capabilities. Ideally maintain in as close to real time as possible.

Display and Interpretation

Basic Display: Ratios or percentages of high priority and overall assets with each capability deployed. Show “current as of” timestamp.

Interpretation: All else equal, higher is better, first for high-priority assets, then overall.

Supplemental Ideas:

  • Availability map(s) by logical or physical location
  • Incorporate temporal availability (downtime) by capability (e.g., shift schedules, maintenance, crashes, outages)
  • Integrate attacker behavior or control coverage by asset - basically, the implications of (non-)availability. For example, augment “asset A does not have EDR,” with “not covered against attacker techniques B and C” and/or “now non-compliant with controls E and F.”

Performance

What have we done? What are our security outcomes, and what did it cost to achieve them?

Performance metrics describe what the organization has done: the actual. Contrast this with capabilities metrics that describe what the organization can do: the possible.

Impact

Impact metrics describe actual results of how capabilities address risks in practice, including the value of those outcomes. When setting goals to improve impact, recognize perfection is impossible: clear-eyed appreciation of costs and benefits is the best way to maintain relevance and credibility.

Metric 4: False Negatives

Key Questions: What did we miss? What did we not prevent or detect?

This metric can also illustrate gaps in coverage.

Process

  1. Review (a.k.a., “hot wash,” “post mortem”) each incident. Include red team engagements. Better response documentation and investigation leads to better results.

  2. Identify attacker behavior, by technique, that was not prevented and/or detected.

  3. Count false negatives for each capability that could reasonably prevent or detect each instance of a behavior.

    For example, if data was exfiltrated over http in three sessions, through a firewall from a system with EDR, with no blocks or alerts, you’d count 12 total false negatives: three each for the firewall and EDR, for both protection and detection.

Display and Interpretation

Basic Display: Total number of false negatives for chosen window(s); trends over time (e.g., histogram bucketed by week or month). Show “current as of” timestamp.

Interpretation: All else equal, lower is better.

False negatives may inversely correlate with false positives, perhaps the fundamental tension in security operations.

Supplemental Ideas:

  • Summary statistics and their trends
  • Cost analysis for false negatives
  • Counts and trends by capability (e.g., by security device or platform)
  • Counts and trends by attacker technique.
  • Track repeat offenders, same techniques appearing in multiple reviews over time.

Metric 5: True Positives

Key Questions: What risks did we prevent or mitigate? What incidents did we detect and remediate? What costs did we avoid?

Process

  1. Determine or estimate the cost (staff hours, business loss, direct expenditures) spent responding to each incident, including initial triage. Common costs include business interruption due to attacker behavior or containment, 3rd-party fees (e.g., service providers or consultants), tool fees (e.g., per-use or quota licenses), etc.
  2. Count incidents and categorize by severity/cost
  3. Count detection and protection events related to incidents, and categorize by attacker technique (many tools provide this mapping out of the box)
  4. Count protection events not related to incidents, and categorize by attacker technique

Display and Interpretation

Basic Display: Counts and costs by severity; trends over time (e.g., histogram bucketed by week or month). Show “current as of” timestamp.

Interpretation: All else equal, lower cost per incident is better, but quantity is complicated. A few thoughts:

  • All else equal, increasing true positives indicate increasing risk (threat, vulnerability, or both). This should inform your risk strategies (i.e., avoidance, transference, mitigation).
  • Ignorance is not bliss: true positive detections are a chance to mitigate impact with an effective response, and equip you to create better protective controls. These are wins.
  • However, more TP detections are not strictly “better,” it’s complicated.5 If you build good protections against techniques you detect, you’d expect the number of TP detections to go down as TP protections go up … but those too might go down if attackers just stop trying that technique (now that its ineffective).

Supplemental Ideas:

  • Estimate cost savings by comparing to cost of similar past incidents (caution! this is difficult to do accurately - see notes on ROSI, below)
  • Counts and trends by location
  • Counts and trends by capability (e.g., by security device or platform)
  • Counts and trends by attacker behavior.
  • Track repeat offenders, same techniques appearing in multiple reviews over time.

Metric 6: Incident Time

Key Questions: How long are incidents in progress before we detect and remediate them?

Process

  1. For each incident, calculate time to detect (tdeclared - troot-cause), remediate (tlast-remediation-task - tdeclared), and recover (tlast-recovery-task - tdeclared).
  2. Store per-case values to support history and trending.
  3. Aggregate incident time values, including summary values (average, median, min, max). Average is most common: “mean time to x” metrics are very popular.
  4. Update trends for chosen window(s).

Display and Interpretation

Basic Display: Trends over time (e.g., rolling averages) to detect, remediate, and recover. Show “current as of” timestamp.

Interpretation: All else equal, lower is better.

There is data indicating faster detection corresponds to direct monetary savings. Time is literally money.

Supplemental Ideas:

  • Trends by incident type
  • Trends by asset criticality

Efficiency

Efficiency metrics describe the cost to produce the impact, usually in terms of time, effort, and money. This is distinct from the value produced by faster detection or response, and in fact is offset by that value.

Metric 7: False Positives

Key Questions: How many alerts were not associated with real incidents? What did we waste (time, effort, money, storage, etc.) on non-attacker activity?

Process

  1. Identify all alerts not related to an incident.
  2. Determine or estimate the cost (staff hours, business loss, direct expenditures) spent responding to these alerts, including initial triage. Common costs include business interruption due to preemptive containment, 3rd-party fees (e.g., service providers or consultants), tool fees (e.g., per-use or quota licenses), etc.

Display and Interpretation

Basic Display: Total number and cost of false positives for chosen window(s); trends over time (e.g., histogram bucketed by week or month). Show “current as of” timestamp.

Interpretation: All else equal, lower is better.

As noted above, false positives may inversely correlate with false negatives, but it’s not always that simple. For example, high false positives may introduce detrimental dynamics like alert fatigue, which can increase false negatives (positive correlation).

Supplemental Ideas:

  • Summary statistics and their trends
  • Counts and trends by capability (e.g., by security device or platform)
  • Counts and trends by attacker behavior.
  • Between true and false positives, calculate events per analyst hour (EPAH) to see if your analysts are overwhelmed.

Metric 8: Return on Security Investment (ROSI)

Key Questions: How do our security costs compare to our security benefits? What do we get for what we spend?

This is very important to boards, business leaders, and CISOs, but is challenging to calculate without assumptions and estimates. Even if the analysis requires effort, and even if the results aren’t 100% precise, there are important benefits to thinking about security in a business context.

Process

Disclaimer: this approach is simplified and opinionated. There are other popular ways to think about this6, 7, 8 with different tradeoffs - use what works for you!

  1. Determine the cost of each incident, considering business losses, staff cost, direct expenditures, etc. (like for True Positives metrics),
  2. For each capability whose return you are calculating, determine the:
    1. Cost of incidents when the capability was present: cw/ cape
    2. Time the capability was available: tw/ cape
    3. Cost of incidents when the capability was not present (e.g., prior to acquisition, on assets where capability was not available), but otherwise not much was different:9 cw/o cape
    4. Time the capability was not available, but otherwise not much was different:: tw/o cape
    5. Cost per unit time for the capability itself (price, maintenance, user effort): ccape-per-time
    6. Return per unit time: rcape-per-time = (cw/o cape ÷ tw/o cape) - ((cw/ cape ÷ tw/ cape) + ccape-per-time)
    7. Total return: rcape = rcape-per-time × tw/ cape

To see how the units work, take the following notional example of a new tool that costs $5,000 per year, assessed after its first 90 days, assuming nothing else in the security configuration changed:

  • cw/ cape is $25,000, tw/ cape is 90 days
  • cw/o cape is $500,000, tw/o cape is 1095 days (3 years)
  • ccape-per-time is $13.70/day ($5000/year ÷ 365 days/year)
  • rcape-per-time is ($500,000 ÷ 1095 days) - (($25,000 ÷ 90 days) + $13.70/day) = $456.62/day - $291.47/day = $165.15/day
  • rcape is $165.15/day × 90 days = $14,863.50

Real calculations get quite a bit more gnarly, but 1) they can be built from existing data, 2) they don’t require massive assumptions.

Display and Interpretation

Basic Display: Return per unit time (rcape-per-time) and total return (rcape), by capability. Show “current as of” timestamp.

Interpretation: All else equal, higher return is better.

Supplemental Ideas:

  • Calculate returns for similar incidents in your organization, where “similarity” is based on attacker behaviors in each incident. This could help you find situations where capabilities affect different types of incidents differently.

References

We’re grateful for collaborative, open products like MITRE ATT&CK and the NIST Cybersecurity Framework (CSF), with their attendant resources and communities - they’re doing amazing things to increase consistency, collaboration, and impact across infosec. Many of these ideas are tractable because of these products.

Thanks to Daniel Miessler for his information security metrics primer, both for its content and its clean presentation - particularly his introductory section on what makes a good metric. Products like Palo Alto Networks' “SOC Metrics That Matter”, Siemplify’s “No-Nonsense Guide to Security Operations Metrics”, and LogRhythm’s “7 Metrics to Measure the Effectiveness of Your Security Operations” make us confident we’re not completely off the reservation.

Conclusion

This topic is vast and this list is not exhaustive, but we hope this helps you build prioritized, useful metrics for your security program. You’ll surely find tweaks, customizations, and deep-dives that are helpful to your specific environment. Also, there are important general-purpose measures of organizational health, customer satisfaction, etc., that were not included but can be very useful to security operations teams.

Give them a try, let us know what you think! And if you’d like help with security metrics planning/implementation or anything else related to improving your information security, please contact us.


  1. Red Canary’s Threat Detection Report is great. ↩︎

  2. MITRE ATT&CK tracks which groups, malware, and tools use each technique. ↩︎

  3. We like this inclusive NIST definition that includes not only actual compromises but also potential (read: attempted) compromises and violations of security policies/procedures. ↩︎

  4. Always interpret metrics in the context of risk, priorities, and other metrics. Use critical thinking and beware metrics abuse! ↩︎

  5. There’s a doctoral thesis, or at least a conference talk, on building a predator-prey/resource-exhaustion model of cyber attackers and defenders … maybe someday. ↩︎

  6. Your organization may have a risk quantification process, or even a team. ↩︎

  7. This post from Netwrix provides a method for calculating ROSI similar to what’s taught in many security policy courses, and links to classic posts like this SANS paper from 2003 that became a de facto standard. ↩︎

  8. You could imagine comparing to “ambient risk” numbers calculated from resources like the IBM Cost of Data Breach report, Verizon Data Breach Investigations Report (DBIR), Sophos’s “State of Ransomware,” FBI’s IC3 data, etc. ↩︎

  9. This is a simplified way of saying cw/o cape and tw/o cape should control for other differences besides just the absence of the capability, if possible. Measuring this across rollout batches, for example, to be close in time and otherwise-consistent in configuration. ↩︎