|
Tuesday, December 09, 2008
Security Operations Metrics: Introduction - An overview of my thoughts on Security Operations and Incident Response Processes and Metrics that are easily achievable in most organizations. I’ve been thinking for a long time about how to integrate processes and measurements and how to interpret data from those measurements related to Security Operations and Incident Response. This is just the introduction of what I’m sure will be many discussions on the topic. I specifically don’t address many subjects for various reasons, mainly because I want to focus on the Operational Processes and Measurements that can help increase efficiency and effectiveness of your Monitoring and Response Teams. Disclaimer: What this post is not: Overall Risk Values, Executive/Mangement Dashboard, Security to Business requirement alignment, Vulnerability Management, a way to reach FISMA or PCI compliance or 100% complete. It is intended to produce thought around a workable set of internal metrics to help organizations grow and refine their security operations and incident response capabilities especially in relation to Detection and Response aspects of those teams. Why are metrics important: I see it on a daily basis so I’d be willing to bet that most people would agree with me that a certain percentage of their organization’s systems are currently compromised. I’ll go further and say that many don’t have the time to address those machines and since they aren’t affecting business in a meaningful manner the current risk level is still acceptable. If we could change that thought process around a bit and we could find a way to measure the % of the data manipulated on those systems or across the enterprise I’d wager that any number above 0 would introduce a level of concern that makes the risk unacceptable. So which should we measure? Both if possible, but how likely is it to measure both? You must figure out the best way to reliably gather and process the information in order to tell the story. For example if I have no way to tell how much data has been manipulated perhaps I need to start looking at that in a more holistic fashion across the enterprise, it seems to be a natural progression. I need better Collection and perhaps analytical techniques to address this problem set’s requirement. So with Security Operations and Incident Response in mind lets take a quick peek into the integration of processes, the categories of metrics and what we might be able to measure and what value we can gain from some those measurements. In future blog posts and workshops we’ll facilitate discussion into these areas much more specifically. Collection: There are some obvious measurements here that you can and should measure from day 1 in your Security Operations Center. 2. Bad Metric Made Better: How effective is my SIM in reducing events I have to look at? This is the first metric I get asked to look at in nearly all cases. And while there is some value to it, I’m not sure it is the first piece of information I’d want to know in judging the stressfulness of my SOC or IRT. a. The idea is this: b. Generally speaking the lower the number at the bottom the happier the customer is with the “tuning” of the system. I agree to certain extent that this is functional but only when you add it other elements like the context of the Problem set. In this metric it is critical to understand that the number of correlated events is irrelevant. Many systems generate correlation events that can be used by a number of more advanced correlation scenarios and in fact add to the overall efficiency of the system. The number of “events of interest” is the key measurement of what your analysts have to look at on a daily basis. More to the point the number of “Events of Interest” per problem set is the measurement that matters. Only once we understand the “EOI” in terms of problem set can we begin the process of tuning the system (reducing false positives, fixing event source log levels, tuning correlation timing and parameters, etc) to make identification more efficient. Analysis: 1. Number of “events of interest” evaluated and number of Incidents escalated per problem set is an easy one. Escalation: Response/Remediation: This at least provides us goals to measure against, and highlights very quickly where problems exist in the remainder of the model. For example if I didn’t have adequate information to conduct the analysis I know I need better visibility (in terms of log levels, event sources, etc). If I can’t conduct live system forensics in a reasonable time period we can justify the need for more robust forensic support in our organization. Measuring these attributes may also point out failures in the overall Incident Response Plan (Incorrect POC or asset information), Political Hurdles, etc. This is the easiest way to show increasing competency in your organization as the overall timeframe shrinks from Months to Years down to Hours and Days. Quick Tips: Tip#2 Figure out what matters in your organization/industry: The concept of measuring the % systems compromised across your organization seems to be gaining some level of support (I agree it is useful). It is easy, reproducible and has obvious value - You can quantify the known compromises/infections versus the overall number fairly easily and then ask the question – What percentage is acceptable to our company? You’ll find that the number varies based on who you ask (0%, 1,%, 10%,+) and the type of system, etc. Summary:
Page 1 of 1 pages
|
|