Incident Detection and Investigation – How Math Helps But Is Not Enough

I love math. I am even going to own up to having been a "mathlete" and looking forward to the annual UVM Math Contest in high school. I pursued a degree in engineering, so I can now more accurately say that I love applied mathematics, which have a much different goal than pure mathematics. Taking advanced developments in pure mathematics and applying them to various industries in a meaningful manner often takes years or decades. In this post, I want to provide the necessary context for math to add a great deal of value to security operations, but also explain the limitations and issues that will arise when it is relied upon too heavily.


                                      A primer on mathematics-related buzzwords in the security industry

big-data-buzzword.jpgThere are always new buzzwords used to describe security solutions with the hope that they will grab your attention, but often the specific detail of what’s being offered is lost or missing. Let’s start with my least favorite buzzphrase:


  • Big Data Analytics – This term is widely used today, but is imprecise and means different things to different people. It is intended to mean that a system can process and analyze data at a speed and scale that would have been impossible a decade ago. But that too is vague. Given the amount of data generated by security devices today, scale of continually growing networks, and the speed with which attackers move, being able to crunch enormous amounts of data is a valuable capability for your security vendors to have, but that capability tells you very little about the value derived from it. If someone tries to sell you their product because it uses Cassandra or MongoDB or another of the dozens of NoSQL database technologies in combination with Hadoop or another map/reduce technology, your eyes should gloss over because it is more important how these technologies are being used. Dig deeper and ask "so your platform can process X terabytes in Y seconds, but how does that specifically help me improve the security of my organization?"


Next, let me explain a few of the more specific, but still oversold math-related (and data science) buzzwords:

  • Machine Learning is all about defining algorithms flexible and adaptive enough to learn from historical data and adjust to the changes in a given dataset over time. Some people prefer to call it pattern recognition because it uses clusters of like data and advanced statistical comparisons to predict what would happen if the monitored group were to continue behaving in a reasonably close manner to that previously observed. The main benefit of this field toward security is the possibility of distinguishing the signal from the noise when sifting through tons of data, whether using clustering, prediction models, or something else.
  • Baselining is a part of machine learning that is actually quite simple to explain. Given a significant sample of historical data, you can establish various baselines that show a normal level of any given activity or measurement. The value of baselining comes from detecting when a measured value deviates significantly from the established historical baseline. A simple example is credit card purchases: consider an average credit card user is found to spend between $600 and $800 per week. This is the baseline for credit card spending for this person.
  • Anomaly Detection refers to the area of machine learning that identifies the events or other measurements in a dataset which are significantly different from an established pattern. These detected events are called "outliers", like the Malcolm Gladwell book. Finding anomalous behavior on your network does not inherently mean you have found risky activity, just that these events differ from the vast majority of historically seen events in the organization’s baseline. To extend the example above: if the credit card user spends $650 one week and $700 the next, that’s in line with previous spending patterns. Even spending $575 or $830 is outside the established baseline, but not much cause for concern. Detecting an anomaly would be to find that the same user spent over $4,000 in a week. That is an uncharacteristic amount to spend, and the purchases that week should probably be reviewed, but it doesn’t immediately mean fraud was committed.
  • Artificial Intelligence is not exactly a mathematics term, but it sometimes gets used as a buzzphrase by security vendors as a synonym for "machine learning". Most science fiction movies focus on the potentially negative consequences of creating artificial intelligence, but the goal is to create machines that can learn, reason, and solve problems the way the awesome brains of animals can today. "Big Blue" and "Watson" showed this field’s progress for chess and quiz shows, respectively, but those technologies were being applied to games with set rules and still needed significant teams to manage them. If someone uses this phrase to describe their solution, run, because all other security vendors would be out of business if this advanced research could be consistently applied to motivated attackers that play by no such set of rules when trying to steal from you.
  • Peer Group Analysis is actually as simple as choosing very similar actors (peers) that do, or are expected to, act in very similar manners, then using these groups of peers to identify when one outlier begins to exhibit behavior significantly different from its peers. Peer groups can be similar companies, similar assets, individuals with similar job titles, people with historically similar browsing patterns, or pretty much any cluster of entities with a commonality. The power of peer groups is to compare new behavior against the new behavior of similar actors rather than expecting the historical activity of a single actor to continue in perpetuity.


Make sure the next time someone starts bombarding you with these terms that they can explain why they are using them and the results that you are going to see.


Mathematics will trigger new alerts, but you could just trade one kind of noise for another

The major benefit that user behavior analytics promises to security teams today is the ability to stop relying on the rules and heuristics primarily used for detection in their IPS, SIEM, and other tools. Great! Less work for the security team to maintain and research the latest attack, right? It depends. The time that you currently spend writing and editing rules in your monitoring solutions very well could be taken over by training the analytics, adjusting thresholds, tweaking the meaning of "high risk" versus "low risk" and any number of modifications that are not technically rules setting.


scatter_anomalies.pngIf you move from rules and heuristics to automated anomaly detection and machine learning, there is no question that you are going to see outliers and risky behaviors that you previously did not. Your rules were most likely aimed at identifying patterns that your team somehow knows indicate malicious activity and anomaly detection tools should not be restricted by the knowledge of your team. However, not involving the knowledge of your team means that a great deal of outliers identified will be legitimate to your organization, so instead of having to sift through thousands of false positives that broke a yes/no rule, you will have thousands of false positives on a risk scale from low to high. I have three examples of the kind of false positives that can occur because human beings are not broadly predictable:


  1. Rare events – Certain events occur in our lives that cause significant changes in behavior, and I don’t mean having children. When someone changes roles in your organization, they are most likely going to immediately look strange in comparison to their established peer group. Similarly, if your IT staff stays late to patch servers every time a major vulnerability (with graphics and a buzz-name!) is released, this is now some of the most critical administrators and systems in the organization straying from any established baselines.
  2. Periodic events – Someone taking vacation is unlikely to skew your alerting because the algorithms should be tuned to account for a week without activity, but what about annual audits for security, IT, accounting, etc.? What about the ongoing change in messaging systems and collaboration tools that constantly lead to data moving through different servers?
  3. Rare actors – There are always going to be individuals with no meaningful peer; whether it is a server that is accessed by nearly every user in the organization (without their knowledge) like IIS servers or a user that does extremely unique, cutting edge research like basically everyone on the Rapid7 Research team, mathematics has not reached the point where it can determine enough meaningful patterns to predict the behavior of some portion of the organization that you need to monitor.


Aside from dealing with a change in the noise, there is the very real risk that by relying too heavily on canned analytics to detect attacks, you can easily leave yourself open to manipulation. If I believe that your organization is using "big data analytics" as most are, I can pre-emptively start to poison the baseline for what is considered normal by triggering events on your network that set off alerts, but appear to be false positives upon closer investigation. Then, having forced this activity into some form of baseline, it can be used as an attack vector. This is the challenge that scientists always run into when observing humans: anyone that knows they are being observed can choose to act differently than they otherwise would and you won’t know.


A final note on anomalies is that a great deal of them are going to be stupid behavior. That’s right, I guarantee that negligence is a much more common cause of risky activity in your organization than malice, but an unsupervised machine learning approach will not know the difference.


InsightIDR blends mathematics with knowledge of attacker behavior

This post is not meant to say that applied mathematics have no place in incident detection or investigation. On the contrary, the Rapid7 Data Science team is continuously researching data samples for meaningful patterns to use in both areas. We just believe that you need to apply the science behind these buzzwords appropriately. I would summarize our approach around this in three ways:know-opponent.jpg

  • A blend of techniques: At times, simple alerts are necessary because the activity should either never occur in an organization or occurs so rarely that the security team wants to hear about it – the best example of this is providing someone with domain administrator privileges. Incident response teams always want to know when a new king of the network has been crowned. Some events cannot be assumed good when a solution is baselining or "learning normal", so there should be an extremely easy way for the security team to indicate which activities are permitted to take place in that specific organization.
  • Add domain expertise: Adding security domain knowledge is not unique to Rapid7, but thanks to our research, penetration test, and Metasploit teams, the breadth and depth of our familiarity with the tools attackers use and their stealth techniques is unmatched in the market. We continually use this in our analyses of large datasets to find new indicators of compromise, visualizations, and kinds of data that we will add to InsightIDR. Plus, if we cannot get the new data from your SIEM or existing data source, we will build tools like our endpoint monitor or no-maintenance honeypots to go out there and get the data.
  • Use outliers differently: Almost every user behavior analytics product in the market is using its algorithms to produce an enormous list of events sorted by each one’s risk score. We believe in alerting infrequently, so that you can trust it is something worth investigating. Outliers? Anomalies? We are going to expose them and help you to explore the massive amount of data to hopefully discover unwanted activity, but the specific outliers have to pass our own tests for significance and noise level before we will turn them into alerts. Additionally, we will help you look through the data in the context of an investigation because it can often add clarity to traditional "search and compare" methods that your teams are likely using in your SIEM.


So if you want to drop mathematics into your network, flip a switch, and to let its artificial intelligence magically save you from the bad guys, we are not the solution for you. Sadly, though, no solution out there is going to fulfill your desire any time soon.


If you want to learn more about the way InsightIDR does what I described here, please check out our on-demand demo. We think you will appreciate our approach.

Source: Jive SBS Syndication Feed @ February 22, 2017 at 05:35PM