Göteborg, 8 Nov, 2011
Welcome to SIEM - part 3
So you’ve made it through the first couple blog posts where we talked about the two parts of a SIEM system, and you now have a basic understanding of what a SIEM is and how it works. Now that you’ve made it this far it’s time to look at hardest, and by far the most difficult part of working with a SIEM: the human element. Yes, it’s humans who sign the checks to pay for the SIEM, it’s a human who set up the processes and procedures on how the SIEM is used, and it’s a human who has to actually put in the time and effort to learn and to use the SIEM. This human element is the one variable that can’t be programmed or scheduled, and it’s the reason why a large majority of SIEM deployments “fail”.
I wish I could say SIEM vendors pitch their products as anything but easy, but if you’ve ever sat through a sales pitch you know that’s just not the case. Many of the SIEM vendors still pitch their product as a “SOC-in-a-box”, similar to some kind of super duper IDS device, where you simply drop it in, walk away and let the magic happen. They don’t bring up the fact that you’ll need from 1 to 3 new full time employees (FTE’s) to run, use and maintain the system. They may also forget to mention that those people also need to be expert security analysis and already know how to use the SIEM if you want to get any value out of it. And from my personal experience working at a SIEM vendor for 5 years, it takes an end user about 6-12 months to come up to speed on a SIEM before they can truly start to use if effectively – and that’s if they are a fairly good security analyst to begin with and actually know what to ask it to look for. I’m sure you are already aware good security analysts don’t come cheap, and one with experience in your SIEM platform is even more rare then some endangered animals.
Not exactly a "SOC-in-a-box" is it?
Lucky for you (but unlucky for your wallet), the vendors are well aware of this and are happy to provide your access to their professional services team. For $2,000 a day they would be more then happy to make those reports work that came “out of the box” not working. They will be happy to train your employees on how to use their product, and would be happy to help maintain it when it has problems (and it will have problems). Now I can’t say all the big vendors are like this, but sadly large majorities depend on their professional services team for a big chunk of their income. Then again this strategy isn’t that strange, how many other IT security products require that same model? If you installed a firewall, most likely you had to have professional services install and configure it if you didn’t have the knowhow in house. If you installed a token-based VPN solution, you probably had professional services involved. The difference between those point solutions and a SIEM is once you have your firewall up and running, that’s it. A SIEM is something you need to work with and maintain everyday – you don’t just configure it and walk away. It’s why I often compare a SIEM product to an ERP system, it requires a lot of “care and feeding”.
Say you wanted to bypass as much as you could, so you wrote the check for your vendors’ professional services team to come out, set up your SIEM and now it’s running. You are getting logs and the reports they set up seem to be working. Now what? Well as long as nothing changes in your IT environment and you’re happy with what you have, all you really need is someone to do the daily system maintenance and analyze the reports. This is probably the simplest scenario possible, and I’m going to assume if you paid 6-figures for your SIEM and a lot more on professional services that you might expect your investment to do a bit more than that.
This is where things in the SIEM world start to fall apart. Most companies expected going in that this software would somehow replace employees, not add more, and well-trained employees at that. Again, going back to my experiences companies try and do more with less and assign their overworked security analyst or a system admin to work with their new SIEM product. The fact is that, maybe like you, they were driven towards SIEM by some compliance requirement; they knew they need it, yet they knew very little about what SIEM is or how it works. And now they have one or two people who had a few days of training, and they are expected to do some pretty complicated tasks all while still handling their existing responsibilities. Well it doesn’t take a crystal ball to see the future and what’s going to happen – you’ve got a one-way ticket to fail town.
So again going back to the first paragraph, this is why a lot of SIEM deployments fail and the customers are unhappy with their results. And what’s even crazier is that I’ve seen companies repeat this behavior over and over again. They buy SIEM “A”, use it for a year but don’t get the results they expect, so they throw it out. Now they buy SIEM “B”, use it again for a year, don’t see results and again throw it out. Now they are looking at SIEM “C”. I wish I could say this is a rare case but it’s actually not. I’d wager up to 1/4 of all the sales opportunities we looked into involved companies that were thinking about replacing their current SIEM solution because they were unhappy with it. When we started asking questions, it became clear why the SIEM they had failed, and most of the time it had nothing to do with the SIEM those chose to buy – they simply didn’t dedicate enough resources to make the project work. The sales teams from the various vendors aren’t going to tell you about it, so companies continue to throw money at products hoping to solve their problems.
Now that you know the pitfalls, you’re asking what is it you really need to get the most out of a SIEM? Well that will come in the final part of this blog when we’ll look at how to use your SIEM, what type of processes you need wrapped around it, and what kind of resources you need to dedicate to it to be effective.
Göteborg, 1 Nov, 2011
Welcome to SIEM - part 2
If you recall from our last blog posting, we started in on unraveling the mystery that is SIEM. We first tackled the basics of the SIM side, and in this next posting we’ll start to take on SIM’s brainer brother, SEM.
So what is it about SEM that makes it so smart? Well if you remember, SEM stands for “security event management”, which means rather then looking at just piles of data, it actually attempts to look at “events” contained within that data, and to correlate those events with other events. Sometimes this can be as simple as showing a user logging into 2 systems at the same time, or it can be as complex as showing a connection to a firewall that it allowed, which then sets off a signature in your IDS/IPS, and then a connection to a Windows box, followed by an attempted authentication against your AD server and then an outbound ftp session to China. For these correlations to be of much use to a SOC, they obviously need to be made in “real time”, which means as soon as these events happen – or as it really works, as close to real time as possible.
So how does our SEM make this magic happen? Well it may sound trivial at first to do this kind of correlation - but remember our last blog about the shear volume and different types of data going into your system? To make an analogy, it can literally be like you sticking a fire hose in your mouth and turning it on (do not attempt at home). For this reason, devices that work as a SEM often carry the requirement of vast amounts of RAM and fast processors. The SEM application works it magic by creating state machines; sometimes millions of them, and all in memory so it can try and digest that fire hose and keep track the billions of events it’s seeing as fast as possible (we want it in real time right?)
I want to make sure we stay together, and I know unless you went through a machine learning class in school or just happen to be an uber nerd, you might not know what a state machine is or how it works. Lets take a second to explain (or if you want to use Google, check out “state machines” or even better a “turing machine”). To pick a trivial case, say we want to know when someone has 5 failed logins over the span of 2 minutes. Now think of our state machine as a chain with 5 links in it and a timer attached to it. When our state machine sees 1 failed login from user X, it hits the first link in our chain and starts the 2-minute timer. Say 20 seconds later we get another failed login by user X, the state machine moves to the second link in our chain. This continues until one of the two possible outcomes happen; first is we move to the last link in our chain, which is when we see the 5 failed logins before our 2 minute timer expires – in which case our SEM takes some kind of action like showing a flashing red alert on a screen or sending someone an email, etc, or the second case happens where the timer reaches the 2 minute mark without getting to the 5th link in our chain, at which time the state machine expires and disappears without anything happening.
Basic state machine
So now that you’re an expert on state machines, you understand the basics of how most real time SEM systems work under the hood. Again, this may look like a trivial task to the human eye, but when it comes to SIEM it suddenly becomes quite complex. You need a well-implemented real time system that needs to be able to process billions of events and keep track of millions of states. It’s not a trivial thing.
Next we come to non-real time reporting, and this is where the SIM and SEM overlap. The SIM side is holding the data, and the SEM side uses an interface to the data store to do some query, be it via SQL, regex, or some hybrid query language, to look for correlations over time. These types of reports are used to show exceptions, do trending and to summarize historical data. The basic idea for these kinds of reports is to show things that either shouldn’t happen, or showing everything that happened, all over some period of time. Lets take a PCI report as an example; one of the requirements states you need to know who logs into your devices and that you review it daily. So you set up a report, which probably runs sometime at night, and when you come in the next day the report is waiting for you. This is not obviously real-time. Trending I think we are all familiar with, and we all know the execs love it. Being able to give your CISO a graph showing how serious security incidents are going down might not get you an instant raise, but it will reflect the value and competency of your security department. And lastly, we all need those inglorious summary reports – these are those reports showing aggregate data sets over time. So how many times this month has that one troubled user had his account locked because he can’t seem to remember his password? How many times have you had to block some IP address in North Korea from those annoying port scans? You get the idea.
Data correlation is an essential part of what a SIEM provides
So SEM encompasses the overall analytics and “brains” of your SIEM, and it puts the data you’ve collected to work. So we’re done right? That’s all there is to SIEM. Think again. Just because you have a SIEM and have this powerful tool at your fingertips, we still haven’t addressed how to actually make it work. How do we make those features actually make you that graph or report you can look at when your morning coffee? Well that is what we’ll cover in part 3 of this post.