ContactSupportBlogPartner Portal

Get One Step Closer to GDPR Compliance


Good morning, good afternoon, good evening; whatever the case may be.  Appreciate you taking your time to join us.  We’re going to discuss today how Graylog can help you with GDPR Compliance. We’re gonna start first with a little bit of background on GDPR and why it should matter to you and what the context is here.  Just one second—I will get that---going.  Alright. (PAUSES).  Here we go; helps if you resume the share.  

Okay.  So—should see the title screen.  We’re gonna straight into—why does GDPR exist?  Why do we have GDPR in the first place?  And the short answer is the old—regulations were just way too old. They were all divine---designed in 1995 or implemented in 1995, which was long before the internet became—the primary market place for businesses.  So, at this point, you consumers are demanding higher standards and more security for their personal data.  The entire—idea of privacy did not even include this type of data when those regulations were created.  And so, it was just time update them.  

So, why should companies care?  What is it about these regulations that are going to affect companies and what’s different than what was there before?  And the biggest difference is—is the way to look at data. Before GDPR, prevailing wisdom said that big data was an asset.  The more you had, the better you were.  Now, you can argue, the big data is—or can be a liability.  Broadly speaking, the less comfort—customer information that an organization stores, the lower their risk of running afoul of GDPR.  What you don’t have, you can’t be held responsible for.

So, what changed?  Mostly like I said is the focus on personal data.  And---(AUDIOCUTOUT)—data—that you don’t normally think that log data contains a lot that’s personal.  Maybe a user name occasionally, but most of the time we haven’t thought of that as personal data.  The GDPR has changed the definition of what constitutes personal information.  So under GDPR, access logs, error logs and security audit logs are now all contained in—considered to contain personal information.  So, companies have to protect intellectual property and—and or IP data and---cookie data, as they would personal identifiers.  Just like credit cards or—or national ID numbers. So, it’s now considered the same type of data as other personal---personal information.

So, there are four---or three---key areas or core areas in---for GDPR readiness that you need to be aware of.  The first of those is transparency.  And companies need to provide clear communications on the personal data that they use and have to man---there’s a mandatory breach disclosure. So, for instance, if you---companies have to inform subjects of whatd ata processing is gonna be done.  The process data has to match the description. So, if—they have—you have to-basically have to say what you’re gonna do and do what you say.

The---PerformanceImpact Assessments are also required.  So, you need to---understand the risk involved in this data getting in a---in a case of a breach.  So, you need to be able to---to know what’s--what would happen, so you should tabletop or---run actual simulations of a breach and make sure you understand what the impact to you customers is going to be.  

Second is compliance and companies should be particularly mindful of—the principle of privacy by design and by default should---again, we mentioned the data protection impact assessment; that’s another thing that need---needs to be high-end, management mind is the—impact and knowing what the impact is before it happens.  Documenting your data use.  You need to be able to clearly describe what it is that should happen, so that you then go and–back and prove that what—did happen matches what should happen.

You need to have data portability.  So you---because now the—the new---thing that consumers have been granted is your right to be forgotten; that’s the one we hear so much about.  And that’s erasure upon request.  Anything I---you have on me, please get rid of it---is the nature of that request. You need to be able to build your data structures so that that’s possible.

And you need to expect that there’s going to be people—more—more regulatory or statutory authorities that want to look at this; want to review this data. And so you need to have it in a form that’s reviewable by third parties.

The last---core areas is accountability.  And the—there are now fines associated with this that are at the very least non-trivial.  So, technical measures, the financial penalties can be up $10 Million euros or 2% of global, annual turnover.  (AUDIO CUTOUT).  And so, 2% of global annual revenues---gross revenues a huge amount of money. For the—non-technical key provisions, financial penalties can be up to $20 Million euros or 4% of global, annual turnover, whichever is higher.  

And then last more importantly probably is the suspicion of their right or the ability to process data (INAUDIBLE).  So, there are too many companies today who can get by without processing data for more than few minutes before it starts to impact their business.  Their most could not stay----in business without that capability.  So, there’s some real teeth, you could put out of business for doing this wrong.  And so, it’s important that you make sure—that you know—that—that companies make sure that they pay attention to these core areas.

There’s six principles they’ve broken down into.  The first is be fair; the second is have a reason; the third is minimize data.  Be fair boils down to say what you’re gonna do and then do what you said you were gonna do; and be able to doc—demonstrate that; document each of those steps.  The second one is if you don’t know why you’re collecting personal data, stop!  If you don’t know what that data is going to be used for, stop collecting it; it’s a liability.  And then minimize that data.  Only keep the data you have to have to satisfy the reason you came up in the prev—came up with in the previous step.  So, those are very important.  If you don’t know why you’re doing it and---you keep more than you need, you’ve created liabilities for the organization.

Stay updated.  So, the data has to be accurate.  If you’re going to be able to delete you need to be---it needs to be accurate in the first place.  If you can’t demonstrate that you’re keeping the data accurately, then you will have difficult times proving that you have handle it appropriately.  To keep it as only as long as necessary.  You know, we minimize the data, but we also don’t wanna keep it any longer than we have to.  So uh, if it-if your retention policy says we keep it for 180 days, after 180 days, it’s a liability; it’s something that can be used against you.  And then process it appropriately.  Just—just to make sure that it is cared for appropriately, you’ve got—it’s got to be processed in a way that will protect that data and retain its integrity.

So, there are classes---there’s three roles of people that are affected by this.  The first role is the data controller and that role defines how to process the personal data and why you wanna person---process that personal data.  The controller also takes responsibility for third-party processors in making sure that comply with GDPR.  In---under GDPR, if a third—party processor is out of compliance, then the primary organization is also considered non-compliant.  So, if you have a business partner, who fails GDPR compliance, you fail.  And that’s something new as well.

The data processor is the second role.  And they could be---data processors could be comprised of outsourcing firms or internal teams that maintain or processed the personal data records in any way.  So, anybody who touches that data is a data processor.  Both the company and the processing partner are accountable for breaches, just like they are in the---in the data controller example.

The last is a Data Protection Officer and there are fewer of these because it doesn’t apply to many customers.  GDPR dictates that companies have to have a DPO only if they’re public authority or handles large amounts of data or data gathered from EU citizens.  So, for---people outside of the EU, unless they have huge business presence in the EU, they’re probably not gonna qual—not gonna have a DPO.  But that’s a new role that’s been created, as well.

So—to the meat and potatoes here; how can Graylog help you with GDPR compliance? What can we do to make this easier for you?

The first thing we’re going to do is we’re gonna show you how we can help you to adhere to the data planning requirements.  So—you---you meet several GDPR requirements related to how you handle---your organization handles personal data.  And data flow planning is key because it’s the thing you do upfront. It’s gotta be well planned in order to be repeatable and in order to meet GDPR requirements for transparency.

So---let us start with---streams.  You should be able now to see—the Graylog Interface and I’m got—I’m not gonna log in; I’m not gonna do this like a tradition demo where we go through step-by-step and show every feature; just gonna show the ones that are relevant to GDPR and the subject matter at hand.

So, streams are---a key part of Graylog.  They are---the concept of stream underlies many---stream underlies many of Graylog most important feature; I’ll show some of those. But the idea is they are tags that route traffic.  So, you can route traffic into---into a pipeline trap processing—a pipeline processor. Pipeline processors are used for---used for manipulating data.  Streams have their own set of rules.  You set these rules---pardon me.  Streams are set here with these streams rules.  In this case, field source must match AWS Cloud Trail.  That means everything that matches this is now going to be put into this stream.  You can have multiple streams, but these streams can then be handled separately. So, first thing you can do is you can----you can parse that data; you can modify that data, you can enrich that data. You can—you do that via something called: Pipelines.  Pipelines---are sets of stages that allow you to make modifications to data.  So, in this case—you see it’s got a pipeline connection, setup to a particular pipeline, you can connect it to different pipelines, so different rules can apply, or the same rules can apply to different---streams.  You can---from there, you’ve got different stages; so up to 99 stages to let you perform—modification to that data.  So, in this case it’s an example (INAUDIBLE)data that’s embedded in (INAUDIBLE); we’re isolating the (INAUDIBLE) parsing that out and then we’re cleaning up that, so that we’re not retaining any data that we don’t need.  And so you can see with these pipelines, you can---you can quickly get in, you can get rid of data that you don’t need.  So, for example, if there’s personal information in there that is not required for your processing, then delete it here with a pipeline and avoid having to keep it. Because you can’t be held---you—you can’t be out of compliance for data you never collected in the first place.

The second thing that streams feed is something—is—the second thing that streams can do rather(PAUSES)---is it can be used to route---traffic---to an index set.  An Index Sets are a separate container for data.  And those---that data can be treated differently.  So, Index Sets in this case—(PAUSES)---you can set the number of Index charge, you can set replication, which is nice.  You can se rotation strategies.  So, you can set it by size, you can set by time; you can also set the period---the rotation period and you can decide what gets done to it. Does it get archived, does it deleted? And in most case, you’re gonna probably archive or delete that.  And this is going to give you the ability to route with streams, pipelines and Index Sets, you can now route the data that does contain personal information into one Index Set and apply a retention period or a retention setting to that and you can take data that does not contain personal information and route it into a separate Index, so that you can then treat that data however is appropriate.  If that data needs to be kept longer or that data needs to be kept shorter then you got the ability to set that via IndexSets.

So those—that Retention Policy will help you also to comply with the Data Retention Policy and GPR meaning you’ve got a defined process for storing that data knowing that—keeping that data separate from non-personal information and for getting rid of that information---as you’ve intended and—and be able to demonstrate that it’s been done by showing the logs of those.

So, data protection by design is another principle that we’re gonna help you to---with in Graylog.  And that means that you treat your data as a valuable asset and you’ve got to apply the---you have to apply role-base—pardon me—role-access---role-based access control to that data.  You need to be able to show—or be able to control who sees that data and how they get to it; what they can see and what they can’t.  We do that via users and roles.  And the role-based role here in this case, you’ve got administrative or reader roles.  I’ve created one for the Test AWS Reader.  Take a look at that.  And we control that access by giving control—or giving access to streams and to dashboards.  So, in---you know—we’ve been looking at the cloud-trail streams, so, I can give you read access to that or I can give you edit access to that or I can give you neither. With the dashboards, we’ve got AWS Networks, so you can set reading to that.  And you can then---control what they can and can’t see.  Anything they’ve been given access to they can see, anything that they can’t---haven’t been given access doesn’t even show up, so they wouldn’t even be aware that that data was—was behind any of this.  And this role is very useful, but in many cases, companies have got an investment in a centralized director structure.  So, most often that’s L-Dap or Active Directory. So, we can map this in addition to just these static roles that we’ve created here.  You can also map these to groups---with an active directory.  So, then you’ve got a centralized place to control authorization and group membership in your active directory structure, which is probably where most of your authentication and authorization is built today can be extended into Graylog---so to work with—with your existing—structures.

Logging and auditing is---excuse me.  Log—logging and auditing is another part of GDPR and so that’s obviously with a Log Management Solution.  That’s one of the things we are particularly good at.  And---so you have to do, you know, sort—sort of monitoring---the great thing about these regulations is that they’re—they’re mandating that you do the things that you really know you ought to have been doing all along.  And one of those is monitoring data.  Monitoring the---flo—in-flow or the—the log data that comes in and looking for security issues, trying to find breaches, making sure that you know what’s going on with the data in your environment.  And so these are some dashboards that Graylog—uses and that’ a—you know—monitoring data, much security or operations data is a core competency for Graylog.  These are just a few of the visualizations you can do.  So, that’s one of things you’ll be looking for, of course, is graphical representations of data, so that you can aggregate it and—and keep an eye on it, see when things rise to the top or other cases may be rise to the bottom or fall to the bottom rather.  The global activity might show you continuity.  You can demonstrate that your logs have been continuously---collected over the period of time in question. And then you can drill down if you need to and—and do searches.

So, let’s go into the search interface.  We’ll set our timer---this is—this gives you a—an easy and intuitive way to find things.  So, for example, we will look at the last 30 minutes.  We can just click on that and say, okay, this is everything.  We can then, you know, scroll through if we want to take a look at what data is in there.  In this case, it’s a DNS Request.  Here’s a Window’s Event.  Knows we’ve parsed all that out.  So, here’s a---here’s edit—a user name.  So, we can pivot from here if we want to then start building queries that are based on something specific that, you know, as we review this, we now changed, as we click on that, it changes to user name administrator and now, show me everything that’s had user—the administrator user name involved.  We can then go in and do some—some more—pivots say, show me all those source---addresses that have tried to log in as administrator. Could be very useful, especially if it’s not an internal box.  

So, this gives me the ability to then dig in further---and see what’s going on—with the logs.  I could if I want to build this into another dashboard like we were looking at before.  So from---straight from here, I can build anew graphical—a new—visualization that I can use to monitor things future.  And here, I’m not really looking for anything in particular, I’m just looking at data in general and trying to see anything that might jump out as an anomaly or anything that might indicate a problem.  Sometimes though, I’m going to have saved searches that are something that I do regularly.  So failed (inbound??) log and attempts would be one.  So, last 5 minutes or maybe I’ll say last day; oh—(MUMBLES).  And do you noticed it created its---this is a query that I’ve done before that I can then call on and do these searches very quickly. This is something that I do a lot and I don’t wanna have to type that every time.  And all I’ve said here is show me all failed Window’s log in events that are not coming from a 172 source address, which happens to be my---RCF 1918 Internal Network.

‘Kay.  So---(PAUSES)---here we go.  This is all—all this is source; sorry.  (MUMBLES)—source address; here we go.  So, here’s this is---over the last---two days.  This is not just administrator logins---from here, if I wanted to say, what user names are these people using, I can then click again—and then see all the different—(CHUCKLES)—although administrator is a big one.

So, that’s when I’ve got data that I’m not entirely sure what I’m looking for, I’m not responding to an alert, I’m not responding to something that—or to a request;I’m just looking through the data and doing general reviews.  I can use dashboards for that, I can use save searches for that.  Everyone will build up their own set of queries that they’re useful in their own environment.  

 Sometimes though, the cus—the customer is going to send---want to have exercised their right to be forgotten.  And in that to do that, you’re going to---need to be able to go and search for a specific IP in many cases. Maybe a username—we’ve been usernames, but could be an IP, as well.  (PAUSES).

Here we go.  So, I’ve got an IP Address.  So now, I need to go back and I find out everywhere that data exists because if I’m going to be---if I’m going to delete it, I need to be able to find it.  So, if they said my IP was, you know, 2-1-292,187,125, then I look through, I might have to look through a lot more than the last two days, but just for sake of the—demonstration, we’ll stick to that.

And now, I might say, okay, that’s so---that’s the IP Address, or there any usernames associated with that.  I don’t see username as one of my field names, so, I don’t have to worry about that.  Source address should always be the same.  Destination addresses.  So, where have they been---where did they go?  Where did they touch?  In this case, they all---they—everything went to the same system.  What systems have logs that I’m going to have to delete?  I can click on this and see it as my Flow logs and my Window’s Logs.  So, now I know which systems I need to worry about; and I know what they’ve touched. So, it gives me an ability to see quickly get to all of the data that’s relevant and to be able to find it and then if I—if I need—if necessary, delete it or at least know where that data is.  

‘Kay.  So---that helps you with maintenance of the data subject rights.  And the last thing I wanted to show you was, something that’s going to help you with the---maintenance of the data itself to show that it has been appropriately---collected and handled appropriately.  And one of the things you have to is to provide some resiliency because Log Data comes in 24/7.  And you need to be able to---you need to be able to demonstrate that data is---collected in such a way that it--should there be spike, should there be something that happens to your system that the data continues to be collected; you can’t drop messages, you know.  If you lose anything, then that’s a---that’s a gap that you then can’t demonstrate you’re being compliant.  

So, one of the things we do to make it easier for Graylog to be more resilient, is we’ve created something called a Journal.  And the Journal writes the data to disk as soon as it comes in.  So that---in a raw format—so that should there ever be any interruption in the electricity, you know, if there’s ever any power interruption or any network interruptions that data doesn’t disappear.  You know, if the power cycles thin, it’s already been written to disk and you’re not losing anything that’s---that’s come in.  

In addition to theJournal, we also have Buffers.  And the Buffers are helping to protect Elasticsearch, which is our storage engine.  It’s the backend of the Graylog System.  So, our input Buffers processed by (INAUDIBLE)output Buffers provide a way for Elasticsearch to be able to---to have some flow control.  We will control the insertion into Elasticsearch such that it doesn’t ever get overwhelmed and doesn’t drop events and should Elasticsearch fill up, it gives you a place for that data to go while you do whatever maintenance is necessary to free that space.  So, we’re going to provide some resiliency that way.  We also offer clustering for both to Graylog and the Elasticsearch---components so that you can have re---redundancy and resiliency. So, high availability--we’ve—showed them the Index Sets earlier where you can set replication.  So, if you got two servers, you can set a copy of the data on each Elasticsearch server, so that if something happens to the first, a complete copy exists on the second.  And that can extended out into, you know, and infinite number—extended laterally.  So, you can have 10 or 12 different Elasticsearch servers and charge or—indexes replicated across multiple servers to provide even more redundancy—(MUMBLES) simple example.

So, that is---wow, we went through it very, very quickly!  I guess, archives would be the last thing we needed to look at.  So, archiving is one of the things that---allows you to retain data---in a more cost effective manner, not an Elasticsearch,but out to a flat file.  In archives, you got control over when the archives get made, you have control over which data---take a look at---let’s see. (MUMBLES:  Configuration; here you go).  You can actually choose which streams get archived.  So, if you’ve got data that has personal information, like we talked about in an earlier example, and you don’t want to retain that data, but you do want to retain all the other data, you can just choose not to---archive those streams and they’ll get deleted once this.  Once this---once theIndex is archived, all of the flow—this---streams that were not included are simply dumped.  So, it gives you the ability to---keep the data you need to, get rid of the data you don’t or conversely, if you do archive the—personal data, you can then use these flat files; they’re (INAUDIBLE) flat files that are on commodity storage.  So, you can actually search through those in the archive, should you need to go back and find data in the case of the customer, who—who has a complaint or want something deleted.  So, you can still go and look through your archives without having to restore them first, then restore the ones you need, perform whatever searches or—or modifications you need and restore that---and return that to the archives.

So---that is the quick tour.  We’ve had everybody muted u to now.  There may be some questions in the Q&A section.  Does anybody have any questions?  (PAUSES).



Alyssa, would you like to open the—mic, perhaps and see if anybody has any—questions they’d like to ask?

Alyssa Fox

Sure.  I don’t see any in the Q&A panel, but I will unmute everybody and----see what we can do there.  If anybody has a question.


Back to Resource library

Stay In The Know

Get Graylog email updates and be the first to know about new content, product updates, and tips and tricks!