With Data Comes Responsibility

August 02, 2020

7 min read

The cloud computing is the future. Cloud services slash the development time, enable novel possibilities. And at the same time, expose to new risks. Here are my thoughts how to prevent it.

This post is a part of Data Governance From an Engineering Perspective, a series of posts about Data Governance and Metadata.

Introduction
1. Data Governance From an Engineering Perspective
2. The Alter Ego of Data
3. Tools in the Data Management Zoo
4. With Data Comes Responsibility (this post)
Physical system
Data models
Business processes & Compliance

Between 2015 and 2018, I was leading a data engineering team for a financial services company. We were the first team in the company to use Azure, and we built a data science environment.

Leading the first cloud implementation project put us under the microscope. We spent months discussing and configuring security, networking and governance.

Who gets fired in case of a data breach?

“Valdas, who gets fired in case of a data breach?” – my lead engineer asked me out of the blue

“Has anything happen?!” - some words increase the cortisol (stress hormone) level and a heart rate, “data breach” is one of them

“No. I am curious. We build data pipelines. We configure network and firewall. There is no one else with Azure experience to review it”

“Well… There is a security department… But we are the ones building everything.” - I mumbled

It was obvious to both of us we will be the first to interrogate in case of a data leakage. Leading an autonomous data team with a mandate to choose technology was no longer fun. It got me thinking:

Are we protected from all the possible attacks?
How to ensure we do not make stupid mistakes?
What is the security department’s responsibility?
How security responsibilities divide between us and a cloud provider?
How to bridge the security knowledge gap in the team?

It’s good to learn from your mistakes. It’s better to learn from other peoples’s mistakes - Warren Buffett

In 2017, the Equifax, an American credit reporting agency, announced a data breach. They exposed the personal information of 147 million people.

What did the hackers find?

First and last names
Social Security numbers
Birth dates
Addresses
Driver’s license numbers

The hackers looked for exposed assets. A public facing web server without the latest patch was a perfect victim. The attackers accessed internal Equifax servers by using Apache Struts security exploit.

See, unpatched vulnerability is one of the methods attackers use to access internal networks. The security specialists call it an attack vector.

Table 1 Equifax attack surface matrix - step 1

Having access to internal network does not yet mean access to data. The next attack vector used against Equifax was compromising employee credentials. Finding a server with usernames and passwords was a breeze.

Table 2 Equifax attack surface matrix - step 2 & 3

Misfortunes come in pairs - an old Polish proverb

In fact, the attack was a combination of charges targeting specific devices and applications. The term for all the possible attack points is an attack surface. The matrix is one of the representations.

Access to internal network and weak credentials opened up the Equifax’s databases. Under the guise of an authorized user, the attackers proceeded following steps:

Performed 9000 scans of the databases
Extracted information into small temporary archives
Downloaded data from the Equifax servers
Removed the temporary archives once completed

Unpatched servers, weak passwords and loose network led to losing protected data. In other words, caused a data breach.

At Equifax, the data breach happened by exploiting 5 attack vectors.

“I hear you, man! I am going to focus on fixing these 5 loopholes and my servers are bulletproof!” - I hear someone shouting

Unfortunately, the list of all possible attack vectors is way longer. Hackers discover new issues. Also, each company has a unique technology landscape, different hardware and software combination. Like the combination of your wallpaper and desktop icons is unique to you.

The expanded table above includes more attack vectors. How does it compare to your IT landscape?

Saying “I am sorry” is not enough

Actually, there are fines ans settlements depending on the data breach impact, leaked contents.

Equifax has to pay up to $700 million in fines as part of a settlement with federal authorities over a data breach.

See, it as an expensive mistake.

To date, it is the biggest penalty under The Federal Trade Commission (USA).

In Europe, there is The General Data Protection Regulation (GDPR).

GDPR sets forth fines of up to 10 million euros, or, in the case of an undertaking, up to 2% of its entire global turnover

The biggest penalty under GDPR to date is a fine of 50 million euros imposed on Google. The company didn’t clarify data processing and usage for ad targeting.

British Airways’ website diverted users’ traffic to a hacker website. This resulted in hackers stealing the personal data of more than 500 000 customers. Result? There are ongoing trials and a possible fine of 200 million euros.

Marriot exposed 339 million guest records. Fine? 110 million euros.

Both, British Airways and Marriot, operate in the COVID-19 hardest-hit industries. Hence, the EU has delayed its final decision.

Should everyone in IT worry about security?

Photo by National Cancer Institute on Unsplash

Presumably, you work with a data warehouse or a data lake. Often, it runs on a servers in a strict security zone. In other words, you can’t simply open up Google search or Stack Overflow there. There is no internet access. Similarily, external users can’t access the server.

I have bad news for you too:

The Equifax breach shows a multistep approach. The attackers might get into the network through seemingly not related systems.
Data warehouse or a data lake stores the most important enterprise data. What is not obvious at the first glimpse, you connect to other systems to get that data. It seems like a central place to get all the credentials also.

You should be especially careful with systems storing customer sensitive data. Under the GDPR, sensitive data is:

data consisting of racial or ethnic origin
political opinions
religious or philosophical beliefs
trade union membership
genetic or biometric data
data concerning health
data concerning a natural person’s sex life or sexual orientation.

Cloud to the rescue!?

ea1054ba69e9d563967863d8b2a05654ab351ad7

One of the most popular cloud storages is Amazon Web Services (AWS) S3. It is a general purpose, storage to store data, files, movies. New stories about exposed AWS S3 buckets occur regularly.

Noam Rotem and Ran Locar created one of the latest leakage report, with S3 as the main hero.

They identified a database containing highly sensitive files from several British consulting firms.

What did the white hat hackers find?

Full names
Addresses
Phone numbers
Email addresses
Dates of birth
National Insurance numbers
Immigration and Visa statuses
Nationalities
Salary details
And more

It is just the tip of the iceberg.

In this case, the files were being stored on an AWS S3 storage. It is important to note that open, publicly viewable S3 buckets are not a flaw of AWS. They are usually the result of an error by the owner of the bucket.

2c7bca80361600e534af34d83a2202f031948d23

Remember:

Amazon provides detailed instructions to help users secure S3 buckets.
A customer applies the instructions to keep the data secure.

How does security responsibilities divide between you and a cloud provider?

Azure, AWS or GCP have something they call “the shared responsibility model”. I am going to use Microsoft approach to explain it.

As you move to Azure, some responsibilities transfer to Microsoft. The areas of responsibility between you and Microsoft depend on the deployment type.

2332c41d811fe7157e0187fda18456df0b154ac7

Regardless of the deployment type, the following responsibilities are always retained by you:

Data
Your devices
Access and account management

Help needed! Where are the security experts?

By now, you know more about possible cyber-attacks. Also, the cloud providers do not protect you from everything. Who else can help you to avoid data breaches? The security department?

Unless you build security solutions, the security teams do not participate in development. Instead, they focus on:

Keeping production systems secure
Education and guidance
Solution reviews
Last moment saves by live systems monitoring

“Information Security is always coming up with a million reasons why anything we do will create a security hole that alien space-hackers will exploit to pillage our entire organization and steal all our code, intellectual property, credit card numbers, and pictures of our loved ones.” - The Phoenix Project by Gene Kim

Development (builders) wants to deploy solutions into production. Security and operations see new releases and updates as potential enemies. They are gate keepers.

e96573138c84c153cf72052026243ce7bac1d4e9

One of my favorite IT books is The Phoenix Project by Gene Kim. It tells a story of a fictional company and their struggles with an important IT project.

The lessons I learned from “The Phoenix Project”:

Teams own the product they develop
Integrate security into your daily work
Strive for trust between development, operations and security

Solution? Development teams need a facilitator role between development, operations and security. Someone who understands the new system, potential threats, infrastructure & networking requirements.

You need a DevOps engineer in your team!

What’s next? My three recommendations

The cloud computing is the future. Cloud services slash the development time, enable novel possibilities. And at the same time, expose to new risks.

The cloud providers integrate advanced security mechanisms to keep you safe. Some of it works by default, some needs extra effort. In fact, enabling data encryption, patching your servers or preventing DDoS has never been easier.

Don’t be lazy, and take care of your IT systems security

First, understand security threats and be able to mitigate them. Do not rely blindly on a cloud provider or the security department.

Every team should have at least one person understanding firewall, encryption, networking, etc.

df46282cceb48f7a9ddf779890e68b511f8a1cf3

Secondly, Minimum Viable Products (MVPs) are not the best designed pieces of software. MVPs are tiny and small in functionality, but often run in production environments.

d3d50d53e3fcaca30250f2c7215b94b03b39fa31

In another blog post, I shared a standard process to run a Big Data prototype.

Remember, running an MVP is not an excuse to overlook your security best practices!

30c77250f2afbd85304da2b23f4caaec2990c560

Third, understand potential threats and make sure you configure:

Firewall
Encryption at rest
Encryption in transit
Authorization
Authentication
Password and key management
Patching and updates
Azure configuration
Networking
Cross-site scripting
Deployment
Hundreds of other nitty-gritty details

Hopefully you don’t forget about something. That would be expensive… (see the Equifax story above).

To ensure I don’t forget about tiny configuration details, I always follow my security checklist:

Finally, who gets fired after a breach?

One question raised at the beginning of this post still stays unanswered - who gets fired after a breach?

In 2017, McAfee, an American global computer security software company, did a survey among IT security leaders. They asked the same question:

d51f35faee77f9c56d0033802843af8a549024d0

What is obvious, whenever “sh*t hits the fan”, it affects not only business and technology leaders. Surprise, surprise! Engineers are responsible for their implementations too.