Here Is a Way to Prevent a Data Breach

data-platform software

Between 2015 and 2018, I was leading a data engineering team for a financial services company. We were the first team in the company to use Azure, and we built a data science environment.

Leading the first cloud implementation project put us under the microscope. We spent months discussing and configuring security, networking and governance.

Who gets fired in case of a data breach?

“Valdas, who gets fired in case of a data breach?” – my lead engineer asked me out of the blue

“Has anything happen?!” - some words increase the cortisol (stress hormone) level and a heart rate, “data breach” is one of them

“No. I am curious. We build data pipelines. We configure network and firewall. There is no one else with Azure experience to review it”

“Well… There is a security department… But we are the ones building everything.” - I mumbled

It was obvious to both of us we will be the first to interrogate in case of a data leakage. Leading an autonomous data team with a mandate to choose technology was no longer fun. It got me thinking:

  • Are we protected from all the possible attacks?
  • How to ensure we do not make stupid mistakes?
  • What is the security department’s responsibility?
  • How security responsibilities divide between us and a cloud provider?
  • How to bridge the security knowledge gap in the team?

Photo by Adi Goldstein on Unsplash

It’s good to learn from your mistakes. It’s better to learn from other peoples’s mistakes - Warren Buffett

In 2017, the Equifax, an American credit reporting agency, announced a data breach. They exposed the personal information of 147 million people.

What did the hackers find?

  • First and last names
  • Social Security numbers
  • Birth dates
  • Addresses
  • Driver’s license numbers

The hackers looked for exposed assets. A public facing web server without the latest patch was a perfect victim. The attackers accessed internal Equifax servers by using Apache Struts security exploit.

See, unpatched vulnerability is one of the methods attackers use to access internal networks. The security specialists call it an attack vector.

Equifax attack Table 1 Equifax attack surface matrix - step 1

Having access to internal network does not yet mean access to data. The next attack vector used against Equifax was compromising employee credentials. Finding a server with usernames and passwords was a breeze.

Equifax attack Table 2 Equifax attack surface matrix - step 2 & 3

Misfortunes come in pairs - an old Polish proverb

In fact, the attack was a combination of charges targeting specific devices and applications. The term for all the possible attack points is an attack surface. The matrix is one of the representations.

Access to internal network and weak credentials opened up the Equifax’s databases. Under the guise of an authorized user, the attackers proceeded following steps:

  • Performed 9000 scans of the databases
  • Extracted information into small temporary archives
  • Downloaded data from the Equifax servers
  • Removed the temporary archives once completed

Equifax data breach Table 3 Equifax attack surface matrix - step 4 & 5

Unpatched servers, weak passwords and loose network led to losing protected data. In other words, caused a data breach.

At Equifax, the data breach happened by exploiting 5 attack vectors.

“I hear you, man! I am going to focus on fixing these 5 loopholes and my servers are bulletproof!” - I hear someone shouting

Unfortunately, the list of all possible attack vectors is way longer. Hackers discover new issues. Also, each company has a unique technology landscape, different hardware and software combination. Like the combination of your wallpaper and desktop icons is unique to you.

Attack surface Table 4 Attack surface matrix example

The expanded table above includes more attack vectors. How does it compare to your IT landscape?

Saying “I am sorry” is not enough

Actually, there are fines ans settlements depending on the data breach impact, leaked contents.

Equifax has to pay up to $700 million in fines as part of a settlement with federal authorities over a data breach.

See, it as an expensive mistake.

To date, it is the biggest penalty under The Federal Trade Commission (USA).

In Europe, there is The General Data Protection Regulation (GDPR).

GDPR sets forth fines of up to 10 million euros, or, in the case of an undertaking, up to 2% of its entire global turnover

The biggest penalty under GDPR to date is a fine of 50 million euros imposed on Google. The company didn’t clarify data processing and usage for ad targeting.

British Airways’ website diverted users’ traffic to a hacker website. This resulted in hackers stealing the personal data of more than 500 000 customers. Result? There are ongoing trials and a possible fine of 200 million euros.

Marriot exposed 339 million guest records. Fine? 110 million euros.

Both, British Airways and Marriot, operate in the COVID-19 hardest-hit industries. Hence, the EU has delayed its final decision.

Should everyone in IT worry about security?


Is data warehouse secure? Photo by National Cancer Institute on Unsplash

Presumably, you work with a data warehouse or a data lake. Often, it runs on a servers in a strict security zone. In other words, you can’t simply open up Google search or Stack Overflow there. There is no internet access. Similarily, external users can’t access the server.

I have bad news for you too:

  1. The Equifax breach shows a multistep approach. The attackers might get into the network through seemingly not related systems.
  2. Data warehouse or a data lake stores the most important enterprise data. What is not obvious at the first glimpse, you connect to other systems to get that data. It seems like a central place to get all the credentials also.

You should be especially careful with systems storing customer sensitive data. Under the GDPR, sensitive data is:

  • data consisting of racial or ethnic origin
  • political opinions
  • religious or philosophical beliefs
  • trade union membership
  • genetic or biometric data
  • data concerning health
  • data concerning a natural person’s sex life or sexual orientation.

Cloud to the rescue!?

Cloud is the best Photo by nappy from Pexels

One of the most popular cloud storages is Amazon Web Services (AWS) S3. It is a general purpose, storage to store data, files, movies. New stories about exposed AWS S3 buckets occur regularly.

Noam Rotem and Ran Locar created one of the latest leakage report, with S3 as the main hero.

They identified a database containing highly sensitive files from several British consulting firms.

What did the white hat hackers find?

  • Full names
  • Addresses
  • Phone numbers
  • Email addresses
  • Dates of birth
  • National Insurance numbers
  • Immigration and Visa statuses
  • Nationalities
  • Salary details
  • And more

It is just the tip of the iceberg.

In this case, the files were being stored on an AWS S3 storage. It is important to note that open, publicly viewable S3 buckets are not a flaw of AWS. They are usually the result of an error by the owner of the bucket.

AWS S3 security


  • Amazon provides detailed instructions to help users secure S3 buckets.
  • A customer applies the instructions to keep the data secure.

How does security responsibilities divide between you and a cloud provider?

Azure, AWS or GCP have something they call “the shared responsibility model”. I am going to use Microsoft approach to explain it.

As you move to Azure, some responsibilities transfer to Microsoft. The areas of responsibility between you and Microsoft depend on the deployment type.

Azure Shared responsibility model

Regardless of the deployment type, the following responsibilities are always retained by you:

  • Data
  • Your devices
  • Access and account management

Help needed! Where are the security experts?

Security is not my responsibility Photo by John Amachaab on Unsplash

By now, you know more about possible cyber-attacks. Also, the cloud providers do not protect you from everything. Who else can help you to avoid data breaches? The security department?

Unless you build security solutions, the security teams do not participate in development. Instead, they focus on:

  1. Keeping production systems secure
  2. Education and guidance
  3. Solution reviews
  4. Last moment saves by live systems monitoring

“Information Security is always coming up with a million reasons why anything we do will create a security hole that alien space-hackers will exploit to pillage our entire organization and steal all our code, intellectual property, credit card numbers, and pictures of our loved ones.” - The Phoenix Project by Gene Kim

Development (builders) wants to deploy solutions into production. Security and operations see new releases and updates as potential enemies. They are gate keepers.

Engineers vs. security

One of my favorite IT books is The Phoenix Project by Gene Kim. It tells a story of a fictional company and their struggles with an important IT project.

The lessons I learned from “The Phoenix Project”:

  • Teams own the product they develop
  • Integrate security into your daily work
  • Strive for trust between development, operations and security

Solution? Development teams need a facilitator role between development, operations and security. Someone who understands the new system, potential threats, infrastructure & networking requirements.

You need a DevOps engineer in your team!

What’s next? My three recommendations

The cloud computing is the future. Cloud services slash the development time, enable novel possibilities. And at the same time, expose to new risks.

The cloud providers integrte advanced security mechanisms to keep you safe. Some of it works by default, some needs extra effort. In fact, enabling data encryption, patching your servers or preventing DDoS has never been easier.

Don’t be lazy, and take care of your IT systems security

First, understand security threats and be able to mitigate them. Do not rely blindly on a cloud provider or the security department.

Every team should have at least one person understanding firewall, encryption, networking, etc.

Ask not what others can do to keep you secure, ask what you can do to prevent data breaches Ask not what others can do to keep you secure, ask what you can do to prevent data breaches

Secondly, Minimum Viable Products (MVPs) are not the best designed pieces of software. MVPs are tiny and small in functionality, but often run in production environments.

Temporary my ass

In another blog post, I shared a standard process to run a Big Data prototype.

Remember, running an MVP is not an excuse to overlook your security best practices!

Keep your MVPs on a short leash Keep your Minimum Viable Products (MVPs) on a short leash Photo by Robert Eklund on Unsplash

Third, understand potential threats and make sure you configure:

  • Firewall
  • Encryption at rest
  • Encryption in transit
  • Authorization
  • Authentication
  • Password and key management
  • Patching and updates
  • Azure configuration
  • Networking
  • Cross-site scripting
  • Deployment
  • Hundreds of other nitty-gritty details

Hopefully you don’t forget about something. That would be expensive… (see the Equifax story above).

To ensure I don’t forget about tiny configuration details, I always follow my security checklist:

Get your free copy of Data Professional Security Checklist Get your free copy of Data Professional Security Checklist

Finally, who gets fired after a breach?

One question raised at the beginning of this post still stays unanswered - who gets fired after a breach?

In 2017, McAfee, an American global computer security software company, did a survey among IT security leaders. They asked the same question:

Who gets fired after a data breach IT security leaders survey by McAfee; 2017

What is obvious, whenever “sh*t hits the fan”, it affects not only business and technology leaders. Surprise, surprise! Engineers are responsible for their implementations too.

Download my security checklist and don’t risk your reputation with crappy implementation.

You vs. hacker

I'm Valdas Maksimavicius. I write about data, cloud technologies and personal development. You can find more about me here.