This post is a part of Data Governance From an Engineering Perspective, a series of posts about Data Governance and Metadata.
We are here:
Between 2015 and 2018, I was leading a data engineering team for a financial services company. We were the first team in the company to use Azure, and we built a data science environment.
Leading the first cloud implementation project put us under the microscope. We spent months discussing and configuring security, networking and governance.
“Valdas, who gets fired in case of a data breach?” – my lead engineer asked me out of the blue
“Has anything happen?!” - some words increase the cortisol (stress hormone) level and a heart rate, “data breach” is one of them
“No. I am curious. We build data pipelines. We configure network and firewall. There is no one else with Azure experience to review it”
“Well… There is a security department… But we are the ones building everything.” - I mumbled
It was obvious to both of us we will be the first to interrogate in case of a data leakage. Leading an autonomous data team with a mandate to choose technology was no longer fun. It got me thinking:
In 2017, the Equifax, an American credit reporting agency, announced a data breach. They exposed the personal information of 147 million people.
What did the hackers find?
The hackers looked for exposed assets. A public facing web server without the latest patch was a perfect victim. The attackers accessed internal Equifax servers by using Apache Struts security exploit.
See, unpatched vulnerability is one of the methods attackers use to access internal networks. The security specialists call it an attack vector.
Table 1 Equifax attack surface matrix - step 1
Having access to internal network does not yet mean access to data. The next attack vector used against Equifax was compromising employee credentials. Finding a server with usernames and passwords was a breeze.
Table 2 Equifax attack surface matrix - step 2 & 3
In fact, the attack was a combination of charges targeting specific devices and applications. The term for all the possible attack points is an attack surface. The matrix is one of the representations.
Access to internal network and weak credentials opened up the Equifax’s databases. Under the guise of an authorized user, the attackers proceeded following steps:
Unpatched servers, weak passwords and loose network led to losing protected data. In other words, caused a data breach.
At Equifax, the data breach happened by exploiting 5 attack vectors.
“I hear you, man! I am going to focus on fixing these 5 loopholes and my servers are bulletproof!” - I hear someone shouting
Unfortunately, the list of all possible attack vectors is way longer. Hackers discover new issues. Also, each company has a unique technology landscape, different hardware and software combination. Like the combination of your wallpaper and desktop icons is unique to you.
Table 4 Attack surface matrix example
The expanded table above includes more attack vectors. How does it compare to your IT landscape?
Actually, there are fines ans settlements depending on the data breach impact, leaked contents.
Equifax has to pay up to $700 million in fines as part of a settlement with federal authorities over a data breach.
See, it as an expensive mistake.
To date, it is the biggest penalty under The Federal Trade Commission (USA).
In Europe, there is The General Data Protection Regulation (GDPR).
GDPR sets forth fines of up to 10 million euros, or, in the case of an undertaking, up to 2% of its entire global turnover
The biggest penalty under GDPR to date is a fine of 50 million euros imposed on Google. The company didn’t clarify data processing and usage for ad targeting.
British Airways’ website diverted users’ traffic to a hacker website. This resulted in hackers stealing the personal data of more than 500 000 customers. Result? There are ongoing trials and a possible fine of 200 million euros.
Marriot exposed 339 million guest records. Fine? 110 million euros.
Both, British Airways and Marriot, operate in the COVID-19 hardest-hit industries. Hence, the EU has delayed its final decision.
Photo by National Cancer Institute on Unsplash
Presumably, you work with a data warehouse or a data lake. Often, it runs on a servers in a strict security zone. In other words, you can’t simply open up Google search or Stack Overflow there. There is no internet access. Similarily, external users can’t access the server.
I have bad news for you too:
You should be especially careful with systems storing customer sensitive data. Under the GDPR, sensitive data is:
One of the most popular cloud storages is Amazon Web Services (AWS) S3. It is a general purpose, storage to store data, files, movies. New stories about exposed AWS S3 buckets occur regularly.
Noam Rotem and Ran Locar created one of the latest leakage report, with S3 as the main hero.
They identified a database containing highly sensitive files from several British consulting firms.
What did the white hat hackers find?
It is just the tip of the iceberg.
In this case, the files were being stored on an AWS S3 storage. It is important to note that open, publicly viewable S3 buckets are not a flaw of AWS. They are usually the result of an error by the owner of the bucket.
Azure, AWS or GCP have something they call “the shared responsibility model”. I am going to use Microsoft approach to explain it.
As you move to Azure, some responsibilities transfer to Microsoft. The areas of responsibility between you and Microsoft depend on the deployment type.
Regardless of the deployment type, the following responsibilities are always retained by you:
Photo by John Amachaab on Unsplash
By now, you know more about possible cyber-attacks. Also, the cloud providers do not protect you from everything. Who else can help you to avoid data breaches? The security department?
Unless you build security solutions, the security teams do not participate in development. Instead, they focus on:
“Information Security is always coming up with a million reasons why anything we do will create a security hole that alien space-hackers will exploit to pillage our entire organization and steal all our code, intellectual property, credit card numbers, and pictures of our loved ones.” - The Phoenix Project by Gene Kim
Development (builders) wants to deploy solutions into production. Security and operations see new releases and updates as potential enemies. They are gate keepers.
One of my favorite IT books is The Phoenix Project by Gene Kim. It tells a story of a fictional company and their struggles with an important IT project.
The lessons I learned from “The Phoenix Project”:
Solution? Development teams need a facilitator role between development, operations and security. Someone who understands the new system, potential threats, infrastructure & networking requirements.
You need a DevOps engineer in your team!
The cloud computing is the future. Cloud services slash the development time, enable novel possibilities. And at the same time, expose to new risks.
The cloud providers integrate advanced security mechanisms to keep you safe. Some of it works by default, some needs extra effort. In fact, enabling data encryption, patching your servers or preventing DDoS has never been easier.
Don’t be lazy, and take care of your IT systems security
First, understand security threats and be able to mitigate them. Do not rely blindly on a cloud provider or the security department.
Every team should have at least one person understanding firewall, encryption, networking, etc.
Secondly, Minimum Viable Products (MVPs) are not the best designed pieces of software. MVPs are tiny and small in functionality, but often run in production environments.
In another blog post, I shared a standard process to run a Big Data prototype.
Remember, running an MVP is not an excuse to overlook your security best practices!
Third, understand potential threats and make sure you configure:
Hopefully you don’t forget about something. That would be expensive… (see the Equifax story above).
To ensure I don’t forget about tiny configuration details, I always follow my security checklist:
One question raised at the beginning of this post still stays unanswered - who gets fired after a breach?
In 2017, McAfee, an American global computer security software company, did a survey among IT security leaders. They asked the same question:
What is obvious, whenever “sh*t hits the fan”, it affects not only business and technology leaders. Surprise, surprise! Engineers are responsible for their implementations too.
Read next: To be released
Hi! I am Valdas Maksimavičius. I specialize in data analytics and cloud computing with ten years of experience. I have been using Azure Cloud components since 2014.
For the last five years, I have been leading Data Engineering teams using the latest Azure Data and AI services. I worked on Data Lake and Data Science platform implementations for various sectors in the Nordics. Check out my personal blog.
I plan to release other posts in the future. If you like the topics, sign up to get notified about new posts.
Any feedback, opinions and suggestions are highly welcome!
Thank you for subscribing!
Have a great day!