Data Governance From an Engineering Perspective


Data Governance From an Engineering Perspective

This article is the first in the wider series about Data Governance and Metadata. In them, I write about what I’ve learned on data platforms, how I think of it, and how I use that knowledge. I plan to release other posts in the future.

A short glimpse at the series:

  1. Introduction
    1. Data Governance From an Engineering Perspective (this post)
    2. The Alter Ego of Data
    3. Tools in the Data Management Zoo
    4. With Data Comes Responsibility
  2. Physical system
  3. Data models
  4. Business processes & Compliance

What makes the series unique and worth reading?

I set for myself three objectives:

1. No marketing fluff

The majority of freely available content about the data governance is vendor specific. Understandably, Informatica, Collibra, Alation, and other vendors, seek to create more demand and praise their features.

As a result, IMHO, many metadata management and data governance aspects are exaggerated and made more complex that it should be.

2. Focus on practical aspects

Secondly, research and advisory companies, like Gartner or McKinsey, publish many data governance articles too.

The issue there - too high level and disconnect from the technology capabilities.

In these series, I want to focus primarily on explaining the concepts by using a specific library or a tool.

3. Don't be boring

Topics like privacy, governance, or security are very formal and get boring fast. I hope you don't mind a funny picture or meme.  

How to approach a wide topic like data governance?

I use the below Venn diagram as a starting point. I used Willem Koenders' LinkedIn post as an inspiration.

The physical systems cover data engineering and technical aspects. 

The data models describe conceptual and data modelling techniques to bring the most value to the business. 

Last but not least, the business processes & compliance focuses on an organization, its processes, privacy and industry regulations.


Table of contents:

I am planning to release series of articles about the topics. The above mind maps will evolve, new ideas will appear.

Nonetheless, here is a high level agenda.

  1. Introduction
    1. Data Governance From an Engineering Perspective (this post)
    2. The Alter Ego of Data
    3. Tools in the Data Management Zoo
    4. With Data Comes Responsibility

  2. Physical system
    1. Databases and storages
    2. Integration & synchronization
    3. Security
    4. Infrastructure
    5. Automation & orchestration
    6. Consumption
    7. DevOps

  3. Data models
    1. Data modelling techniques
    2. Data catalog
    3. Business glossary
    4. Data exploration
    5. Master data management
    6. Data quality
    7. Visualizations
    8. ML models
    9. Compliance
    10. Roles and responsibilities
    11. Data quality

  4. Business processes & Compliance
    1. Business processes
    2. Industry specific regulations
    3. Budget, ROI
    4. Privacy regulations
    5. Business and technology architecture
    6. Roadmap
    7. Data governance models
    8. Organization structure

Read next: The Alter Ego of Data


About author

Hi! I am Valdas Maksimavičius. I specialize in data analytics and cloud computing with ten years of experience. I have been using Azure Cloud components since 2014.

For the last five years, I have been leading Data Engineering teams using the latest Azure Data and AI services. I worked on Data Lake and Data Science platform implementations for various sectors in the Nordics. Check out my personal blog.


I plan to release other posts in the future. If you like the topics, sign up to get notified about new posts.

Any feedback, opinions and suggestions are highly welcome!

.