Data Platform School
HomeLandscapeAbout me

Data Platform Mastermind

By Valdas Maksimavicius
Published in Data Governance
May 17, 2021
2 min read
Data Platform Mastermind

I invite all of you to peer-to-peer mastermind sessions about Azure Data Platform. Instead of only me sharing various insights about data platforms, we can learn from each other.

Sign up to get details

A few rules:

  1. My request is to keep the discussions vendor neutral
  2. Focus on pragmatic examples - we are engineers, data scientists, architects, managers.
  3. No slides. I expect a free-flowing discussion. Though it’s totally OK if you want just to listen in.
  4. No recording. I value your privacy and it’s all about learning from each other.

Data Platform Mastermind #3 - Lakehouse on Azure Data Platform

When: Thursday, June 10 (14:00 UTC) Duration: 60 mins

The Lakehouse term, initially coined by Databricks, seems to grow in popularity. I’ve coming back to this Lakehouse paper on CIDR over and over again.

The first key idea we propose for implementing a Lakehouse is to have the system store data in a low-cost object store (e.g., Amazon S3) using a standard file format such as Apache Parquet, but implement a transactional metadata layer on top of the object store that defines which objects are part of a table version.

This allows the system to implement management features such as ACID transactions or versioning within the metadata layer, while keeping the bulk of the data in the low-cost object store and allowing clients to directly read objects from this store using a standard file format

Let’s meet and talk: how do you approach Lakehouse implementation on Azure Data Platform? Let’s share experiences and your findings!

Sign up to get details

Past meetups

Data Platform Mastermind #2 - Data Ops

Here are a few points I noted down for myself during the meeting:

Recommended read and further investigation:

Data Platform Mastermind #1 - Data Governance - how to get started?

Data governance is a very broad subject, many layers of abstractions, interpretations, countless rabbit holes…

But on the other hand, the number of messages and requests to start with such topic only indicates the need of looking the devil in the eye ;)

Here are a few points I noted down for myself during the meeting:

  • Data Governance is a very loaded term with a lot of baggage. Metadata management is a better concept to start with.
  • Data Governance is like agile methodology in its early days. To achieve success, there has to be buy-in at the top of the organization, but also data practitioners’ commitment.
  • Until people don’t see benefits of governance, it will be perceived as a burden that no one wants to do. Instead, focus on showcasing value proposition that you get with proper governance in place.
  • Documenting measures & data is as important as the data itself.
  • Even the best data catalog product is useless if users don’t contribute. Get your community excited about it. Make the processes of documenting data “sexy”. Enable crowdsourcing, award users for contributions with gamification.
  • It’s better to build momentum, spark interest in data catalog and value of metadata before buying an expensive COTS offering. Wikipedia page with described data definitions and terms is a great place to start.
  • Data engineers can’t be left alone to take care of all governance nuances. They need support and close collaboration from business, data stewards, data consumers.

Further reading:

  • “Data Management at Scale” by Piethein Strengholt
  • “Non-Invasive Data Governance” by Robert S. Seiner
  • “Data Management Body of Knowledge” DAMA-DMBOK

Tags

#mastermind

The latest set of Azure Data Platform best practices - April 2021

A simple blog post evolved to 25+ page guide with 75+ different recommendations.
Download Document
Previous Article
Launching Databricks at If Insurance | Medium
Valdas Maksimavicius

Valdas Maksimavicius

IT Architect | Microsoft Data Platform MVP

Resources

ADVERTISE WITH US

Topics

Data Architecture
Data Engineering
Data Governance
Miscellaneous

Related Posts

Apache Ranger Evaluation for Cloud Migration and Adoption Readiness
May 24, 2021
15 min
© 2021, All Rights Reserved.

Quick Links

About mePrivacyContactLandscape

Social Media