Intro
Although the data product term is not new, there is a great deal of misconception around what it is, and why it is. Many views the data product mealy as table in a Data Warehouse, or a set of files on a storage somewhere, but in reality the Data Product is so much more, and inherent in it is the key to a successful Data Platform implementation
A Data Product Is
The Data Product is the combination of two very important components:
- The data as files in the “Gold” zone of the data platform. typical its Delta Parque either in a storage container or in a more dedikated LakeHouse
- A contract between the Data Platform team and the owner of the Data Product. The contract describes exactly what the Data product is in both technical and non-technical terms.
A Data Product is NOT:
Data in any shape and form outside of the “Gold” zone, Data that does not have a business owner that is requesting the data, Data living inside the Data Platform in other zones for staging or similar purposes, Data that the Data Platform team imported or created because someone might need them at at point of time etc…
The importnace of a prober Data Product
So isn´t this just semantics and why is this actually important? When establishing a Data Platform, there are many competing agendas in play, The team needs to empower the business to be datadriven, the team needs to ensure a propper data governance and many others. In fact this list tends to spiral out of control and this will lead to overload and scope creep for the Data Platform
See this post for why Data Platforms fails:
With out a very clear and well defined concept of a Data Product, the Data Platform team will not be able to deliver on the real value of the Data Platform; business value, and will often revert back to building technical artifacts that they believe will help them to become more efficient, but in reality more often than not, will be ditched when the management realizes that this version of the platform also failed.
But when having a clear and well defined notion of a data product, The Data Platform team will know exactly what to deliver and to whom. They will know what the requirements are for this specific data, and they will be able to know when they have delivered the data according to specs.
The Contract
A contract in the context of data platforms is a comprehensive document that outlines all the relevant details pertaining to the data product that the data platform team is slated to build. This includes essential technical aspects such as file structure, data model, metadata fields, classification, and more.
The file structure and data model are pivotal elements that determine the structure and organization of the data, thereby impacting the ease of data retrieval and analysis. Metadata fields provide context to the data, making it more understandable and usable. Classification, on the other hand, aids in organizing data into various categories for improved accessibility and data governance.
The contract goes beyond these functional requirements and encapsulates non-functional requirements as well, such as update frequency, business owner, and more. Update frequency is crucial as it defines how often the data platform should be updated to ensure the availability of the most recent and relevant data. The business owner is a key stakeholder who drives the data product’s direction and ensures its alignment with business objectives.
The contract will also cary the information of classification of the data product.
Moreover, the contract establishes accountability and sets clear expectations. It provides a blueprint of the final product, minimizing the possibility of misunderstanding or miscommunication. It delineates the responsibilities and roles of each party involved, ensuring that the data platform team knows exactly what is expected of them.
In addition, the contract also serves as a reference document that can be consulted during the development process. It paves the way for regular and effective communication between the business and the data platform team, fostering a collaborative environment.
The Sanctity of “The Contract” and its Enforced Responsibility Boundaries
Adherence to the contract in the context of a data platform is not merely a guideline; it should be treated with near-religious fervor. The contract serves as a beacon, guiding the data platform team towards delivering real value to the business.
The contract is a meticulously crafted document that encapsulates the business’s data needs. Therefore, it serves as the basis for what the data platform team should deliver. Straying from the contract or adding data not explicitly requested in the contract can lead to a divergence from the business’s actual needs. This could result in wasted resources, misaligned goals, and ultimately, a data platform that does not drive value for the business.
By strictly adhering to the contract, the data platform team ensures they are focused on delivering the exact requirements, thereby maximizing the value of their efforts. It helps the team to remain aligned with the business’s goals and objectives. This unwavering focus on the contract also helps to avoid scope creep, which often leads to unnecessary complications, delays, and cost overruns.
The contract also enforces clear responsibility boundaries. It delineates the roles and tasks of each party involved, preventing confusion and overlaps in responsibilities. By clearly defining who is responsible for what, the contract creates an environment of accountability. It ensures that each party is aware of their responsibilities and can be held accountable for their part of the project.
This heightened sense of responsibility and accountability further enhances the value delivered by the data platform team. It ensures that the team remains dedicated to fulfilling their tasks as per the contract, thereby ensuring that the end product is exactly what the business needs.
In essence, the contract is not just a document; it is a commitment, a promise that the data platform team makes to the business. By revering the contract and strictly adhering to it, the data platform team can ensure they are delivering valuable, needed outcomes. Therefore, the data platform team should not add any data to the platform unless it is explicitly requested in a contract. This disciplined approach will help in building a data platform that truly serves the needs of the business and drives real value.
Delivering Data Products as Files
When it comes to delivering data products, the most effective and efficient method is to deliver them as files in a data lake or a lakehouse, as opposed to tables in a data warehouse or as a part of any other compute platform. This approach has a multitude of benefits that enhance the clarity, efficiency, and effectiveness of the data product delivery.
Files are the lowest common denominator in data structures. They are simple, well-defined, and easy to describe in a contract. This makes files a much more straightforward and effective medium for defining data products. When data products are defined at the file level, they become more tangible and easier to comprehend, thereby reducing chances of misinterpretation or misunderstanding.
Another advantage of delivering data products as files in a data lake or lakehouse is that it provides a clear and well-defined product. This approach ensures that the data product is delivered in its purest form, without any influence or modifications from the compute platform. This results in a more accurate and reliable data product that truly represents the business’s needs as defined in the contract.
Furthermore, requirements for compute platforms often change frequently. If the compute platform forms a part of the data product, these changes can make the data product delivery unclear and fragile. On the other hand, when data products are delivered as files in a data lake or lakehouse, they remain unaffected by changes in the compute platform. This ensures a stable, reliable, and consistent data product delivery.
In addition, by separating the data product from the compute platform, it allows for greater flexibility. The data can be accessed, processed, and analyzed using different compute platforms as per the business’s changing needs. This ensures that the data product remains relevant and valuable, even as the business’s requirements evolve.
Balancing Robustness and Enforcement with Ad hoc Data Exploration
In the realm of data platforms, finding the equilibrium between maintaining rigorous data enforcement and allowing for ad hoc data exploration is pivotal. This balance ensures the sustainability of the data platform and its capacity to continually deliver value to the business.
The contract and data product need to be robust, with strict enforcement ensuring that the data meets the business’s needs and adheres to established guidelines. This rigidity serves as the backbone of the data platform, providing consistency, reliability, and trustworthiness. This is the organization-sanctioned data that forms the foundation for all data-driven decisions and strategies.
However, it’s equally crucial to maintain a certain level of flexibility to facilitate ad hoc data exploration. This exploration enables teams to experiment with data, derive new insights, and build prototypes that could potentially deliver significant future value.
Building prototypes on top of the existing data products allows for innovation and discovery. These prototypes can open up new avenues for the business, revealing previously unseen patterns, trends, or insights. However, it’s essential to remember that these prototypes are experimental and, as such, do not yet have the same level of reliability or governance as the original data products.
As these prototypes mature and their business importance becomes evident, they should not remain in the realm of ad hoc exploration. Instead, they need to be formalized as new data products, complete with their own contracts and business owners. This is a crucial step in ensuring that the prototype is integrated into a proper support and governance structure.
Formalizing the prototype as a new data product ensures that it is subject to the same rigorous enforcement and governance as the original data products. It guarantees that the prototype is reliable, consistent, and can deliver sustained value to the business.
In conclusion, while maintaining robustness and strict data enforcement is vital, fostering an environment that encourages ad hoc data exploration is equally important. This balance allows businesses to benefit from the reliability of established data products while simultaneously encouraging innovation and discovery. However, once a prototype proves its value, it is essential to incorporate it into the formal data structure, thereby ensuring its longevity and continued value delivery.
CONCLUSION
Embracing the Data Product Paradigm The concept of a Data Product is pivotal for the success of any Data Platform. It transcends mere data storage, embodying a comprehensive contract that defines technical and business aspects. This contract ensures clarity, accountability, and alignment with business objectives. By delivering data as files within the “Gold” zone and adhering strictly to the contract, Data Platform teams can avoid scope creep and deliver tangible business value. Balancing robust enforcement with the flexibility for ad hoc data exploration allows for innovation while maintaining data integrity. Ultimately, a disciplined approach to Data Products is the cornerstone of a Data Platform that not only meets but drives business needs forward