Data products have exploded in popularity over the last few years. As an industry, we are where the automobile industry was around the turn of the 20th century. We are slowly transitioning from building hand-crafted, exclusive products for Big Tech customers to widespread commoditization. Soon, efficiency, maintenance, standards, and assembly lines are going to be instrumental, just as they were when the Ford Model T revolutionized automobiles.
After delivering 100+ products across multiple domains for a wide variety of advanced analytics use cases, we’ve gained a lot of understanding about the nature of data products and the unique challenges that come with building and supporting them. We’ve been in the trenches with mid-market, fast-moving companies – conceptualizing, building, iterating, delivering, and supporting ML data products across their entire multi-year lifecycle.
In this piece, we want to share our insights into why a lifecycle-based approach to the development and delivery of data products is imperative if we are to push through the challenges that are currently holding us back from building the dynamic products of the future.
What are Data Products?
Before we talk about building better data products, let us take a moment to define exactly what “data products” mean in this context.
Data products come in all shapes and sizes – BI dashboards, recommendation engines, forecasting tools, and many more. But in our view, all data products should have a few fundamental characteristics:
- They have a defined scope that is easily understandable. They have specific structures around pricing and deliverables that outline what you’re getting, and how much you will need to pay for it.
- They come with certain guarantees. For example, when you plug a motor into the socket, it just works. You’re also made aware that the motor will run well for a certain period, after which it will need to be serviced. You also have guarantees around safety and security when you’re using a well-made product.
- They have an aesthetic, user-friendly interface that doesn’t make them intimidating to use.
- They’re malleable and can be packaged/ layered in multiple ways based on economic factors and organizational objectives.
Why We Need To Adopt A Lifecycle-Based Approach
The exponential scaling of the number and complexity of use cases we are observing across every business domain reinforces the need for a different type of data product. As a community, we need to create consensus around the boundaries, flows, design patterns, and standards that we employ while building these products.
With a developmental paradigm shift that emphasizes lifecycle management, we can have a broader, more holistic viewpoint that allows us to account for the massive amount of uncertainty and the continually iterative nature of building data products. Let’s look at some key benefits of adopting such an approach.
- Economics: As things stand currently, most data products ignore lifecycle costs leading to a complete mismatch in the expected TCO (Total Cost of Ownership). Within this new approach, budgeting can account for iterative development cycles leading to a more realistic view of the cost and potential ROI of a data product. Developers should spend a lot more time building clarity about the exact experience they want the end user to have to avoid costly delays during development.
- Trust: Data products are meant to drive decision-making and business outcomes. As a result, being able to trust the solutions offered by the product is absolutely imperative. Within a lifecycle-based approach, credibility and trust are built into the DNA of the product throughout its development. At every step of the way, credibility is prioritized over functionality, ensuring that the level of trust can be maintained no matter how complex the final product becomes.
- Speed: We need to change our notion of scalability within this space from scaling data to scaling data products themselves. To effectively serve the sheer number of use cases modern enterprises have, speed and rapid prototyping need to be a major focus. We need to be able to build primitive, trustworthy versions of data products faster and put them in the hands of end users to enable the feedback loop that leads to building the final solution.
- Aesthetics: We must reduce the cognitive load for end users by building products that have intuitive user interfaces that can be easily understood and consumed. We need products that are designed for humans, emphasizing usability over complexity. By building a better user experience, we will be facilitating the widespread adoption of data products.
With that in mind, let us take a look at the typical lifecycle of a data product.
What is the Data Product Lifecycle?
Individual implementations might vary depending on organizational hierarchies and economic factors, but in general, this is how we believe the data product lifecycle needs to be managed. There are various stages in the development and operation of the data product:
- Incubation: One of the interesting aspects of data product development is that no one knows what the final product will look like at the beginning. Most end-users’ experience of using data products is a dashboard and exportable spreadsheets that can be viewed in Excel.
What we have observed is that clients have a starting point and a mental image of the ultimate experience that they would like to have with the data product in terms of improving their productivity and decision-making. They don’t have much clarity about the form they would like the solution to take, but their minds really open up when they’re able to see a a tangible, usable version of a data product. - Minimum Viable Product (MVP): To open up the spigot of opportunity for adoption, you need to have an initial version of a product for the end user. This will need a rapid prototype that is functional, reliable, and can be quickly test-driven.
Since building data products is an inherently agile and iterative process, the nature and features within this prototype will constantly change according to customer feedback and real-world testing. You may need to build several prototypes of a product before settling on one that can be iterated upon. - Building Credibility: Managing the credibility of the data product, however complex or simple, is an integral part of the development process. Whatever output the product delivers – numbers, charts, recommendations – needs to be brought up to a standard of reliability. Since these products inform real-world business decisions, you need to ensure that no adverse outcomes occur due to the product’s insights.
- Staying Agile: The real world is dynamic, and the requirements of end users will naturally change over time. The product has to keep up to ensure that the deliverables remain relevant and applicable to current conditions.
When building data products, you want to focus on utilizing and reutilizing data sets, features, and composable interfaces to solve new use cases. The assetization of data sets not only helps build consistency and trust but also saves the time required to build new products. So, every successive data product will inherit certain characteristics and features from its predecessor to deliver better performance with lesser resources. - Winding Down: In the due course of time, when it comes time to retire a data product, there needs to be a set of procedures that can perform careful disengagement at the process, data, and infrastructure levels.
In summation, data products have a much higher degree of uncertainty than traditional software development. As a result, the tools, processes, and the people building them must be able to handle uncertainty.
A lifecycle-based approach to data product development helps us sidestep many of the problems that could arise within an inherently volatile process. With fewer roadblocks, we can build leaner, more efficient data products that are reliable, agile, and scalable.
It is a fascinating time to be a part of the data products space. Stay tuned to the Scribble Data blog for more insights into the wild and wonderful world of data.