Let’s say you work at a modern data-driven company and you want to find a way to enhance one of your processes, like partner management. It makes sense considering you have limited resources to invest in partner development, but it ranks high on your growth goals for the year. The first step would be to benchmark and find out which partners to invest in.
You think about adopting an ML approach that would model partner performance and find optimal allocation of resources. But you soon realize that this approach too isn’t going to save you time as you’ll have to wait for the right data to be collected and the approach will probably have a short shelf life due to an evolving business environment, as well as an opportunity cost because of to the limited bandwidth of your data science team, whose time may be better spent on use cases that have a deeper impact.
Interestingly, this isn’t a one off case. You’ve probably seen this time and again at your own organization. Most organizations today are generating tons of data that multiple teams and individuals depend on. However, actually being able to use this data can be incredibly time consuming.
If only data science teams would listen to these end users and build ML models that solve for their use cases. That would make their lives so much easier, right? Not really. There are multiple challenges along the way that lead to these use cases being put on the backburner:
-
It is unclear whether solving for a problem with an ML model is feasible or not
-
The cost and complexity around building an ML production system means that there are several ways in which the effort can remain at the level of an experiment, rather than a productionizable solution
-
The time taken to develop and deploy an ML model is just way too long
-
Building and operating ML models is expensive, requiring well-skilled engineers at every step, along with accountable data scientists if the models’ predictions go awry or dip below acceptable accuracy levels
Basically, the outcome just doesn’t justify the effort or the cost at this point in time. And if that isn’t enough, there’s a constant struggle due to lack of clarity around which models to develop, and the scarcity of available infrastructure or data science talent.
And as a result, you’re left with 2 options – either you wait a lifetime before your desired data model is deployed, or just let it go!
Sub-ML: a simpler, and nimbler path to ML
Maybe building use cases doesn’t have to be all that complicated and you can always try a different route. One where you experiment with use cases and after incremental updates, you can decide which of the ones go to ML, or production. And that approach is what we call the Sub-ML use case approach.
What started off as an experimental path to ML is now an emerging space in itself. Gone are the days where you had to worry about shuffling your data science resources, already crunched for bandwidth, to work on use cases that catered to all the functions within your organization. The Sub-ML use case approach, which is all about incremental scoping, provides a path to getting to solutions.
How does Sub-ML and incremental scoping work?
Now, let’s go back to the same example of partner management that we discussed initially. We could simply forgo the use of complex ML models and follow a Sub-ML approach instead, where the goal would not be to try and build an ideal model. Instead, we would follow a series of incremental steps to discover the problem, value, and approach simultaneously.
So in the case of partner management, we wouldn’t necessarily focus on building a comprehensive end solution, but on providing one incremental input to the partner manager – such as outlier partners worth focusing on. It would work with whatever data is easily available, address the problems that have immediate value, and can be productionized using an agile platform.
The key is that it goes through a mini-iteration of the end-to-end ML development process within hours to days, and that the output is productionized, i.e., available everyday and maintained. Once the data product starts being used, you get active feedback on next steps, including problems with outliers detected, recommendations and decision tracking.
This approach has a number of advantages including incremental commitment from the organization, not having to wait for the model to see value, active cooperation from stakeholders, easier skill planning and allocation, and most importantly a model that is fit for purpose.
But isn’t Sub-ML just a fancy new term for BI or analytics?
Sub-ML is a lot more than just BI, and here’s how
-
Sub-ML involves lightweight models which don’t take up hours to give you an output
-
Sub-ML productions are productionized from day 1. They’re not ad hoc.
-
Sub-ML use cases are ‘living’ – they are not one off. They are maintained and evolve continuously
-
Sub-ML requires feature engineering (data transformations) beyond the kind of ETL that BI systems need
-
Sub-ML use cases solve for similar complexity as your ML models without necessarily requiring a data scientist
-
Sub-ML goes beyond BI, and its use cases, once deployed, provide a lot more clarity around ML problems and their solutions
We’re already seeing our customers deploying Sub-ML use cases within a tenth of the time taken to deploy a single ML model. And this applies to a wide spectrum of customers, from enterprise to mid-market and SMBs. We’ve also seen that a lot of the skillset concentration in the data space is, in fact, Sub-ML.
Sub-ML is the future of data, and it’s already here!