Why aren’t you getting value out of Data Science? Reason #2 is Below….

You didn’t get deployment right.

Generally, depending on the use case, a model can be deployed:

·        Either verbally or through written communication as Business Insights

·        In a Batch Process to deliver a score in a table that is used in some way

·        In Real Time during an interaction or event

In this article, we will look at problems with deployment within this context.

Deployment Type #1: Business Insights delivered to leaders within an organization. 

Business insights may be delivered verbally or through a written communication.   They may be used to:

·        Answer a question.

·        Tell leadership why there is a problem.

·        Explain a model that will score in batch or real time.

Note that while statistical and/or machine learning algorithms can be used to make predictions, the value that is delivered to the business is generally a better understanding of the factors that influence the business, not the prediction. These insights by themselves don’t change actions or activities, but they should influence how decision-makers drive their business. 

Example business insights:

·        The top reasons that specific stores had lower than expected revenues.

·        The top drivers that indicate that a customer is going to churn.

·        The biggest factors that need to be investigated to stop fraud.

·        The impact of incremental marketing spend on revenue.

Insights from these models often point to operational challenges that need to be addressed by a change in process or policy. Unfortunately, these insights are often not communicated well, or the findings are dubious, resulting in inaction.

If the model isn’t explained so that business owners understand it, was the model deployed? The data scientists may think so, the business owners are left scratching their heads. 

Deployment Type #2: Batch Processes that proactively score on a periodic basis.

Scores can be assigned to customers, prospects, businesses, or other entities on a scheduled basis based on prior known behavior. These scores can be used to make decisions about the best future action to take, whether at the individual entity level or at some aggregate level. 

While the batch scoring process occurs on a periodic basis, scores can be recalled as part of a report, list, ad-hoc request, or may be recalled based on a triggered event. For example, an e-retailer may pre-score good candidates for certain types of products. The resultant product recommendation is not delivered until a user logs into the website. 

Models deployed as a batch process may not be effective because:

·        They are outdated. This can occur when the behavior that the model is designed for has already stopped – for example the fraud scheme has changed. Or the model was developed using data from an irrelevant time period, such as a holiday or storm.

·        They are not timely enough. In this case, the event was missed. For example, the customer has already churned because the scoring (or the action based on the score) is not frequent enough.

·        There is no plan on how to use them. A model that predicts that a customer is a good cross-sell candidate for an offer needs to also have an action to contact the customer.

·        The model predicts something that will occur regardless, and no action is necessary.  In these cases, the business problem was likely poorly framed, and the work was an academic exercise. Models like these are often deployed to offer customers a discount if they shop (and they would have shopped anyway), resulting in lower overall revenue for the business.

Deployment Type #3: Real-Time Processes that score at the point of interaction. 

Here, a score is generated during some interaction based on behaviors that have just occurred. Real-time scoring is extremely valuable if the entity is unknown until the current interaction or if the entity needs to be continuously monitored to provide an interaction at just the right time.

Deployment of a model in real time requires developing the model using data that is available at the time of interaction. In the era of big data, data availability has become less of a concern than in years past. Effective real time scoring often requires pre-aggregation of relevant attributes (from the current event stream and from historical behaviors) that will be used along with intelligent algorithm selection (or good coding) so that the scoring occurs quickly enough.

In short, real-time scoring often is ineffective if:

·        Relevant data needed for scoring is not available quickly enough for the score to occur transparently.

·        The problem is over-engineered and batch scoring, using an event trigger, would have worked just as well, and could have been implemented much more quickly.

On Effective Deployment.

Regardless of the means of deployment, the best models can fail if deployment is an afterthought. How a model will be deployed needs to be considered before the modeling project commences. Additionally, metrics must be put into place to track the performance of the model so that once it is deployed, leadership will be able to monetize the value.

Stay tuned for my next edition where I will highlight the #3 Reason.

James Zahoudanis

Principal & Senior Consultant at CMG Consulting LLC

4y

Laura, Excellent article. Timely as well given the creation of data driven solutioning is a key initiative with these industry enterprises. 

Like
Reply
Svetlana Levitan, PhD

Principal Algorithms and Machine Learning Scientist at Walgreens Boots Alliance

4y

And how do you deploy the model for scoring? Open standards like PMML or PFA can help to ease this task for traditional ML, while ONNX seems to be a de-facto standard for deep learning models. 

To view or add a comment, sign in

Insights from the community

Explore topics