Adding Machine Learning to a Software Product

Published in

iyzico.engineering

11 min readJun 3, 2018

Team effort optimized for speed. Source: link

AI is the new electricity and ML is at the top of tech hype cycle. When I speak to engineering teams at different companies, I notice that several engineers are in a hurry brainstorming machine learning products, choosing the coolest data science platform, hiring data scientists, and/or purchasing training sessions for transforming their existing engineers into data scientists. On the other hand, I also observe that most of such hurry is caused by the FOMO of engineering executives. Management consultancy teams, technology vendors, and sector reports are forcing them to catch up with the current trend as soon as possible.

A simple Google search indicates what ML topics others are interested in.

When the technology hype is the driver of forming ML capabilities at a company, most organizations can now quickly put together the required development platform, data science team, and connectivity to existing corporate data. Once all these ingredients are in place, IT executives believe that the data (engineering or science) team will start building amazing new products. Unfortunately, many data projects do not result in successful products, even with a good mix of talented people and right tools. In 2015, Gartner declared that a majority of (big) data projects “fail to go beyond piloting and experimentation, and will be abandoned”. Following this report, a Gartner analyst Nick Heudecker gave an update estimating the failure rate to be much higher, close to 85%.

If you started experimenting with data-driven products, beware that it will most likely die.

Product development is an art of optimizing time, effort, and budget while satisfying customer needs at the same time. Data products are not any different from other software products. It is the user who determines the fate of a product. Moreover, different product development approaches can be appropriate for different teams and business problems. For instance, a waterfall strategy may be the best approach for re-engineering of a large legacy application, whereas agile approach might be more suitable for experimental startup products. Regardless of picking an agile or waterfall method, mixing the product with machine learning should be delayed in the development plan as much as possible. The ML component is at the heart of a data product, and whenever a new product feature is added or updated, the ML component may need to be revised or even revamped completely. Since maintaining the ML component usually requires more effort than maintaining other software components, I could point out that ML should be even avoided completely in the early stages of new product development. I would like to briefly explain how ML components are different from other software components.

1. ML is hard to make right

I love how this article compares ML to common software development. Concisely, the reason ML is hard is not because it requires Mathematics and Theoretical Computer Science background to understand the underlying mechanisms. ML is difficult to accomplish due to its experimental iteration cycle that is similar to software debugging. You need to continuously measure and monitor how your ML components are behaving. Once your ML models become less accurate and robust as a cascading consequence of changes in your product, customer experience, or external user conditions, you need to start debugging the model by iterating over a multi-dimensional grid of feature pipeline, algorithm configuration, and data-model fit. Since the potential evaluation tasks in ML are asymptotically more complex than debugging other software components, ML team scalability will also become the bottleneck in your time plan. This is why many experts are discussing (e.g.,in link1 or link2) whether an agile approach is suitable for ML or data science.

The following illustration states that we need cooperation of large data volumes and complex algorithms in order to obtain better ML performance. I have added data (x-axis) and development (y-axis) scalability concerns on the original illustration of Andrew Ng. Although it is currently possible to linearly scale with increasing data volume using Hadoop, Spark, MPP databases and so on, building more complex ML algorithms requires much higher effort for obtaining smaller improvements in the outcome.

Illustration of more data vs. more complex algorithms in Andrew Ng’s “Machine Learning Yearning” book.

Because of difficulties in building and maintaining ML products, significant advances in ML technologies are usually coming from academic institutions with good funding, or technology behemoths that can collect and manipulate large amounts of data and afford paying hundreds of machine learning experts. Despite having available many SaaS solutions or open source frameworks that reduce complexities of ML, entry barrier of ML is still very high from the product development perspective, especially when you are a startup with limited time and resources.

2. ML is hard to put into production

When we divide software products into components or layers, we usually provide a way that these layers can communicate with each other so that they can be developed by different teams, or by the same full-stack team in different tasks or sprints. ML component of a product can be integrated into the product via an API. On the other hand, even a simple ML component can bring too many auxiliary components along with it, in order to be able to put into production. ML requires training and scoring phases, real-time or batch data pipelines, monitoring and measuring performance metrics such as accuracy, latency, throughput, memory footprint and so on. When you compare the amount of code you wrote in the ML core and auxiliary data and performance management components, you will realize that you had to spend significant effort for the auxiliary components. The following figure is taken from Google’s article “hidden technical debt in ML systems” that explains this very same issue of creating an ML software habitat.

3. ML can become a blackbox and its decisions are hard to explain

Complicated ML models may contain thousands of rules, millions of weights, or ensembles of multiple heterogeneous algorithms. As a result, it is not always possible to understand how the decision of the ML model is formulated. When the ML models are employed for assisting call center agents, risk analysts, or campaign marketers, human experts would like to know the reasons behind ML model’s decision. If ML model’s actual purpose is to assist human experts in making decisions rather than autonomously making decisions on behalf of them, structure of the ML components need to be modified significantly.

As British science fiction writer Arthur C. Clarke states that “any sufficiently advanced technology is indistinguishable from magic”, when your ML model is too complicated but amazingly good to personalize, it may creep people out and your customers may start questioning what private data your application has access to. Human brain is evolved to relax only when it feels safe and senses that things are under control. Thus, your product may need additional mechanisms to gain user trust and to explain why the decision is made by the ML model. For instance, image recognition or natural language processing models easily pick biases of the people who prepared the training data sets or contributed to the content (e.g., racist or sexist bias).

4. ML component cannot stay constant while the product is evolving

You may have heard of the Monty Hall problem that is usually shown as an example of conditional probabilities: the likelihood of an outcome is determined by prior choices.

Whatever you name it, either as conditional probability or the butterfly effect, the phenomenon you are attempting to model by ML is linked to several external factors. In fact, such connections between events and entities make the universe tick. The journal Nature’s quantum entanglement article explains that the entanglement between the particles at the edges of the universe keep the universe as one, otherwise it would get separated into isolated sub-verses. As a result, any change in any factor you had not captured as a feature into your ML component will change the behavior and performance of your ML model. When you are making rapid experiments and iterations in your product, you will hardly be able to keep your ML model robust.

At iyzico, there are multiple ways to define a fraudulent payment transaction (stolen card, denying delivery, brute force attack and so on) hence we prepare training data sets properly according to the given fraud descriptions. The model we train against stolen card transactions is unable to distinguish denied shipment deliveries, but can accurately identify stolen cards that have not been realized as stolen and declared by the real owner to the issuer bank. We have a clear business understanding of the need, and a machine learning product matching the need. On the other hand, the business need may evolve through time, and require a much different approach, making the existing model invalid.

In another case, Netflix had a million-dollar competition to improve their algorithm that predicts star-ratings of movies. Through years, Netflix gained huge experience on consumer oriented experimentation. Although initially predicting ratings of movies was one of the ways of personalization, Netflix “consumer science” team learned that accuracy, diversity, freshness, and awareness are important for personalization all together. Hence, the original problem and the ML model formulation changed into a more complicated ranking problem. Therefore, their original ML models are no longer valid for the current Netflix business.

5. Developing an ML product with technophilies is hard

As the final reason why ML needs to be carefully approached by product managers, I would like to put tendencies of the engineers and ML experts in to the list. Data professionals tend to become artisans who value and get motivated by tools (Spark, Tensorflow, Kafka), hardware (GPU, CUDA, FPGAs), architectures (real-time, in-memory, serverless), MOOCs, hypes (drones, 3d printers, blockchain), methodologies (SCRUM, clean-code, TDD) and so on. Thus, I frequently observe that product development teams force product managers and infrastructure teams to incorporate their favorite “thing” into the product. Otherwise the development team quickly lose interest and motivation in developing the product, regardless of consumer/user needs.

The most suitable timing of starting the ML component

ML product development is a 3-player competition, where the product manager needs to negotiate perfect balance of customer satisfaction, employee happiness, and high quality ML process.

3-player game of building an ML product.

When a company is measuring the success of the product by levels of customer happiness, the underlying technologies and methodologies should be maintained in a way that enables rapid prototyping, fast iteration, and quick delivery. Caring only for customer satisfaction and moving forward while validating customer needs may require pushing the ML development into later iterations of the product. Sometimes developing a quick and dirty but working product with an expiration date is the best product development approach, although it may make engineers unhappy because they cannot play with their favorite thing, cannot “self improve” by learning cutting edge technologies, or would have wanted to build a solid big product that would stay alive forever — like a monument of success.

As a data scientist, my focus is on the development of the ML components rather than the overall product. In my experience, I find “the Turk” approach as the best working product development strategy. “the Turk” is a chess playing automaton with a hidden small-sized person providing the intelligence as “magic”. Hence the strategy here is to mock the ML component up while developing the rest of the product.

An ML product usually has many other components, user and admin UI, database backend, logging and monitoring mechanism, and so on. Even if the main product itself has strong ML requirements such as forecasting or predictive analysis, it is still possible to mock the ML models up, while independently building the product. At first iteration, we can start with a baseline that is simple statistics, e.g., recommending the most common item, forecasting as the last rolling window average, returning the mode etc., so that the initial iteration of the ML component will both be simple enough to iteratively improve and serve as a baseline to track improvement.

As an old Google article about best practices for ML engineering describes, it is sometimes the best when you ship your product without any ML at all. Your product’s first iteration should be minimally viable, and ML is one of the most complicated features that you are adding. Moreover, without an ML component, the reasons your product may be failing would reduce into a shorter list of not satisfying the customer needs or software errors. When the ML component is added, the number of things that may go wrong increases drastically. Biased personalization, misdirected forecasting, or failing fraud controls may cause catastrophic loss of users or revenue.

The following illustration is generally used for explaining that MVP iterations are better when you have to make your customers happy at all stages of product development. Whatever project management methodology is employed, I believe that the best stage of building ML into the product is the last stage when the final form of product is now clearly visible out of the mist. Call it either Monica Rogati’s “AI hierarchy of needs” or Gartner’s “Analytics capabilities path”, product development has certain checkpoints for the ML component, such as having the right data (ability to collect, query, and analyze data), smooth product experience (blackbox vs. descriptive), and good team harmony (open to quick iteration and rapid experimentation vs. technology or methodology enthusiasts).

I like the growth story of AirBnb, explaining that “in order to scale, you have to first do things that don’t scale at all.” This is also true for ML products. After creating the correct consumer experience, constructing the data and feature flows, and designing useful product KPIs, it is then the time to move forward and automate via ML the portions in your operation that requires manual effort which does not scale.

For instance, let’s assume that you want to build a chatbot (i.e., getting the example from the first figure in this article, the most common Google search about ML) in order to automate your customer contact center. In this case, your actual product needs to have a messaging interface and be integrated into your internal CRM application. Your chatbot application can initially be a human-chatbot application where human agents are communicating with the customer directly. Once the UX and customer needs are validated and you already hit the scalability limits of human agents, you can start improving the agent interfaces by adding rule-based message templates. You can detect customer message categories by using simple bag-of-words approaches and rank the agent templates accordingly. When you also hit the scalability limits of agent productivity, it may be the time to slightly automate the messaging interface. You can first start with rules and heuristics to respond to the customer. Finally you can move towards NLP and AI to build the final intelligent form of your product.

Summing Up

In this article, I wanted to let product managers know that ML is not an ingredient that can simply be picked from the shelf. Validating and experimenting with the market, perfecting the user experience, and building a solid product are much more important than having a perfect ML model that is going to become garbage when the product evolves very quickly. Also product managers should be aware of issues related to ML maintenance, team tendencies, and stages of perfecting the ML component.

Please let me know what you think and what you experienced differently.