12 symptoms of a Hidden Technical Debt in your ML project

Introduction

About a year ago I stumbled upon a paper called “Machine Learning: The High-Interest Credit Card of Technical Debt” written by brilliant engineers from Google in 2014 (also, it has an older version with practically identical contents - “Hidden Technical Debt in Machine Learning Systems”). I found ideas from there very practical. Every time I come back to this paper, it offers me a fresh perspective on my current projects. So, I decided to rethink this paper in a form of a list of symptoms which can be used to assess a current or planned level of a technical debt in ML-project and create an action plan suitable for a corresponding situation.

I added some comments using italics and proposed several solutions (πŸ’‘) which were not considered in the paper.

What is a technical debt

Technical debt is a metaphor that refers to an amount of work your team would have to do in a future to provide a quick solution right here right now.

Guys will have a hard time when they decide to hang TV on another wall

Technical debt is not a curse and it is absolutely normal to increase it on the early stage of a new project to deliver results faster. Yet, it is very useful to have some guidelines helping to identify where technical debt can emerge in order to make informed decisions.

Symptoms of ML-Project Technical Debt

❓ Your model relies on many data sources

Problems

❗ If one of the features changes it’s distribution - the prediction behavior might change drastically (CACE principle:Changing Anything Changes Everything).

Possible solutions

βœ… Isolate models based on different sources and serve ensembles. In some cases this solution may have bad scalability and add cost on maintaining separate models. Case from my practice: my team used a combination of a linear model and a boosting algorithm to make predictions for out-of-sample objects.

βœ… Gain a deep understanding of your data. For example, you may build several models on various slices of your data and inspect metrics you receive. This is an excellent advice in any situation: the better you understand your data and your model - the less surprises you are going to face.

βœ… Add a regularization punishing for diverging from the prior model’s predictions. This increases chances for your model to achieve the same local minimum it has converged to on the previous run.

↩️ Return to the list of symptoms

❓ Your model affects it’s own input data

Problems

❗ It may make it difficult to analyse system performance and predict its behavior before it is released. In the worst case this feedback loop can be hidden.

Possible solutions

βœ… Isolate certain parts of data from the influence of your model.

βœ… Identify hidden input loops and get rid of them. In general, it requires an understanding of origins of the data you use.

↩️ Return to the list of symptoms

❓ Your prediction service has undeclared consumers (aka visibility debt)

Problems

❗ Any changes to your model probably will break these silently dependent systems.

❗ It may create a hidden input loop if an undeclared consumer is creating an input data for your model.

Possible solutions

βœ… Use automated feature management tool to annotate data sources and build dependency trees. I believe authors were describing feature store concept before it became widespread.

βœ…πŸ’‘ Make your service private so any consumer within your organization would have to inform you about intentions to use your model output.

βœ…πŸ’‘ Support an old version of your prediction service for some time and make announcements long before any changes.

↩️ Return to the list of symptoms

❓ Your model relies on unstable data source (e.g. another model)

Problems

❗ Changes in input data source may cause unexpected behavior of your model.

Possible solutions

βœ… Create a versioned copy of an unstable input data and use it until the updated version is fully stabilized.

βœ… Add more data to teach the first ML-model dealing with your use-case.

βœ…πŸ’‘Use input features from the first model to train your own model.

↩️ Return to the list of symptoms

❓ You are using features with slim to none performance impact

Problems

❗ The more features you have the higher risk that any of them will alter and corrupt your model performance.

Possible solutions

βœ… Regularly evaluate the effect of removing individual features from a model.

βœ… Develop cultural awareness about the lasting benefit of underutilized dependency cleanup.

↩️ Return to the list of symptoms

❓ You have a lot of β€œglue code” because of a specific ML-package

Problems

❗ It may turn into a tax on innovation: switching to other machine learning package would become very expensive.

Possible solutions

βœ… Re-implement algorithms from a general-purpose package to satisfy your specific needs. This may look costly, but sometimes it is the easiest solutions in terms of understanding, testing and maintaining your code. For example my team implemented a common interface for all data transformers and rewrites a code from general-purpose packages like sklearn to suit this interface.

↩️ Return to the list of symptoms

❓ Data preparation stages turned into pipeline jungles

Problems

❗ Complicated pipelines are difficult to test and maintain.

Possible solutions

βœ… Do not separate researchers and engineers, they should work together and probably be one person.

βœ…πŸ’‘ Use data engineering/MLOps tools which have become popular nowdays. My team is using Airflow and DVC in almost every project which helps us easily manage our data pipelines.

↩️ Return to the list of symptoms

❓ You are mixing dead experimental code with a working one

Problems

❗ Unused code paths increase system complexity causing a whole range of negative effects from difficulties in maintenance to unexpected behavior of the system.

Possible solutions

βœ… Build a healthy ML system which isolates experimental code well. E.g., DVC encourages a usage of separate branches for separate experiments. By doing so, you would nip this problem in the bud.

↩️ Return to the list of symptoms

❓ Your configuration files are very complex

Problems

❗ Errors in configuration files are a common source of costly mistakes because they are usually not tested properly and treated lightly by engineers.

Possible solutions

βœ… Validate data passed via configs using assertions, e.g. pydantic may help with that.

βœ… Carefully review changes in configuration files.

↩️ Return to the list of symptoms

❓ You have chosen a threshold for your model manually

Problems

❗ This threshold may become invalid if a model is retrained on new data.

Possible solutions

βœ… Let your ML system to learn a threshold on holdout data.

↩️ Return to the list of symptoms

❓ Your model relies on non-causal correlations

Problems

❗ Non-causal correlations may occur randomly or temporarily, so it is extremely risky to rely on them.

Possible solutions

βœ… Avoid using illogically correlated features. Lucky for us, nowadays, a field of causal and explainable ML is developing rapidly.

βœ…πŸ’‘Check that introduced features affect the results in a way you can explain, e.g. you may use a combination of domain knowledge and Shapley Values for that.

↩️ Return to the list of symptoms

❓ Your ML-system monitoring and testing require improvements

Problems

❗ Unit tests and end-to-end tests are unable to uncover changes in the external world that affect your model behavior.

Possible solutions

βœ… Monitor prediction bias (aka concept drift).

βœ… Add sanity checks especially for the systems allowed to perform actions in the real world.

βœ…πŸ’‘ Monitor other useful metrics. Here are parameters my team usually monitors in every project: input data distribution, prediction distribution, overall metrics, metrics on some slices of data, feature importance, assertions for edge-cases (e.g. if all features are zero we expect prediction to be zero).

↩️ Return to the list of symptoms

Conclusion

I hope you have found this article helpful! Do not let technical debt cut down an innovation rate of your ML-projects!

If notice an error or just in a mood to say hello, please contact me via LinkedIn, Telegram or email. Every message counts, your feedback really motivates me for creation of a new content! Also, you are very welcome to subscribe to my Telegram channel: @FuriousAI.

Kirill Vasin
Kirill Vasin
DS Teamlead

Building real-life applications with machine learning