By Ryan Cox, Senior Director, Co-Head of Artificial Intelligence, Synechron
Have you ever invited your friends to your home and made them a delicious meal? Perhaps they’ve lauded your cooking skills and said, “this is the best dish I have ever eaten – why don’t you open a restaurant?” But … should you, really?
You’d be a fool to open a restaurant just because you think you’re the best cook in town. In your own kitchen, you only want to make the tastiest meal but – when you’re running a restaurant – you have a multitude of other considerations: Making a profit; keeping your customers happy; meeting all the hygiene regulations; handling customer complaints; managing your suppliers; what if one of your waiters calls in sick or your chef resigns? The list is endless.
Adoption begins at home
If you’re adopting artificial intelligence (AI) or machine learning (ML) in your business, you’ll likely begin in the business equivalent of your home kitchen: The laptops of your data scientists and AI engineers. It’ll be these people who tell you that a particular model works and can help your business.
But, moving your model from your home kitchen into the restaurant is a complicated journey, for one simple reason: While the model is center stage, it’s only a tiny part of your entire system – there are many moving parts. All the gears need to be in place and in sync. Moreover, your system is dynamic. Components in AI and machine learning systems are always evolving but, as a business, you need to ensure the integrity of the entire operation, while any components evolve or are replaced.
Your AI checklist
There are a few important checklist items for any AI system in production. Top of this list is usually data. If your system depends on the ‘latest data’ to work, you’ll require a mechanism to detect data drift ꟷ which occurs when the model’s predictions start to falter over time because the data it was trained on no longer represents the current environment And if your data comes from a third-party vendor, you’ll need to detect format changes or new data schema. These validations will need to be done automatically and on a daily basis in the pipeline.
Equally important is telemetry, which, in its simplest terms, is the automated process of collecting measurements and other data from remote or inaccessible points and transmitting them to receiving equipment for monitoring. This should be included in any production system because having visibility of the resources consumed by different parts of your system (not only the AI models) is essential.
Monitoring for CPU, memory, and network bandwidth is expected. But, in this case, you should also account for GPU resources, input size, and usage intensity. These are the essential data components to determine if your setup matches demand, and whether you’ve gotten your scalability planning right. Additionally, if you have a way to measure the accuracy of your model, you should keep a record to see whether it’s degrading over time.
Software receives version upgrades and so should your models. If you’ve fine-tuned your model (because of data drift, for example), ask yourself if you’ve numbered each version? Do you have a repository to manage the versions? If you’ve changed your model design, have you documented the model parameters in the repository? Can you switch easily to a different model from the user’s end or roll back the version? These checkpoints not only give you the flexibility you need in a production system, but also the integrity to validate your results.
Finally, a production system means your engineers are not the end users. If there are any urgent issues, are there any communication channels on standby? Your business may depend on it! A production system can also be vague (“the system is not responding!”) so someone will need to be on hand to establish all the details. You might also be required to understand why a system isn’t behaving intuitively. Lines of communication need to be kept open at all times. You’ll always need a system to manage any open issues and keep your users happy.
Your AI plan is vitally important
Cooking at home is a hobby. The key difference between an amateur and a professional is not skill or knowledge but the presence of a detailed plan to ensure every part works together, effectively and efficiently. The checklist above is not exclusive, but it’s a good guide to what you need to consider when moving an AI model from development into production.