Inference for the Real World: What to do with your ML Model Once You’ve Trained it

6 min readApr 17, 2021

In our blog: The Power of Abstraction: How ML Frameworks and PerceptiLabs Provide a more Human-Readable Language for Describing ML Models, we talked about what it means to do machine learning (ML) modeling and the language that ML frameworks provide for describing ML models. The end result of the ML modeling process is a file containing the model’s trained weights ready for inference (i.e., to be used for making real-world predictions, classifications, etc.).

But once you have this file, how do you actually use it for inference? In this blog we’ll do a quick recap of the ML model export process, dive into what to do next with your trained model file, and explore why knowledge of this process is valuable for both technical and non-technical members across your organization.

A Recap of Exporting

Exporting is the part of the ML modeling process where you save out your trained model to a specially-formatted file. During export, you also have the option to optimize (reduce) the size of the resulting file, so that it can be more easily transferred over the Internet while consuming less space and memory on the target device.

Two common optimization methods are:

Compression: prunes the model by removing some of the less-significant weights which don’t affect predictions too much.
Quantization: reduces the number of bits used to store weight values (e.g., from 32-bit floating point values, to 8-bit integers).

Note that both methods can potentially reduce the model’s accuracy, so some experimentation is recommended.

With PerceptiLabs, exporting is simply a matter of selecting the File > Export menu, specifying your target location, selecting an optimization option, and clicking Export:

Through this simple interface, pretty much any member of your team, from developers to non-technical users, can easily export a trained model with PerceptiLabs.

The export process will generate a number of files in a directory structure that looks similar to the following:

/mymodel/1/assets/variablesvariables.data-00000-of-00001variables.indexsaved_model.pbcheckpointmodel.ckpt-1.data-00000-of-00002model.ckpt-1.data-00001-of-00002model.ckpt-1.index

Of these files, the only one you need to concern yourself with is the .pb file, in this example: saved_model.pb. The .pb file is the final, trained model file that is ready to use for inference. And in case you technical readers out there were wondering, this file follows the Protocol Buffer (ProtoBuf) format.

Let’s now look at what you need to do with this file.

Using the Trained Model

The next step is to transfer the model file to the device (e.g., cloud server, edge device, etc.) where it will be presented with real-world data to analyze and perform inference on. However, that is only half of the equation. After you have the model where it needs to be (i.e., on the target device), the other requirement is to provide some sort of code or application to load the file, feed data to it, and make its results available for consumption. TensorFlow provides two options for doing this.

The first option is to use TensorFlow’s tensorflow-model-server command-line tool to host (serve) the model on a certain port while handling REST requests which invoke predictions on the model. REST requests are essentially special URLs which applications can access to send and receive data. This approach is useful for allowing remote (e.g., distributed) applications to use the same model for inference.

The second option is to programmatically load and interact with the model directly within your application using TensorFlow’s APIs. For example, our recent End-to-End Workflow video showed how to export a model from PerceptiLabs and use it in a Streamlit app. Alternatively, you can use other APIs such as those used in our Coral tutorial.

Either way, that’s all there is to it!

But Who Cares?

Learning about the export process, how a trained model is used, and even how ML models are created, can be beneficial to all members of an organization including non-developer and non-ML stakeholders.

IT personnel for example, may be responsible for setting up cloud servers to host a model, so they’ll likely want to know how big the model is, how much estimated processing power is required, where to get the model file from, and how to best secure it.

On the business side, business analysts for example, may be responsible for using the model or the application that runs the model to make decisions. They may even be tasked with determining where ML models fit into their organization’s objectives, and what contributions those models need to make.

Marketing and sales departments can also benefit from insider knowledge. For example, they may dictate the requirements needed for using past data to predict market or sales trends, or their target level of model accuracy. They may even work directly with data scientists on this or access the applications running the models for inference.

Also keep in mind that the process of training, exporting, and using a model is often an iterative and on-going process. This can be due to factors such as model drift and decay, changes in business requirements, or from a need for continual refinement. Regardless of the reason, the on-going nature of building and refining models has led to a formal methodology known as MLOps which can involve all of the members above.

Catering to the needs of these different stakeholders is where PerceptiLabs really shines. For starters, our visual workflow allows users to drag and drop model components without having to write code. Even a non-technical, non-ML practitioner can look at a PerceptiLabs model and see that there is some sort of flow between an input and an output. And with each model component providing visualizations, users can gain insight into how the model works and how each part of it transforms the data.

Furthermore, PerceptiLabs enhances ML modeling by separating the model editing process from the model training process. This makes modeling faster than in TensorFlow, which normally requires you to run the whole model on the whole training set (and all from code). With PerceptiLabs, only the first training sample is used during modeling, and only the affected parts of the model are re-run on that sample as the model is being changed. This saves both time and reduces the computing power required just to create the model. Then, during training, any user can view statistics and other visualizations to see how the model is performing, and determine if training should be interrupted early to further tweak the model.

In the end, PerceptiLabs is all about getting you to an exported model faster, and one that you can trust because you had insight into how it was built.

Conclusion

The goal of ML modeling is to produce a trained ML model in a file that can be deployed for inference in the real world. An API or serving application will then load this file, feed data to it, and return its output. From there, an application can then use the output to make decisions.

As we’ve seen, PerceptiLabs not only makes the modeling process easier and accessible for a variety of users, it also simplifies TensorFlow’s export process into a simple export menu. It’s then just a matter of transferring the resulting file to the target device, and creating an application to use or serve that model.

More generally, PerceptiLabs’ visually-oriented workflow and features are now making it possible for more members of your organization to participate in the design, training, and exporting of your ML models.

For more information see our Exporting page in our PerceptiLabs documentation.

Inference for the Real World: What to do with your ML Model Once You’ve Trained it

A Recap of Exporting

Using the Trained Model

But Who Cares?

Conclusion

Written by Martin Isaksson