Deploying your machine learning project as a service

By: Andres Solis Montero, Statistics Canada

The first step in the Software Development Life Cycle (SDLC) of a machine learning project is to define the problem statement and objectives. Then, gather, analyze and process data. The process continues with multiple—sometimes non-ending—iterations of modelling, training, hyperparameters tuning, testing and evaluation. These steps are essential to building a successful model and consume most of the development time and focus. But what happens next? Software packing and deployment? Most of the time, the final goal is delivering a product to clients, making code available to other teams or users for collaboration, or simply sharing the work and results with the world.

Deployment should not simply be the last step in the development cycle. Incrementally including good software engineering practices and open-source tools can improve development skills as well as an organization's ability to deliver applications and services faster. This approach helps build a product from the ground up that is easily shareable and deployable without a significant impact on modelling and developing time.

A project template following the practices and tools mentioned here is available for public use. This template can be your initial step when developing future machine learning projects— feel free to fork the project and extend its functionalities. Another interesting feature of this project structure is the separation of business logic for deployment, allowing it to follow Government of Canada API guidelines for delivering secure web services through HTTPS without the need to transform your code. This article assumes Python as the programming language to use; however, the methodologies and solutions presented here could also be implemented using any programming language.

Version Control

The first relevant practice to use when deploying a machine learning project as a service is version control. The use of version control for analytics projects was covered in a previous article that also highlighted its importance and value. This article presents a project structure to use within your version control system: a simple yet scalable project structure for any machine learning project.

  • LICENSE [License details]
  • README.md [Quick Usage Documentation]
  • CONTRIBUTING.md
  • SECURITY.md
  • docs [Documentation]
    • Makefile
    • conf.py
    • index.rst
    • make.bat
    • markdown [Manual Documentation]
      • model.md
      • problem_statement.md
      • relevant.md
  • src [Source Code]
    • mlapi
      • Dockerfile [Containerization]
      • requirements.txt
      • notebook.ipynb [Prototyping Notebook]
      • ml [ML modules]
        • classifier.py
        • model.joblib
      • main.py [REST API]

This structure reflects the production-ready code in the main branch. Other branches will mirror the same folder structure but serve different development phases, such as developing different versions, testing, new features and experiments. The goal of the main branch is that it is always release-ready, meaning that you can deploy from it at any time. Additionally, you can have multiple branches off the main branch that address production or development issues.

Git workflows and usage is an extensive topic and out of the scope of this article. Visit the public documentation for more detailed git usage guidelines.

Documentation

The second practice to take note of is documentation. Documenting the code is an important step to ensure your machine learning project is understandable and ready for deployment. Writing documentation can be daunting if you are trying to pull it together at the end of a project. With a few reasonable practices and tools, the work can be more manageable.

A well-documented project should target multiple users, from developers and maintainers to users, clients and stakeholders. The main interest of developers and maintainers is to understand implementation details and exposed Application Programming Interfaces (API). Users, clients and stakeholders want to know how to use the solution, data sources, Extract, Transform, and Load (ETL) pipelines and to understand the experiments and results.

Good project documentation is built as the project progresses, from the beginning, not just when the project is finished. Open-source tools such as Sphinx can automatically generate documentation from Docstring comments. Documenting the code as you go through the development life cycle of your project is an exercise that should be encouraged and that your team should follow. Following Docstring standards format when writing code can help to build comprehensive documentation of the code. Docstrings are a great way to generate API documentation as you write code by showcasing your models, parameters, functions and modules. The following docstring example demonstrates the mlapi.main.train function.

async def train(gradient_boosting: bool = False) -> bool:
    """ 
    FastAPI POST route '/train' endpoint to train our model     

    Args:
         gradient_boosting: bool            
                A boolean flag to switch between a DTreeClassifier or GradientBoostClassifier

    Returns:
           bool:
  	A boolean value identifying if training was successful.  
    """
    data = clf.dataset()
    return clf.train(data['X'], data['y'], gradient_boosting)

Integrating Sphinx with triggers in the versioning system can parse our project structure at each commit, looking for existing docstrings and automatically generating our documentation. In our sample project, the .gitlab.yaml configuration file will integrate our commits to the main branch with Sphinx to automatically generate our code's API documentation, as shown below.

async mlapi.main.train(gradient_boosting: bool = False) → bool
FastAPI POST route '/train' endpoint to train our model
Parameters: gradient_boosting – bool
A boolean flag to switch between a DTreeClassifier or GradientBoostClassifier
Returns: A boolean value identifying if training was successful.
Return type: bool

On the other hand, users, clients and stakeholders can benefit from our high-level project descriptions such as modelling details, objectives, input data sources, ETL pipelines, experiments and results. We complement code documentation by manually creating files under the docs/markdown/ folder. Sphinx has support for both ReStructuredText (.rst) and Markdown (.md), making generation of HTML and PDF documentation simple. Our project leverages both .rst and .md file formats, stored under the docs/ folder and specified in the index.rst file.

Pushing code to our main branch will trigger automatic documentation generation by inspecting all code docstrings under the source folder. During the same process, Markdown listed in the index are linked in the final documentation website. It is also important to specify a top-level README.md file containing a quick usage guide with relevant links and a LICENSE file disclosing our usage terms for clients and users.

REST APIs

The third practice to keep in mind for deployment of ML projects is the use of REST APIs. The Government of Canada has put an emphasis on the use of APIs as a means of deployment API as a client-server web service following a Representational State Transfer (REST) architectural style.

FastAPI is a modern, high-performance, web framework for building REST APIs. This increasingly popular open-source tool leverages Python type hints to automatically convert Python objects to JSON representations and vice-versa.

Let us talk a bit about the model implementation in our project before converting its API into a web REST API. Without losing generality, we selected a simple supervised classification model. This article is not about model training, so we will keep it simple for explanation purposes.

In the linked project, we selected the Iris data set to train a classification model. The data set contains four features (i.e., sepal length and width, and petal length and width). These features are used to classify each observation among three classes: Setosa, Versicolour, and Virginica.

We train our model with two simple classifiers, DecisionTreeClassifier and GradientBoosterClassifier, and use them to make future predictions. Our IrisClassifier model description and implementation can be found under src/mlapi/ml/classifier.py and contains five method calls (i.e., train, download, load, save, and predict).

Now, let us see how we can share our model as a web service. First, we create a FastAPI app instance and classifier inside a FastAPI application. The entry point is in the src/mlapi/main.py file.

app = FastAPI(title="MLAPI Template", description="API for ml model", version="1.0")
"""FastAPI app instance"""

clf = IrisClassifier.load()
"""Classifier instance"""

The IrisClassifier.load() method will return an already pre-trained classifier.

Then, we start by specifying our public HTTP routes to connect our web service to the classifier API.

@app.post("/train")
async def train(gradient_boosting: bool = False):
    """ Docstring """
    data = clf.dataset()
    clf.train(data['X'], data['y'], gradient_boosting)
    return True

The POST HTTP route @app.post('/train') accepts a Boolean flag to toggle between our two previously mentioned classifiers options. For each route request to /train, our web service will re-train the classifier using the Iris data set and the gradient_boostring flag and update the classifier (i.e., clf) instance.

Next, we define the route that will take our prediction requests; it will be a post method to /predict.

@app.post("/predict",response_model=IrisPredictionResponse)
async def predict(iris: IrisPredictionInput) :
    """ Docstring """
    return clf.predict(iris.data)

This method takes an IrisPredictionInput so it can ensure that the request data format is correct and return the IrisPredictionResponse class with the probabilities for each category. An IrisPredictionInput contains a data member with a list of observation features of size four, as seen in our Iris data set. FastAPI inspects Python type hints to convert the JSON post payload to the valid python objects declared by us in the same main.py file.

class IrisPredictionInput(BaseModel):
    """ Docstring """
    data: List[conlist(float, min_items=4, max_items=4)]

class IrisPredictionResponse(BaseModel):
    """ Docstring """
    prediction: List[int]
    probability: List[Any]

Finally, let us run our web service

src/mlapi$ uvicorn main:app --reload --host 0.0.0.0 --port 8888

Open https://127.0.0.1:8888/docs in your web browser. Since we diligently followed best practices, FastAPI was able to automatically create a nice Swagger webapp for documenting and testing our API. While this demonstrates how easy it is to use these development practices, it is only a small sample application. Finally, your organization certificate and private key can be passed to uvicorn during deployment, providing a secure HTTPS layer of communication for your API. There is no need to change your code nor modify it to make it secure as uvicorn will integrate Transport Layer Security (TLS) just by telling it where to find the certificate. Our project structure allows separation of business logic between your code and easy TLS deployment.

src/mlapi$ uvicorn main:app --host 0.0.0.0 --port 8888 –ssl-keyfile=./key.pem --ssl-certificate=./cert.pem

If your organization has robust TLS infrastructure in place through alternate systems, these can wrapped around the container to make the process even easier. There are numerous ways to implement TLS

Containerization

The fourth practice to implement during the deployment of your machine learning project is containerization. Containerization is a form of operating system virtualization where applications run in isolated user spaces. A container is essentially a fully packaged computing environment that contains everything an application needs to run (e.g., code and all its dependencies). The container is abstracted from the host OS, allowing it to run the same code in any infrastructure without needing code refactoring (i.e., any OS, VM, or Cloud).

The advantage of coding our machine learning projects using a container is to control all our software dependencies and environment. Hence, we make sure that it can be shared and run as initially intended. What does this mean? We create a Docker image description file defining our dependencies and running process. This does not affect our model or implementations aside from the proposed folder structure; it reflects all of our code dependencies.

There are three basic requirements in our template for building the custom Docker image description (i.e., Dockerfile) used to run our model as a service. First, Docker images allow inheritance, meaning that we can build on images that use most of the same libraries and dependencies as our project. For example, we could select to extend our Dockerfile from an image using scikit-learn, pytorch, tensorflow, keras, or Caffe. Second, we will keep track of any python package dependencies that we use in our project inside of the file requirements.txt. Finally, we specify our container's command entry point pointing to execute our main app.

Dockerfile

FROM tiangolo/uvicorn-gunicorn-fastapi:python3.7

WORKDIR /tmp
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt

COPY . /app
WORKDIR /app
CMD ["uvicorn", "main:app", "--reload", "--host", "0.0.0.0", "--port", "8888"]

The requirements.txt file contains a single python package name per line, specifying a necessary python dependency from our project.

requirements.txt

fastapi
uvicorn
pydantic
starlette
python-multipart
requests
scikit-learn
joblib

Now we can turn the definition file into a docker image and run the container pointing to our FastAPI service

src/mlapi$ docker build -t mlapi .
src/mlapi$ docker run -d -p 8888:8888 --name ml-mlapi mlapi

Modularization

The fifth and final practice highlighted in this article is modularization. Modularization is a technique used to divide a software system into multiple discrete and independent modules capable of carrying out tasks independently. These modules are considered basic constructs for the application.

If we want to develop code that is readable and maintainable, we must use some modular design. It is essential to separate our code into reusable unit building blocks. Splitting our code into different unit blocks allows us to execute the whole solution by putting them together. Although all of these are in a case-by-case scenario and project dependant; Machine Learning projects have very defining blocks such as data ETLs, pipelines, analysis, training, testing, results and report generation. Separating these logics in different code modules makes our python code readable and maintainable while keeping production costs low and speeding up our development cycle. Code that is not modular takes more time to move to production, and it is prone to errors and misconfigurations. It becomes a burden to review code multiple times before deployment.

Jupyter notebooks are one of the most common tools used when prototyping machine learning applications. They allow us to execute cells of code and document them in the same place. Unfortunately, they are not suitable for deploying a project; we need to translate their code into python modules. We could think of notebooks cells as building blocks of our prototype. Once tested, one or more code cells could be wrapped into a function or packed into a python module under /src/mlapi/ml folder. Then, we can import them from our notebooks and continue prototyping.

While prototyping our models, jupyter notebooks should be saved under the src/mlapi/ folder, next to the REST API main.py entry point. This ensures that our prototyping and production code maintain the same absolute module path imports. Also, the same way we package our code, project documentation could follow the same workflow. Jupyter markdown-cells containing meaningful information of the application should be moved to docs/markdown/<document>.md documents, extending our project documentation. Remember to add the reference to our Sphinx docs/index.rst file. These documentation pages can still be referenced from our prototyping notebook by linking to their final publishing location.

Another good modularization practice is limiting the amount of hard-coded variable values into our application, creating configuration files that reference these values, or making them arguments to a function. Use FastAPI BaseModel base class and Python data structures such as Enum, NamedTuple, and DataClasses objects, to specify arguments to our procedures and API calls. It is also good to make our model parameters and hyperparameters configurable and not hard-coded, allowing flexibility of setting different configurations every time we train or run our model.

In machine learning projects, training our model is highly dependent on our problem, input data, and format. Because of the multiple training iterations that our models go through, it is good to package the training code in an API that could be reusable. For example, instead of simply building code that processes our local copies of input files, we could translate the same principle to accept a single URL pointing to a compressed file containing all of the data set, following a particular structure. Other's data sets could follow the same structure and be incorporated into our training using the same code. Before creating our dataset packing structure, it is best to look for public datasets relevant to our problem and reuse their input format where possible. Standardizing our data sets is another positive way of creating modular machine learning code.

Always think of how we would like to use the solution before programming it. When building APIs or modules, think from the user's perspective and not the developer mindset. As data science continues to advance, resources continue to be produced on how to improve python code modularity and engineering skills.

Conclusion

Five software engineer practices that allow us to deploy machine learning projects

Description - Figure 1

Diagram depicting the five software engineer practices that allow us to deploy machine learning projects by serving our model as a restful web service.

Practice #1: Version control; The use of version control for analytics projects was covered in a previous article. This article presents a project structure to use within your version control system.

Practice #2: Documentation; Documenting the code is an important step to ensure your machine learning project is understandable and ready for deployment.

Practice #3: REST APIs; The Government of Canada has put an emphasis on the use of APIs as a means of deployment API as a client-server web service following a Representational State Transfer (REST) architecture style.

Practice #4: Containerization; Containerization is a form of operating system virtualization where applications run in isolated user spaces.

Practice #5: Modularization; Modularization is a technique used to divide a software system into multiple discrete and independent modules capable of carrying out tasks independently.

In this article we presented five software engineer practices that allow us to deploy machine learning projects by serving our model as a restful web service. We talk about the relevance of code versioning, documentation, REST API, containerization, and code modularization as fundamental steps to follow in your SDLC. Introducing good software development practices and the tools mentioned in this article will improve your project, code collaboration, and deployment. These are not the only good practices that we should focus on, but it is a good starting set to be aware of. For this article, we have created a basic project template following the practices mentioned here. Feel free to fork and reuse the template for your machine learning projects.

Date modified: