This project consists of deploying a pre-trained YOLOv7 object detection model to the cloud using AWS Elastic Beanstalk. The application allows users to upload an image and receive a JSON response containing the detected objects along with their confidence scores. The objective was to build a full end-to-end pipeline from model conversion to production deployment, making the solution scalable and accessible via a REST API. This includes converting the YOLOv7 model from PyTorch to ONNX, validating its performance, and comparing its predictions to ensure consistency. A key component of the project is transfer learning, leveraging the robustness and accuracy of YOLOv7—a state-of-the-art object detection model—to avoid training a model from scratch and focus on integrating it into a production environment.
The following technologies were used to develop this project:
To prepare the YOLOv7 model for deployment in a cloud environment, the first step was converting it from its original PyTorch format to the ONNX
(Open Neural Network Exchange) format.
This step was crucial for optimizing the model for production and ensuring efficient inference.
When deploying machine learning models to production—especially on resource-constrained environments such as cloud-based APIs—it's important to minimize memory usage and latency. ONNX offers several advantages that made it ideal for this project:
The conversion process was done using Google Colab for its GPU support and flexibility. After converting the model, it was tested and compared against the original PyTorch version to ensure that inference results remained consistent.
Credits: The original conversion script was adapted from a version shared by Wong Kin Yiu, the author of YOLOv7. The code was modified to fit the specific needs of this deployment pipeline.
The following code snippet shows the steps taken to convert the YOLOv7 model to ONNX format.
!pip install --upgrade setuptools pip --user
!pip install onnx
!pip install onnxruntime
#!pip install --ignore-installed PyYAML
#!pip install Pillow
!pip install protobuf<4.21.3
!pip install onnxruntime-gpu
!pip install onnx>=1.9.0
!pip install onnx-simplifier>=0.3.6 --user
We then clone the official YOLOv7 repository by Wong Kin Yiu, which provides the training weights and export scripts required for model conversion.
!git clone https://github.com/WongKinYiu/yolov7
%cd yolov7
!ls
Download the pre-trained weights used for object detection. These weights were trained on a standard dataset and will later be used for inference and export
!wget https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7_training.pt
Before exporting the model, it's helpful to validate that the model works as expected in PyTorch by running inference on a sample image.
!python detect.py --weights yolov7_training.pt --source ./inference/images/bus.jpg
from PIL import Image
Image.open('/content/yolov7/yolov7/runs/detect/exp3/bus.jpg')
For graph manipulation and optimization before conversion, onnx_graphsurgeon is installed from NVIDIA’s NGC registry.
!pip install onnx_graphsurgeon --index-url https://pypi.ngc.nvidia.com
Finally, we convert the PyTorch model to ONNX using the provided export.py script, applying simplifications and configurations optimal for inference on AWS:
%cd /content/yolov7/yolov7/
!python export.py --weights ./yolov7_training.pt \
--grid --end2end --simplify \
--topk-all 100 --iou-thres 0.45 --conf-thres 0.25 \
--img-size 640 640 --max-wh 640
This produces a .onnx model file, which is validated and later uploaded to AWS S3, ready to be consumed by the Flask-based inference API.
After converting the model to ONNX format, we performed a manual inference to validate its functionality and ensure the outputs were coherent with the original PyTorch model. Using onnxruntime, we loaded the exported .onnx model and ran inference on the same image used in the PyTorch test. The image was preprocessed following YOLOv7's input requirements: resized and padded with the letterbox function, normalized, and converted to the appropriate format expected by the ONNX runtime
import cv2
import time
import requests
import random
import numpy as np
import onnxruntime as ort
from PIL import Image
from pathlib import Path
from collections import OrderedDict,namedtuple
## CONFIGURACION
cuda = False
w = "/content/yolov7/yolov7/yolov7_training.onnx"
# img = cv2.imread('/content/yolov7/yolov7/runs/detect/exp2/bus.jpg')
img_path = Path('/content/yolov7/yolov7/runs/detect/exp3/bus.jpg').as_posix()
img = cv2.imread(img_path)
names = ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
'hair drier', 'toothbrush']
providers = ['CUDAExecutionProvider', 'CPUExecutionProvider'] if cuda else ['CPUExecutionProvider']
session = ort.InferenceSession(w, providers=providers)
def letterbox(im, new_shape=(640, 640), color=(114, 114, 114), auto=True, scaleup=True, stride=32):
# Resize and pad image while meeting stride-multiple constraints
shape = im.shape[:2] # current shape [height, width]
if isinstance(new_shape, int):
new_shape = (new_shape, new_shape)
# Scale ratio (new / old)
r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
if not scaleup: # only scale down, do not scale up (for better val mAP)
r = min(r, 1.0)
# Compute padding
new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1] # wh padding
if auto: # minimum rectangle
dw, dh = np.mod(dw, stride), np.mod(dh, stride) # wh padding
dw /= 2 # divide padding into 2 sides
dh /= 2
if shape[::-1] != new_unpad: # resize
im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)
top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color) # add border
return im, r, (dw, dh)
## Pre-processing
colors = {name:[random.randint(0, 255) for _ in range(3)] for i,name in enumerate(names)}
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
image = img.copy()
image, ratio, dwdh = letterbox(image, auto=False)
image = image.transpose((2, 0, 1))
image = np.expand_dims(image, 0)
image = np.ascontiguousarray(image)
im = image.astype(np.float32)
im /= 255
im.shape
outname = [i.name for i in session.get_outputs()]
inname = [i.name for i in session.get_inputs()]
inp = {inname[0]:im}
# ONNX inference
outputs = session.run(outname, inp)[0]
outputs
To verify that the ONNX model performs equivalently to its PyTorch counterpart, we rendered the outputs from both inference pipelines on the same input image and compared the results visually. The following post-processing steps were applied to the ONNX outputs:
ori_images = [img.copy()]
for i,(batch_id,x0,y0,x1,y1,cls_id,score) in enumerate(outputs):
image = ori_images[int(batch_id)]
box = np.array([x0,y0,x1,y1])
box -= np.array(dwdh*2)
box /= ratio
box = box.round().astype(np.int32).tolist()
cls_id = int(cls_id)
score = round(float(score),3)
name = names[cls_id]
color = colors[name]
name += ' '+str(score)
cv2.rectangle(image,box[:2],box[2:],color,2)
cv2.putText(image,name,(box[0], box[1] - 2),cv2.FONT_HERSHEY_SIMPLEX,0.75,[225, 255, 255],thickness=2)
imgpytorch = Image.open('/content/yolov7/yolov7/runs/detect/exp3/bus.jpg')
# /content/yolov7/yolov7/runs/detect/exp3/bus.jpg
imgonnx = Image.fromarray(ori_images[0])
plt.figure(figsize=(14,10))
plt.subplot(121)
plt.imshow(imgpytorch)
plt.subplot(122)
plt.imshow(imgonnx)
plt.show()
Both results show consistent object detection performance, identifying the same instances (people and a bus) with similar bounding boxes and confidence scores. This visual parity confirms that the ONNX model maintains the predictive integrity of the original PyTorch version, validating the success of the model conversion process.
To ensure a maintainable and production-ready application, the project was structured following best practices for Python web development and model deployment. This step involved creating a clear directory layout, setting up a virtual environment, and specifying the project dependencies in a requirements.txt file.
Below is an overview of the main directory structure used in the project:
AWS-Project/
│
├── app/ # Core application logic
│ ├── application.py # Initializes and configures the Flask app
│ ├── yolocounterv1.py # Handles object counting logic
│ ├── yolomodel.py # Loads and runs ONNX model inference
│ ├── static/ # CSS and JS
│ └── templates/ # HTML template (Flask view)
│
├── tests/ # Automated tests
│ ├── conftest.py
│ ├── pytest.ini
│ ├── run_tests.py
│ ├── test_app.py # Unit tests for Flask endpoints
│ ├── test_integration.py # Integration tests
│ └── test_ui.py # Selenium UI tests
│
├── .github/ # GitHub Actions workflows for CI
├── .ebextensions/ # Elastic Beanstalk config files
├── .platform/ # Platform-specific settings
├── .env # Environment variables
├── .gitignore # Git exclusions
├── venv/ # Python virtual environment
├── app.py # Entry point to run the application locally
├── yolov7_training.onnx # Converted ONNX model
└── requirements.txt # List of dependencies
The app.py file in the root directory serves as the main entry point for local development and testing. It simply imports and runs the Flask app defined in app/application.py:
from app.application import application
if __name__ == '__main__':
application.run(debug=True)
To manage dependencies in isolation and avoid conflicts, a Python virtual environment (venv/) was created. This ensures reproducibility across environments and avoids polluting the system Python installation.
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
All required packages were listed in the requirements.txt file to make the environment easily reproducible, especially for cloud deployment:
onnxruntime==1.16.2
opencv_python==4.8.1.78
Pillow==10.0.1
requests==2.31.0
flask==3.0.0
numpy==1.26.2
python-dotenv
selenium
pytest
pytest-flask
webdriver-manager
pytest-cov
psutil
pillow
Dependencies include:
To run the application locally, start by activating your virtual environment and executing the command:
python application.py
. Once running, you can open your browser and navigate to the local host URL (usually http://127.0.0.1:5000
) shown in the console.
The web interface will appear, allowing you to interact with the deployed model.
Before deploying the application to production, it is essential to verify that all components behave correctly in a controlled environment. This project uses three primary tools for testing:
Insomnia was used to manually verify the behavior of the /detect route during development. By sending a POST request with an image file, the API returns two structured outputs:
Here the image is sent to the API, and the response is displayed in JSON format:
Sample response from the API:
{
"countings": {
"backpack": 2,
"bus": 1,
"car": 11,
"motorcycle": 1,
"person": 4,
"truck": 2
},
"detections": [
[[379, 248, 959, 632], 2, "0.9502241", "car"],
[[133, 176, 293, 520], 0, "0.9456609", "person"],
[[0, 290, 255, 631], 2, "0.9187772", "car"],
...
[[271, 107, 454, 276], 5, "0.2698659", "bus"]
]
}
Use Cases Covered:
Pytest is a widely-used Python testing framework. It was used in this project to validate:
============================= test session starts =============================
platform win32 -- Python 3.11.4, pytest-8.4.1
rootdir: /tests
collected 26 items
tests/test_app.py ..................... [ 80%]
tests/test_integration.py ... [ 92%]
tests/test_ui.py .. [100%]
============================= 26 passed in 46.70s =============================
✅ All tests passed successfully.
Example test cases passed:
GitHub Actions plays a crucial role in automating the testing and validation process of our project. By integrating CI (Continuous Integration), we ensure that every new commit or pull request to the main branch is automatically verified through linting and testing. This helps catch errors early, enforce code quality, and maintain consistent behavior across environments.
GitHub Actions uses workflow files written in .yml format, typically located in the .github/workflows/ directory of the repository. In our project, we use a file named python-app.yml, which triggers the workflow on:
Here's what the CI pipeline step by step:
name: Python application
on:
push:
branches: [ "main" ]
pull_request:
branches: [ "main" ]
permissions:
contents: read
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python 3.11
uses: actions/setup-python@v3
with:
python-version: "3.11"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install flake8 pytest
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
- name: Lint with flake8
run: |
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
- name: Set environment variables
run: |
echo "S3_BUCKET_URL=https://modelv0.s3.us-east-1.amazonaws.com/" >> $GITHUB_ENV
- name: Test with pytest
run: |
pytest
The application was deployed using AWS Elastic Beanstalk, a service that simplifies the process of deploying and scaling web applications. Since this process involves several specific configuration steps —such as setting up the environment, assigning service roles, and selecting the appropriate instance type— a dedicated presentation was created to explain each step visually and clearly.
In general terms, the deployment consists of compressing the application into a .zip file that includes the key files:
application.py
requirements.txt
app/
directory.ebextensions/
directory.platform/
directoryThe following video shows a demonstration of the application running successfully on the AWS cloud.
You can explore the complete source code on GitHub .