Gerardo Sánchez | AI Consultant

Intelligent Retail Checkout With an Android App

Ever wanted an easier way to check out at the store? Read this article in order to look at an intelligent self-checkout system embedded in a mobile app.

In many retail markets and supermarkets, it is common to see different self-checkout systems to register the purchase or consult the prices for the final consumer. Many of these systems are based on barcodes, RFID tags, or QR code. This article proposes an Intelligent Self-Checkout System embedded in a mobile app not only to detect multiple products without any labels but also to display a list including each one of the prices and the total cost of the purchase.

Architecture Description

In the next figure, we provide a high-level functional description overview.

The workflow is:

  1. The user takes a picture of his/her shopping cart.
  2. Object detection with a neural network (YOLO V.3) to detect bottles, boxes, etc.
  3. The images of each object are extracted from the photo.
  4. Transfer learning is used from SqueezeNet to ConvNet to classify the product in each sub-image and to build an ID products list.
  5. Finally, the cost of the purchase is retrieved through a product prices list and all of the information in the Android mobile app referring the detected products is deployed.

A Google Cloud instance is used as a server to upload the images, to run all the Artificial Neural Network models, and to send a product prices list to the Android mobile APP.

Image Detection Through YOLO V3

You Only Look Once (YOLO) is an artificial neural network used for object detection. It is trained with the ImageNet 1000 class classification dataset in 160 epochs.

In the training part, YOLO is used to detect objects in the images of our dataset and to generate a new dataset, only with the objects’ images. The dataset result is used to train the classification system. The process is shown in the next piece of code.

Transfer Learning for Classification of Objects

Transfer learning is a deep learning technique, which allows you to use pre-trained ConvNet models either as an initialization or as a fixed feature extractor for the task of interest. These models are trained with a very large dataset (e.g. ImageNet, which contains 1.2 million images with 1000 categories).

We use SqueezeNet as a fixed feature extractor. SqueezeNet is a pre-trained ConvNet model used to detect the predominant object in an image. In the next image, the neural network architecture is shown.

Layers “conv1” to “fire9” were taken. The following code shows this process.

A Convolutional Neural Network (ConvNet) is added to perform the classification. The ConvNet architecture is shown in the next code.

In the training part, the dataset generated with YOLO is loaded and the SqueezeNet model is evaluated to obtain the corresponding output for each image.

Then, the SqueezeNet output is used along with the labels (Mayonesa (Mayonnaise), Coca-Cola, Catsup, Activia, None) to train the ConvNet and to fill XD and YD respectively. The following images show the confusion matrix.

Backend on Python

The backend code is separated into two Python scripts ( and has all the predictions from each model (YOLO, SqueezeNet, and ConvNet) and is the service to construct the purchase ticket.

Mobile App

The mobile app is developed in Cordova. Cordova camera plugin is used to take a picture from the smartphone and the Cordova file transfer plugin to send the image to the server. To deploy the price list sent by the server, JQuery Mobile is used. The next image shows the mobile APP views.

Exponential intelligence for exponential companies

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store