BakeryCashier

Bread Auto-Calculator

Date: 2022/06/20

Author: Song Yeong Won/ Song Tae Woong

Github: repository link

Demo Video: Youtube link

Introduction

In this project, we want to implement an automatic bread calculator for quick and quick work processing. This program aims to increase work efficiency by quickly and accurately detecting and calculating bread in bakery stores. In addition, it is intended to reduce the inconvenience of having to take direct barcodes and increase the convenience of users. A total of five types of bread were used as objects in this program, and Apple in the tree at Handong University was used.

Since the YOLOv5 model was directly trained and used through Custom-data, the DarkLabel 2.4 program was used to generate additional training data. We used YOLOv5 open source and Python via Anaconda virtual environment in Visual Studio Code.

1. Requirement

Hardware

  • NVDIA GeForce RTX 3080

  • HD Pro Webcam C920

Environment constraint

  • Camera angle : 37.5 degree

  • Camera height : 47[cm] from Tray

    img

    Figure 1. Experiment Environment

Software Installation

software specification as follow :

  • CUDA 11.6

  • cudatoolkit 11.3.1

  • Python 3.9.12

  • Pytorch 1.10

  • YOLOv5l model

Anaconda settings

before starting, check if the GPU driver for the cuda version is installed.

img

Figure 2. Check CUDA Version

check your cuda version and donwload nvidia driver click here

YOLOv5 Installation

Go to YOLOv5 github (https://github.com/ultralytics/yolov5) and download Repository as below. After entering the /yolov5-master folder, copy the path address. Then executing Anaconda prompt in administrator mode, execute the code below sequentially.

Labeling

  • DarkLabel2.4

The DarkLabel 2.4 program was used to generate custom data. Using this program, bounding box labeling was performed directly for each frame of the video to create an image and label dataset. Compared to other labeling programs, labeling work through images is possible, so it is possible to generate a lot of training data quickly.

Go to DarkLabel 2.4 and download the DarkLabel 2.4 program below. if it is not available, please download here

img

Figure 3. DarkLabel2.4

After executing DarkLabel.exe, labeling is performed using the desired image or image.

img

Figure 4. DarkLabel2.4 Tool

  1. Image file path for using image files

  2. Using Darknet yolo labeling method

  3. To using your customized labels for model training, the number of data and class name of coco dataset must be changed.

change the Number 0 - 5 COCO dataset classes : [‘person’, ‘bicycle’, ‘car’, ‘motorcycle’, ‘airplane’, ‘bus’] -> 6 COCO based custom dataset classes : [‘Apple Tart’, ‘Croissant’, ‘Chocolate’, ‘Bagel’, ‘White Donut’, ‘Pretzel’]

  1. save labels

  2. save images

for example using darklabel2.4 program :

img

Figure 5. Example of using DarkLabel2.4

If you keep pressing the space, you can quickly label the image because you keep tracking the image with the bounding box over every frame and draw the bounding box. However, if an object is moved or obscured, it will not be accurate tracking, so such frames should be re-run after removing the image labeling.

2. Training Procedure

Pretrained Checkpoints

The model should be selected in consideration of the accuracy and processing speed suitable for the purpose. This project used the YOLOv5l model. The model should be appropriately selected according to GPU Driver performance. It is also important to select the batch size that GPU cuda memory can allocate. Batch size = 4 was applied to this model learning. If you use a better hardware GPU driver, you can use a YOLOv5l or higher model.

The results of precision and recall learned through the YOLOv5l model will be mentioned in the 4. Evaluation part.

Model

size (pixels)

mAPval 0.5:0.95

mAPval 0.5

Speed CPU b1 (ms)

Speed V100 b1 (ms)

Speed V100 b32 (ms)

params (M)

FLOPs @640 (B)

640

28.0

45.7

45

6.3

0.6

1.9

4.5

640

37.4

56.8

98

6.4

0.9

7.2

16.5

640

45.4

64.1

224

8.2

1.7

21.2

49.0

640

49.0

67.3

430

10.1

2.7

46.5

109.1

640

50.7

68.9

766

12.1

4.8

86.7

205.7

1280

36.0

54.4

153

8.1

2.1

3.2

4.6

1280

44.8

63.7

385

8.2

3.6

12.6

16.8

1280

51.3

69.3

887

11.1

6.8

35.7

50.0

1280

53.7

71.3

1784

15.8

10.5

76.8

111.4

1280 1536

55.0 55.8

72.7 72.7

3136 -

26.2 -

19.4 -

140.7 -

209.8

Table 1. Model Performance

Further more information : Click here

2.1 Customize datasets

img

img

Images data

Labels

For training using the YOLOv5 model, an image file and a labeling coordinate file are required as shown in Figure 6. We previously generated the data in Figure 6 using the Dark Label program. Looking at the labeling coordinate file, it fits the YOLOv5 model as below.

img

Figure 7. Label txt file

Total number of Image dataset : 5,546

Total number of labeling dataset : 5,546

2.2 Split Train and Validation set

Create a datasets folder at the same location as the yolov5-master folder.

Train image dataset path : datasets > bakery > images > train

Train label dataset path : datasets > bakery > labels> train

Val image dataset path : datasets > bakery > images > val

Val label dataset path : datasets > bakery > labels> val

img

Figure 8. Datasets path

2.3 create customized yaml file

create new bakery.yaml file. (path : ./data)

img

Figure 9. yaml file path

check the train and val path as follow.

2.4 Model Training

When you start training, you must select img size, batch size, epochs, and model. Make sure that the bakery.yaml path is correct based on the current path running the above code. In addition, the model of yolov5 must be selected, and it can be selected from four types: s,m,l,x, and yolov5l model was used for this training. Finally, it is also important to determine the batch size. The batch size must be selected according to GPU or CPU performance, and a "cuda out of memory" error will occur if the batch size is set too large. Training is possible while gradually reducing the batch size. If the epoch is set very large, there is a risk of overfitting, and if the epoch is set low, it may become underfitting. Trial and error is required for optimal model training.

The Figure 10 below is an output window when only epoch 1 is executed. Train results and weight.pt files can be found in runs/train/exp (number).

img

Figure 10. Model Training

2.5 Using Trained Weight.pt

When you proceed with model training, there are best.pt and last.pt in the file. The best.pt file is a model weight file that has the optimal training parameter weight. last.pt is the final model weight file when all training is done. If we set a lot of epochs, we used the most optimal best.pt because it could be overfitting at the end of training.

It can be seen that the weights file is generated in the runs/train/exp(number) path.

img

img

Figure 11. Trained weight file

We changed the best.pt file name to bakery.pt.

bakery.py file path : /yolov5-master

You can test through the weight.pt file trained through the code below.

img

Figure 12. Test

3. Algorithm

The algorithm has three main sections. Whole process of program algorithm as follows.

  1. pre-processing

    • Rounding Tray

  2. post-processing

    • Image Capture

    • Filtering Out of tray

    • Auto-calculation

  3. Application

    • KAKAOPAY QR Code

    • Image Concatenation

  • Flowchart

img

Figure 13. Flowchart

3.1 Pre-processing

Rounding Tray

The ROI area required for object detection should be set. First, find the four vertices of the rounding tray and draw a square to determine if an object exists in the square. For this algorithm, the openvInrange function and the HoughlineP function were used. Since the original image used is a BGR scale, the surrounding tray edge is extracted by first converting it to HSV and then adjusting Inrange to Hue, Saturation, and Value. Figure 14 (a) is an original frame image, and (b) is a result of converting to a binary image after Inrange processing. If firstFrame = 1 is the exact line of the tray, if not, repeat until firstFrame = 0 and extract the correct line. This is the tray detection loop represented in Figure 13. flowchart.

  • Rounding Tray using HoughlineP

img

img

Source Image (a)

After Inrange (b)

Figure 14. HoughLinesP

Finally, the HoughLinesP is adjusted to extract the line in the Inrange. Several lines are detected through HoughLineP, and we found and used the maximum, minimum x, and y values of all extracted straight lines because only the edge of the tray should be represented by one box. Therefore, the results are as follows.

img

Figure 15. Tray detection

3.2 post-processing

Image Capture

The first process of post-processing is image capture. When the 'c' key is pressed, the current frame is captured and object detection is performed on the frame. Since the frame image at the moment of capture is continuously stored, pressing the 'c' key continuously uses the stored image. When 'r' is pressed, the captured frame is initialized back into the current frame. Figure 16 is the result of object detection when the 'c' key is input.

img

Figure 16. Image Capture

Filtering Out of tray

We performed the out of tray filtering process for accurate calculation. We performed filtering using the center coordinates of the object detection rounding box. If the central coordinate of the bounding box was within the tray edge area, it was determined as inside, and if it existed outside, it was determined as outside. In Figure 17, objects existing outside the rounding tray are represented by a red binding box. Rounding Tray is a post-processing process that assumes a cash register where a real customer puts things up, and does not calculate objects that exist outside the cash register.

img

Figure 17. Filtering result

Auto-calculation

If you have distinguished the bread in the Rounding Tray after the Out of Tray filtering process, calculate the total price for the bread only. The class name existing for each frame may be returned as int(cls) in integer. Therefore, we sum the prices for all objects corresponding to the class number specified in advance.

If you have learned by adding more kinds of bread, you can add the class number and price to the list below.

img

Figure18. Class list

3.3 Application

We added display elements in consideration of actual commercial applications. Total price according to the total price was output, and Kakao Pay QR code was output so that actual consumers could pay. Figure 19 is the final result of combining three images: webcam image, qr code, and price. Looking at Figure 19, the total price was set at 7,800 won. This is a price measurement for only three objects in the Rounding Tray, and it can be seen that bread outside the Rounding Tray in red is not included in the total price price. Also, for bread with Inside Rounding Tray, the price is marked for each object.

img

Figure 19. Application Display

3.4 Customized detect.py

This is the final customized detect.py file.

4. Evaluation

  • In this project, we learned customized datasets through the YOLOv5l model, and the training performance was very good. As can be seen from the reproduction rate and precision graph shown below, model training shows very high values. The training evaluation was performed through the test video image, and (valid results) came out somehow. As the most important goal of this project was to recognize bread in real time and enable calculation, we achieved as much as expected in terms of speed and accuracy. Therefore, it was possible to implement a fast and accurate model through a low-cost webcam and GPU.

    Figure 20 is the result of val execution. Both Precision and Recall showed a high performance of 99%. Since training was performed using images, there is a possibility that more than a few frames have the same image. Therefore, since similar images may be validated, the precision and reproduction rate were higher than expected.

    img

    Figure 20. Validation result

    Figure 22 is a graph showing F1-Score after validation. F1-Score is a value representing the harmonic mean of precision and reproducibility. Precision is the ratio of what the model classifies as true that is true. It is the ratio of predicting that the model is true among the actual true reproduction rates. Accurate classification is possible by increasing precision. However, the higher the precision, the lower the reproduction rate. Therefore, precision and reproducibility are in a trade-off relationship. Since we have to accurately classify and accurately predict actual bread, the harmonic mean, which can reasonably consider precision and reproducibility, was used as an evaluation criterion for basic model performance.

    img

    Figure 21. Model Evaluation

    Looking at Figure 22, the confidence of all classes is maintained above 0.8. Therefore, it can be confirmed that the actual bread type was accurately classified.

img

Figure 22. F1-score of Model

  • However, there are some issues that need to be fixed. First of all, in this project, we learned bread without wrapping paper. If it is packaged, it is expected that there will be difficulties in training because it is difficult to distinguish the reflection of light or the exact model of bread. If you want to learn and classify bread with wrappers, it is considered important to learn cropping or rotation, and to different images from various angles in addition to the original image. In addition, there are various models for one bread type, but in this project, we learned about one bread type using only one bread. In order to apply the results of the project in the actual store, a post-processing process that can distinguish bread from similar models of the same kind as more data is needed.

  • In this project, users can pay through QR code images. In order to implement it as an actual payment system, it is necessary to introduce an additional computer system.

​

5. Run Test Video

Finally, if you have completed training the data, data preprocessing, and post-processing, you can check it through real-time images by executing the following code.

Reference

YOLOv5 : [Click here](ultralytics/yolov5: YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite (github.com))

YOLOv5 installation : Click here

Appendix

Last updated

Was this helpful?