To download the source video, go the the following URL:
https://github.com/SungJooo/DLIP_FINAL
There you will find a file called source.MOV. Download it to your desired folder
Now with the software installations complete, open the folder you downloaded the source.MOV file to in VScode, then create a new .py file and paste the "code to test on computer" code, which you can find in the appendix of this paper.
Near the top of the code (lines 8~13) you will see this part. If you wish to use the webcam for your test, use the cap=cv2.VideoCapture(0) line and comment out the above line, and if you wish to use the source video, use the cap=cv2.VideoCapture('source.MOV') line and comment out the below line.
Then, run the code on VScode.
Software Explanation
The algorithm works in three steps:
Object detection via YOLOv5 pretrained model
Object tracking using the detection data
People counting using the tracking data
The following are some important functions in the tracking or people counting algorithm
matchboxes()
Uses the coordinates of all bounding boxes in previous frame and current frame to match the boxes with each other and track the objects
defmatchboxes(coordlist,prev_coordlist,width):
Parameters
coordlist: list of bounding box coordinates in current frame
coordlist: list of bounding box coordinates in previous frame
width: width of frame
Example Code
# list of boxes that have corresponding boxes in previous framei_list=matchboxes(coordlist, prev_coordlist,width)
checkbot_box()
Checks if the inputted box coordinates are near the bottom of the frame
If you want to start Raspberry Pi, you need to install the Raspberry Pi OS. First, insert a micro SD card with a reader into your laptop. Then, download the OS installer in the link. You can get an exe. file named "imager_1.7.2.exe". When you run the file, you must pick 64-bit OS for activating YOLOv5.
1.1. Remote controlling Raspberry Pi
For remote controlling the Raspberry Pi in laptop, you have to do add "ssh" file which doesn't have any extension name, and "wpa_supplicant.conf" file at the root folder of Raspberry Pi.
The file named "wpa_supplicant.conf" need to include the following:
The name "JooDesk" and the password "wwjd0000" is the laptop hotspot's ID and password, to make the IP address unchanged. The bandwidth must be selected as 2.4GHz.
When you've done the process above, insert mirco SD card to Raspberry Pi and boot it. After you boot the Raspberry Pi, make your hostname and password.
If the Raspberry Pi is executed successfully, you have to make the IP address of Raspberry Pi as a static IP address. To do this, you need to follow the instructions:
$ hostname -I // chech the IP address of Raspberry Pi
$ sudo apt-get install vim // install vim
$ sudo vi /etc/dhcpcd.conf // 'i' to insert, 'esc' to out
And then, insert the following command at the bottom of it.
There should be no "#" marks. After you enter the command, press "esc" to quit, and type ":wq!" to quit the vim. After this step, your Raspberry Pi will get the static IP address, which is "192.168.137.110". Then reboot your Raspberry Pi with the command "$ reboot".
After reboot the Raspberry Pi, follow the command for the next step.
Tightvncserver is a program to synchronize the Raspberry Pi screen on a laptop. Specific guidelines are follows.
1.2. PuTTY
PuTTY is a program for connecting to Raspberry Pi as a SSH mode. You can download via the link.
The static IP address is "192.168.137.110", and use the port number as 22.
1.3. TightVNC
When connecting to Raspberry Pi using PuTTY, it is connected only in terminal mode. To connect and use the Raspberry Pi as a GUI environment, TightVNC program would help. The installation link is here. If you download the TightVNC, you have to set the password.
By the above step, you have installed TightVNC at the Raspberry Pi. To activate the TightVNC in the Raspberry Pi, command the following line.
$ tightvncserver
$ sudo netstat –tulpn // check the state of Raspberry Pi
TightVNC uses the 5901 port of Raspberry Pi. By the command "sudo netstat –tulpn", you can check the state 0.0.0.0.0:5901 is in the "listen" state. If it is, it is ready to sync Raspberry Pi into your laptop. "$ vncpasswd" is a command to edit the password of TightVNC.
2. YOLOv5 in Raspberry Pi
First, you have to clone the YOLOv5 repository into the Raspberry Pi. To do this, you need to enter the following command line.
$ git clone https://github.com/ultralytics/yolov5
After this, the "yolov5" folder would be formed at the root folder of Raspberry Pi. Follow the instruction to make the environment for yolov5.
After the commands, you can find the following image at the "runs/detect/exp" in Raspberry Pi.
For connecting external camera input device(such as Logitech Webcam, Picam ...), you can test the module that can detect the object by the input source. The test code is as follows. "source 0" means the external device you have connected to the Raspberry Pi. For the extra device, "source 1" and goes on.
This is the pinmap of Raspberry Pi 4 GPIO. We used GPIO21(Pin 40) as a voltage source to the light. To activate this on python, the code is as follows.
Then you could find the 3 python code which we made.
"DLIP_Final_00_test.py" is the file that model yolov5n is working well on Raspberry Pi. This file finds only the "person" class.
"DLIP_Final_01_fps.py" is the file that measures your FPS with the model yolov5n. As model is still heavy to covered in Raspberry Pi, the FPS would be about 2~2.5, and in remote condition, the FPS gets even lower when the WiFi network is bad.
"DLIP_Final_01_fps.py" is the file that turns the lights on and off depending on whether a person enters or leaves.
To launch the code, write down the following code at the DLIP_FINAL folder.
$ python DLIP_Final_10_LAST.py
Results and Analysis
The system was successfully able to:
film the doorway entrance
use the video footage to detect, track, and count the people in the frame
all within a raspberry pi module, without connection to an external computer
Some issues were that the frame rate (around 2.5fps) and accuracy (around 60 percent) of the detection model (YOLOv5 nano) weren't superb on a raspberry pi. YOLOv5 nano was deemed the adequate model, for a lighter model would result in a faster frame rate but a lower accuracy, while a heavier model would've had a higher accuracy, but a higher frame rate.
Below is a table of the frame rate depending on the device.
A possible solution to this problem to be to use a tensor-based object detection model instead of YOLOv5, which would have increased fps without sacrifice of accuracy. This is because a TPU was used for this project, which is optimized to accelerate computing speed of tensor-based models.
Appendix
Code to test on computer
import torchimport cv2import randomfrom PIL import Imageimport numpy as npimport mathimport time# Load the Modelmodel = torch.hub.load('ultralytics/yolov5', 'yolov5n', pretrained=True)model.classes=[0]# cap = cv2.VideoCapture('source.MOV')cap = cv2.VideoCapture(0)width =int(cap.get(3)); height =int(cap.get(4)); frameno=0num_people=0fpsStart =0fps =0# returns coordinates of box as listdefbox_coords(box): xmin=int(box[0]) ymin=int(box[1]) xmax=int(box[2]) ymax=int(box[3])return [xmin, ymin, xmax, ymax]# checks if box touches the bottom of framedefcheckbot_box(coords,height): ymax=coords[3]if ymax>height-(height/54):return1else:return0# returns center coordinates of boxdefbox_cent(coords): cent_x=int((coords[0]+coords[2])/2) cent_y=int((coords[1]+coords[3])/2)return [cent_x,cent_y]# gets intersecting area of two boxesdefinters_area(coord1,coord2): xmin1=coord1[0] ymin1=coord1[1] xmax1=coord1[2] ymax1=coord1[3] xmin2=coord2[0] ymin2=coord2[1] xmax2=coord2[2] ymax2=coord2[3] dx=min(xmax1,xmax2)-max(xmin1,xmin2) dy=min(ymax1,ymax2)-max(ymin1,ymin2)if (dx>0) and (dy>0):return dx*dyelse:return0# returns list of coordinates of boxes in current frame that are new (no corresponding box in previous frame)defnewbox(coordlist,i_list): new_list=[]for k in coordlist:if k notin [i[0]for i in i_list]: new_list+=[k]return new_list# returns list of coordinates of boxes in previous frame that have disappeared (no corresponding box in current frame)defdispbox(prev_coordlist,i_list): disp_list=[]for k in prev_coordlist:if k notin [i[1]for i in i_list]: disp_list+=[k]return disp_list# finds which box in previous slide is the one in current frame (highest intersecting area)defmatchboxes(coordlist,prev_coordlist,width): i_list=[]for coord in coordlist: area=0 add_ilist=[]for prev_coord in prev_coordlist:ifinters_area(coord,prev_coord)>area and (math.dist(box_cent(coord),box_cent(prev_coord))<(width/20)): area=inters_area(coord,prev_coord) add_ilist=[[coord, prev_coord]]if coord notin [i[0]for i in i_list] and prev_coord notin [j[1]for j in i_list]: i_list+=add_ilistreturn i_list# COUNT_PEOPLE_FRAMEOUT(prev_results, results, frame, rect_frame, num_people)defCOUNT_PEOPLE_FRAMEOUT(dataPre,dataCur,frame,frameCopy,num_people):# create lists of all box coordinates in previous and current frame prev_coordlist=[]for j inrange(len(dataPre.xyxy[0])): prev_coords=box_coords(dataPre.xyxy[0][j]) prev_coordlist+=[prev_coords] coordlist=[]for k inrange(len(dataCur.xyxy[0])): coords=box_coords(dataCur.xyxy[0][k]) coordlist+=[coords]for c in coordlist: cv2.rectangle(frameCopy,(c[0],c[1]),(c[2],c[3]),(255,0,0),thickness=-1)# list of boxes that have corresponding boxes in previous frame i_list=matchboxes(coordlist, prev_coordlist, width)# get list of boxes that are new in the frame new_list=newbox(coordlist,i_list)# get list of boxes that have disappeared disp_list=dispbox(prev_coordlist,i_list)# adjust number of people and draw rectanglesfor new_coords in new_list:ifcheckbot_box(new_coords,height)==1: num_people-=1 cv2.rectangle(frameCopy,(new_coords[0],new_coords[1]),(new_coords[2],new_coords[3]),(0,0,255),thickness=-1)for disp_coords in disp_list:ifcheckbot_box(disp_coords,height)==1: num_people+=1 cv2.rectangle(frameCopy,(disp_coords[0],disp_coords[1]),(disp_coords[2],disp_coords[3]),(0,255,0),thickness=-1)# add the rectangles to the frame frame=cv2.addWeighted(frameCopy,0.3,frame,0.7,1.0)return frame, num_people# import RPi.GPIO as GPIO# GPIO.setmode(GPIO.BCM)# pin_num = 21# GPIO.setup(pin_num, GPIO.OUT, initial=GPIO.LOW)defGPIO_LIGHT(numPeople,frame):# if numPeople > 0: GPIO.output(pin_num, GPIO.HIGH)# else: GPIO.output(pin_num, GPIO.LOW)if numPeople >0: cv2.circle(frame, (int(width*0.9), int(height*0.9)), radius=30, color=(255,255,255), thickness=cv2.FILLED)else: cv2.circle(frame, (int(width*0.9), int(height*0.9)), radius=30, color=(0,0,0), thickness=cv2.FILLED)while(1): frameno+=1 _, frame = cap.read()# create frames for color filling in rect_frame=frame.copy() results =model(frame)if frameno==1: prev_results=results frame, num_people =COUNT_PEOPLE_FRAMEOUT(prev_results, results, frame, rect_frame, num_people)# send rasp GPIO command GPIO_LIGHT(num_people, frame) fpsEnd = time.time() timeDiff = fpsEnd - fpsStart fps =1/timeDiff fpsStart = fpsEnd fpsText ="FPS: {:2.2f}".format(fps) cv2.putText(frame, fpsText, (30, 40), cv2.FONT_HERSHEY_COMPLEX, 1, (0, 255, 255), 2) num_peopletxt="Number of people: "+str(num_people) cv2.putText(frame, num_peopletxt, (int(width/40), height-int(width/40)), cv2.FONT_HERSHEY_SIMPLEX, round(width/1000), (0, 0, 255), round(width/1000), cv2.LINE_AA) cv2.namedWindow("result", cv2.WINDOW_NORMAL) cv2.imshow("result", frame) prev_results=results k = cv2.waitKey(5)&0xFFif k ==27:# GPIO.output(pin_num, GPIO.LOW)# GPIO.cleanup()breakif k ==114or k ==82: num_people =0
Code to test on Raspberry Pi
import torchimport cv2import randomfrom PIL import Imageimport numpy as npimport mathimport time# Load the Modelmodel = torch.hub.load('ultralytics/yolov5', 'yolov5n', pretrained=True)model.classes=[0]cap = cv2.VideoCapture(0)width =int(cap.get(3)); height =int(cap.get(4)); frameno=0num_people=0fpsStart =0fps =0# returns coordinates of box as listdefbox_coords(box): xmin=int(box[0]) ymin=int(box[1]) xmax=int(box[2]) ymax=int(box[3])return [xmin, ymin, xmax, ymax]# checks if box touches the bottom of framedefcheckbot_box(coords,height): ymax=coords[3]if ymax>height-(height/54):return1else:return0# returns center coordinates of boxdefbox_cent(coords): cent_x=int((coords[0]+coords[2])/2) cent_y=int((coords[1]+coords[3])/2)return [cent_x,cent_y]# gets intersecting area of two boxesdefinters_area(coord1,coord2): xmin1=coord1[0] ymin1=coord1[1] xmax1=coord1[2] ymax1=coord1[3] xmin2=coord2[0] ymin2=coord2[1] xmax2=coord2[2] ymax2=coord2[3] dx=min(xmax1,xmax2)-max(xmin1,xmin2) dy=min(ymax1,ymax2)-max(ymin1,ymin2)if (dx>0) and (dy>0):return dx*dyelse:return0# returns list of coordinates of boxes in current frame that are new (no corresponding box in previous frame)defnewbox(coordlist,i_list): new_list=[]for k in coordlist:if k notin [i[0]for i in i_list]: new_list+=[k]return new_list# returns list of coordinates of boxes in previous frame that have disappeared (no corresponding box in current frame)defdispbox(prev_coordlist,i_list): disp_list=[]for k in prev_coordlist:if k notin [i[1]for i in i_list]: disp_list+=[k]return disp_list# finds which box in previous slide is the one in current frame (highest intersecting area)defmatchboxes(coordlist,prev_coordlist,width): i_list=[]for coord in coordlist: area=0 add_ilist=[]for prev_coord in prev_coordlist:ifinters_area(coord,prev_coord)>area and (math.dist(box_cent(coord),box_cent(prev_coord))<(4*width/20)): area=inters_area(coord,prev_coord) add_ilist=[[coord, prev_coord]]if coord notin [i[0]for i in i_list] and prev_coord notin [j[1]for j in i_list]: i_list+=add_ilistreturn i_list# COUNT_PEOPLE_FRAMEOUT(prev_results, results, frame, rect_frame, num_people)defCOUNT_PEOPLE_FRAMEOUT(dataPre,dataCur,frame,frameCopy,num_people):# create lists of all box coordinates in previous and current frame prev_coordlist=[]for j inrange(len(dataPre.xyxy[0])): prev_coords=box_coords(dataPre.xyxy[0][j]) prev_coordlist+=[prev_coords] coordlist=[]for k inrange(len(dataCur.xyxy[0])): coords=box_coords(dataCur.xyxy[0][k]) coordlist+=[coords]for c in coordlist: cv2.rectangle(frameCopy,(c[0],c[1]),(c[2],c[3]),(255,0,0),thickness=-1)# list of boxes that have corresponding boxes in previous frame i_list=matchboxes(coordlist, prev_coordlist, width)# get list of boxes that are new in the frame new_list=newbox(coordlist,i_list)# get list of boxes that have disappeared disp_list=dispbox(prev_coordlist,i_list)# adjust number of people and draw rectanglesfor new_coords in new_list:ifcheckbot_box(new_coords,height)==1: num_people-=1 cv2.rectangle(frameCopy,(new_coords[0],new_coords[1]),(new_coords[2],new_coords[3]),(0,0,255),thickness=-1)for disp_coords in disp_list:ifcheckbot_box(disp_coords,height)==1: num_people+=1 cv2.rectangle(frameCopy,(disp_coords[0],disp_coords[1]),(disp_coords[2],disp_coords[3]),(0,255,0),thickness=-1)# add the rectangles to the frame frame=cv2.addWeighted(frameCopy,0.3,frame,0.7,1.0)return frame, num_peopleimport RPi.GPIO as GPIOGPIO.setmode(GPIO.BCM)pin_num =21GPIO.setup(pin_num, GPIO.OUT, initial=GPIO.LOW)defGPIO_LIGHT(numPeople,frame):if numPeople >0: GPIO.output(pin_num, GPIO.HIGH)else: GPIO.output(pin_num, GPIO.LOW) cv2.circle(frame, (int(width*0.9), int(height*0.9)), radius=31, color=(0,0,0), thickness=cv2.FILLED)if numPeople >0: cv2.putText(frame, 'ON' ,(int(width*0.865), int(height*0.92)), cv2.FONT_HERSHEY_COMPLEX, 1, (0, 255, 255), 2)resultFINAL = cv2.VideoWriter('demovideo.avi', cv2.VideoWriter_fourcc(*'XVID'), cap.get(cv2.CAP_PROP_FPS), (width, height))# 3 is FPS / cap.get(cv.CAP_PROP_FPS)while(1): frameno+=1 _, frame = cap.read()# create frames for color filling in rect_frame=frame.copy() results =model(frame)if frameno==1: prev_results=results frame, num_people =COUNT_PEOPLE_FRAMEOUT(prev_results, results, frame, rect_frame, num_people)# send rasp GPIO command GPIO_LIGHT(num_people, frame) fpsEnd = time.time() timeDiff = fpsEnd - fpsStart fps =1/timeDiff fpsStart = fpsEnd fpsText ="FPS: {:2.2f}".format(fps) cv2.putText(frame, fpsText, (int(width/40), int(height/15)), cv2.FONT_HERSHEY_COMPLEX, 1, (0, 255, 255), 2) num_peopletxt="Number of people entered: "+str(num_people)if num_people>0: cv2.putText(frame, num_peopletxt, (int(width/40), height-int(width/40)), cv2.FONT_HERSHEY_COMPLEX, 0.8, (255, 255, 255), 2)else: cv2.putText(frame, num_peopletxt, (int(width/40), height-int(width/40)), cv2.FONT_HERSHEY_COMPLEX, 0.8, (255, 255, 0), 2) cv2.namedWindow("result", cv2.WINDOW_NORMAL) cv2.imshow("result", frame) resultFINAL.write(frame) prev_results=results k = cv2.waitKey(5)&0xFFif k ==27: GPIO.output(pin_num, GPIO.LOW) GPIO.cleanup()breakif k ==114or k ==82: num_people =0cap.release()resultFINAL.release()cv2.destroyAllWindows()