In this LAB, we start a project that tells us how to correct our posture in real time when we exercise at the gym. This program is basically limited to the "lat-pull-down" movement. These days, as interest in health increases due to COVID-19, interest in health increases, and people who exercise alone also increase. However, if we exercise alone, it is difficult to recognize whether you are exercising in an accurate posture, and as a result, a problem that is prone to muscle imbalance is found. To solve this problem, we try to create system that identifies each joint of a person and measures the balance according to both slopes of the upper body joint to give feedback on the balance between the two forces. The tutorial is run by visual studio code(VS code), loading web cam or video source, and processing images in real time using OpenCV.
Demo
II. Requirement
Hardware
Logitech C922 pro Webcam
Lat Pull Down Machine
Software
Python 3.9.12
Tensorflow 2.9.1
numpy 1.21.5
OpenCV 4.5.5
MoveNet
III. Flow Chart
IV. Procedure
1. Setup
First, installation is carried out using Anaconda Prompt to build the environment. It is important to install something suitable for each version using anaconda to build it to enable image processing.
2. Installation
2-1. Install Anaconda
Anaconda : Python and libraries package installer.
Download MoveNet model: TFLite model link(Must download in local folder (include main.py))
4. library you need
import tensorflow as tfimport numpy as npimport cv2 as cvfrom cv2 import*from cv2.cv2 import*from tkinter import*
5. Global Variable
5-1. Main Variable
Definition Body Parts: The output of the deep learning model we use is location information for 17 joints. The position of each joint has an order. The order is as follows.
[ nose, left eye, right eye, left ear, right ear, left shoulder, right shoulder, left elbow, right elbow, left wrist, right wrist, left hip, right hip, left knee, right knee, left ankle, right ankle ]
Since we use the positions of the shoulders, elbows, and wrists on both sides in this application, we defined them as follows.
Definition of body edges: Each joint is tied together to draw a skeleton model.
Thresholding
CONFIDENCE_THRESHOLD: Acceptable confidence for each joint position
CORRECT_RECOGNIGION:
START_THRESHOLD:
START_COUNT_THRESH: Define the minimum number of frames for the correct posture
User input: This is a definition for setting the number of user's target exercise in the system.
Flag
system_Flag: It is a flag that controls the start of processing.
start_Flag: It is a flag that controls the start of processing for counting the number of workouts.
finish_Flag: It is a flag indicating that the exercise is over.
tk_Flag: It is a flag that determines whether or not to receive new input from the user.
up_down_Flag: It is a flag necessary to count the exercise.
For counting: Definitions for the number of good and bad exercises, stack calculations, average values, conditions for starting an exercise, and frame values.
For Balance: Definitions for the balance value, the minimum balance value, and the maximum balance value.
#===============================================## Global Variable ##===============================================## Color Definition (BGR)WHITE = (255,255,255)RED = ( 0,0,255)GREEN = ( 0,255,0)PINK = (184,53,255)YELLOW = ( 0,255,255)BLUE = (255,0,0)BLACK = ( 0,0,0)PURPLE = (255,102,102)# Font DefinitionUSER_FONT = FONT_HERSHEY_DUPLEXfontScale =1fontThickness =2# Definition Body PartsLEFT_SHOULDER =5RIGHT_SHOULDER =6LEFT_ELBOW =7RIGHT_ELBOW =8LEFT_WRIST =9RIGHT_WRIST =10# Definition of body edgesEDGES ={ (LEFT_SHOULDER,LEFT_ELBOW):'m', (LEFT_ELBOW,LEFT_WRIST):'m', (RIGHT_SHOULDER,RIGHT_ELBOW):'c', (RIGHT_ELBOW,RIGHT_WRIST):'c', (LEFT_SHOULDER,RIGHT_SHOULDER):'y',}# ThresholdingCONFIDENCE_THRESHOLD =0.2# For minimum confidence of outputCORRECT_RECOGNIGION =5# START_THRESHOLD =0.3# For value of good poseSTART_COUNT_THRESH =30# For start counting# User inputinputCount =11# For user input# Flagsystem_Flag =False# For start systemstart_Flag =False# For start counting processfinish_Flag =False# For finish workouttk_Flag =False# For new user inputup_down_Flag = [False,False,False,False] # 0: down_Flag, 1: up_Flag, 2: left_down_Flag, 3: right_down_Flag# For countingcounting_List = [0,0,0,0,0.0,0.0,0] # 0: Good count, 1: bad count, 2: left_E_ystack, 3: right_E_ystack, 4: left_E_avg, 5: right_E_avg 6: count_framestart_Count =0# For starting setting# 0:set Count, 1: startCount, 2: userSetoffset_Text =""# For Balancebalance_List = [0.0,10.0,0.0] # 0: balance, 1: good, 2: bad# Frame countframe_Num =0# for processingposition_FrameList = [0,0] # 0: worst / 1: Best# Video namingVideo ="DEMO.mp4"
6. Definition Function
6-1. Processing
Get position of 17 each joint
Since the position of the joint obtained from the model is relative to the frame, the frame is entered as an input to the function, and the keypoint that has information about the joint points and the minimum confidence are input. As output, 'shape' with the positions of all joints for the frame and 'posebuf' with position information for joints exceeding the minimum confidence can be obtained.
If the value obtained by subtracting the calculated slopes between the shoulder and wrist joints is within the allowable range for a certain frame, a text indicating that the user is in the correct posture is given and start_Flag is turned on (start_Flag = True). Otherwise, feedback text about the posture is given.
The balance is calculated using the relative proportions of the positions of both wrists.
defCalculate_Balance(_shaped,_balance): Left_w_y =0.0 Right_w_y =0.0 Left_w_y, _, Left_w_c = _shaped[9]# Left wrist Right_w_y, _, Right_w_c = _shaped[10]# Right wrist# Valance > 0 : left wrist under the right wrist# Valance < 0 : right wrist under the left wristif(Left_w_y !=0) & (Right_w_y !=0) & (Left_w_c > CONFIDENCE_THRESHOLD) & (Right_w_c > CONFIDENCE_THRESHOLD): _balance = Left_w_y/Right_w_y -1.0return _balance
Count workout
Because there is a noise value, the positions of the elbows on both sides are accumulated for 10 frames each, and the flag is determined based on the average value.
The condition for counting is when one or both elbows go down below the shoulder and then rise again.
If both elbows go down and the balance value at that time is within the allowable range, the correct posture count is performed.
Otherwise, if one elbow goes down or both elbows go down but the balance value exceeds the allowable range, a bad posture count is counted.
Finally, when the correct posture count is equal to the target number of exercise, finish_Flag is turned on (finish_Flag = True).
Open the video we using and do ready for recording
# cv.VideoCaputure(0) -> notebook cam# cv.VideoCaputure(1) -> another cam connecting with my notebook# cv.VideoCapture(filename.mp4) -> Videocap = cv.VideoCapture("DEMO.mp4")# Recording Video Configurationw =round(cap.get(CAP_PROP_FRAME_WIDTH))h =round(cap.get(CAP_PROP_FRAME_HEIGHT))fps = cap.get(CAP_PROP_FPS)fourcc =VideoWriter_fourcc(*'DIVX')out =VideoWriter('output.avi', fourcc, fps, (w,h))delay =round(1000/fps)if (cap.isOpened()==False):# if there is no video we can open, print errorprint("Not Open the VIDEO")
Start the system by while()
Processing is performed for each frame of the video using a while().
First, when the user wants to receive input again (press 'r' key -> tk_Falg = True), a section to receive input again was placed at the beginning of while().
And use the function (cv.getTickCount()) to get the time to measure the FPS.
Get a frame from video.
#================== While Loop =================#while cap.isOpened():# When you press the 'r' botton -> restartif tk_Flag ==True: tk =Tk() tk.title('Input Exercise Count') tk.geometry("200x100") label1 =Label(tk, text='Input Count').grid(row=0, column=0) entry1 =Entry(tk) entry1.grid(row=0,column=1) btn =Button(tk, text='Press Count', bg='black', fg='white', command=Input_Count).grid(row=1,column=0) exit_button =Button(tk, text='Exit', bg='black', fg='white', command=tk.destroy).grid(row=1,column=1) tk.mainloop()ResetPram() tk_Flag =False frame_Num +=1# Start Window Time startTime = cv.getTickCount()# Video Read ret, frame = cap.read()if ret ==False:print("Video End")break
Resize the frame, Setup input detail and output detail, and Input to model and get output
Resized to fix frame size to 1080x1080.
Since the input size of the deep learning model we use is 192x192, we change the frame to 192x192 to put it as an input.
(Using resize_with_pad())
You need to know the information of the input tensor and the output tensor in order to transmit and receive data, so setup is done.
Input to model and get output
# Reshape image frame =resize(frame, dsize = (1080, 1080), interpolation=INTER_LINEAR) img = frame.copy() img = tf.image.resize_with_pad(np.expand_dims(img, axis=0), 192, 192) input_image = tf.cast(img, dtype=tf.float32)#Setup input and Output input_details = interpreter.get_input_details()# receive information of input tensor output_details = interpreter.get_output_details()# receive information of output tensor# input to model interpreter.set_tensor(input_details[0]['index'], np.array(input_image)) interpreter.invoke()# Get output to model keypoints_with_scores = interpreter.get_tensor(output_details[0]['index'])
Main code
We bring about the joint information we want to use.
(Wrist, Elbow, and Shoulder)
To adjust the flag to start the system, adjust the flag depending on whether the elbow is above or below the shoulder position.
(flag off(system_Flag = False) if the elbow position is below the shoulder / flag off(system_Flag = True) if the elbow position is above the shoulder))
If the system flag is turned on (system_Flag == True), the finish flag is checked, and if it is off, the function to adjust the start flag is executed.
If the flag is turned on through the function (start_Flag == True), the balance is calculated and the exercise count starts.
Otherwise, the feedback on the starting posture adjustment is output as text.
When the exercise done with the correct posture is equal to the target number of exercises, the finish flag is turned on (finish_Flag == True).
When the finish flag is turned on, the system flag is turned on and the exercise result window appears.
# =================== START POINT =================== # pose_Buf = [] pose_Buf, shaped =Get_Shape_PoseBuf(frame, keypoints_with_scores, EDGES, CONFIDENCE_THRESHOLD)if shaped[RIGHT_WRIST][0]>shaped[RIGHT_ELBOW][0] and shaped[LEFT_WRIST][0]>shaped[LEFT_ELBOW][0]:# 쉴 때의 조건 system_Flag =Falseelse: system_Flag =Trueif system_Flag ==True:# Get Start Flag and start_count for counting, offset_Textif finish_Flag ==False: start_Flag, start_Count, offset_Text =Start_Postion_Adjustment(frame, pose_Buf, start_Count, start_Flag, offset_Text)if start_Flag ==False:show_Start_text(frame, offset_Text)# draw skeletonDraw_Connecting(frame, shaped, EDGES, CONFIDENCE_THRESHOLD)# Start count processingif start_Flag ==Trueand finish_Flag ==False: balance_List[0]=Calculate_Balance(shaped, balance_List[0]) finish_Flag, counting_List, balance_List, up_down_Flag, counting_List[6]=Count_Workout(frame, shaped, inputCount, finish_Flag, counting_List, balance_List, up_down_Flag)show_Text(frame, balance_List[0], counting_List[0], counting_List[1], finish_Flag)# Finish Flagelif finish_Flag ==True:show_Text(frame, balance_List[0], counting_List[0], counting_List[1], finish_Flag) system_Flag =False# Press Esc to Exit, Stop Video to 's' k = cv.waitKey(5)&0xFFif k ==27:breakelif k ==ord('s'): cv.waitKey()elif k ==ord('r'): tk_Flag =True# Time Loop End endTime = cv.getTickCount()# FPS Calculate FPS =round(getTickFrequency()/(endTime - startTime))# FPS Text FPS_Text =f"FPS: {FPS}"putText(frame, FPS_Text, (0, 20), USER_FONT, 0.8, RED) cv.imshow('MoveNet Lightning', frame)resizeWindow('MoveNet Lightning', 1080, 1080)else:if finish_Flag ==False:ResetPram()else:break# Press Esc to Exit, Stop Video to 's' k = cv.waitKey(5)&0xFFif k ==27:breakelif k ==ord('s'): cv.waitKey()elif k ==ord('r'): tk_Flag =True# Time Loop End endTime = cv.getTickCount()# FPS Calculate FPS =round(getTickFrequency()/(endTime - startTime))# FPS Text FPS_Text =f"FPS: {FPS}"putText(frame, FPS_Text, (0, 20), USER_FONT, 0.8, RED) cv.imshow('MoveNet Lightning', frame)# # Record Video out.write(frame)cap.release()cv.destroyAllWindows()out.release()
Left
Bad Count
Right
Bad Count
8. Show the result of workout
show the result(worst pose, best pose, good pose count, bed pose count, and all count)
# ============================================================================== ## =============================== Final result report ========================== ## ============================================================================== #if finish_Flag ==True:# When finish flag is on# image stack best_image = cv.imread('BestPose.jpg') best_image = cv.resize(best_image, (0,0), None, .5, .5) worst_image = cv.imread('WorstPose.jpg') worst_image = cv.resize(worst_image, (0,0), None, .5, .5) pose_result = np.vstack((best_image, worst_image)) result_paper = np.zeros_like(pose_result) workout_result = np.hstack((result_paper, pose_result))# Put text# Make text for result image TEXT_GOOD =f"Best Pose(Balance {abs(round(balance_List[1],3))})" TEXT_BED =f"Worst Pose(Balance {abs(round(balance_List[2],3))})"# Make text for result report TEXT_RESULT1 =f"======================" TEXT_RESULT =f"---------Result---------" TEXT_RESULT2 =f"======================" TEXT_GOOD_COUNT =f"Count about Good Pose = {counting_List[0]}" TEXT_BED_COUNT =f"Count about Bed Pose = {counting_List[1]}" TEXT_COUNT =f"Count about All Pose = {counting_List[0]+counting_List[1]}" TEXT_RATIO =f"Performace ratio ="# Parameter for position Size_GOOD, _ = cv.getTextSize(TEXT_GOOD, USER_FONT, fontScale, fontThickness) Size_BED, _ = cv.getTextSize(TEXT_BED, USER_FONT, fontScale, fontThickness) width = best_image.shape[0]# best image width height = best_image.shape[1]# best image height x_GOOD = Size_GOOD[0] y_GOOD = Size_GOOD[1] x_BED = Size_BED[0] y_BED = Size_BED[1]# Draw ract and Put Text for image cv.rectangle(workout_result, (width, 0), (width*2, y_GOOD+13), WHITE, -1) cv.rectangle(workout_result, (width, height), (width*2, y_GOOD+height+13), WHITE, -1) cv.putText(workout_result, TEXT_BED, (width+int(width/2)-int(x_BED/2), y_GOOD+height+5), USER_FONT, fontScale, BLACK, fontThickness)# Put Text for result report cv.putText(workout_result, TEXT_RESULT1, (0, y_GOOD), USER_FONT, fontScale, RED, fontThickness) cv.putText(workout_result, TEXT_RESULT, (0, y_GOOD*2), USER_FONT, fontScale, RED, fontThickness) cv.putText(workout_result, TEXT_RESULT2, (0, y_GOOD*3), USER_FONT, fontScale, RED, fontThickness) cv.putText(workout_result, TEXT_COUNT, (0, y_GOOD*4), USER_FONT, fontScale, WHITE, fontThickness) cv.putText(workout_result, TEXT_GOOD_COUNT, (0, y_GOOD*6), USER_FONT, fontScale, WHITE, fontThickness) cv.putText(workout_result, TEXT_BED_COUNT, (0, y_GOOD*8), USER_FONT, fontScale, WHITE, fontThickness) cv.imshow('Final result of your workout', workout_result)waitKey() cv.destroyAllWindows()
V. Result
1. Adjust Correct Starting Position
Adjust
Complete
2. Exercising
Contraction
Relaxation
3. Unbalance
Left
Right
4. Show Result
VI. Evaluation
Since we used the pre-trained model, we analyzed the algorithm we implemented, not the analysis of the model itself. The adjust correct starting position part and the experimenting part were largely divided and analyzed.
Adjust Correct Starting Position
For evaluation, another Lat-Pull Down machine tested "Adjust Correct Starting Position" 20 times per person and 40 times in total. In this case, Positive means Correct Position, and Negative means a state in which movement to right or left is required. Accordingly, the heat map is as follows, and based on this, Accuracy, Precision, and Recall are analyzed.
- Accuracy: 87.5%
- Precision: 94.1%
- Recall: 80.0%
Looking at the above results, Recall is lowered, which means that negative is frequently recognized (FN) when positive. In other words, it can be seen that the threshold value should be adjusted so that it can be clearly recognized as positive when it is positive.
Workout Counting
This time, an experiment on "Workout Counting" is conducted 20 times per person, 5 sets, and a total of 200 times. At this time, Positive means exercising in the right posture, and Negative means exercising in the wrong posture. The heat map accordingly is as follows, and based on this, Accuracy, Precision, and Recall are analyzed.
- Accuracy: 94.0%
- Precision: 89.3%
- Recall: 100.0%
Looking at the above results, the precision is lowered, which means that there are many cases (FP) that are perceived as positive when negative. In the experiment, a mirror is present and the precision is lowered due to the recognition of the person in the mirror. In other words, when using this program, it should be executed in an environment where there is nothing else that can be recognized as a person other than the surrounding me.