Evaluation Metric

ROC Curve

Understanding AUC, ROC curve: click here

AUC - ROC curve is a performance measurement for classification problem at various thresholds settings. ROC is a probability curve and AUC represents degree or measure of separability.

ROC AUC It tells how much model is capable of distinguishing between binary classes

Receiver Operating Characteristic(ROC) Curve

  • true positive rate (recall) vs false positive rate (FPR)

  • FPR is the ratio of negative instances that are incorrectly classified as positive

Area under the curve(AUC): a perfect classifier ROC AUC= 1 a purely random classifier ROC AUC= 0.5.

  • E.g. Find a Person. Red: Person, Green: non-person

민감도와 특이 (Covid-19 μ§„λ‹¨μ˜ˆμ‹œ)

μ•žμ„œ λ°νžŒλŒ€λ‘œ 민감도와 νŠΉμ΄λ„ 검사 λͺ¨λ‘ 이미 μŒμ„±Β·μ–‘μ„±μ„ ν™•μΈν•œ λŒ€μƒμžλ₯Ό 놓고 μƒˆλ‘œμš΄ 진단법에 λŒ€ν•œ 정확도λ₯Ό λ°νžˆλŠ” 과정이닀.

λ―Όκ°λ„λŠ” 'μ–‘μ„± ν™˜μž 쀑 검사법이 μ§„λ‹¨ν•œ μ–‘μ„± 정확도'λΌλŠ” 의미고, νŠΉμ΄λ„λŠ” '정상인 쀑 검사법이 μ§„λ‹¨ν•œ 정상 정확도'λΌλŠ” μ˜λ―Έλ‹€.

μ‹€μ œ μ–‘μ„±Β·μŒμ„±κ΅°μ„ λŒ€μƒμœΌλ‘œ 진단 μ‹œν–‰ μ‹œ 얻을 수 μžˆλŠ” κ²°κ³Ό

각 ν‘œλ³Έμ— λŒ€ν•œ 검사가 λλ‚˜λ©΄ λŒ€μƒκ΅°μ€ μœ„μ˜ λ„€κ°œλ‘œ λΆ„λ₯˜λ˜κ³  μ΄λ•Œμ˜ λ―Όκ°λ„Β·νŠΉμ΄λ„λ₯Ό κ΅¬ν•˜λŠ” 곡식은 λ‹€μŒκ³Ό κ°™λ‹€.

  1. 민감도 = μƒˆλ‘œμš΄ 진단법이 판λͺ…ν•œ ν™˜μž 쀑 μ‹€μ œ ν™˜μž

    β‘  / β‘  + β‘‘

  2. νŠΉμ΄λ„ = μƒˆλ‘œμš΄ 진단법이 판λͺ…ν•œ 정상인 쀑 μ‹€μ œ 정상

    β‘£ / β‘’ + β‘£

κ·Έλ ‡λ‹€λ©΄ 민감도와 νŠΉμ΄λ„ 쀑 진단기법 신뒰도에 더 큰 영ν–₯을 λ―ΈμΉ˜λŠ” 것은 λ¬΄μ—‡μΌκΉŒ. 전문가에 λ”°λ₯΄λ©΄ 두 기쀀은 μ–‘λ¦½ν•΄μ•Όν•˜λ©° μ–΄λŠ ν•œ μͺ½μ΄ μš°μ›”ν•œ κ°€μΉ˜λŠ” μ•„λ‹ˆλΌλŠ” μ„€λͺ…이닀.

정은경 μ²­μž₯은 "Sensitivity(민감도)와 Specificity(νŠΉμ΄λ„)κ°€ 차이가 크닀면 μ˜¬λ°”λ₯Έ 진단 방식이라고 λ³Ό μˆ˜λŠ” 없을 것"이라며 "μ§ˆλ³‘μ— 따라 μ–΄λŠ ν•œμͺ½μ— λ¬΄κ²Œκ°€ 싀리기도 ν•˜μ§€λ§Œ μ–‘μͺ½ λͺ¨λ‘λ₯Ό μΆ©μ‘±ν•΄μ•Ό ν•œλ‹€"κ³  μ„€λͺ…ν–ˆλ‹€.

좜처 : νžˆνŠΈλ‰΄μŠ€(http://www.hitnews.co.kr)

Top-1, TOP-5 ImageNet, ILSVRC

The Top-5 error rate is the percentage of test examples for which the correct class was not in the top 5 predicted classes.

If a test image is a picture of a Persian cat, and the top 5 predicted classes in order are [Pomeranian (0.4), mongoose (0.25), dingo (0.15), Persian cat (0.1), tabby cat (0.02)], then it is still treated as being 'correct' because the actual class is in the top 5 predicted classes for this test image.

For Object Detection

IOU

We need to evaluate the performance of both (1) classification and (2) localization of using bounding boxes in the image.

Object Detection uses the concept of Intersection over Union (IoU). IoU computes intersection over the union of the two bounding boxes; the bounding box for the ground truth and the predicted bounding box. An IoU of 1 implies that predicted and the ground-truth bounding boxes perfectly overlap.

Set a threshold value for the IoU to determine if the object detection is valid or not.

If threshold of IoU=0.5,

  • if IoU β‰₯0.5, classify the object detection as True Positive(TP)

  • if Iou <0.5, then it is a wrong detection and classify it as False Positive(FP)

  • When a ground truth is present in the image and model failed to detect the object, classify it as False Negative(FN).

  • True Negative (TN): TN is every part of the image where we did not predict an object. This metrics is not useful for object detection, hence we ignore TN.

Also, need to consider the confidence score (classification) for each object detected. Bounding boxes above the threshold value are considered as positive boxes and all predicted bounding boxes below the threshold value are considered as negative.

Use Precision and Recall as the metrics to evaluate the performance. Precision and Recall are calculated using true positives(TP), false positives(FP) and false negatives(FN).

mAP

What is mAP: click here

It use 11-point interpolated average precision to calculate mean Average Precision(mAP).

Step 1: Plot Precision and Recall from IoU

Precision in PR graph is not always monotonically decreasing due to certain exceptions and/or lack of data.

Example: In this example, the whole dataset contains 5 apples only. We collect all the predictions made for apples in all the images and rank it in descending order according to the predicted confidence level. (IoU>0.5)

For example, for rank#3, assume only 3 apples are predicted(2 are correct)

Precision is the proportion of TP = 2/3 = 0.67

Recall is the proportion of TP out of the possible positives = 2/5 = 0.4

Step 2: use 11 point interpolation technique.

11 equally spaced recall levels of 0.0, 0.1, 0.2, 0.3 ….0.9, 1.0.

Point interpolation: take the maximum Precision value of all future points of Recall.

Step 3: Calculate the mean Average Precision(mAP)

Average Precision is the area under the curve of Precision-Recall

mAP is calculated as

In our example, AP = (5 Γ— 1.0 + 4 Γ— 0.57 + 2 Γ— 0.5)/11.

For 20 different classes in PASCAL VOC, we compute an AP for every class and also provide an average for those 20 AP results.

It is less precise. Second, it lost the capability in measuring the difference for methods with low AP. Therefore, a different AP calculation is adopted after 2008 for PASCAL VOC.

AP (Area under curve AUC)

For later Pascal VOC competitions, VOC2010–2012 samples.

No approximation or interpolation is needed. Instead of sampling 11 points, we sample p(rα΅’) whenever it drops and computes AP as the sum of the rectangular blocks.

COCO mAP

Latest research papers tend to give results for the COCO dataset only. In COCO mAP, a 101-point interpolated AP definition is used in the calculation. For COCO, AP is the average over multiple IoU (the minimum IoU to consider a positive match). AP@[.5:.95] corresponds to the average AP for IoU from 0.5 to 0.95 with a step size of 0.05. For the COCO competition, AP is the average over 10 IoU levels on 80 categories (AP@[.50:.05:.95]: start from 0.5 to 0.95 with a step size of 0.05). The following are some other metrics collected for the COCO dataset.

Last updated

Was this helpful?