Evaluation Metric

ROC Curve

Understanding AUC, ROC curve: click here

AUC - ROC curve is a performance measurement for classification problem at various thresholds settings. ROC is a probability curve and AUC represents degree or measure of separability.

ROC AUC It tells how much model is capable of distinguishing between binary classes

Receiver Operating Characteristic(ROC) Curve

  • true positive rate (recall) vs false positive rate (FPR)

  • FPR is the ratio of negative instances that are incorrectly classified as positive

Area under the curve(AUC): a perfect classifier ROC AUC= 1 a purely random classifier ROC AUC= 0.5.

  • E.g. Find a Person. Red: Person, Green: non-person

๋ฏผ๊ฐ๋„์™€ ํŠน์ด (Covid-19 ์ง„๋‹จ์˜ˆ์‹œ)

์•ž์„œ ๋ฐํžŒ๋Œ€๋กœ ๋ฏผ๊ฐ๋„์™€ ํŠน์ด๋„ ๊ฒ€์‚ฌ ๋ชจ๋‘ ์ด๋ฏธ ์Œ์„ฑยท์–‘์„ฑ์„ ํ™•์ธํ•œ ๋Œ€์ƒ์ž๋ฅผ ๋†“๊ณ  ์ƒˆ๋กœ์šด ์ง„๋‹จ๋ฒ•์— ๋Œ€ํ•œ ์ •ํ™•๋„๋ฅผ ๋ฐํžˆ๋Š” ๊ณผ์ •์ด๋‹ค.

๋ฏผ๊ฐ๋„๋Š” '์–‘์„ฑ ํ™˜์ž ์ค‘ ๊ฒ€์‚ฌ๋ฒ•์ด ์ง„๋‹จํ•œ ์–‘์„ฑ ์ •ํ™•๋„'๋ผ๋Š” ์˜๋ฏธ๊ณ , ํŠน์ด๋„๋Š” '์ •์ƒ์ธ ์ค‘ ๊ฒ€์‚ฌ๋ฒ•์ด ์ง„๋‹จํ•œ ์ •์ƒ ์ •ํ™•๋„'๋ผ๋Š” ์˜๋ฏธ๋‹ค.

์‹ค์ œ ์–‘์„ฑยท์Œ์„ฑ๊ตฐ์„ ๋Œ€์ƒ์œผ๋กœ ์ง„๋‹จ ์‹œํ–‰ ์‹œ ์–ป์„ ์ˆ˜ ์žˆ๋Š” ๊ฒฐ๊ณผ

๊ฐ ํ‘œ๋ณธ์— ๋Œ€ํ•œ ๊ฒ€์‚ฌ๊ฐ€ ๋๋‚˜๋ฉด ๋Œ€์ƒ๊ตฐ์€ ์œ„์˜ ๋„ค๊ฐœ๋กœ ๋ถ„๋ฅ˜๋˜๊ณ  ์ด๋•Œ์˜ ๋ฏผ๊ฐ๋„ยทํŠน์ด๋„๋ฅผ ๊ตฌํ•˜๋Š” ๊ณต์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

  1. ๋ฏผ๊ฐ๋„ = ์ƒˆ๋กœ์šด ์ง„๋‹จ๋ฒ•์ด ํŒ๋ช…ํ•œ ํ™˜์ž ์ค‘ ์‹ค์ œ ํ™˜์ž

    โ‘  / โ‘  + โ‘ก

  2. ํŠน์ด๋„ = ์ƒˆ๋กœ์šด ์ง„๋‹จ๋ฒ•์ด ํŒ๋ช…ํ•œ ์ •์ƒ์ธ ์ค‘ ์‹ค์ œ ์ •์ƒ

    โ‘ฃ / โ‘ข + โ‘ฃ

๊ทธ๋ ‡๋‹ค๋ฉด ๋ฏผ๊ฐ๋„์™€ ํŠน์ด๋„ ์ค‘ ์ง„๋‹จ๊ธฐ๋ฒ• ์‹ ๋ขฐ๋„์— ๋” ํฐ ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ๊ฒƒ์€ ๋ฌด์—‡์ผ๊นŒ. ์ „๋ฌธ๊ฐ€์— ๋”ฐ๋ฅด๋ฉด ๋‘ ๊ธฐ์ค€์€ ์–‘๋ฆฝํ•ด์•ผํ•˜๋ฉฐ ์–ด๋А ํ•œ ์ชฝ์ด ์šฐ์›”ํ•œ ๊ฐ€์น˜๋Š” ์•„๋‹ˆ๋ผ๋Š” ์„ค๋ช…์ด๋‹ค.

์ •์€๊ฒฝ ์ฒญ์žฅ์€ "Sensitivity(๋ฏผ๊ฐ๋„)์™€ Specificity(ํŠน์ด๋„)๊ฐ€ ์ฐจ์ด๊ฐ€ ํฌ๋‹ค๋ฉด ์˜ฌ๋ฐ”๋ฅธ ์ง„๋‹จ ๋ฐฉ์‹์ด๋ผ๊ณ  ๋ณผ ์ˆ˜๋Š” ์—†์„ ๊ฒƒ"์ด๋ผ๋ฉฐ "์งˆ๋ณ‘์— ๋”ฐ๋ผ ์–ด๋А ํ•œ์ชฝ์— ๋ฌด๊ฒŒ๊ฐ€ ์‹ค๋ฆฌ๊ธฐ๋„ ํ•˜์ง€๋งŒ ์–‘์ชฝ ๋ชจ๋‘๋ฅผ ์ถฉ์กฑํ•ด์•ผ ํ•œ๋‹ค"๊ณ  ์„ค๋ช…ํ–ˆ๋‹ค.

์ถœ์ฒ˜ : ํžˆํŠธ๋‰ด์Šค(http://www.hitnews.co.kr)

Top-1, TOP-5 ImageNet, ILSVRC

The Top-5 error rate is the percentage of test examples for which the correct class was not in the top 5 predicted classes.

If a test image is a picture of a Persian cat, and the top 5 predicted classes in order are [Pomeranian (0.4), mongoose (0.25), dingo (0.15), Persian cat (0.1), tabby cat (0.02)], then it is still treated as being 'correct' because the actual class is in the top 5 predicted classes for this test image.

For Object Detection

IOU

We need to evaluate the performance of both (1) classification and (2) localization of using bounding boxes in the image.

Object Detection uses the concept of Intersection over Union (IoU). IoU computes intersection over the union of the two bounding boxes; the bounding box for the ground truth and the predicted bounding box. An IoU of 1 implies that predicted and the ground-truth bounding boxes perfectly overlap.

Set a threshold value for the IoU to determine if the object detection is valid or not.

If threshold of IoU=0.5,

  • if IoU โ‰ฅ0.5, classify the object detection as True Positive(TP)

  • if Iou <0.5, then it is a wrong detection and classify it as False Positive(FP)

  • When a ground truth is present in the image and model failed to detect the object, classify it as False Negative(FN).

  • True Negative (TN): TN is every part of the image where we did not predict an object. This metrics is not useful for object detection, hence we ignore TN.

Also, need to consider the confidence score (classification) for each object detected. Bounding boxes above the threshold value are considered as positive boxes and all predicted bounding boxes below the threshold value are considered as negative.

Use Precision and Recall as the metrics to evaluate the performance. Precision and Recall are calculated using true positives(TP), false positives(FP) and false negatives(FN).

mAP

What is mAP: click here

It use 11-point interpolated average precision to calculate mean Average Precision(mAP).

Step 1: Plot Precision and Recall from IoU

Precision in PR graph is not always monotonically decreasing due to certain exceptions and/or lack of data.

Example: In this example, the whole dataset contains 5 apples only. We collect all the predictions made for apples in all the images and rank it in descending order according to the predicted confidence level. (IoU>0.5)

For example, for rank#3, assume only 3 apples are predicted(2 are correct)

Precision is the proportion of TP = 2/3 = 0.67

Recall is the proportion of TP out of the possible positives = 2/5 = 0.4

Step 2: use 11 point interpolation technique.

11 equally spaced recall levels of 0.0, 0.1, 0.2, 0.3 โ€ฆ.0.9, 1.0.

Point interpolation: take the maximum Precision value of all future points of Recall.

Step 3: Calculate the mean Average Precision(mAP)

Average Precision is the area under the curve of Precision-Recall

mAP is calculated as

In our example, AP = (5 ร— 1.0 + 4 ร— 0.57 + 2 ร— 0.5)/11.

For 20 different classes in PASCAL VOC, we compute an AP for every class and also provide an average for those 20 AP results.

It is less precise. Second, it lost the capability in measuring the difference for methods with low AP. Therefore, a different AP calculation is adopted after 2008 for PASCAL VOC.

AP (Area under curve AUC)

For later Pascal VOC competitions, VOC2010โ€“2012 samples.

No approximation or interpolation is needed. Instead of sampling 11 points, we sample p(rแตข) whenever it drops and computes AP as the sum of the rectangular blocks.

COCO mAP

Latest research papers tend to give results for the COCO dataset only. In COCO mAP, a 101-point interpolated AP definition is used in the calculation. For COCO, AP is the average over multiple IoU (the minimum IoU to consider a positive match). AP@[.5:.95] corresponds to the average AP for IoU from 0.5 to 0.95 with a step size of 0.05. For the COCO competition, AP is the average over 10 IoU levels on 80 categories (AP@[.50:.05:.95]: start from 0.5 to 0.95 with a step size of 0.05). The following are some other metrics collected for the COCO dataset.

Last updated

Was this helpful?