---
title: "wconf: Weighted Confusion Matrix"
author: "Alexandru Monahov"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteEncoding{UTF-8}
%\VignetteIndexEntry{wconf: Weighted Confusion Matrix}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
markdown:
wrap: 72
---
```{r setup, include = FALSE, echo=FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
error = TRUE,
comment = "#>"
)
```
## The wconf package
**wconf is a package that allows users to create weighted confusion
matrices and accuracy scores**
Used to improve the model selection process, the package includes
several weighting schemes which can be parameterized, as well as the
option for custom weight configurations. Furthermore, users can decide
whether they wish to positively or negatively affect the accuracy score
as a result of applying weights to the confusion matrix. "wconf"
integrates with the "caret" package, but it can also work standalone
when provided data in matrix form.
#### **About confusion matrices**
Confusion matrices are used to visualize the performance of
classification models in tabular format. A confusion matrix takes the
form of an "n x n" matrix depicting:
a) the reference category, in columns;
b) the predicted category, in rows;
c) the number of observation corresponding to each combination of
"reference - predicted" category couples, as cells of the matrix.
Visually, the simplest binary classification confusion matrix takes on
the form:
$$
A = \begin{bmatrix}TP & FP \\FN & TN\\ \end{bmatrix}
$$ where:
$TP$ - True Positives - the number of observations that were "positive"
and were correctly predicted as being "positive"
$TN$ - True Negatives - the number of originally "negative" observations
that were correctly predicted by the model as being "negative".
$FP$ - False Positives - also called "Type 1 Error" - represents
observations that are in fact "negative", but were incorrectly
classified by the model as being "positive".
$FN$ - False Negatives - also called "Type 2 Error" - represents
observations that are in fact "positive", but were incorrectly
classified by the model as being "negative".
The traditional accuracy metric is compiled by adding the true positives
and true negatives, and dividing them by the total number of
observations.
$$
A = \frac{TP + TN} {N}
$$
A weighted confusion matrix consists in attributing weights to all
classification categories based on their distance from the correctly
predicted category. This is important for multi-category classification
problems (where there are three or more categories), where distance from
the correctly predicted category matters.
The weighted confusion matrix, for the simple binary classification,
takes the form:
$$
A = \begin{bmatrix}w1*TP & w2*FP \\w2*FN & w1*TN\\ \end{bmatrix}
$$
In the case of the weighted confusion matrix, a weighted accuracy score
can be calculated by summing up all of the elements of the matrix and
dividing the resulting amount by the number of observations.
$$
A = \frac{w1*TP + w2*FP + w2*FN + w1*TN} {N}
$$
#### **References**
For more details on the method, see the paper:
Monahov, A. (2023). Improved Accuracy Metrics for Classification with
Imbalanced Data and Where Distance from the Truth Matters, with the
Wconf R Package, Computing Methodology eJournal, SSRN.
## Functions
#### **weightmatrix - configure and visualize a weight matrix**
This function compiles a weight matrix according to one of several
weighting schemas and allows users to visualize the impact of the weight
matrix on each element of the confusion matrix.
In R, simply call the function:
``` r
weightmatrix(n, weight.type = "arithmetic", weight.penalty = FALSE, standard.deviation = 2, geometric.multiplier = 2, interval.high=1, interval.low = -1, custom.weights = NA, plot.weights = FALSE)
```
The function takes as input:
*n* -- the number of classes contained in the confusion matrix.
*weight.type* -- the weighting schema to be used. Can be one of:
"arithmetic" - a decreasing arithmetic progression weighting scheme,
"geometric" - a decreasing geometric progression weighting scheme,
"normal" - weights drawn from the right tail of a normal distribution,
"interval" - weights contained on a user-defined interval, "custom" -
custom weight vector defined by the user.
*weight.penalty* -- determines whether the weights associated with
non-diagonal elements generated by the "normal", "arithmetic" and
"geometric" weight types are positive or negative values. By default,
the value is set to FALSE, which means that generated weights will be
positive values.
*standard.deviation* -- standard deviation of the normal distribution,
if the normal distribution weighting schema is used.
*geometric.multiplier* -- the multiplier used to construct the geometric
progression series, if the geometric progression weighting scheme is
used.
*interval.high* -- the upper bound of the weight interval, if the
interval weighting scheme is used.
*interval.low* -- the lower bound of the weight interval, if the
interval weighting scheme is used.
*custom.weights* -- the vector of custom weights to be applied, is the
custom weighting scheme was selected. The vector should be equal to "n",
but can be larger, with excess values being ignored.
*plot.weights* -- optional setting to enable plotting of weight vector,
corresponding to the first column of the weight matrix
The function outputs a matrix:
| | |
|-----|------------------------|
| w | the nxn weight matrix. |
#### **wconfusionmatrix - compute a weighted confusion matrix**
This function calculates the weighted confusion matrix by multiplying,
element-by-element, a weight matrix with a supplied confusion matrix
object.
In R, simply call the function:
``` r
wconfusionmatrix(m, weight.type = "arithmetic", weight.penalty = FALSE, standard.deviation = 2, geometric.multiplier = 2, interval.high=1, interval.low = -1, custom.weights = NA, print.weighted.accuracy = FALSE)
```
The function takes as input:
*m* -- the caret confusion matrix object or simple matrix.
*weight.type* -- the weighting schema to be used. Can be one of:
"arithmetic" - a decreasing arithmetic progression weighting scheme,
"geometric" - a decreasing geometric progression weighting scheme,
"normal" - weights drawn from the right tail of a normal distribution,
"interval" - weights contained on a user-defined interval, "custom" -
custom weight vector defined by the user.
*weight.penalty* -- determines whether the weights associated with
non-diagonal elements generated by the "normal", "arithmetic" and
"geometric" weight types are positive or negative values. By default,
the value is set to FALSE, which means that generated weights will be
positive values.
*standard.deviation* -- standard deviation of the normal distribution,
if the normal distribution weighting schema is used.
*geometric.multiplier* -- the multiplier used to construct the geometric
progression series, if the geometric progression weighting scheme is
used.
*interval.high* -- the upper bound of the weight interval, if the
interval weighting scheme is used.
*interval.low* -- the lower bound of the weight interval, if the
interval weighting scheme is used.
*custom.weights* -- the vector of custom weights to be applied, is the
custom weighting scheme was selected. The vector should be equal to "n",
but can be larger, with excess values being ignored.
*print.weighted.accuracy* -- optional setting to print the weighted
accuracy metric, which represents the sum of all weighted confusion
matrix cells divided by the total number of observations.
The function outputs a matrix:
| | |
|-----|------------------------------------|
| w_m | the nxn weighted confusion matrix. |
#### **rconfusionmatrix - compute a redistributed confusion matrix**
This function calculates the redistributed confusion matrix from a caret
ConfusionMatrix object or a simple matrix and optionally prints the
redistributed standard accuracy score. The redistributed confusion
matrix can serve to place significance on observations close to the
diagonal by applying a custom weighting scheme which transfers a
proportion of the non-diagonal observations to the diagonal.
In R, simply call the function:
``` r
rconfusionmatrix(m, custom.weights = c(0, 0.25, 0.1), print.weighted.accuracy = FALSE)
```
The function takes as input:
*m* -- the caret confusion matrix object or simple matrix.
*custom.weights* -- the vector of custom weights to be applied, which
should be equal to "n", but can be larger, with excess values, as well
as the first element, being ignored. The first element is ignored
because it represents weighting applied to the diagonal. As, in the case
of redistribution, a proportion of the non-diagonal observations is
shifted towards the diagonal, the weighting applied to the diagonal
depends on the weights assigned to the non-diagonal elements, and is
thus not configurable by the user.
*print.weighted.accuracy* -- optional setting to print the standard
redistributed accuracy metric, which represents the sum of all observations on
the diagonal divided by the total number of observations.
The function outputs a matrix:
| | |
|-----|------------------------------------|
| w_m | the nxn weighted confusion matrix. |
#### **balancedaccuracy - calculate accuracy scores for imbalanced data**
This function calculates classification accuracy scores using the
sine-based formulas proposed by Starovoitov and Golub (2020). The
advantage of the new method consists in producing improved results when
compared with the standard balanced accuracy function, by taking into
account the class distribution of errors. This feature renders the
method useful when confronted with imbalanced data.
In R, simply call the function:
``` r
balancedaccuracy(m, print.scores = TRUE)
```
The function takes as input:
*m* -- the caret confusion matrix object or simple matrix.
*print.scores* -- used to display the accuracy scores when set to TRUE.
The function outputs a list of objects:
| | |
|------------|-------------------|
| ACCmetrics | accuracy metrics. |
## Examples
#### **Producing a weighted confusion matrix in conjunction with the caret package**
This example provides a real-world usage example of the wconf package on
the Iris dataset included in R.
To load the wconf package, run the command:
```{r}
library(wconf)
```
We will attempt the more difficult task of predicting petal length from
sepal width. In addition, for this task, we are only given categorical
information about the length of the petals, specifically that they are:
- "Short (length between: 1-3)"
- "Medium (length between: 3-5 cm)"
- "Long (length between: 5-7 cm)".
Numeric data is available for the sepal width.
Using caret, we train a multinomial logistic regression model to fit the
numeric sepal width onto our categorical petal length data. We run
10-fold cross-validation, repeated 3 times to avoid overfitting and find
optimal regression coefficient values for various data configurations.
Finally, we extract the confusion matrix. We wish to weigh the confusion
matrix to represent preference for observations fitted closer to the
correct value. We would like to assign some degree of positive value to
observations that are incorrectly classified, but are close to the
correct category. Since our categories are equally spaced, we can use an
arithmetic weighing scheme.
Let's first visualize what this weighting schema would look like:
```{r}
# View the weight matrix and plot for a 3-category classification problem, using the arithmetic sequence option.
weightmatrix(3, weight.type = "arithmetic", plot.weights = TRUE)
```
To obtain the weighted confusion matrix, we run the "wconfusionmatrix"
command and provide it the confusion matrix object generated by caret, a
weighting scheme and, optionally, parameterize it to suit our
objectives. Using the "wconfusionmatrix" function will automatically
determine the dimensions of the weighing matrix and the user need only
specify the parameters associated with their weighting scheme of choice.
The following block of code produces the weighted confusion matrix, to
out specifications.
```{r}
# Load libraries and perform transformations
library(caret)
data(iris)
iris$Petal.Length.Cat = cut(iris$Petal.Length, breaks=c(1, 3, 5, 7), right = FALSE)
# Train multinomial logistic regression model using caret
set.seed(1)
control <- trainControl(method="repeatedcv", number=10, repeats=3)
model <- train(Petal.Length.Cat ~ Sepal.Width, data=iris, method="multinom", trace = FALSE, trControl=control)
# Extract original data, predicted values and place them in a table
y = iris$Petal.Length.Cat
yhat = predict(model)
preds = table(data=yhat, reference=y)
# Construct the confusion matrix
confmat = confusionMatrix(preds)
# Compute the weighted confusion matrix and display the weighted accuracy score
wconfusionmatrix(confmat, weight.type = "arithmetic", print.weighted.accuracy = TRUE)
```
#### **Producing a redistributed confusion matrix from an existing confusion matrix**
A model was run to predict the performance of students with grades classified into
four buckets: 1 - poor, 2 - average, 3 - good, 4 - excellent.
```{r echo=FALSE}
mtx = t(matrix(
c(20, 0, 2, 1,
0, 34, 23, 7,
0, 0, 5, 3,
0, 0, 5, 1),
nrow = 4))
mtx
```
We notice that while the model gets it right for the first two grade categories
(poor and average), it does a worse job of correctly classifying students with higher
grades. However, upon more careful inspection, it seems that the model typically isn't very far off from the correct category - i.e. it is likely to classify good students as
excellent or average (neighboring grade categories), but not poor (far away category).
As such, in composing our accuracy metric, we could stand to benefit from allowing
observations classified into neighboring categories to produce a positive impact on
the accuracy metric.
We could construct a weighted confusion matrix to account for our preference. However,
if we also wish to use alternative weighting measures such as the SinACC or BalACC
indicators, while comparing them to the traditional accuracy metrics, our weighting
scheme should not change the total number of observations, as measured by the sum of
elements of the confusion matrix. In order to accommodate for this, the newly developed
"rconfusionmatrix" function allows for the redistribution of a proportion of the total observations from nearby categories to the correctly classified category, according
to a user-specified weighting scheme. This achieves an effect similar to the weighted
confusion matrix, however, with the added benefit of keeping the total number of observations intact.
```{r}
rmtx = rconfusionmatrix(mtx, custom.weight = c(0, 0.5, 0.1, 0), print.weighted.accuracy = TRUE)
rmtx
```
This particular configuration indicates that the user wishes to redistribute 50% of the observations classified in categories immediately neighboring the correct category as being correct, as well as 10% of the observations located one more category away from the
true one as being correctly classified.
The diagonal is weighted with zero to indicate that we are not removing any proportion
from this category. However, any value written here will be ignored, as the algorithm of
the function redistributes non-diagonal elements to the diagonal.
A notable aspect to consider is that the same result in terms of accuracy score can be
achieved with a weighted matrix configuration. However, in this case, the total number
of observations is different from the initial unweighted matrix.
```{r}
wmtx = wconfusionmatrix(mtx, weight.type = "custom", custom.weight = c(1, 0.5, 0.1, 0), print.weighted.accuracy = TRUE)
wmtx
```
To calculate the extended SinACC and BalACC metrics, run the "balancedaccuracy" command
on the redistributed confusion matrix:
```{r}
balancedaccuracy(rmtx)
```
#### **Generating accuracy metrics for imbalanced data**
Let us now undertake an analysis of the classification performance of a
model on imbalanced data. To do so, we will make use of the
"balancedaccuracy" function.
Consider the following example of loans classified into different
categories of Loan-To-Value (LTV) - an indicator which tells a bank if a
loan has enough collateral to cover against the clients' default. Lower
values of the indicator denote safer loans.
A bank's risk department has come up with a model that classifies loans
into one of four categories, depending on the LTV band of the loan. The
results are presented below:
```{r echo=FALSE}
mtx = t(matrix(
c(50, 0, 118, 5,
0, 1, 45, 27,
0, 84, 22, 1,
0, 22, 57, 4),
nrow = 4))
mtx
```
The classification categories can be interpreted in the following
manner:\
cat. 1 - loans with LTVs between 40% and 60%\
cat. 2 - loans with LTVs between 60% and 80%\
cat. 3 - loans with LTVs between 80% and 100%\
cat. 4 - loans with LTVs between 100% and 120%
Let's look at the correlation matrix to get an idea of how well the
model performs.
For category 1 (safest loans with an LTV ratio of 40%-60%), the model
predicts all 50 loans that were issued correctly.
For category 2 loans, only 1 loan out of 107 loans that were issued with
an LTV ratio of 60%-80%, was correctly predicted.
The performance of category 3 is also bad, as the smallest share of
loans issued within this bucket were predicted correctly.
For category 4 loans (highest risk, with LTVs above 100%), Only 4 out of
37 of the loans belonging to this class were predicted correctly.
Overall, our conclusion is that this is a very bad model at predicting 3
out of 4 loan categories (categories 2 - 4). We therefore would want to
assign a low score.
Let's calculate the accuracy metrics of this loan using the
"balancedaccuracy" function.
```{r}
balancedaccuracy(mtx)
```
Let's analyze the scores:
SinACC - is the Starovoitov-Golub Sine-Accuracy Function BalACC - is the
Balanced Accuracy Function ACC - is the standard Accuracy Function
``` r
SinACC = 0.2557172 BalACC = 0.3020907 ACC = 0.1766055
```
For the SinACC and BalACC functions, we can also extract the
per-category accuracy metrics, which show us how well each category was
predicted.
``` r
Class accuracy metrics:
SinAcc
[,1] [,2] [,3] [,4]
1 6.63064e-05 0.01237203 0.01043053
BalAcc
[,1] [,2] [,3] [,4]
1 0.009345794 0.09090909 0.1081081
```
We notice that, as all observations belonging to the first category were
correctly predicted as being in the first category, both the SinACC and
BalACC functions give it a score of 1 (or 100% correctly predicted).
For the other categories, SinACC penalizes the number of incorrect
predictions more than BalACC. As a consequence, SinAcc and BalACC
per-category scores will only be close to each other when the number of
correctly predicted cases significantly exceeds that of the incorrectly
predicted cases.
To exemplify this, consider the following case where, for the last
class, the number of correctly predicted observations has been set to
equal more than double the number of incorrectly predicted observations.
As such mtx[4,4] = 70.
```{r}
mtx = t(matrix(
c(50, 0, 118, 5,
0, 1, 45, 27,
0, 84, 22, 1,
0, 22, 57, 70),
nrow = 4))
balancedaccuracy(mtx)
```
In this case:
``` r
SinACC = 0.411762 BalACC = 0.4449666 ACC = 0.2848606
SinAcc
[,1] [,2] [,3] [,4]
1 6.63064e-05 0.01237203 0.6346096
BalAcc
[,1] [,2] [,3] [,4]
1 0.009345794 0.09090909 0.6796117
```
The accuracy metrics for the 4th category for SinACC and BalACC are
relatively close to each other:
``` r
SinACC[,4] = 0.6346096 BalACC[,4] = 0.6796117
```
Notice, however, that both the SinACC and BalACC scores are invariant to
the distance of the predicted value from the correct category. If there
is value in assigning some positive weight to predictions classified in
the vicinity of the correct category or, conversely, applying a
supplementary penalty to predictions situated far away from the correct
category, then you should consider first applying weights to the
confusion matrix using the function "rconfusionmatrix", and then using
the "balancedaccuracy" function on the weighted matrix.
Finally, let's consider the case when there is a disproportionately
large number of observations classified correctly in one if the
categories. We assume the following confusion matrix, in which mtx[1,1]
was changed to 5000:
```{r echo=FALSE}
mtx = t(matrix(
c(5000, 0, 118, 5,
0, 1, 45, 27,
0, 84, 22, 1,
0, 22, 57, 4),
nrow = 4))
```
When running the accuracy metrics, we obtain the following results.
```{r}
balancedaccuracy(mtx)
```
The standard accuracy score receives a tremendous improvement, given
that it only considers the total number of correctly classified
observations. Both the SinACC and BalACC scores are unaffected however.
This is because, just as in the initial case, the first category
continues to be estimated correctly in 100% of the predictions that the
model generates for loans in this category.
``` r
SinACC = 0.2557172 BalACC = 0.3020907 ACC = 0.9333457
```
The SinACC score, remains more conservative than the BalACC, but the
difference between the two continues to be the same.
## About the author
The wconf: Weighted Confusion Matrix package was programmed by Dr.
Alexandru Monahov.
Alexandru Monahov holds a PhD in Economics from the University Cote
d'Azur (Nice, France) and a Professional Certificate in Advanced Risk
Management from the New York Institute of Finance (New York, United
States). His Master's Degree in International Economics and Finance and
his Bachelor's Degree in Economics and Business Administration were
completed at the University of Nice (Nice, France).
His professional activity includes working for the Bank of England as a
Research Economist and as Expert Consultant at the National Bank of
Moldova, within the Financial Stability Division. Alexandru also
provides training for professionals in finance from Central Banks and
Ministries of Finance at the Center of Excellence in Finance (Ljubljana,
Slovenia) and the Centre for Central Banking Studies (London, UK).
Previously, he worked as assistant and, subsequently, associate
professor at the University of Nice and IAE in France, where he taught
Finance, Economics, Econometrics and Business Administration. He
developed training and professional education curricula for the Chambers
of Commerce and Industry and directed several continuing education
programs.
Dr. Monahov was awarded funding for continuing professional education by
the World Bank through the Reserve Advisory & Management Partnership
Program, a PhD scholarship by the Doctoral School of Nice and a
scholarship of the French Government.
⠀
Copyright Alexandru Monahov, 2024.
You may use, modify and redistribute this code, provided that you give credit to the author and make any derivative work available to the public for free.