quantificationlib.metrics.multiclass module¶
Score functions and loss functions for multiclass quantification problems
- bray_curtis(p_true, p_pred)[source]¶
Bray-Curtis dissimilarity
\(bcd = \sum_{j=1}^{j=l} | (p_j - \hat{p}_j) | / \sum_{j=1}^{j=l} (p_j + \hat{p}_j)\)
being l the number of classes
- Parameters:
p_true (array_like, shape=(n_classes)) – True prevalences. In case of binary quantification, this parameter could be a single float value.
p_pred (array_like, shape=(n_classes)) – Predicted prevalences. In case of binary quantification, this parameter could be a single float value.
- Returns:
BCD – The Bray-Curtis dissimilarity
- Return type:
float
- brier_multi(p_true, p_pred)[source]¶
Brier score (classification) for multiclass problems
\[bsm = 1/n \sum_{i=1}^{i=n} \sum_{j=1}^{j=l} (\hat{p}_{ij} - p_{ij} )^2\]being l the number of classes and n the number of predictions
- Parameters:
p_true (array_like, shape=(n_classes)) – True prevalences. In case of binary quantification, this parameter could be a single float value.
p_pred (array_like, shape=(n_classes)) – Predicted prevalences. In case of binary quantification, this parameter could be a single float value.
- Returns:
BSM – The Brier score for multiclass problems
- Return type:
float
- check_prevalences(p_true, p_pred)[source]¶
Check that p_true and p_pred are valid and consistent
- Parameters:
p_true (array_like, shape = (n_classes)) – True prevalences
p_pred (array_like, shape = (n_classes)) – Predicted prevalences
- Returns:
p_true (array-like of shape = (n_classes, 1)) – The converted and validated p_true array
p_pred (array-like of shape = (n_classes, 1)) – The converted and validated p_pred array
- geometric_mean(y_true, y_pred, correction=0.0)[source]¶
Compute the geometric mean.
In quantification, the geometric mean is useful to training a classifier for imbalanced problems (a quite common issue). The geometric mean tries to maximize the accuracy for all classes, their accuracies must be balanced to obtain a good score for geometric mean. It is computed as the root of the product of classes sensitivity.
The optimal value is 1 and the worst is 0 (this occurs when the accuracy for one class is 0). To dealt with worst-case for highly multiclass problems, the sensitivity of unrecognized classes can be corrected to a given value (instead of zero), see correction parameter.
The implementation given here is a simplification of the one provide in imbalanced library.
- Parameters:
y_true (ndarray, shape (n_examples,)) – True class for each example
y_pred (array, shape (n_examples,)) – Predicted class returned by a classifier
correction (float, default=0.0) – Substitutes sensitivity of unrecognized classes from zero to this value.
- Returns:
g_mean – The geometric mean
- Return type:
float
- hd(p_true, p_pred)[source]¶
Hellinger distance (HD)
\(hd = \sqrt{\sum_{j=1}^{j=l} (\sqrt{p_j} - \sqrt{\hat{p}_j}}\)
being l the number of classes
- Parameters:
p_true (array_like, shape = (n_classes)) – True prevalences
p_pred (array_like, shape = (n_classes)) – Predicted prevalences
- Returns:
HD – The Hellinger distance
- Return type:
float
- jensenshannon(p_true, p_pred, epsilon=1e-20)[source]¶
Jensen-Shannon divergence (a=1/2)
\[jsd = 1/2 \sum_{j=1}^{j=l} p_j \cdot \log{(p_j + \epsilon)} + \hat{p}_j \cdot \log{(\hat{p}_j + \epsilon)} - (p_j + \hat{p}_j) \cdot \log{((p_j + \hat{p}_j + \epsilon) / 2)}\]being l the number of classes.
- Parameters:
p_true (array_like, shape=(n_classes)) – True prevalences. In case of binary quantification, this parameter could be a single float value.
p_pred (array_like, shape=(n_classes)) – Predicted prevalences. In case of binary quantification, this parameter could be a single float value.
epsilon (float, optional) – To avoid divison by 0
- Returns:
JSD – The Jensen-Shannon divergence
- Return type:
float
- kld(p_true, p_pred, eps=1e-12)[source]¶
Kullback-Leiber divergence (KLD)
\(kld = \sum_{j=1}^{j=l} p_j \cdot \log{p_j/\hat{p}_j}\)
being l the number of classes.
Also known as discrimination information, relative entropy or normalized cross-entropy (see [Esuli and Sebastiani 2010; Forman 2008]). KLD is a special case of the family of f-divergences
- Parameters:
p_true (array_like, shape = (n_classes)) – True prevalences
p_pred (array_like, shape = (n_classes)) – Predicted prevalences
eps (float,) – To prevent division by 0 and Inf/NaN quant_results
- Returns:
KLD – The Kullback-Leiber divergence
- Return type:
float
- l1(p_true, p_pred)[source]¶
L1 loss function
\(l1 = \sum_{j=1}^{j=l} | p_j - \hat{p}_j |\)
being l the number of classes
- Parameters:
p_true (array_like, shape = (n_classes)) – True prevalences
p_pred (array_like, shape = (n_classes)) – Predicted prevalences
- Returns:
l1 – The L1 lost
- Return type:
float
- l2(p_true, p_pred)[source]¶
L2 loss function
\(l2 = \sqrt{\sum_{j=1}^{j=l} (p_j - \hat{p}_j)^2}\)
being l the number of classes
- Parameters:
p_true (array_like, shape = (n_classes)) – True prevalences
p_pred (array_like, shape = (n_classes)) – Predicted prevalences
- Returns:
l2 – The L2 loss
- Return type:
float
- mean_absolute_error(p_true, p_pred)[source]¶
Mean absolute error
\(mae = 1/l \sum_{j=1}^{j=l} | p_j - \hat{p}_j |\)
being l the number of classes
- Parameters:
p_true (array_like, shape = (n_classes)) – True prevalences
p_pred (array_like, shape = (n_classes)) – Predicted prevalences
- Returns:
MAE – The mean absolute error
- Return type:
float
- mean_squared_error(p_true, p_pred)[source]¶
Mean squared error
\(mse = 1/l \sum_{j=1}^{j=l} (p_j - \hat{p}_j)^2\)
being l the number of classes
- Parameters:
p_true (array_like, shape = (n_classes)) – True prevalences
p_pred (array_like, shape = (n_classes)) – Predicted prevalences
- Returns:
MSE – The mean squared error
- Return type:
float
- probsymmetric(p_true, p_pred, epsilon=1e-20)[source]¶
Probabilistic Symmetric distance
\(psd = 2 \cdot \sum_{j=1}^{j=l} (p_j - \hat{p}_j)^2 / (p_j + \hat{p}_j + \epsilon)\)
being l the number of classes
- Parameters:
p_true (array_like, shape=(n_classes)) – True prevalences. In case of binary quantification, this parameter could be a single float value.
p_pred (array_like, shape=(n_classes)) – Predicted prevalences. In case of binary quantification, this parameter could be a single float value.
epsilon (float, optional) – To avoid divison by 0
- Returns:
PSD – The probabilistic symmetric distance
- Return type:
float
- topsoe(p_true, p_pred, epsilon=1e-20)[source]¶
Topsoe distance
\[topsoe = \sum_{j=1}^{j=l} (p_j \cdot \log{((2 \cdot p_j + \epsilon)/( p_j + \hat{p}_j + \epsilon))}) + (\hat{p}_j \cdot \log{((2 \cdot \hat{p}_j + \epsilon)/( p_j + \hat{p}_j + \epsilon))})\]being l the number of classes.
- Parameters:
p_true (array_like, shape=(n_classes)) – True prevalences. In case of binary quantification, this parameter could be a single float value.
p_pred (array_like, shape=(n_classes)) – Predicted prevalences. In case of binary quantification, this parameter could be a single float value.
epsilon (float, optional) – To avoid divison by 0
- Returns:
TOPSOE – The Topsoe distance
- Return type:
float