quantificationlib.metrics.multiclass module

Score functions and loss functions for multiclass quantification problems

bray_curtis(p_true, p_pred)[source]

Bray-Curtis dissimilarity

\(bcd = \sum_{j=1}^{j=l} | (p_j - \hat{p}_j) | / \sum_{j=1}^{j=l} (p_j + \hat{p}_j)\)

being l the number of classes

Parameters:
  • p_true (array_like, shape=(n_classes)) – True prevalences. In case of binary quantification, this parameter could be a single float value.

  • p_pred (array_like, shape=(n_classes)) – Predicted prevalences. In case of binary quantification, this parameter could be a single float value.

Returns:

BCD – The Bray-Curtis dissimilarity

Return type:

float

brier_multi(p_true, p_pred)[source]

Brier score (classification) for multiclass problems

\[bsm = 1/n \sum_{i=1}^{i=n} \sum_{j=1}^{j=l} (\hat{p}_{ij} - p_{ij} )^2\]

being l the number of classes and n the number of predictions

Parameters:
  • p_true (array_like, shape=(n_classes)) – True prevalences. In case of binary quantification, this parameter could be a single float value.

  • p_pred (array_like, shape=(n_classes)) – Predicted prevalences. In case of binary quantification, this parameter could be a single float value.

Returns:

BSM – The Brier score for multiclass problems

Return type:

float

check_prevalences(p_true, p_pred)[source]

Check that p_true and p_pred are valid and consistent

Parameters:
  • p_true (array_like, shape = (n_classes)) – True prevalences

  • p_pred (array_like, shape = (n_classes)) – Predicted prevalences

Returns:

  • p_true (array-like of shape = (n_classes, 1)) – The converted and validated p_true array

  • p_pred (array-like of shape = (n_classes, 1)) – The converted and validated p_pred array

geometric_mean(y_true, y_pred, correction=0.0)[source]

Compute the geometric mean.

In quantification, the geometric mean is useful to training a classifier for imbalanced problems (a quite common issue). The geometric mean tries to maximize the accuracy for all classes, their accuracies must be balanced to obtain a good score for geometric mean. It is computed as the root of the product of classes sensitivity.

The optimal value is 1 and the worst is 0 (this occurs when the accuracy for one class is 0). To dealt with worst-case for highly multiclass problems, the sensitivity of unrecognized classes can be corrected to a given value (instead of zero), see correction parameter.

The implementation given here is a simplification of the one provide in imbalanced library.

Parameters:
  • y_true (ndarray, shape (n_examples,)) – True class for each example

  • y_pred (array, shape (n_examples,)) – Predicted class returned by a classifier

  • correction (float, default=0.0) – Substitutes sensitivity of unrecognized classes from zero to this value.

Returns:

g_mean – The geometric mean

Return type:

float

hd(p_true, p_pred)[source]

Hellinger distance (HD)

\(hd = \sqrt{\sum_{j=1}^{j=l} (\sqrt{p_j} - \sqrt{\hat{p}_j}}\)

being l the number of classes

Parameters:
  • p_true (array_like, shape = (n_classes)) – True prevalences

  • p_pred (array_like, shape = (n_classes)) – Predicted prevalences

Returns:

HD – The Hellinger distance

Return type:

float

jensenshannon(p_true, p_pred, epsilon=1e-20)[source]

Jensen-Shannon divergence (a=1/2)

\[jsd = 1/2 \sum_{j=1}^{j=l} p_j \cdot \log{(p_j + \epsilon)} + \hat{p}_j \cdot \log{(\hat{p}_j + \epsilon)} - (p_j + \hat{p}_j) \cdot \log{((p_j + \hat{p}_j + \epsilon) / 2)}\]

being l the number of classes.

Parameters:
  • p_true (array_like, shape=(n_classes)) – True prevalences. In case of binary quantification, this parameter could be a single float value.

  • p_pred (array_like, shape=(n_classes)) – Predicted prevalences. In case of binary quantification, this parameter could be a single float value.

  • epsilon (float, optional) – To avoid divison by 0

Returns:

JSD – The Jensen-Shannon divergence

Return type:

float

kld(p_true, p_pred, eps=1e-12)[source]

Kullback-Leiber divergence (KLD)

\(kld = \sum_{j=1}^{j=l} p_j \cdot \log{p_j/\hat{p}_j}\)

being l the number of classes.

Also known as discrimination information, relative entropy or normalized cross-entropy (see [Esuli and Sebastiani 2010; Forman 2008]). KLD is a special case of the family of f-divergences

Parameters:
  • p_true (array_like, shape = (n_classes)) – True prevalences

  • p_pred (array_like, shape = (n_classes)) – Predicted prevalences

  • eps (float,) – To prevent division by 0 and Inf/NaN quant_results

Returns:

KLD – The Kullback-Leiber divergence

Return type:

float

l1(p_true, p_pred)[source]

L1 loss function

\(l1 = \sum_{j=1}^{j=l} | p_j - \hat{p}_j |\)

being l the number of classes

Parameters:
  • p_true (array_like, shape = (n_classes)) – True prevalences

  • p_pred (array_like, shape = (n_classes)) – Predicted prevalences

Returns:

l1 – The L1 lost

Return type:

float

l2(p_true, p_pred)[source]

L2 loss function

\(l2 = \sqrt{\sum_{j=1}^{j=l} (p_j - \hat{p}_j)^2}\)

being l the number of classes

Parameters:
  • p_true (array_like, shape = (n_classes)) – True prevalences

  • p_pred (array_like, shape = (n_classes)) – Predicted prevalences

Returns:

l2 – The L2 loss

Return type:

float

mean_absolute_error(p_true, p_pred)[source]

Mean absolute error

\(mae = 1/l \sum_{j=1}^{j=l} | p_j - \hat{p}_j |\)

being l the number of classes

Parameters:
  • p_true (array_like, shape = (n_classes)) – True prevalences

  • p_pred (array_like, shape = (n_classes)) – Predicted prevalences

Returns:

MAE – The mean absolute error

Return type:

float

mean_squared_error(p_true, p_pred)[source]

Mean squared error

\(mse = 1/l \sum_{j=1}^{j=l} (p_j - \hat{p}_j)^2\)

being l the number of classes

Parameters:
  • p_true (array_like, shape = (n_classes)) – True prevalences

  • p_pred (array_like, shape = (n_classes)) – Predicted prevalences

Returns:

MSE – The mean squared error

Return type:

float

probsymmetric(p_true, p_pred, epsilon=1e-20)[source]

Probabilistic Symmetric distance

\(psd = 2 \cdot \sum_{j=1}^{j=l} (p_j - \hat{p}_j)^2 / (p_j + \hat{p}_j + \epsilon)\)

being l the number of classes

Parameters:
  • p_true (array_like, shape=(n_classes)) – True prevalences. In case of binary quantification, this parameter could be a single float value.

  • p_pred (array_like, shape=(n_classes)) – Predicted prevalences. In case of binary quantification, this parameter could be a single float value.

  • epsilon (float, optional) – To avoid divison by 0

Returns:

PSD – The probabilistic symmetric distance

Return type:

float

topsoe(p_true, p_pred, epsilon=1e-20)[source]

Topsoe distance

\[topsoe = \sum_{j=1}^{j=l} (p_j \cdot \log{((2 \cdot p_j + \epsilon)/( p_j + \hat{p}_j + \epsilon))}) + (\hat{p}_j \cdot \log{((2 \cdot \hat{p}_j + \epsilon)/( p_j + \hat{p}_j + \epsilon))})\]

being l the number of classes.

Parameters:
  • p_true (array_like, shape=(n_classes)) – True prevalences. In case of binary quantification, this parameter could be a single float value.

  • p_pred (array_like, shape=(n_classes)) – Predicted prevalences. In case of binary quantification, this parameter could be a single float value.

  • epsilon (float, optional) – To avoid divison by 0

Returns:

TOPSOE – The Topsoe distance

Return type:

float