quantificationlib.decomposition.ordinal module

Generic decomposition quantifier based on Frank and Hall approach

class FrankAndHallQuantifier(quantifier, estimator_train=None, estimator_test=None, verbose=0)[source]

Bases: UsingClassifiers

Implements a Frank and Hall Ordinal Quantifier given any base quantifier

Trains one quantifier per each model of the Frank and Hall (FH) decocompositon. For instance, in a ordinal classification problem with classes ranging from 1-star to 5-star, FHQuantifier trains 4 quantifiers: 1 vs 2-3-4-5, 1-2 vs 3-4-5, 1-2-3 vs 4-5, 1-2-3-4 vs 5 and combines their predictions. The positive class correspond to the left group of each quantifier ({1}, {1,2}, and so on)

The class works both with quantifiers that require classifiers or not. In the former case, the estimator used for the training distribution and the testing distribution must be a FrankAndHallClassifier

Parameters:
  • quantifier (quantifier object) – The base quantifier used to build the FH decomposition. Any quantifier can be used

  • estimator_train (estimator object, optional, (default=None)) – An estimator object implementing fit and one of predict or predict_proba. It is used to classify the examples of the training set and to obtain their distribution when the base quantifier is an instance of the class UsingClassifiers. Notice that some quantifiers of this kind, namely CC and PCC, do not require an estimator for the training distribution

  • estimator_test (estimator object, optional, (default=None)) – An estimator object implementing fit and one of predict or predict_proba. It is used to classify the examples of the testing bag and to obtain their distribution when the base quantifier is an instance of the class UsingClassifiers. For some experiments both estimators could be the same

  • verbose (int, optional, (default=0)) – The verbosity level. The default value, zero, means silent mode

quantifier

The base quantifier used to build the FH decomposition

Type:

quantifier object

estimator_train

Estimator used to classify the examples of the training set

Type:

estimator

estimator_test

Estimator used to classify the examples of the testing bag

Type:

estimator

predictions_train_

Predictions of the examples in the training set

Type:

ndarray, shape (n_examples, n_classes-1) (probabilistic)

predictions_test_

Predictions of the examples in the testing bag

Type:

ndarray, shape (n_examples, n_classes-1) (probabilistic)

needs_predictions_train

True if the base quantifier needs to estimate the training distribution

Type:

bool, (default=True)

probabilistic_predictions

Not used

Type:

bool

quantifiers_

List of quantifiers, one for each model of a FH decomposition. The number is equal to n_classes - 1

Type:

ndarray, shape (n_classes-1, )

classes_

Class labels

Type:

ndarray, shape (n_classes, )

y_ext_

Repmat of true labels of the training set. When CV_estimator is used with averaged_predictions=False, predictions_train_ will have a larger dimension (factor=n_repetitions * n_folds of the underlying CV_estimator) than y. In other cases, y_ext_ == y. y_ext_ must be used in fit/predict methods whenever the true labels of the training set are needed, instead of y

Type:

ndarray, shape(len(predictions_train_), )

verbose

The verbosity level

Type:

int

static check_and_correct_prevalences_asc(prevalences)[source]

This function checks and corrects the prevalences of a quantifier based on the Frank and Hall decomposition that are inconsistent. It is used by FrankAndHallQuantifier.

To obtain consistent prevalences, we need to ensure that the consecutive probabilities do not decrease.

Example:

Quantifier 1 vs 2-3-4   Prevalence({1}) = 0.3
Quantifier 1-2 vs 3-4   Prevalence({1,2}) = 0.2
Quantifier 1-2-3 vs 4   Prevalence({1,2,3}) = 0.6

This is inconsistent. Following (Destercke, Yang, 2014) the method computes the upper (adjusting from left to right) and the lower (from right to left) cumulative prevalences. These sets of values are monotonically increasing (from left to right) and monotonically decreasing (from right to left), respectively. The average value is assigned to each group

Example:

{1}   {1-2}  {1-2-3}
0.3   0.3    0.6      Upper cumulative prevalences (adjusting from left to right)
0.2   0.2    0.6      Lower cumulative prevalences (adjusting from right to left)
----------------
0.25  0.25   0.6      Averaged prevalences
Parameters:

prevalences (array, shape(n_classes-1, )) – The prevalences of the binary quantifiers of a FrankAndHallQuantifier for a single dataset

Returns:

prevalences_ok – The corrected prevalences ensuring that do not decrease (from left to right)

Return type:

array, shape(n_classes-1)

References

Sébastien Destercke, Gen Yang. Cautious Ordinal Classification by Binary Decomposition. Machine Learning and Knowledge Discovery in Databases - European Conference ECML/PKDD, Sep 2014, Nancy, France. pp.323 - 337, 2014,

fit(X, y, predictions_train=None)[source]

Fits all the quanfifiers of a FH decomposition

First, the method fits the estimators (estimator_train and estimator_test) (if needed) using the fit method of its superclass

Then, it creates (using deepcopy) the set on quantifiers_ (n_classes-1 quantifiers) and fit them

Parameters:
  • X (array-like, shape (n_examples, n_features)) – Data

  • y (array-like, shape (n_examples, )) – True classes

  • predictions_train (ndarray, optional, shape (n_examples, n_classes-1) (probs)) – Predictions of the examples in the training set

Raises:

ValueError – When estimator_train or estimator_test are not instances of OneVsRestClassifier

predict(X, predictions_test=None)[source]

Aggregates the prevalences of the quantifiers_ to compute the final prediction

In this kind of decomposition strategy it is important to ensure that the aggregated consecutive prevalencences do not decrease:

Example:

Quantifier 1 vs 2-3-4   Prevalence({1}) = 0.3
Quantifier 1-2 vs 3-4   Prevalence({1,2}) = 0.2
Quantifier 1-2-3 vs 4   Prevalence({1,2,3}) = 0.6

This is inconsistent. Following (Destercke, Yang, 2014) the method computes the upper (adjusting from left to right) and the lower (from right to left) cumulative prevalences. These sets of values are monotonically increasing (from left to right) and monotonically decreasing (from right to left), respectively. The average value is assigned to each group and the prevalence for each class is computed as:

Prevalence({y_k}) = Prevalence({y_1,…,y_k}) - Prevalence({y_1,…,y_k-1})

Example:

{1}   {1-2}  {1-2-3}
0.3   0.3    0.6    Upper cumulative prevalences (adjusting from left to right)
0.2   0.2    0.6    Lower cumulative prevalences (adjusting from right to left)
----------------
0.25  0.25   0.6    Averaged prevalences

Prevalence({1}) = 0.25
Prevalence({2}) = Prevalence({1,2}) - Prevalence({1}) = 0.25 - 0 .25 = 0
Prevalence({3}) = Prevalence({1,2,3}} - Prevalence({1,2}) = 0.6 - 0.25 = 0.35

The last class is computed as 1 - the sum of prevalences for the rest of classes

Prevalence({4}) = 1 - Prevalence({1,2,3}} = 1 - 0.6 = 0.4
Parameters:
  • X ((sparse) array-like, shape (n_examples, n_features)) – Data

  • predictions_test (ndarray, shape (n_examples, n_classes) (default=None)) – Predictions for the testing bag

Returns:

prevalences – Contains the predicted prevalence for each class

Return type:

ndarray, shape(n_classes, )

References

Destercke, S., & Yang, G. (2014, September). Cautious ordinal classification by binary decomposition. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 323-337).

set_fit_request(*, predictions_train='$UNCHANGED$')

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
  • predictions_train (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for predictions_train parameter in fit.

  • self (FrankAndHallQuantifier) –

Returns:

self – The updated object.

Return type:

object

set_predict_request(*, predictions_test='$UNCHANGED$')

Request metadata passed to the predict method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
  • predictions_test (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for predictions_test parameter in predict.

  • self (FrankAndHallQuantifier) –

Returns:

self – The updated object.

Return type:

object