quantificationlib.baselines.cc module

Multiclass versions for CC and PCC quantifiers

class CC(estimator_test=None, verbose=0)[source]

Bases: UsingClassifiers

Multiclass Classify And Count method

prevalence (class_i) = (1/|Test|) * sum_{x in Test} I ( h(x) == class_i)

This class works in two different ways:

  1. An estimator is used to classify the examples of the testing bag (the estimator can be already trained)

  2. You can directly provide the predictions for the examples in the predict method. This is useful for synthetic/artificial experiments

Parameters:
  • estimator_test (estimator object (default=None)) – An estimator object implementing fit and predict methods

  • verbose (int, optional, (default=0)) – The verbosity level. The default value, zero, means silent mode

estimator_train
Type:

None. (Not used)

estimator_test

Estimator used to classify the examples of the testing bag

Type:

estimator object

needs_predictions_train

It is False because CC quantifiers do not need to estimate the training distribution

Type:

bool, False

probabilistic_predictions

This means that predictions_test_ contains crisp predictions

Type:

bool, False

predictions_test_

Crisp predictions of the examples in the testing bag

Type:

ndarray, shape (n_examples, )

predictions_train_
Type:

None. (Not used)

classes_

Class labels

Type:

ndarray, shape (n_classes, )

y_ext_

True labels of the training set

Type:

ndarray, shape(n_examples, )

verbose

The verbosity level

Type:

int

Notes

Notice that at least one between estimator_test and predictions_test must be not None. If both are None a ValueError exception will be raised. If both are not None, predictions_test is used.

References

George Forman. 2005. Counting positives accurately despite inaccurate classification. In Proceedings of the European Conference on Machine Learning (ECML’05). 564–575.

George Forman. 2008. Quantifying counts and costs via classification. Data Mining Knowledge Discovery 17, 2 (2008), 164–206.

fit(X, y, predictions_train=None)[source]

Fit the estimator for the testing bags when needed. The method checks whether the estimator is trained or not calling the predict method

Parameters:
  • X ((sparse) array-like, shape (n_examples, n_features)) – Data

  • y ((sparse) array-like, shape (n_examples, )) – True classes

  • predictions_train (None, not used) – Predictions of the examples in the training set.

predict(X, predictions_test=None)[source]

Predict the class distribution of a testing bag

The prevalence for each class is the proportion of examples predicted as belonging to that class

prevalence (class_i) = (1/|Test|) * sum_{x in Test} I ( h(x) == class_i)

Parameters:
  • X ((sparse) array-like, shape (n_examples, n_features)) – Data

  • predictions_test (ndarray, shape (n_examples, ) or (n_examples, n_classes) (default=None)) –

    They can be crisp values or probabilities. In the latter case, they are converted to crisp values using __probs2crisps method.

    If predictions_test is not None they are copied on predictions_test_ and used. If predictions_test is None, predictions for the testing examples are computed using the predict method of estimator_test (it must be an actual estimator)

Raises:

ValueError – When estimator_test and predictions_test are both None

Returns:

prevalences

Return type:

An ndarray, shape(n_classes, ) with the prevalence for each class

set_fit_request(*, predictions_train='$UNCHANGED$')

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
  • predictions_train (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for predictions_train parameter in fit.

  • self (CC) –

Returns:

self – The updated object.

Return type:

object

set_predict_request(*, predictions_test='$UNCHANGED$')

Request metadata passed to the predict method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
  • predictions_test (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for predictions_test parameter in predict.

  • self (CC) –

Returns:

self – The updated object.

Return type:

object

class PCC(estimator_test=None, verbose=0)[source]

Bases: UsingClassifiers

Multiclass Probabilistic Classify And Count method

prevalence (class_i) = sum_{x in T} P( h(x) == class_i | x )

This class works in two different ways:

  1. An estimator is used to classify the examples of the testing bag (the estimator can be already trained)

  2. You can directly provide the predictions for the examples in the predict method. This is useful for synthetic/artificial experiments

Parameters:
  • estimator_test (estimator object (default=None)) – An estimator object implementing fit and predict methods. It is used to classify the testing examples

  • verbose (int, optional, (default=0)) – The verbosity level. The default value, zero, means silent mode

estimator_test

Estimator used to classify the examples of the testing bag

Type:

estimator

predictions_test_

Probabilistic predictions of the examples in the testing bag

Type:

ndarray, shape (n_examples, n_classes)

estimator_train
Type:

None. (Not used)

predictions_train_
Type:

None. (Not used)

needs_predictions_train

It is False because PCC quantifiers do not need to estimate the training distribution

Type:

bool, False

probabilistic_predictions

This means that predictions_test_ contains probabilistic predictions

Type:

bool, True

classes_

Class labels

Type:

ndarray, shape (n_classes, )

y_ext_

True labels of the training set

Type:

ndarray, shape(n_examples, )

verbose

The verbosity level

Type:

int

Notes

Notice that at least one between estimator_test and predictions_test must be not None. If both are None a ValueError exception will be raised. If both are not None, predictions_test is used.

References

Antonio Bella, Cèsar Ferri, José Hernández-Orallo, and María José Ramírez-Quintana. 2010. Quantification via probability estimators. In Proceedings of the IEEE International Conference on Data Mining (ICDM’10). IEEE, 737–742.

fit(X, y, predictions_train=None)[source]

Fit the estimator for the testing bags when needed. The method checks whether the estimator is trained or not calling the predict method

Parameters:
  • X ((sparse) array-like, shape (n_examples, n_features)) – Data

  • y ((sparse) array-like, shape (n_examples, )) – True classes

  • predictions_train (Not used) –

predict(X, predictions_test=None)[source]

Predict the class distribution of a testing bag

The prevalence for each class is the average probability for such class

prevalence (class_i) = sum_{x in T} P( h(x) == class_i | x )

Parameters:
  • X ((sparse) array-like, shape (n_examples, n_features)) – Data

  • predictions_test (ndarray, shape (n_examples, n_classes) (default=None)) –

    They must be probabilities (the estimator used must have a predict_proba method)

    If predictions_test is not None they are copied on predictions_test_ and used. If predictions_test is None, predictions for the testing examples are computed using the predict method of estimator_test (it must be an actual estimator)

Raises:

ValueError – When estimator_test and predictions_test are both None

Returns:

prevalences

Return type:

An ndarray, shape(n_classes, ) with the prevalence for each class

set_fit_request(*, predictions_train='$UNCHANGED$')

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
  • predictions_train (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for predictions_train parameter in fit.

  • self (PCC) –

Returns:

self – The updated object.

Return type:

object

set_predict_request(*, predictions_test='$UNCHANGED$')

Request metadata passed to the predict method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
  • predictions_test (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for predictions_test parameter in predict.

  • self (PCC) –

Returns:

self – The updated object.

Return type:

object