quantificationlib.baselines.cc module¶
Multiclass versions for CC and PCC quantifiers
- class CC(estimator_test=None, verbose=0)[source]¶
Bases:
UsingClassifiers
Multiclass Classify And Count method
prevalence (class_i) = (1/|Test|) * sum_{x in Test} I ( h(x) == class_i)
This class works in two different ways:
An estimator is used to classify the examples of the testing bag (the estimator can be already trained)
You can directly provide the predictions for the examples in the predict method. This is useful for synthetic/artificial experiments
- Parameters:
estimator_test (estimator object (default=None)) – An estimator object implementing fit and predict methods
verbose (int, optional, (default=0)) – The verbosity level. The default value, zero, means silent mode
- estimator_train¶
- Type:
None. (Not used)
- estimator_test¶
Estimator used to classify the examples of the testing bag
- Type:
estimator object
- needs_predictions_train¶
It is False because CC quantifiers do not need to estimate the training distribution
- Type:
bool, False
- probabilistic_predictions¶
This means that predictions_test_ contains crisp predictions
- Type:
bool, False
- predictions_test_¶
Crisp predictions of the examples in the testing bag
- Type:
ndarray, shape (n_examples, )
- predictions_train_¶
- Type:
None. (Not used)
- classes_¶
Class labels
- Type:
ndarray, shape (n_classes, )
- y_ext_¶
True labels of the training set
- Type:
ndarray, shape(n_examples, )
- verbose¶
The verbosity level
- Type:
int
Notes
Notice that at least one between estimator_test and predictions_test must be not None. If both are None a ValueError exception will be raised. If both are not None, predictions_test is used.
References
George Forman. 2005. Counting positives accurately despite inaccurate classification. In Proceedings of the European Conference on Machine Learning (ECML’05). 564–575.
George Forman. 2008. Quantifying counts and costs via classification. Data Mining Knowledge Discovery 17, 2 (2008), 164–206.
- fit(X, y, predictions_train=None)[source]¶
Fit the estimator for the testing bags when needed. The method checks whether the estimator is trained or not calling the predict method
- Parameters:
X ((sparse) array-like, shape (n_examples, n_features)) – Data
y ((sparse) array-like, shape (n_examples, )) – True classes
predictions_train (None, not used) – Predictions of the examples in the training set.
- predict(X, predictions_test=None)[source]¶
Predict the class distribution of a testing bag
The prevalence for each class is the proportion of examples predicted as belonging to that class
prevalence (class_i) = (1/|Test|) * sum_{x in Test} I ( h(x) == class_i)
- Parameters:
X ((sparse) array-like, shape (n_examples, n_features)) – Data
predictions_test (ndarray, shape (n_examples, ) or (n_examples, n_classes) (default=None)) –
They can be crisp values or probabilities. In the latter case, they are converted to crisp values using __probs2crisps method.
If predictions_test is not None they are copied on predictions_test_ and used. If predictions_test is None, predictions for the testing examples are computed using the predict method of estimator_test (it must be an actual estimator)
- Raises:
ValueError – When estimator_test and predictions_test are both None
- Returns:
prevalences
- Return type:
An ndarray, shape(n_classes, ) with the prevalence for each class
- set_fit_request(*, predictions_train='$UNCHANGED$')¶
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
predictions_train (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
predictions_train
parameter infit
.self (CC) –
- Returns:
self – The updated object.
- Return type:
object
- set_predict_request(*, predictions_test='$UNCHANGED$')¶
Request metadata passed to the
predict
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed topredict
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it topredict
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
predictions_test (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
predictions_test
parameter inpredict
.self (CC) –
- Returns:
self – The updated object.
- Return type:
object
- class PCC(estimator_test=None, verbose=0)[source]¶
Bases:
UsingClassifiers
Multiclass Probabilistic Classify And Count method
prevalence (class_i) = sum_{x in T} P( h(x) == class_i | x )
This class works in two different ways:
An estimator is used to classify the examples of the testing bag (the estimator can be already trained)
You can directly provide the predictions for the examples in the predict method. This is useful for synthetic/artificial experiments
- Parameters:
estimator_test (estimator object (default=None)) – An estimator object implementing fit and predict methods. It is used to classify the testing examples
verbose (int, optional, (default=0)) – The verbosity level. The default value, zero, means silent mode
- estimator_test¶
Estimator used to classify the examples of the testing bag
- Type:
estimator
- predictions_test_¶
Probabilistic predictions of the examples in the testing bag
- Type:
ndarray, shape (n_examples, n_classes)
- estimator_train¶
- Type:
None. (Not used)
- predictions_train_¶
- Type:
None. (Not used)
- needs_predictions_train¶
It is False because PCC quantifiers do not need to estimate the training distribution
- Type:
bool, False
- probabilistic_predictions¶
This means that predictions_test_ contains probabilistic predictions
- Type:
bool, True
- classes_¶
Class labels
- Type:
ndarray, shape (n_classes, )
- y_ext_¶
True labels of the training set
- Type:
ndarray, shape(n_examples, )
- verbose¶
The verbosity level
- Type:
int
Notes
Notice that at least one between estimator_test and predictions_test must be not None. If both are None a ValueError exception will be raised. If both are not None, predictions_test is used.
References
Antonio Bella, Cèsar Ferri, José Hernández-Orallo, and María José Ramírez-Quintana. 2010. Quantification via probability estimators. In Proceedings of the IEEE International Conference on Data Mining (ICDM’10). IEEE, 737–742.
- fit(X, y, predictions_train=None)[source]¶
Fit the estimator for the testing bags when needed. The method checks whether the estimator is trained or not calling the predict method
- Parameters:
X ((sparse) array-like, shape (n_examples, n_features)) – Data
y ((sparse) array-like, shape (n_examples, )) – True classes
predictions_train (Not used) –
- predict(X, predictions_test=None)[source]¶
Predict the class distribution of a testing bag
The prevalence for each class is the average probability for such class
prevalence (class_i) = sum_{x in T} P( h(x) == class_i | x )
- Parameters:
X ((sparse) array-like, shape (n_examples, n_features)) – Data
predictions_test (ndarray, shape (n_examples, n_classes) (default=None)) –
They must be probabilities (the estimator used must have a predict_proba method)
If predictions_test is not None they are copied on predictions_test_ and used. If predictions_test is None, predictions for the testing examples are computed using the predict method of estimator_test (it must be an actual estimator)
- Raises:
ValueError – When estimator_test and predictions_test are both None
- Returns:
prevalences
- Return type:
An ndarray, shape(n_classes, ) with the prevalence for each class
- set_fit_request(*, predictions_train='$UNCHANGED$')¶
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
predictions_train (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
predictions_train
parameter infit
.self (PCC) –
- Returns:
self – The updated object.
- Return type:
object
- set_predict_request(*, predictions_test='$UNCHANGED$')¶
Request metadata passed to the
predict
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed topredict
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it topredict
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
predictions_test (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
predictions_test
parameter inpredict
.self (PCC) –
- Returns:
self – The updated object.
- Return type:
object