quantificationlib.binary.quantiles module¶
Multiclass versions for quantifiers based on representing the distributions using quantiles
- class QUANTy(estimator_train=None, estimator_test=None, n_quantiles=8, distance=<function l2>, tol=1e-05, verbose=0)[source]¶
Bases:
UsingClassifiers
Generic binary methods for quantiles-y method
The idea is to represent the mixture of the training distribution and the testing distribution using quantiles of the predictions given by a classifier (y). The difference between both is minimized using a distance/loss function. This method encapsulates PAC quantifier (Bella et al. 2013). PAC has just 1 quantile and with this class you can define more quantiles and use any distance/loss to measure distribution similarity. The class has a parameter to select the distance used.
This class (as every other class based on distribution matching using classifiers) works in two different ways:
Two estimators are used to classify training examples and testing examples in order to compute the distribution of both sets. Estimators can be already trained
You can directly provide the predictions for the examples in the fit/predict methods. This is useful for synthetic/artificial experiments
The goal in both cases is to guarantee that all methods based on distribution matching are using exactly the same predictions when you compare this kind of quantifiers (and others that also employ an underlying classifier, for instance, CC/PCC and AC/PAC). In the first case, estimators are only trained once and can be shared for several quantifiers of this kind
Multiclass quantification is not implemented yet for this object. It would need a more complex searching algorithm (instead golden_section_search)
- Parameters:
estimator_train (estimator object (default=None)) – An estimator object implementing fit and predict_proba. It is used to classify the examples of the training set and to compute the distribution of each class individually
estimator_test (estimator object (default=None)) – An estimator object implementing fit and predict_proba. It is used to classify the examples of the testing set and to compute the distribution of the whole testing set. For some experiments both estimator_train and estimator_test could be the same
n_quantiles (int) – Number of quantiles
distance (distance function (default=l2)) – It is the name of the distance used to compute the difference between the mixture of the training distribution and the testing distribution
tol (float, (default=1e-05)) – The precision of the solution when search is used to compute the prevalence
verbose (int, optional, (default=0)) – The verbosity level. The default value, zero, means silent mode
- estimator_train¶
Estimator used to classify the examples of the training set
- Type:
estimator
- estimator_test¶
Estimator used to classify the examples of the testing bag
- Type:
estimator
- predictions_train_¶
Predictions of the examples in the training set
- Type:
ndarray, shape (n_examples, n_classes) (probabilities)
- predictions_test_¶
Predictions of the examples in the testing bag
- Type:
ndarray, shape (n_examples, n_classes) (probabilities)
- needs_predictions_train¶
It is True because QUANTy quantifiers need to estimate the training distribution
- Type:
bool, True
- probabilistic_predictions¶
This means that predictions_train_/predictions_test_ contain probabilistic predictions
- Type:
bool, True
- classes_¶
Class labels
- Type:
ndarray, shape (n_classes, )
- y_ext_¶
Repmat of true labels of the training set. When CV_estimator is used with averaged_predictions=False, predictions_train_ will have a larger dimension (factor=n_repetitions * n_folds of the underlying CV) than y. In other cases, y_ext_ == y. y_ext_ is used in fit/predict method whenever the true labels of the training set are needed, instead of y
- Type:
ndarray, shape(len(predictions_train_, 1)
- n_quantiles¶
The number of quantiles to represent data distribution
- Type:
int (default=8)
- distance¶
The name of the distance function used
- Type:
A distance function (default=l2)
- tol¶
The precision of the solution when search is used to compute the solution
- Type:
float
- train_distrib_¶
Contains predictions_train_ in ascending order
- Type:
ndarray, shape (n_examples, 1) binary quantification
- train_labels_¶
Contains the corresponding labels of the examples in train_distrib_ in the same order
- Type:
ndarray, shape (n_examples, 1) binary quantification
- test_distrib_¶
Contains the quantiles of the test distribution
- Type:
ndarray, shape (n_quantiles, 1)
- mixtures_¶
Contains the mixtures for all the prevalences in the range [0, 1] step=0.01. This speeds up the prediction for a collection of testing bags
- Type:
ndarray, shape (101, n_quantiles)
- verbose¶
The verbosity level
- Type:
int
Notes
Notice that at least one between estimator_train/predictions_train and estimator_test/predictions_test must be not None. If both are None a ValueError exception will be raised. If both are not None, predictions_train/predictions_test are used
- fit(X, y, predictions_train=None)[source]¶
This method performs the following operations: 1) fits the estimators for the training set and the testing set (if needed), and 2) computes predictions_train_ (probabilities) if needed. Both operations are performed by the fit method of its superclass. After that, the method orders the predictions for the train set. The actual quantiles are computed by a mixture function because it depends on the class prevalence
- Parameters:
X (array-like, shape (n_examples, n_features)) – Data
y (array-like, shape (n_examples, )) – True classes
predictions_train (ndarray, shape (n_examples, n_classes)) – Predictions of the examples in the training set
- Raises:
ValueError – When estimator_train and predictions_train are at the same time None or not None
- predict(X, predictions_test=None)[source]¶
Predict the class distribution of a testing bag
First, predictions_test_ are computed (if needed, when predictions_test parameter is None) by super().predict() method.
After that, the method computes the quantiles for the testing bag sorting the testing examples according their posterior probabilities.
Finally, the prevalences are computed using golden section search and the distance function of the object
- Parameters:
X (array-like, shape (n_examples, n_features)) – Testing bag
predictions_test (ndarray, shape (n_examples, n_classes) (default=None)) –
They must be probabilities (the estimator used must have a predict_proba method)
If estimator_test is None then predictions_test can not be None. If predictions_test is None, predictions for the testing examples are computed using the predict_proba method of estimator_test (it must be an actual estimator)
- Raises:
ValueError – When estimator_test and predictions_test are at the same time None or not None
- Returns:
prevalences – Contains the predicted prevalence for each class
- Return type:
ndarray, shape(n_classes, )
- set_fit_request(*, predictions_train='$UNCHANGED$')¶
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
predictions_train (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
predictions_train
parameter infit
.self (QUANTy) –
- Returns:
self – The updated object.
- Return type:
object
- set_predict_request(*, predictions_test='$UNCHANGED$')¶
Request metadata passed to the
predict
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed topredict
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it topredict
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
predictions_test (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
predictions_test
parameter inpredict
.self (QUANTy) –
- Returns:
self – The updated object.
- Return type:
object