quantificationlib.binary.debias module¶
De-Bias quantifier (just for binary quantification)
- class DeBias(estimator_test=None, estimator_train=None, verbose=0)[source]¶
Bases:
UsingClassifiers
Binary quantifier based on De-Bias estimate proposed by Friedman
prevalence (positives) = prior(positives) + ( prevalence_PCC - prior(positives) ) / Vt
where
- Vt = [ 1/|T| sum_{x in D} (P(h(x)==+1|x) - prior(positives) )^2 ]
/ (prior(positives) * prior(negatives))
This class works in two different ways:
An estimator is used to classify the examples of the testing bag (the estimator can be already trained)
You can directly provide the predictions for the examples in the predict method. This is useful for synthetic/artificial experiments
- Parameters:
estimator_train (estimator object (default=None)) – An estimator object implementing fit and predict_proba. It is used to classify the examples of the training set and to compute the confusion matrix
estimator_test (estimator object (default=None)) – An estimator object implementing fit and predict_proba. It is used to classify the examples of the testing set and to obtain the confusion matrix of the testing set. For some experiments both estimators could be the same
verbose (int, optional, (default=0)) – The verbosity level. The default value, zero, means silent mode
- estimator_train¶
Estimator used to classify the examples of the training set.
- Type:
estimator
- estimator_test¶
Estimator used to classify the examples of the testing bag
- Type:
estimator
- predictions_train_¶
Predictions of the examples in the training set
- Type:
ndarray, shape (n_examples, n_classes) (probabilistic estimator)
- predictions_test_¶
Predictions of the examples in the testing bag
- Type:
ndarray, shape (n_examples, n_classes) (probabilistic estimator)
- probabilistic_predictions¶
This means that predictions_train_/predictions_test_ contain probabilistic predictions
- Type:
bool, True
- needs_predictions_train¶
It is True because DeBias quantifiers need to estimate the training distribution
- Type:
bool, True
- classes_¶
Class labels
- Type:
ndarray, shape (n_classes, )
- y_ext_¶
True labels of the training set
- Type:
ndarray, shape(n_examples, )
- train_prevs_¶
Prevalence of each class in the training set
- Type:
ndarray, shape (n_classes, )
- Vt_¶
- The value of equation
- Vt = [ 1/|T| sum_{x in D} (P(h(x)==+1|x) - train_prevs_[1])^2 ]
/ (train_prevs_[1] * train_prevs_[0])
applied over the training examples D
- Type:
float
- verbose¶
The verbosity level
- Type:
int
Notes
Notice that at least one between estimator_train/predictions_train and estimator_test/predictions_test must be not None. If both are None a ValueError exception will be raised. If both are not None, predictions_train/predictions_test are used
References
Jerome Friedman. Class counts in future unlabeled samples. Presentation at MIT CSAIL Big Data Event, 2014.
- fit(X, y, predictions_train=None)[source]¶
This method performs the following operations: 1) fits the estimators for the training set and the testing set (if needed), and 2) computes predictions_train_ (probabilities) if needed. Both operations are performed by the fit method of its superclass.
Finally the method computes the value of Vt
- Vt = [ 1/|T| sum_{x in D} (P(h(x)==+1|x) - prior(positives) )^2 ]
/ (prior(positives) * prior(negatives))
- Parameters:
X (array-like, shape (n_examples, n_features)) – Data
y (array-like, shape (n_examples, )) – True classes
predictions_train (ndarray, shape (n_examples, n_classes)) – Predictions of the training set
- Raises:
ValueError – When estimator_train and predictions_train are both None
AttributeError – When the number of classes > 2
- predict(X, predictions_test=None)[source]¶
Predict the class distribution of a testing bag
The prevalence for the positive class is
prevalence (positives) = prior(positives) + ( prevalence_PCC - prior(positives) ) / Vt
- Parameters:
X ((sparse) array-like, shape (n_examples, n_features)) – Data
predictions_test (ndarray, shape (n_examples, n_classes) (default=None)) –
They must be probabilities (the estimator used must have a predict_proba method)
If predictions_test is not None they are copied on predictions_test_ and used. If predictions_test is None, predictions for the testing examples are computed using the predict method of estimator_test (it must be an actual estimator)
- Raises:
ValueError – When estimator_test and predictions_test are both None
- Returns:
prevalences
- Return type:
An ndarray, shape(n_classes, ) with the prevalence for each class
- set_fit_request(*, predictions_train='$UNCHANGED$')¶
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
predictions_train (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
predictions_train
parameter infit
.self (DeBias) –
- Returns:
self – The updated object.
- Return type:
object
- set_predict_request(*, predictions_test='$UNCHANGED$')¶
Request metadata passed to the
predict
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed topredict
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it topredict
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
predictions_test (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
predictions_test
parameter inpredict
.self (DeBias) –
- Returns:
self – The updated object.
- Return type:
object