quantificationlib.estimators.cross_validation module¶
Estimator object based on Cross Validation
- class CV_estimator(estimator, groups=None, cv='warn', n_jobs=None, fit_params=None, pre_dispatch='2*n_jobs', averaged_predictions=True, voting='hard', verbose=0)[source]¶
Bases:
BaseEstimator
,ClassifierMixin
Cross Validation Estimator
The idea is to have an estimator in which the model is formed by the models of a CV. This object is needed to estimate the distribution of the training set and testing sets. It has a fit method, that trains the models of the CV, and the typical methods predict and predict_proba to compute the predictions using such models. This implies that this object can be used by any distribution matching method that requires an estimator to represent the distributions
- Parameters:
sklearn (Mainly the same that cross_validate method in) –
estimator (estimator object implementing fit) – The object to use to fit the data.
groups (array-like, with shape (n_samples,), optional) – Group labels for the samples used while splitting the dataset into train/test set.
cv (int, cross-validation generator or an iterable, optional) –
Determines the cross-validation splitting strategy. Possible inputs for cv are:
None, to use the default 3-fold cross validation,
integer, to specify the number of folds in a (Stratified)KFold,
CV splitter,
An iterable yielding (train, test) splits as arrays of indices.
For integer/None inputs, if the estimator is a classifier and
y
is either binary or multiclass,StratifiedKFold
is used. In all other cases,KFold
is used.n_jobs (int or None, optional (default=None)) – The number of CPUs to use to do the computation.
None
means 1 unless in ajoblib.parallel_backend
context.-1
means using all processors.fit_params (dict, optional) – Parameters to pass to the fit method of the estimator.
pre_dispatch (int, or string, optional) –
Controls the number of jobs that get dispatched during parallel execution. Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be:
None, in which case all the jobs are immediately created and spawned. Use this for lightweight and fast-running jobs, to avoid delays due to on-demand spawning of the jobs
An int, giving the exact number of total jobs that are spawned
A string, giving an expression as a function of n_jobs, as in 2*n_jobs
averaged_predictions (bool, optional (default=True)) – If True, predict and predict_proba methods average the predictions given by estimators_ for each example
voting (str, {'hard', 'soft'} (default='hard')) – Only used when averaged_predictions is True. If ‘hard’, predict and predict_proba methods apply majority rule voting. If ‘soft’, predict the class label based on the argmax of the sums of the predicted probabilities, which is recommended for an ensemble of well-calibrated classifiers.
verbose (integer, optional (default=0)) – The verbosity level.
- estimator¶
The estimator to fit each model of the CV
- Type:
An estimator object
- estimators_¶
The list of estimators trained by fit method. The number of estimators is equal to the number of folds times number of repetitions
- Type:
list of trained estimators
- averaged_predictions¶
Determines whether the predictions for each example given by estimators_ are averaged or not
- Type:
bool
- voting¶
How predictions are aggregated:
‘hard’, applying majority rule voting
‘soft’, based on the argmax of the sums of the predicted probabilities
- Type:
str, {‘hard’, ‘soft’} (default=’hard’)
- le_¶
Used to compute the class labels
- Type:
a LabelEncoder fitted object
- classes_¶
Class labels
- Type:
ndarray, shape (n_classes, )
- X_train_¶
Data. It is needed to obtain the predictions over the own training set
- Type:
array-like, shape (n_examples, n_features)
- y_train_¶
True classes. It is needed to obtain the predictions over the own training set
- Type:
array-like, shape (n_examples, )
- verbose¶
The verbosity level.
- Type:
integer
- fit(X, y)[source]¶
Fit the models It calls cross_validate to fit the models and save them in estimators_ attribute. It also stores some attributes needed by predict and predict_proba, namely, le_, classes_, X_train and y_train_
- Parameters:
X (array-like, shape (n_examples, n_features)) – Data
y (array-like, shape (n_examples, )) – True classes
- predict(X)[source]¶
Returns the crisp predictions given by a CV estimator
- Parameters:
X (array-like, shape (n_examples, n_features)) – Test ata
- Returns:
preds – Crisp predictions for the examples in X
- Training set:
averaged_predictions == True, shape(n_examples, )
averaged_predictions == False, shape(n_examples * n_reps, )
- Testing set:
averaged_predictions == True, shape(n_examples, )
averaged_predictions == False, shape(n_examples * n_reps * n_folds, )
- Return type:
array-like, shape depends on type of the examples and the value of averaged_predictions
- predict_proba(X)[source]¶
Returns probabilistic predictions given by a CV estimator
- Parameters:
X (array-like, shape (n_examples, n_features)) – Test ata
- Returns:
preds – Probabilistic predictions for the examples in X.
- Training set:
averaged_predictions == True, shape(n_examples, n_classes)
averaged_predictions == False, shape(n_examples * n_reps, n_classes)
- Testing set:
averaged_predictions == True, shape(n_examples, n_classes)
averaged_predictions == False, shape(n_examples * n_reps * n_folds, n_classes)
- Return type:
array-like, shape depends on type of the examples and the value of averaged_predictions
- set_score_request(*, sample_weight='$UNCHANGED$')¶
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter inscore
.self (CV_estimator) –
- Returns:
self – The updated object.
- Return type:
object