quantificationlib.bag_generator module¶
Classes and functions for generating bags of examples and distributions of bags with different kind of drifts
- class CovariateShift_BagGenerator(n_bags=1001, bag_size=None, random_state=2032, verbose=0)[source]¶
Bases:
BagGenerator
Generate bags with covariate shift
The idea is to pick just an instance from X and then randomly selecting the examples of the bag according to their distance to said instance
- Parameters:
n_bags (int, (default=1000)) – Number of bags
bag_size (int, (default=None)) – Number of examples in each bag
random_state (int, RandomState instance, (default=2032)) – To generate random numbers. If type(random_state) is int, random_state is the seed used by the random number generator; If random_state is a RandomState instance, random_state is the own random number generator;
verbose (int, optional, (default=0)) – The verbosity level. The default value, zero, means silent mode
- n_bags¶
Number of bags
- Type:
int
- bag_size¶
Number of examples in each bag
- Type:
int
- random_state¶
To generate random numbers
- Type:
int, RandomState instance
- verbose¶
The verbosity level
- Type:
int, optional
- prevalences_¶
i-th row contains the true prevalences of each generated bag
- Type:
array-like, shape (n_classes, n_bags)
- indexes_¶
i-th column contains the indexes of the examples for i-th bag
- Type:
array-line, shape (bag_size, n_bags)
- generate_bags(X, y)[source]¶
Create bags of examples simulating covariate shift
The method first picks a center example for each bag. The probability to select an example for the bag is proportional to the distance to the centrr example.
- Parameters:
X (array-like, shape (n_examples, n_features)) – Data
y (array-like, shape (n_examples, )) – True classes
- Returns:
prevalences (numpy array, shape (n_bags, n_classes)) – Each row contains the prevalences of the corresponding bag
indexes (numpy array, shape (size_bags, n_bags)) – Each column contains the indexes of the examples of the bag
- Raises:
ValueError – When random_state is neither a int nor a RandomState object
- class PriorAndCovariateShift_BagGenerator(n_bags=1000, bag_size=None, method='Uniform', alphas=None, min_prevalence=None, random_state=2032, verbose=0)[source]¶
Bases:
BagGenerator
Generate bags with a mix of prior probability shift and covariate shfit
This class generates the bags using two objects of the classes PriorShift_BagGenerator and CovariateShift_BagGenerator
- Parameters:
n_bags (int, (default=1000)) – Number of bags
bag_size (int, (default=None)) – Number of examples in each bag
method (str, (default='Uniform')) –
Method used to generate the distributions. Two methods available:
’Uniform’ : the prevalences are uniformly distributed
’Dirichlet’ : the prevalences are generated using the Dirichlet distribution
alphas (None, float or array-like, (default=None), shape (n_classes, ) when it is an array) – The parameters for the Dirichlet distribution when the selected method is ‘Dirichlet’
min_prevalence (None, float or array-like, (default=None)) – The min prevalence for each class. If None the min prevalence will be 0. If just a single value is passed all classes have the same min_prevalence value. This parameter is only used when ‘Uniform’ method is selected
random_state (int, RandomState instance, (default=2032)) – To generate random numbers. If type(random_state) is int, random_state is the seed used by the random number generator; If random_state is a RandomState instance, random_state is the own random number generator;
verbose (int, optional, (default=0)) – The verbosity level. The default value, zero, means silent mode
- n_bags¶
Number of bags
- Type:
int
- bag_size¶
Number of examples in each bag
- Type:
int
- min_prevalence¶
The min prevalence for each class.
- Type:
None, float or array-like
- random_state¶
To generate random numbers
- Type:
int, RandomState instance
- verbose¶
The verbosity level
- Type:
int, optional
- prevalences_¶
i-th row contains the true prevalences of each generated bag
- Type:
array-like, shape (n_classes, n_bags)
- indexes_¶
i-th column contains the indexes of the examples for i-th bag
- Type:
array-line, shape (bag_size, n_bags)
- generate_bags(X, y)[source]¶
Create bags of examples simulating prior probability shift and covariate shift. It uses instances of classes PriorShift_BagGenerator and CovariateShift_BagGenerator
- Parameters:
X (array-like, shape (n_examples, n_features)) – Data
y (array-like, shape (n_examples, )) – True classes
- Returns:
prevalences (numpy array, shape (n_bags, n_classes)) – Each row contains the prevalences of the corresponding bag
indexes (numpy array, shape (size_bags, n_bags)) – Each column contains the indexes of the examples of the bag
- Raises:
ValueError – When random_state is neither a int nor a RandomState object
- class PriorShift_BagGenerator(n_bags=1000, bag_size=None, method='Uniform', alphas=None, min_prevalence=None, random_state=2032, verbose=0)[source]¶
Bases:
BagGenerator
Generate bags with prior probability shift
- Parameters:
n_bags (int, (default=1000)) – Number of bags
bag_size (int, (default=None)) – Number of examples in each bag
method (str, (default='Uniform')) –
Method used to generate the distributions. Two methods available:
’Uniform’ : the prevalences are uniformly distributed
’Dirichlet’ : the prevalences are generated using the Dirichlet distribution
alphas (None, float or array-like, (default=None), shape (n_classes, ) when it is an array) – The parameters for the Dirichlet distribution when the selected method is ‘Dirichlet’
min_prevalence (None, float or array-like, (default=None)) – The min prevalence for each class. If None the min prevalence will be 0. If just a single value is passed all classes have the same min_prevalence value. This parameter is only used when ‘Uniform’ method is selected
random_state (int, RandomState instance, (default=2032)) – To generate random numbers. If type(random_state) is int, random_state is the seed used by the random number generator; If random_state is a RandomState instance, random_state is the own random number generator;
verbose (int, optional, (default=0)) – The verbosity level. The default value, zero, means silent mode
- n_bags¶
Number of bags
- Type:
int
- bag_size¶
Number of examples in each bag
- Type:
int
- method¶
Method used to generate the prevalences
- Type:
str
- alphas¶
Parameters of the Dirichlet distribution
- Type:
None, float or array-like
- min_prevalence¶
The min prevalence for each class
- Type:
None, float or array-like
- random_state¶
To generate random numbers
- Type:
int, RandomState instance
- verbose¶
The verbosity level
- Type:
int, optional
- prevalences_¶
i-th row contains the true prevalences of each generated bag
- Type:
array-like, shape (n_classes, n_bags)
- indexes_¶
i-th column contains the indexes of the examples for i-th bag
- Type:
array-line, shape (bag_size, n_bags)
- generate_bags(X, y)[source]¶
Create bags of examples simulating prior probability shift
Two different methods are implemented: - ‘Uniform’ : the prevalences are uniformly distributed - ‘Dirichlet’ : the prevalences are generated using the Dirichlet distribution
- Parameters:
X (array-like, shape (n_examples, n_features)) – Data
y (array-like, shape (n_examples, )) – True classes
- Returns:
prevalences (numpy array, shape (n_bags, n_classes)) – Each row contains the prevalences of the corresponding bag
indexes (numpy array, shape (size_bags, n_bags)) – Each column contains the indexes of the examples of the bag
- Raises:
ValueError – When random_state is neither a int nor a RandomState object, when the selected method is not implemented or when the parameters for the selected method are incorrect