quantificationlib.bag_generator module

Classes and functions for generating bags of examples and distributions of bags with different kind of drifts

class BagGenerator[source]

Bases: object

Base class for bag generator classes

generate_bags(X, y)[source]
class CovariateShift_BagGenerator(n_bags=1001, bag_size=None, random_state=2032, verbose=0)[source]

Bases: BagGenerator

Generate bags with covariate shift

The idea is to pick just an instance from X and then randomly selecting the examples of the bag according to their distance to said instance

Parameters:
  • n_bags (int, (default=1000)) – Number of bags

  • bag_size (int, (default=None)) – Number of examples in each bag

  • random_state (int, RandomState instance, (default=2032)) – To generate random numbers. If type(random_state) is int, random_state is the seed used by the random number generator; If random_state is a RandomState instance, random_state is the own random number generator;

  • verbose (int, optional, (default=0)) – The verbosity level. The default value, zero, means silent mode

n_bags

Number of bags

Type:

int

bag_size

Number of examples in each bag

Type:

int

random_state

To generate random numbers

Type:

int, RandomState instance

verbose

The verbosity level

Type:

int, optional

prevalences_

i-th row contains the true prevalences of each generated bag

Type:

array-like, shape (n_classes, n_bags)

indexes_

i-th column contains the indexes of the examples for i-th bag

Type:

array-line, shape (bag_size, n_bags)

generate_bags(X, y)[source]

Create bags of examples simulating covariate shift

The method first picks a center example for each bag. The probability to select an example for the bag is proportional to the distance to the centrr example.

Parameters:
  • X (array-like, shape (n_examples, n_features)) – Data

  • y (array-like, shape (n_examples, )) – True classes

Returns:

  • prevalences (numpy array, shape (n_bags, n_classes)) – Each row contains the prevalences of the corresponding bag

  • indexes (numpy array, shape (size_bags, n_bags)) – Each column contains the indexes of the examples of the bag

Raises:

ValueError – When random_state is neither a int nor a RandomState object

class PriorAndCovariateShift_BagGenerator(n_bags=1000, bag_size=None, method='Uniform', alphas=None, min_prevalence=None, random_state=2032, verbose=0)[source]

Bases: BagGenerator

Generate bags with a mix of prior probability shift and covariate shfit

This class generates the bags using two objects of the classes PriorShift_BagGenerator and CovariateShift_BagGenerator

Parameters:
  • n_bags (int, (default=1000)) – Number of bags

  • bag_size (int, (default=None)) – Number of examples in each bag

  • method (str, (default='Uniform')) –

    Method used to generate the distributions. Two methods available:

    • ’Uniform’ : the prevalences are uniformly distributed

    • ’Dirichlet’ : the prevalences are generated using the Dirichlet distribution

  • alphas (None, float or array-like, (default=None), shape (n_classes, ) when it is an array) – The parameters for the Dirichlet distribution when the selected method is ‘Dirichlet’

  • min_prevalence (None, float or array-like, (default=None)) – The min prevalence for each class. If None the min prevalence will be 0. If just a single value is passed all classes have the same min_prevalence value. This parameter is only used when ‘Uniform’ method is selected

  • random_state (int, RandomState instance, (default=2032)) – To generate random numbers. If type(random_state) is int, random_state is the seed used by the random number generator; If random_state is a RandomState instance, random_state is the own random number generator;

  • verbose (int, optional, (default=0)) – The verbosity level. The default value, zero, means silent mode

n_bags

Number of bags

Type:

int

bag_size

Number of examples in each bag

Type:

int

min_prevalence

The min prevalence for each class.

Type:

None, float or array-like

random_state

To generate random numbers

Type:

int, RandomState instance

verbose

The verbosity level

Type:

int, optional

prevalences_

i-th row contains the true prevalences of each generated bag

Type:

array-like, shape (n_classes, n_bags)

indexes_

i-th column contains the indexes of the examples for i-th bag

Type:

array-line, shape (bag_size, n_bags)

generate_bags(X, y)[source]

Create bags of examples simulating prior probability shift and covariate shift. It uses instances of classes PriorShift_BagGenerator and CovariateShift_BagGenerator

Parameters:
  • X (array-like, shape (n_examples, n_features)) – Data

  • y (array-like, shape (n_examples, )) – True classes

Returns:

  • prevalences (numpy array, shape (n_bags, n_classes)) – Each row contains the prevalences of the corresponding bag

  • indexes (numpy array, shape (size_bags, n_bags)) – Each column contains the indexes of the examples of the bag

Raises:

ValueError – When random_state is neither a int nor a RandomState object

class PriorShift_BagGenerator(n_bags=1000, bag_size=None, method='Uniform', alphas=None, min_prevalence=None, random_state=2032, verbose=0)[source]

Bases: BagGenerator

Generate bags with prior probability shift

Parameters:
  • n_bags (int, (default=1000)) – Number of bags

  • bag_size (int, (default=None)) – Number of examples in each bag

  • method (str, (default='Uniform')) –

    Method used to generate the distributions. Two methods available:

    • ’Uniform’ : the prevalences are uniformly distributed

    • ’Dirichlet’ : the prevalences are generated using the Dirichlet distribution

  • alphas (None, float or array-like, (default=None), shape (n_classes, ) when it is an array) – The parameters for the Dirichlet distribution when the selected method is ‘Dirichlet’

  • min_prevalence (None, float or array-like, (default=None)) – The min prevalence for each class. If None the min prevalence will be 0. If just a single value is passed all classes have the same min_prevalence value. This parameter is only used when ‘Uniform’ method is selected

  • random_state (int, RandomState instance, (default=2032)) – To generate random numbers. If type(random_state) is int, random_state is the seed used by the random number generator; If random_state is a RandomState instance, random_state is the own random number generator;

  • verbose (int, optional, (default=0)) – The verbosity level. The default value, zero, means silent mode

n_bags

Number of bags

Type:

int

bag_size

Number of examples in each bag

Type:

int

method

Method used to generate the prevalences

Type:

str

alphas

Parameters of the Dirichlet distribution

Type:

None, float or array-like

min_prevalence

The min prevalence for each class

Type:

None, float or array-like

random_state

To generate random numbers

Type:

int, RandomState instance

verbose

The verbosity level

Type:

int, optional

prevalences_

i-th row contains the true prevalences of each generated bag

Type:

array-like, shape (n_classes, n_bags)

indexes_

i-th column contains the indexes of the examples for i-th bag

Type:

array-line, shape (bag_size, n_bags)

generate_bags(X, y)[source]

Create bags of examples simulating prior probability shift

Two different methods are implemented: - ‘Uniform’ : the prevalences are uniformly distributed - ‘Dirichlet’ : the prevalences are generated using the Dirichlet distribution

Parameters:
  • X (array-like, shape (n_examples, n_features)) – Data

  • y (array-like, shape (n_examples, )) – True classes

Returns:

  • prevalences (numpy array, shape (n_bags, n_classes)) – Each row contains the prevalences of the corresponding bag

  • indexes (numpy array, shape (size_bags, n_bags)) – Each column contains the indexes of the examples of the bag

Raises:

ValueError – When random_state is neither a int nor a RandomState object, when the selected method is not implemented or when the parameters for the selected method are incorrect