quantificationlib.estimators.frank_and_hall module

Estimators based on Frank and Hall decomposition

class FHLabelBinarizer(neg_label=0, pos_label=1)[source]

Bases: LabelBinarizer

Binarize labels in a Frank and Hall decomposition

This type of decomposition works as follows. For instance, in a ordinal classification problem with classes ranging from 1-star to 5-star, Frank and Hall (FH) decompositon trains 4 binary classifiers: 1 vs 2-3-4-5, 1-2 vs 3-4-5, 1-2-3 vs 4-5, 1-2-3-4 vs 5 and combines their predictions.

To train all these binary classifiers, one needs to convert the original ordinal labels to binary labels for each of the binary problems of the Frank and Hall decomposition. FHLabelBinarizer makes this process easy using the transform method.

Parameters:
  • neg_label (int (default: 0)) – Value with which negative labels must be encoded.

  • pos_label (int (default: 1)) – Value with which positive labels must be encoded.

  • sparse_output (boolean (default: False)) – True if the returned array from transform is desired to be in sparse CSR format.

set_inverse_transform_request(*, threshold='$UNCHANGED$')

Request metadata passed to the inverse_transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to inverse_transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to inverse_transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
  • threshold (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for threshold parameter in inverse_transform.

  • self (FHLabelBinarizer) –

Returns:

self – The updated object.

Return type:

object

transform(y)[source]

Transform ordinal labels to the Frank and Hall binary labels

Parameters:

y (array, (n_samples,)) – Class labels for a set of examples

Returns:

y_bin_fh – Each column contains the binary labels for the consecutive binary problems of a Frank and Hall decomposition from left to right. For instance, in a 4-class problem, each column corresponds to the following problems:

1st column: 1 vs 2-3-4

2nd column: 1-2 vs 3-4

3rd column: 1-2-3 vs 4

4ht column: (not really used)

Return type:

array, (n_samples, n_classes)

class FrankAndHallClassifier(estimator=None, n_jobs=None, verbose=0, params_fit=None)[source]

Bases: BaseEstimator, ClassifierMixin

Ordinal Classifier following Frank and Hall binary decomposition

This type of decomposition works as follows. For instance, in a ordinal classification problem with classes ranging from 1-star to 5-star, Frank and Hall (FH) decompositon trains 4 binary classifiers: 1 vs 2-3-4-5, 1-2 vs 3-4-5, 1-2-3 vs 4-5, 1-2-3-4 vs 5 and combines their predictions.

Parameters:
  • estimator (estimator object (default=None)) – An estimator object implementing fit and one of predict or predict_proba. It is the base estimator used to learn the set of binary classifiers

  • n_jobs (int or None, optional (default=None)) – The number of jobs to use for the computation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors

  • params_fit (list of dictionaries with parameters for each binary estimator, optional) –

    Example: 5 classes/4 binary estimators:

    params_fit = [{‘C’:0.0001} , {‘C’:0.000001}, {‘C’:0.000001}, {‘C’:0.01}]

  • verbose (int, optional, (default=0)) – The verbosity level. The default value, zero, means silent mode

estimator

The base estimator used to build the FH decomposition

Type:

estimator object

n_jobs

The number of jobs to use for the computation.

Type:

int or None,

params_fit

It has the parameters for each binary estimator

Type:

list of dictionaries

verbose

The verbosity level. The default value, zero, means silent mode

Type:

int

classes_

Class labels

Type:

ndarray, shape (n_classes, )

estimators_
List of binary estimators following the same order of the Frank and Hall decomposition:

estimators_[0] -> 1 vs 2-3-4-5, estimators_[1] -> 1-2 vs 3-4-5, …

Type:

ndarray, shape(n_classes-1,)

label_binarizer_

Object used to transform multiclass labels to binary labels and vice-versa

Type:

FHLabelBinarizer object

References

Eibe Frank and Mark Hall. 2001. A simple approach to ordinal classification. In Proceedings of the European Conference on Machine Learning. Springer, 145156.

fit(X, y)[source]

Fits the set of estimators for the training set following the Frank and Hall decomposition

It learns a list of binary estimators following the same order of the Frank and Hall decomposition:

estimators_[0] -> 1 vs 2-3-4-5, estimators_[1] -> 1-2 vs 3-4-5, …

The left group of each classifier ({1}, {1,2}, …) is the positive class

Parameters:
  • X ((sparse) array-like, shape (n_examples, n_features)) – Data

  • y ((sparse) array-like, shape (n_examples, )) – True classes

Raises:

ValueError – When estimator is None

predict(X)[source]

Predict the class for each testing example

The method computes the probability of each class (using predict_proba) and returns the class with highest probability

Parameters:

X ((sparse) array-like, shape (n_examples, n_features)) – Data

Return type:

An array, shape(n_examples, ) with the predicted class for each example

Raises:

NotFittedError – When the estimators are not fitted yet

predict_proba(X)[source]

Predict the class probabilities for each example following the original rule proposed by Frank & Hall

If the classes are c_1 to c_k:

Pr(y = c_1) = Pr (y <= c_1)
Pr(y = c_i) = Pr(y > c_i−1)  x (1 − Pr(y > c_i)) ; 1 < i < k
Pr(y = c_k) = Pr(y > c_k−1)

Notice that :  sum_{i=1}^{i=k} Pr(c_i) \neq 1

Example with 5 classes:

We have 4 binary estimators that return two probabilities:
the probability of the left group and the probability of the right group,
denoted as e_i.left and e_i.right respectively,
in which i is the number of the estimator 1<=i<k

Estimator 0:    c1  |   c2, c3, c4, c5          e1.left | e1.right
Estimator 2:    c1, c2  |   c3, c4, c5          e2.left | e2.right
Estimator 3:    c1, c2, c3  |   c4, c5          e3.left | e3.right
Estimator 4:    c1, c2, c3  c4  |   c5          e4.left | e4.right

Pr(y = c_1) = e1.left
Pr(y = c_2) = e1.right x e2.left
Pr(y = c_3) = e2.right x e3.left
Pr(y = c_4) = e3.right x e4.left
Pr(y = c_5) = e4.right
Parameters:

X ((sparse) array-like, shape (n_examples, n_features)) – Data

Return type:

An array, shape(n_examples, n_classes) with the class probabilities for each example

Raises:

NotFittedError – When the estimators are not fitted yet

set_score_request(*, sample_weight='$UNCHANGED$')

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (FrankAndHallClassifier) –

Returns:

self – The updated object.

Return type:

object

class FrankAndHallMonotoneClassifier(estimator=None, n_jobs=None, verbose=0, params_fit=None)[source]

Bases: FrankAndHallClassifier

Ordinal Classifier following Frank and Hall binary decomposition but returning consistent probabilities

This type of decomposition works as follows. For instance, in a ordinal classification problem with classes ranging from 1-star to 5-star, Frank and Hall (FH) decompositon trains 4 binary classifiers: 1 vs 2-3-4-5, 1-2 vs 3-4-5, 1-2-3 vs 4-5, 1-2-3-4 vs 5 and combines their predictions.

The difference with FrankAndHallClassifier is that the original method devised by Frank & Hall was intented just for crips predictions. The computed probabilities for all classes may be not consistent (their sum is not 1 in many cases)

Following (Destercke, Yang, 2014) this class computes the upper (adjusting from left to right) and the lower (from right to left) cumulative probabilities for each group of classes. These sets of values are monotonically increasing (from left to right) and monotonically decreasing (from right to left), respectively. The final probability assigned to each group is the average of both values, and the probality of each class is computed as:

Pr({y_k}) = Pr({y_1,…,y_k}) - Pr({y_1,…,y_k-1})

Parameters:
  • estimator (estimator object (default=None)) – An estimator object implementing fit and one of predict or predict_proba. It is the base estimator used to learn the set of binary classifiers

  • n_jobs (int or None, optional (default=None)) – The number of jobs to use for the computation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors

  • params_fit (list of dictionaries with parameters for each binary estimator, optional) –

    Example: 5 classes/4 binary estimators:

    params_fit = [{‘C’:0.0001} , {‘C’:0.000001}, {‘C’:0.000001}, {‘C’:0.01}]

  • verbose (int, optional, (default=0)) – The verbosity level. The default value, zero, means silent mode

estimator

The base estimator used to build the FH decomposition

Type:

estimator object

n_jobs

The number of jobs to use for the computation.

Type:

int or None,

verbose

The verbosity level. The default value, zero, means silent mode

Type:

int

params_fit

It has the parameters for each binary estimator (not used in this class)

Type:

list of dictionaries

classes_

Class labels

Type:

ndarray, shape (n_classes, )

estimators_
List of binary estimators following the same order of the Frank and Hall decomposition:

estimators_[0] -> 1 vs 2-3-4-5, estimators_[1] -> 1-2 vs 3-4-5, …

Type:

ndarray, shape(n_classes-1,)

label_binarizer_

Object used to transform multiclass labels to binary labels and vice-versa

Type:

FHLabelBinarizer object

References

Sébastien Destercke, Gen Yang. Cautious Ordinal Classification by Binary Decomposition. Machine Learning and Knowledge Discovery in Databases - European Conference ECML/PKDD, Sep 2014, Nancy, France. pp.323 - 337, 2014,

fit(X, y)[source]

Fits the set of estimators for the training set following the Frank and Hall decomposition

It learns a list of binary estimators following the same order of the Frank and Hall decomposition:

estimators_[0] -> 1 vs 2-3-4-5, estimators_[1] -> 1-2 vs 3-4-5, …

The left group of each classifier ({1}, {1,2}, …) is the positive class

Parameters:
  • X ((sparse) array-like, shape (n_examples, n_features)) – Data

  • y ((sparse) array-like, shape (n_examples, )) – True classes

Raises:

ValueError – When estimator is None

predict(X)[source]

Predict the class for each testing example

The method computes the probability of each class (using predict_proba) and returns the class with highest probability

Parameters:

X ((sparse) array-like, shape (n_examples, n_features)) – Data

Return type:

An array, shape(n_examples, ) with the predicted class for each example

Raises:

NotFittedError – When the estimators are not fitted yet

predict_proba(X)[source]

Predict the class probabilities for each example following a new rule (different from the original one proposed by Frank & Hall)

To obtain consistent probabilities, we need to ensure that the aggregated consecutive probabilities do not decrease.

Example:

Classifier 1 vs 2-3-4   Pr({1}) = 0.3
Classifier 1-2 vs 3-4   Pr({1,2}) = 0.2
Classifier 1-2-3 vs 4   Pr({1,2,3}) = 0.6

This is inconsistent. Following (Destercke and Yang, 2014) the method computes the upper (adjusting from left to right) and the lower (from right to left) cumulative probabilities. These sets of values are monotonically increasing (from left to right) and monotonically decreasing (from right to left), respectively. The average value is assigned to each group and the probability for each class is computed as:

Pr({y_k}) = Pr({y_1,…,y_k}) - Pr({y_1,…,y_k-1})

Example:

{1}   {1-2}  {1-2-3}
0.3   0.3    0.6    Upper cumulative probabilities (adjusting from left to right)
0.2   0.2    0.6    Lower cumulative probabilities (adjusting from right to left)
----------------
0.25  0.25   0.6    Averaged probability

Pr({1}) = 0.25
Pr({2}) = Pr({1,2}) - Pr({1}) = 0.25 - 0 .25 = 0
Pr({3}) = Pr({1,2,3}} - Pr({1,2}) = 0.6 - 0.25 = 0.35

The last class is computed as 1 - the sum of the probabilities for the rest of classes

Pr({4}) = 1 - Pr({1,2,3}} = 1 - 0.6 = 0.4
Parameters:

X ((sparse) array-like, shape (n_examples, n_features)) – Data

Return type:

An array, shape(n_examples, n_classes) with the class probabilities for each example

Raises:

NotFittedError – When the estimators are not fitted yet

set_score_request(*, sample_weight='$UNCHANGED$')

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (FrankAndHallMonotoneClassifier) –

Returns:

self – The updated object.

Return type:

object

class FrankAndHallTreeClassifier(estimator=None, n_jobs=None, verbose=0, performance_measure=<function binary_kld>, params_fit=None)[source]

Bases: FrankAndHallClassifier

Ordinal Classifier following Frank and Hall binary decomposition but organizing the binary models in a tree to compute the predictions

This type of decomposition works as follows. For instance, in a ordinal classification problem with classes ranging from 1-star to 5-star, Frank and Hall (FH) decompositon trains 4 binary classifiers: 1 vs 2-3-4-5, 1-2 vs 3-4-5, 1-2-3 vs 4-5, 1-2-3-4 vs 5 and combines their predictions.

The difference with FrankAndHallClassifier is that the original method devised by Frank & Hall computes the probability of each class applying the binary models from left to right: 1 vs 2-3-4-5, 1-2 vs 3-4-5, and so on. This classifier is based on the method proposed by (San Martino, Gao and Sebastiani, 2016). The idea is to build a binary tree with the binary models of the Frank and Hall decomposition, selecting at each point of the tree the best possible model according to their quantification performance (applying PCC algorithm with each binary classifier and using the KLD as performance measure).

Example:

                       1-2-3 vs 4-5

      1 vs 2-3-4-5                            1-2-3-4 vs 5

1                  1-2 vs 3-4-5           4                5

                2                 3
Parameters:
  • estimator (estimator object (default=None)) – An estimator object implementing fit and one of predict or predict_proba. It is the base estimator used to learn the set of binary classifiers

  • n_jobs (int or None, optional (default=None)) – The number of jobs to use for the computation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors

  • performance_measure (a binary quantification performance measure, (default=binary_kld)) – The binary quantification performance measure used to estimate the goodness of each binary classifier used as quantifier

  • params_fit (list of dictionaries with parameters for each binary estimator, optional) –

    Example: 5 classes/4 binary estimators:

    params_fit = [{‘C’:0.0001} , {‘C’:0.000001}, {‘C’:0.000001}, {‘C’:0.01}]

  • verbose (int, optional, (default=0)) – The verbosity level. The default value, zero, means silent mode

estimator

The base estimator used to build the FH decomposition

Type:

estimator object

n_jobs

The number of jobs to use for the computation.

Type:

int or None,

performance_measure

The binary quantification performance measure used to estimate the goodness of each binary classifier used as quantifier

Type:

str, or any binary quantification performance measure

verbose

The verbosity level. The default value, zero, means silent mode

Type:

int

params_fit

It has the parameters for each binary estimator

Type:

list of dictionaries

classes_

Class labels

Type:

ndarray, shape (n_classes, )

estimators_
List of binary estimators following the same order of the Frank and Hall decomposition:

estimators_[0] -> 1 vs 2-3-4-5, estimators_[1] -> 1-2 vs 3-4-5, …

Type:

ndarray, shape(n_classes-1,)

label_binarizer_

Object used to transform multiclass labels to binary labels and vice-versa

Type:

FHLabelBinarizer object

tree_

A tree with the binary classifiers ordered by their quantification performance (using KLD or other measure)

Type:

A tree

References

Giovanni Da San Martino, Wei Gao, and Fabrizio Sebastiani. 2016a. Ordinal text quantification. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval. 937940.

Giovanni Da San Martino,Wei Gao, and Fabrizio Sebastiani. 2016b. QCRI at SemEval-2016 Task 4: Probabilistic methods for binary and ordinal quantification. In Proceedings of the 10th InternationalWorkshop on Semantic Evaluation (SemEval’16). Association for Computational Linguistics, A, 5863.

fit(X, y)[source]

Fits the set of estimators for the training set following the Frank and Hall decomposition and builds the binary tree to organize such estimators

Parameters:
  • X ((sparse) array-like, shape (n_examples, n_features)) – Data

  • y ((sparse) array-like, shape (n_examples, )) – True classes

Raises:

ValueError – When estimator is None

predict(X)[source]

Predict the class for each testing example

The method computes the probability of each class (using predict_proba) and returns the class with highest probability

Parameters:

X ((sparse) array-like, shape (n_examples, n_features)) – Data

Return type:

An array, shape(n_examples, ) with the predicted class for each example

Raises:

NotFittedError – When the estimators are not fitted yet

predict_proba(X)[source]

Predict the class probabilities for each example applying the binary tree of models

Example:

                            1-2-3 vs 4-5

        1 vs 2-3-4-5                            1-2-3-4 vs 5

     1                1-2 vs 3-4-5           4                5

                    2              3

Imagine that for a given example the probabily returned by each model are the following
(the models return the probability of the left group of classes):

Pr({1,2,3}) = 0.2
Pr({1}) = 0.9
Pr({1,2,3,4}) = 0.7
Pr({1,2}) = 0.4

with tha values, the probability for each class will be:

Pr({1}) = Pr({1,2,3}) * Pr({1}) = 0.2 * 0.9 = 0.18
Pr({2}) = Pr({1,2,3}) * (1-Pr({1})) * Pr({1,2}) = 0.2 * 0.1 * 0.4 = 0.008
Pr({3}) = Pr({1,2,3}) * (1-Pr({1})) * (1-Pr({1,2})) = 0.2 * 0.1 * 0.6 = 0.012
Pr({4}) = (1-Pr({1,2,3}) * Pr{1,2,3,4}) = 0.8 * 0.7 = 0.56
Pr({5}) = (1-Pr({1,2,3}) * (1-Pr{1,2,3,4})) = 0.8 * 0.3 = 0.24
Parameters:

X ((sparse) array-like, shape (n_examples, n_features)) – Data

Return type:

An array, shape(n_examples, n_classes) with the class probabilities for each example

Raises:

NotFittedError – When the estimators are not fitted yet

set_score_request(*, sample_weight='$UNCHANGED$')

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (FrankAndHallTreeClassifier) –

Returns:

self – The updated object.

Return type:

object

class QTree(fhtree=None, pos_estimator=0, left=None, right=None)[source]

Bases: object

Auxiliar class to represent the binary trees needed by FrankAndHallTreeClassifier

Parameters:
  • fhqtree (FrankAndHallTreeClassifier object (default=None)) –

  • pos_estimator (int, (default=0)) – Index of the estimator in the order defined by the Frank and Hall decomposition: 1 vs 2-3-4-5, 1-2 vs 3-4-5 and so on.

  • left (a QTree object (default=None)) – Left subTree of this node

  • right (a QTree object (default=None)) – Right subTree of this node

is_leaf()[source]

Check whether it is a leaf or not