quantificationlib.estimators.ordinal_ddag module

Estimator based on DDAGs (Decision Directed Acyclic Graphs)

class DDAGClassifier(estimator, n_jobs=None, predict_method='full_probabilistic', verbose=0)[source]

Bases: OneVsOneClassifier

Decision Directed Acyclic Graph ordinal classifier

This strategy consists on learning a classifier per each pair of classes, thus it requires to fit n_classes * (n_classes - 1) / 2 classifiers. For this reason, this class derives from OneVsOneClassifier and uses most of its functionalities, mainly to train the binary models.

However, there are two main differences with respect to OneVsOneClassifier:

  1. This class is used for ordinal classification, it does not make sense to use it for multiclass classification

  2. The rule to make predictions is different. Here, the binary classifiers are arranged into a binary tree in which the classifier selected at each node is the one that deals with the two more distant remaining classes. Thus, the root contains the classifier that decides between the first class of the order and last class, and this idea is recursively applied.

Example:

In a ordinal classification problem with classes ranging from 1-star to 5-star,
the corresponding DDAG will be

                                          1|5

                     1|4                                       2|5

           1|3                 2|4                   2|4                  3|5

      1|2       2|3       2|3       3|4         2|3       3|4        3|4       4|5

    1     2   2     3   2     3   3     4     2     3   3     4    3     4   4     5

Since some subtrees are shared by different branches, for instance, the subtree labeled as node
2|4 is shared by the right subtree of 1|4 and the left subtree of 2|5, the tree can be depicted
in a more compact way:

                                          1|5

                                  1|4             2|5

                           1|3            2|4            3|5

                     1|2          2|3             3|4           4|5

                 1          2              3              4             5

in which all internal nodes (2|4, 2|3 and 3|4) and all leaves, except the first one, and
the last one (2, 3 and 4) are reached from different paths.

The class implements two different strategies to compute the probabilities for a given example:

‘full_probabilistic’

The probabilities computed by each node are propagated through the tree. For those leaves that can be reached following different paths (all except the leaves for the first and the last class), the probabilities are summed. With this method, all the classes may have a probability greater that 0.

Example: For a given example, the probability of the left class returned by each model is the following:

P(1|5) = 0.2
P(1|4) = 0.1
P(2|5) = 0.1
P(1|3) = 0.2
P(2|4) = 0.3
P(3|5) = 0.3
P(1|2) = 0.3
P(2|3) = 0.4
P(3|4) = 0.4
P(4|5) = 0.3

P(y=1) = P(1|5) * P(1|4) * P(1|3) * P(1|2) =
         0.2    * 0.1    * 0.2    * 0.3    = 0.0012

P(y=2) = P(1|5)     * P(1|4)     * P(1|3)     * (1-P(1|2)) +
         P(1|5)     * P(1|4)     * (1-P(1|3)) * P(2|3)     +
         P(1|5)     * (1-P(1|4)) * P(2|4)     * P(2|3)     +
         (1-P(1|5)) * P(2|5)     * P(2|4)     * P(2|3)     =
         0.2        * 0.1        * 0.2        * 0.7        +
         0.2        * 0.1        * 0.8        * 0.4        +
         0.2        * 0.9        * 0.3        * 0.4        +
         0.8        * 0.1        * 0.3        * 0.4        =
         0.0028 + 0.0064 + 0.0216 + 0.0096 = 0.0404

P(y=3) = P(1|5)     * P(1|4)     * (1-P(1|3)) * (1-P(2|3)) +
         P(1|5)     * (1-P(1|4)) * P(2|4))    * (1-P(2|3)) +
         P(1|5)     * (1-P(1|4)) * (1-P(2|4)) * P(3|4)     +
         (1-P(1|5)) * P(2|5)     * P(2|4))    * (1-P(2|3)) +
         (1-P(1|5)) * P(2|5)     * (1-P(2|4)) * P(3|4)     +
         (1-P(1|5)) * (1-P(2|5)) * P(3|5)     * P(3|4)     =
         0.2        * 0.1        * 0.8        * 0.6        +
         0.2        * 0.9        * 0.3        * 0.6        +
         0.2        * 0.9        * 0.7        * 0.4        +
         0.8        * 0.1        * 0.3        * 0.6        +
         0.8        * 0.1        * 0.7        * 0.4        +
         0.8        * 0.9        * 0.3        * 0.4        =
         0.0096 + 0.0324 + 0.0504 + 0.0144 + 0.0224 + 0.0864 = 0.2156

P(y=4) = P(1|5)     * (1-P(1|4)) * (1-P(2|4)) * (1-P(3|4)) +
         (1-P(1|5)) * P(2|5)     * (1-P(2|4)) * (1-P(3|4)) +
         (1-P(1|5)) * (1-P(2|5)) * P(3|5)     * (1-P(3|4)) +
         (1-P(1|5)) * (1-P(2|5)) * (1-P(3|5)) * P(4|5)     =
         0.2        * 0.9        * 0.7        * 0.6        + 0.0756
         0.8        * 0.1        * 0.7        * 0.6        + 0.0336
         0.8        * 0.9        * 0.3        * 0.6        + 0.1296
         0.8        * 0.9        * 0.7        * 0.3        =
         0.0756 + 0.0336 + 0.1296 + 0.1512 = 0.3900

P(y=5) = (1-P(1|5)) * (1-P(2|5)) * (1-P(3|5)) * (1-P(4|5)) =
         0.8        * 0.9        * 0.7        * 0.7        = 0.3528

Thus, the probabilities returned by `predict_proba` method would be:
        (0.0012, 0.0404, 02156, 0.3900, 0.3528)
‘winner_node’

Uses the probabilities of binary estimators to descent until the level previous to the leaves (it is like binarizing such probabilities to 0,1). Then, the method returns the probalities of such binary estimator for the two consecutive classes involved, and zero for the rest of classes.

Example: For a given example, the probability of the left class returned by each model is the following:

P(1|5) = 0.2
P(1|4) = 0.1
P(2|5) = 0.1
P(1|3) = 0.2
P(2|4) = 0.3
P(3|5) = 0.3
P(1|2) = 0.3
P(2|3) = 0.4
P(3|4) = 0.4
P(4|5) = 0.3

Taking binary decisions from the root, we reach the binary classifier 4|5, thus
the returned probabilities are (0, 0, 0, 0.3, 0.7)
Parameters:
  • estimator (estimator object (default=None)) – An estimator object implementing fit and one of predict or predict_proba. It is the base estimator used to learn the set of binary classifiers

  • n_jobs (int or None, optional (default=None)) – The number of jobs to use for the computation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors

  • predict_method (str, optional (default 'full_probabilistic')) –

    • ‘full_probabilistic’

    The probabilities computed by each node are propagated through the tree. For those leaves that can be reached following different paths (all except the leaves for the first and the last class), the probabilities are summed. With this method, all the classes may have a probability greater that 0.

    • ’winner_node’

    Uses the probabilities of binary estimators to descent until the level previous to the leaves (it is like binarizing such probabilities to 0,1). Then, the method returns the probalities of such binary estimator for the two consecutive classes involved, and zero for the rest of classes.

  • verbose (int, optional, (default=0)) – The verbosity level. The default value, zero, means silent mode

estimator

An estimator object implementing fit and predict_proba.

Type:

estimator object

n_jobs

The number of jobs to use for the computation.

Type:

int or None,

predict_method

The method used by predict_proba to compute the class probabilities of a given example

Type:

str

verbose

The verbosity level.

Type:

int,

estimators_

Estimators used for predictions.

Type:

list of n_classes * (n_classes - 1) / 2 estimators

classes_

Array containing labels.

Type:

numpy array of shape [n_classes]

References

José Ramón Quevedo, Elena Montañés, Óscar Luaces, Juan José del Coz: Adapting decision DAGs for multipartite ranking. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 115-130). Springer, Berlin, Heidelberg.

predict(X)[source]

Predict the class for each testing example

The method computes the probability of each class (using predict_proba) and returns the class with highest probability

Parameters:

X ((sparse) array-like, shape (n_examples, n_features)) – Data

Return type:

An array, shape(n_examples, ) with the predicted class for each example

Raises:

NotFittedError – When the estimators are not fitted yet

predict_proba(X)[source]

Predict the class probabilities for each example

Two different methods are implemented depending on the value of predict_method attribute

‘full_probabilistic’

The probabilities computed by each node are propagated through the tree. For those leaves that can be reached following different paths (all except the leaves for the first and the last class), the probabilities are summed. With this method, all the classes may have a probability greater that 0.

‘winner_node’

Uses the probabilities of binary estimators to descent until the level previous to the leaves (it is like binarizing such probabilities to 0,1). Then, the method returns the probalities of such binary estimator for the two consecutive classes involved, and zero for the rest of classes.

The method uses a recursive auxiliar method to compute the class probabilities

Parameters:

X ((sparse) array-like, shape (n_examples, n_features)) – Data

Return type:

An array, shape(n_examples, n_classes) with the class probabilities for each example

Raises:

NotFittedError – When the estimators are not fitted yet

set_partial_fit_request(*, classes='$UNCHANGED$')

Request metadata passed to the partial_fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to partial_fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to partial_fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
  • classes (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for classes parameter in partial_fit.

  • self (DDAGClassifier) –

Returns:

self – The updated object.

Return type:

object

set_score_request(*, sample_weight='$UNCHANGED$')

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (DDAGClassifier) –

Returns:

self – The updated object.

Return type:

object