lale.lib.aif360.orbis module

class lale.lib.aif360.orbis.Orbis(*, favorable_labels, protected_attributes, unfavorable_labels=None, estimator, redact=True, imbalance_repair_level=0.8, bias_repair_level=0.8, combine='keep_separate', sampling_strategy='mixed', replacement=False, n_jobs=1, random_state=None, k_neighbors=5)

Bases: PlannedIndividualOp

Experimental Orbis (Oversampling to Repair Bias and Imbalance Simultaneously) pre-estimator fairness mitigator.

This documentation is auto-generated from JSON schemas.

Work in progress and subject to change; only supports pandas DataFrame so far. Uses SMOTE and RandomUnderSampler to resample not only for repairing class imbalance, but also group bias. Internally, this works by replacing class labels by the cross product of classes and groups, then changing the sizes of the new intersections to achieve the desired repair levels. Unlike other mitigators in lale.lib.aif360, this mitigator does not come from AIF360.

Parameters
  • favorable_labels (array, >=1 items, not for optimizer) –

    Label values which are considered favorable (i.e. “positive”).

    • items : union type

      • float

        Numerical value.

      • or string

        Literal string value.

      • or boolean

        Boolean value.

      • or array, >=2 items, <=2 items of items : float

        Numeric range [a,b] from a to b inclusive.

  • protected_attributes (array, >=1 items, not for optimizer) –

    Features for which fairness is desired.

    • items : dict

      • feature : union type

        Column name or column index.

        • string

        • or integer

      • reference_group : array, >=1 items

        Values or ranges that indicate being a member of the privileged group.

        • items : union type

          • string

            Literal value.

          • or float

            Numerical value.

          • or array, >=2 items, <=2 items of items : float

            Numeric range [a,b] from a to b inclusive.

      • monitored_group : union type, default None

        Values or ranges that indicate being a member of the unprivileged group.

        • None

          If monitored_group is not explicitly specified, consider any values not captured by reference_group as monitored.

        • or array, >=1 items

          • items : union type

            • string

              Literal value.

            • or float

              Numerical value.

            • or array, >=2 items, <=2 items of items : float

              Numeric range [a,b] from a to b inclusive.

  • unfavorable_labels (union type, not for optimizer, default None) –

    Label values which are considered unfavorable (i.e. “negative”).

    • None

      If unfavorable_labels is not explicitly specified, consider any labels not captured by favorable_labels as unfavorable.

    • or array, >=1 items

      • items : union type

        • float

          Numerical value.

        • or string

          Literal string value.

        • or boolean

          Boolean value.

        • or array, >=2 items, <=2 items of items : float

          Numeric range [a,b] from a to b inclusive.

  • estimator (operator, not for optimizer) – Nested classifier.

  • redact (boolean, optional, not for optimizer, default True) – Whether to redact protected attributes before data preparation (recommended) or not.

  • imbalance_repair_level (float, >=0.0, <=1.0, optional, default 0.8) –

    How much to repair for class imbalance (0 means original imbalance, 1 means perfect balance).

    See also constraint-1.

  • bias_repair_level (float, >=0.0, <=1.0, optional, default 0.8) –

    How much to repair for group bias (0 means original bias, 1 means perfect fairness).

    See also constraint-1.

  • combine (‘keep_separate’, ‘and’, ‘or’, or ‘error’, optional, not for optimizer, default ‘keep_separate’) – How to handle the case when there is more than one protected attribute.

  • sampling_strategy (‘under’, ‘over’, ‘mixed’, ‘minimum’, or ‘maximum’, optional, not for optimizer, default ‘mixed’) –

    How to change the intersection sizes.

    Possible choices are:

    • 'under': under-sample large intersections to desired repair levels;

    • 'over': over-sample small intersection to desired repair levels;

    • 'mixed': mix under- with over-sampling while keeping sizes similar to original;

    • 'minimum': under-sample everything to the size of the smallest intersection;

    • 'maximum': over-sample everything to the size of the largest intersection.

    See also constraint-1.

  • replacement (boolean, optional, not for optimizer, default False) – Whether under-sampling is with or without replacement.

  • n_jobs (integer, optional, not for optimizer, default 1) – The number of threads to open if possible.

  • random_state (union type, optional, not for optimizer, default None) –

    Control the randomization of the algorithm.

    • None

      RandomState used by np.random

    • or integer

      The seed used by the random number generator

    • or numpy.random.RandomState

      Random number generator instance.

  • k_neighbors (union type, optional, not for optimizer, default 5) –

    Number of nearest neighbours to use to construct synthetic samples.

    • integer

      Number of nearest neighbours to use to construct synthetic samples.

    • or Any

      An estimator that inherits from sklearn.neighbors.base.KNeighborsMixin that will be used to find the n_neighbors.

Notes

constraint-1 : union type

When sampling_strategy is minimum or maximum, both repair levels must be 1.

  • sampling_strategy : negated type of ‘minimum’ or ‘maximum’

  • or dict

    • imbalance_repair_level : 1

    • bias_repair_level : 1

fit(X, y=None, **fit_params)

Train the operator.

Note: The fit method is not available until this operator is trainable.

Once this method is available, it will have the following signature:

Parameters
  • X (array) –

    Features; the outer array is over samples.

    • items : array

      • items : union type

        • float

        • or string

  • y (union type) –

    Target class labels; the array is over samples.

    • array of items : float

    • or array of items : string

predict(X, **predict_params)

Make predictions.

Note: The predict method is not available until this operator is trained.

Once this method is available, it will have the following signature:

Parameters

X (array) –

Features; the outer array is over samples.

  • items : array

    • items : union type

      • float

      • or string

Returns

result – Predicted class label per sample.

  • array of items : float

  • or array of items : string

Return type

union type

predict_proba(X)

Probability estimates for all classes.

Note: The predict_proba method is not available until this operator is trained.

Once this method is available, it will have the following signature:

Parameters

X (array) –

Features; the outer array is over samples.

  • items : array

    • items : union type

      • float

      • or string

Returns

result – The class probabilities of the input samples

  • array of items : Any

  • or array of items : array of items : Any

Return type

union type