lale.lib.aif360.orbis module¶

class lale.lib.aif360.orbis.Orbis(*, favorable_labels, protected_attributes, unfavorable_labels=None, estimator, redact=True, imbalance_repair_level=0.8, bias_repair_level=0.8, combine='keep_separate', sampling_strategy='mixed', replacement=False, n_jobs=1, random_state=None, k_neighbors=5)¶

Bases: PlannedIndividualOp

Experimental Orbis (Oversampling to Repair Bias and Imbalance Simultaneously) pre-estimator fairness mitigator.

This documentation is auto-generated from JSON schemas.

Work in progress and subject to change; only supports pandas DataFrame so far. Uses SMOTE and RandomUnderSampler to resample not only for repairing class imbalance, but also group bias. Internally, this works by replacing class labels by the cross product of classes and groups, then changing the sizes of the new intersections to achieve the desired repair levels. Unlike other mitigators in lale.lib.aif360, this mitigator does not come from AIF360.

Parameters

favorable_labels (array, >=1 items, not for optimizer) –
Label values which are considered favorable (i.e. “positive”).
- items : union type
  - float
    
    Numerical value.
  - or string
    
    Literal string value.
  - or boolean
    
    Boolean value.
  - or array, >=2 items, <=2 items of items : float
    
    Numeric range [a,b] from a to b inclusive.
protected_attributes (array, >=1 items, not for optimizer) –
Features for which fairness is desired.
- items : dict
  - feature : union type
    
    Column name or column index.
    
    string
    
    or integer
  - reference_group : array, >=1 items
    
    Values or ranges that indicate being a member of the privileged group.
    
    items : union type
    
    string
    
    Literal value.
    
    or float
    
    Numerical value.
    
    or array, >=2 items, <=2 items of items : float
    
    Numeric range [a,b] from a to b inclusive.
  - monitored_group : union type, default None
    
    Values or ranges that indicate being a member of the unprivileged group.
    
    None
    
    If monitored_group is not explicitly specified, consider any values not captured by reference_group as monitored.
    
    or array, >=1 items
    
    items : union type
    
    string
    
    Literal value.
    
    or float
    
    Numerical value.
    
    or array, >=2 items, <=2 items of items : float
    
    Numeric range [a,b] from a to b inclusive.
unfavorable_labels (union type, not for optimizer, default None) –
Label values which are considered unfavorable (i.e. “negative”).
- None
  
  If unfavorable_labels is not explicitly specified, consider any labels not captured by favorable_labels as unfavorable.
- or array, >=1 items
  - items : union type
    
    float
    
    Numerical value.
    
    or string
    
    Literal string value.
    
    or boolean
    
    Boolean value.
    
    or array, >=2 items, <=2 items of items : float
    
    Numeric range [a,b] from a to b inclusive.
estimator (operator, not for optimizer) – Nested classifier.
redact (boolean, optional, not for optimizer, default True) – Whether to redact protected attributes before data preparation (recommended) or not.
imbalance_repair_level (float, >=0.0, <=1.0, optional, default 0.8) –
How much to repair for class imbalance (0 means original imbalance, 1 means perfect balance).

See also constraint-1.
bias_repair_level (float, >=0.0, <=1.0, optional, default 0.8) –
How much to repair for group bias (0 means original bias, 1 means perfect fairness).

See also constraint-1.
combine (‘keep_separate’, ‘and’, ‘or’, or ‘error’, optional, not for optimizer, default ‘keep_separate’) – How to handle the case when there is more than one protected attribute.
sampling_strategy (‘under’, ‘over’, ‘mixed’, ‘minimum’, or ‘maximum’, optional, not for optimizer, default ‘mixed’) –
How to change the intersection sizes.
Possible choices are:
- 'under': under-sample large intersections to desired repair levels;
- 'over': over-sample small intersection to desired repair levels;
- 'mixed': mix under- with over-sampling while keeping sizes similar to original;
- 'minimum': under-sample everything to the size of the smallest intersection;
- 'maximum': over-sample everything to the size of the largest intersection.
See also constraint-1.
replacement (boolean, optional, not for optimizer, default False) – Whether under-sampling is with or without replacement.
n_jobs (integer, optional, not for optimizer, default 1) – The number of threads to open if possible.
random_state (union type, optional, not for optimizer, default None) –
Control the randomization of the algorithm.
- None
  
  RandomState used by np.random
- or integer
  
  The seed used by the random number generator
- or numpy.random.RandomState
  
  Random number generator instance.
k_neighbors (union type, optional, not for optimizer, default 5) –
Number of nearest neighbours to use to construct synthetic samples.
- integer
  
  Number of nearest neighbours to use to construct synthetic samples.
- or Any
  
  An estimator that inherits from sklearn.neighbors.base.KNeighborsMixin that will be used to find the n_neighbors.

Notes

constraint-1 : union type

When sampling_strategy is minimum or maximum, both repair levels must be 1.

sampling_strategy : negated type of ‘minimum’ or ‘maximum’

or dict

imbalance_repair_level : 1

bias_repair_level : 1

fit(X, y=None, **fit_params)¶

Train the operator.

Note: The fit method is not available until this operator is trainable.

Once this method is available, it will have the following signature:

Parameters

X (array) –
Features; the outer array is over samples.
- items : array
  - items : union type
    
    float
    
    or string
y (union type) –
Target class labels; the array is over samples.
- array of items : float
- or array of items : string

predict(X, **predict_params)¶

Make predictions.

Note: The predict method is not available until this operator is trained.

Once this method is available, it will have the following signature:

Parameters

X (array) –

Features; the outer array is over samples.

items : array
- items : union type
  float
  
  or string

Returns

result – Predicted class label per sample.

array of items : float
or array of items : string

Return type

union type

predict_proba(X)¶

Probability estimates for all classes.

Note: The predict_proba method is not available until this operator is trained.

Once this method is available, it will have the following signature:

Parameters

X (array) –

Features; the outer array is over samples.

items : array
- items : union type
  float
  
  or string

Returns

result – The class probabilities of the input samples

array of items : Any
or array of items : array of items : Any

Return type

union type