lale.lib.rasl.target_encoder module¶

class lale.lib.rasl.target_encoder.TargetEncoder(*, verbose=0, cols=None, drop_invariant=False, return_df=True, handle_missing='value', handle_unknown='value', min_samples_leaf=1, smoothing=1.0, classes=None)¶

Bases: PlannedIndividualOp

Relational algebra reimplementation of scikit-learn contrib’s TargetEncoder transformer.

This documentation is auto-generated from JSON schemas.

Works on both pandas and Spark dataframes by using Aggregate for fit and Map for transform, which in turn use the appropriate backend.

Parameters

verbose (integer, not for optimizer, default 0) – Verbosity of the output, 0 for none.
cols (union type, not for optimizer, default None) –
Columns to encode.
- None
  
  All string columns will be encoded.
- or array of items : string
drop_invariant (False, not for optimizer, default False) – This implementation only supports drop_invariant=False.
return_df (True, not for optimizer, default True) – This implementation returns a pandas or spark dataframe if the input is a pandas or spark dataframe, respectively.
handle_missing ('value', not for optimizer, default 'value') – This implementation only supports handle_missing=’value’.
handle_unknown ('value', not for optimizer, default 'value') – This implementation only supports handle_unknown=’value’.
min_samples_leaf (integer, >=1, <=10 for optimizer, not for optimizer, default 1) – For regularization the weighted average between category mean and global mean is taken. The weight is an S-shaped curve between 0 and 1 with the number of samples for a category on the x-axis. The curve reaches 0.5 at min_samples_leaf. (parameter k in the original paper)
smoothing (float, >0.0, <=10.0 for optimizer, not for optimizer, default 1.0) – Smoothing effect to balance categorical average vs prior. Higher value means stronger regularization. The value must be strictly bigger than 0. Higher values mean a flatter S-curve (see min_samples_leaf).
classes (union type, optional, not for optimizer, default None) –
- None
  
  Regression task.
- or array, >=2 items of items : float
  
  Classification task with numeric labels.
- or array, >=2 items of items : string
  
  Classification task with string labels.
- or array, >=2 items of items : boolean
  
  Classification task with Boolean labels.

fit(X, y=None, **fit_params)¶

Train the operator.

Note: The fit method is not available until this operator is trainable.

Once this method is available, it will have the following signature:

Parameters

X (array) –
Features; the outer array is over samples.
- items : array
  - items : union type
    
    float
    
    or string
y (union type) –
Target class labels; the array is over samples.
- array of items : float
- or array of items : string

partial_fit(X, y=None, **fit_params)¶

Incremental fit to train train the operator on a batch of samples.

Note: The partial_fit method is not available until this operator is trainable.

Once this method is available, it will have the following signature:

transform(X, y=None)¶

Transform the data.

Note: The transform method is not available until this operator is trained.

Once this method is available, it will have the following signature:

Parameters

X (array) –

Features; the outer array is over samples.

items : array
- items : union type
  float
  
  or string

Returns

result

Return type

array of items : array of items : float