lale.lib.rasl.target_encoder module¶
- class lale.lib.rasl.target_encoder.TargetEncoder(*, verbose=0, cols=None, drop_invariant=False, return_df=True, handle_missing='value', handle_unknown='value', min_samples_leaf=1, smoothing=1.0, classes=None)¶
Bases:
PlannedIndividualOp
Relational algebra reimplementation of scikit-learn contrib’s TargetEncoder transformer.
This documentation is auto-generated from JSON schemas.
Works on both pandas and Spark dataframes by using Aggregate for fit and Map for transform, which in turn use the appropriate backend.
- Parameters
verbose (integer, not for optimizer, default 0) – Verbosity of the output, 0 for none.
cols (union type, not for optimizer, default None) –
Columns to encode.
None
All string columns will be encoded.
or array of items : string
drop_invariant (False, not for optimizer, default False) – This implementation only supports drop_invariant=False.
return_df (True, not for optimizer, default True) – This implementation returns a pandas or spark dataframe if the input is a pandas or spark dataframe, respectively.
handle_missing ('value', not for optimizer, default 'value') – This implementation only supports handle_missing=’value’.
handle_unknown ('value', not for optimizer, default 'value') – This implementation only supports handle_unknown=’value’.
min_samples_leaf (integer, >=1, <=10 for optimizer, not for optimizer, default 1) – For regularization the weighted average between category mean and global mean is taken. The weight is an S-shaped curve between 0 and 1 with the number of samples for a category on the x-axis. The curve reaches 0.5 at min_samples_leaf. (parameter k in the original paper)
smoothing (float, >0.0, <=10.0 for optimizer, not for optimizer, default 1.0) – Smoothing effect to balance categorical average vs prior. Higher value means stronger regularization. The value must be strictly bigger than 0. Higher values mean a flatter S-curve (see min_samples_leaf).
classes (union type, optional, not for optimizer, default None) –
None
Regression task.
or array, >=2 items of items : float
Classification task with numeric labels.
or array, >=2 items of items : string
Classification task with string labels.
or array, >=2 items of items : boolean
Classification task with Boolean labels.
- fit(X, y=None, **fit_params)¶
Train the operator.
Note: The fit method is not available until this operator is trainable.
Once this method is available, it will have the following signature:
- Parameters
X (array) –
Features; the outer array is over samples.
items : array
items : union type
float
or string
y (union type) –
Target class labels; the array is over samples.
array of items : float
or array of items : string
- partial_fit(X, y=None, **fit_params)¶
Incremental fit to train train the operator on a batch of samples.
Note: The partial_fit method is not available until this operator is trainable.
Once this method is available, it will have the following signature:
- transform(X, y=None)¶
Transform the data.
Note: The transform method is not available until this operator is trained.
Once this method is available, it will have the following signature:
- Parameters
X (array) –
Features; the outer array is over samples.
items : array
items : union type
float
or string
- Returns
result
- Return type
array of items : array of items : float