lale.lib.category_encoders.target_encoder module¶

class lale.lib.category_encoders.target_encoder.TargetEncoder(*, verbose=0, cols=None, drop_invariant=False, return_df=True, handle_missing='value', handle_unknown='value', min_samples_leaf=1, smoothing=1.0)¶

Bases: PlannedIndividualOp

Target encoder transformer from scikit-learn contrib that encodes categorical features as numbers.

This documentation is auto-generated from JSON schemas.

Parameters

verbose (integer, not for optimizer, default 0) – Verbosity of the output, 0 for none.
cols (union type, not for optimizer, default None) –
Columns to encode.
- None
  
  All string columns will be encoded.
- or array of items : string
drop_invariant (boolean, not for optimizer, default False) – Whether to drop columns with 0 variance.
return_df (boolean, not for optimizer, default True) – Whether to return a pandas DataFrame from transform (otherwise it will be a numpy array).
handle_missing (‘error’, ‘return_nan’, or ‘value’, not for optimizer, default ‘value’) – Given ‘value’, return the target mean.
handle_unknown (‘error’, ‘return_nan’, or ‘value’, not for optimizer, default ‘value’) – Given ‘value’, return the target mean.
min_samples_leaf (integer, >=1, <=10 for optimizer, not for optimizer, default 1) – For regularization the weighted average between category mean and global mean is taken. The weight is an S-shaped curve between 0 and 1 with the number of samples for a category on the x-axis. The curve reaches 0.5 at min_samples_leaf. (parameter k in the original paper)
smoothing (float, >0.0, <=10.0 for optimizer, not for optimizer, default 1.0) – Smoothing effect to balance categorical average vs prior. Higher value means stronger regularization. The value must be strictly bigger than 0. Higher values mean a flatter S-curve (see min_samples_leaf).

fit(X, y=None, **fit_params)¶

Train the operator.

Note: The fit method is not available until this operator is trainable.

Once this method is available, it will have the following signature:

Parameters

X (array) –
Features; the outer array is over samples.
- items : array
  - items : union type
    
    float
    
    or string
y (union type) –
Target class labels; the array is over samples.
- array of items : float
- or array of items : string

transform(X, y=None)¶

Transform the data.

Note: The transform method is not available until this operator is trained.

Once this method is available, it will have the following signature:

Parameters

X (array) –

Features; the outer array is over samples.

items : array
- items : union type
  float
  
  or string

Returns

result

Return type

array of items : array of items : float