lale.lib.category_encoders.target_encoder module

class lale.lib.category_encoders.target_encoder.TargetEncoder(*, verbose=0, cols=None, drop_invariant=False, return_df=True, handle_missing='value', handle_unknown='value', min_samples_leaf=1, smoothing=1.0)

Bases: PlannedIndividualOp

Target encoder transformer from scikit-learn contrib that encodes categorical features as numbers.

This documentation is auto-generated from JSON schemas.

Parameters
  • verbose (integer, not for optimizer, default 0) – Verbosity of the output, 0 for none.

  • cols (union type, not for optimizer, default None) –

    Columns to encode.

    • None

      All string columns will be encoded.

    • or array of items : string

  • drop_invariant (boolean, not for optimizer, default False) – Whether to drop columns with 0 variance.

  • return_df (boolean, not for optimizer, default True) – Whether to return a pandas DataFrame from transform (otherwise it will be a numpy array).

  • handle_missing (‘error’, ‘return_nan’, or ‘value’, not for optimizer, default ‘value’) – Given ‘value’, return the target mean.

  • handle_unknown (‘error’, ‘return_nan’, or ‘value’, not for optimizer, default ‘value’) – Given ‘value’, return the target mean.

  • min_samples_leaf (integer, >=1, <=10 for optimizer, not for optimizer, default 1) – For regularization the weighted average between category mean and global mean is taken. The weight is an S-shaped curve between 0 and 1 with the number of samples for a category on the x-axis. The curve reaches 0.5 at min_samples_leaf. (parameter k in the original paper)

  • smoothing (float, >0.0, <=10.0 for optimizer, not for optimizer, default 1.0) – Smoothing effect to balance categorical average vs prior. Higher value means stronger regularization. The value must be strictly bigger than 0. Higher values mean a flatter S-curve (see min_samples_leaf).

fit(X, y=None, **fit_params)

Train the operator.

Note: The fit method is not available until this operator is trainable.

Once this method is available, it will have the following signature:

Parameters
  • X (array) –

    Features; the outer array is over samples.

    • items : array

      • items : union type

        • float

        • or string

  • y (union type) –

    Target class labels; the array is over samples.

    • array of items : float

    • or array of items : string

transform(X, y=None)

Transform the data.

Note: The transform method is not available until this operator is trained.

Once this method is available, it will have the following signature:

Parameters

X (array) –

Features; the outer array is over samples.

  • items : array

    • items : union type

      • float

      • or string

Returns

result

Return type

array of items : array of items : float