lale.lib.rasl.target_encoder module

class lale.lib.rasl.target_encoder.TargetEncoder(*, verbose=0, cols=None, drop_invariant=False, return_df=True, handle_missing='value', handle_unknown='value', min_samples_leaf=1, smoothing=1.0, classes=None)

Bases: PlannedIndividualOp

Relational algebra reimplementation of scikit-learn contrib’s TargetEncoder transformer.

This documentation is auto-generated from JSON schemas.

Works on both pandas and Spark dataframes by using Aggregate for fit and Map for transform, which in turn use the appropriate backend.

Parameters
  • verbose (integer, not for optimizer, default 0) – Verbosity of the output, 0 for none.

  • cols (union type, not for optimizer, default None) –

    Columns to encode.

    • None

      All string columns will be encoded.

    • or array of items : string

  • drop_invariant (False, not for optimizer, default False) – This implementation only supports drop_invariant=False.

  • return_df (True, not for optimizer, default True) – This implementation returns a pandas or spark dataframe if the input is a pandas or spark dataframe, respectively.

  • handle_missing ('value', not for optimizer, default 'value') – This implementation only supports handle_missing=’value’.

  • handle_unknown ('value', not for optimizer, default 'value') – This implementation only supports handle_unknown=’value’.

  • min_samples_leaf (integer, >=1, <=10 for optimizer, not for optimizer, default 1) – For regularization the weighted average between category mean and global mean is taken. The weight is an S-shaped curve between 0 and 1 with the number of samples for a category on the x-axis. The curve reaches 0.5 at min_samples_leaf. (parameter k in the original paper)

  • smoothing (float, >0.0, <=10.0 for optimizer, not for optimizer, default 1.0) – Smoothing effect to balance categorical average vs prior. Higher value means stronger regularization. The value must be strictly bigger than 0. Higher values mean a flatter S-curve (see min_samples_leaf).

  • classes (union type, optional, not for optimizer, default None) –

    • None

      Regression task.

    • or array, >=2 items of items : float

      Classification task with numeric labels.

    • or array, >=2 items of items : string

      Classification task with string labels.

    • or array, >=2 items of items : boolean

      Classification task with Boolean labels.

fit(X, y=None, **fit_params)

Train the operator.

Note: The fit method is not available until this operator is trainable.

Once this method is available, it will have the following signature:

Parameters
  • X (array) –

    Features; the outer array is over samples.

    • items : array

      • items : union type

        • float

        • or string

  • y (union type) –

    Target class labels; the array is over samples.

    • array of items : float

    • or array of items : string

partial_fit(X, y=None, **fit_params)

Incremental fit to train train the operator on a batch of samples.

Note: The partial_fit method is not available until this operator is trainable.

Once this method is available, it will have the following signature:

transform(X, y=None)

Transform the data.

Note: The transform method is not available until this operator is trained.

Once this method is available, it will have the following signature:

Parameters

X (array) –

Features; the outer array is over samples.

  • items : array

    • items : union type

      • float

      • or string

Returns

result

Return type

array of items : array of items : float