lale.lib.sklearn.target_encoder module

class lale.lib.sklearn.target_encoder.TargetEncoder(*, categories='auto', target_type='auto', smooth='auto', cv=5, shuffle=True, random_state=None)

Bases: PlannedIndividualOp

Target encoder for regression and classification targets..

This documentation is auto-generated from JSON schemas.

Parameters
  • categories (union type, not for optimizer, default 'auto') –

    Categories (unique values) per feature.

    • ’auto’

      Determine categories automatically from training data.

    • or array

      The ith list element holds the categories expected in the ith column.

      • items : union type

        • array of items : string

        • or array of items : float

          Should be sorted.

  • target_type (union type, not for optimizer, default 'auto') –

    Type of target.

    • ’auto’

      Type of target is inferred with type_of_target.

    • or ‘continuous’

      Continuous target

    • or ‘binary’

      Binary target

    • or ‘multiclass’

      Multiclass target

  • smooth (union type, optional, not for optimizer, default 'auto') –

    The amount of mixing of the target mean conditioned on the value of the category with the global target mean.

    • ’auto’

      Set to an empirical Bayes estimate.

    • or float, >=0.0, <=1.0

      A larger smooth value will put more weight on the global target mean

  • cv (integer, >=1, optional, not for optimizer, default 5) – Determines the number of folds in the cross fitting strategy used in fit_transform. For classification targets, StratifiedKFold is used and for continuous targets, KFold is used.

  • shuffle (boolean, optional, not for optimizer, default True) – Whether to shuffle the data in fit_transform before splitting into folds. Note that the samples within each split will not be shuffled.

  • random_state (union type, optional, not for optimizer, default None) –

    When shuffle is True, random_state affects the ordering of the indices, which controls the randomness of each fold. Otherwise, this parameter has no effect. Pass an int for reproducible output across multiple function calls.

    • None

    • or numpy.random.RandomState

      Use the provided random state, only affecting other users of that same random state instance.

    • or integer

      Explicit seed.

fit(X, y=None, **fit_params)

Train the operator.

Note: The fit method is not available until this operator is trainable.

Once this method is available, it will have the following signature:

Parameters
  • X (array) –

    Features; the outer array is over samples.

    • items : union type

      • array of items : float

      • or array of items : string

  • y (array, optional) – The target data used to encode the categories.

transform(X, y=None)

Transform the data.

Note: The transform method is not available until this operator is trained.

Once this method is available, it will have the following signature:

Parameters

X (array) –

Features; the outer array is over samples.

  • items : union type

    • array of items : float

    • or array of items : string

Returns

result – Transformed input; the outer array is over samples.

  • items : union type

    • array of items : float

    • or array of items : string

Return type

array