lale.lib.sklearn.one_hot_encoder module

class lale.lib.sklearn.one_hot_encoder.OneHotEncoder(*, categories='auto', dtype='float64', handle_unknown='error', drop=None, sparse_output=True, feature_name_combiner='concat')

Bases: PlannedIndividualOp

One-hot encoder transformer from scikit-learn that encodes categorical features as numbers.

This documentation is auto-generated from JSON schemas.

Parameters
  • categories (union type, not for optimizer, default 'auto') –

    • ‘auto’ or None

      Determine categories automatically from training data.

    • or array

      The ith list element holds the categories expected in the ith column.

      • items : union type

        • array of items : string

        • or array of items : float

          Should be sorted.

  • dtype (Any, not for optimizer, default 'float64') – Desired dtype of output, must be number. See https://docs.scipy.org/doc/numpy-1.14.0/reference/arrays.scalars.html#arrays-scalars-built-in

  • handle_unknown (union type, not for optimizer, default 'error') –

    Specifies the way unknown categories are handled during transform.

    • ’error’

      Raise an error if an unknown category is present during transform.

    • or ‘ignore’

      When an unknown category is encountered during transform, the resulting one-hot encoded columns for this feature will be all zeros. In the inverse transform, an unknown category will be denoted as None.

    • or ‘infrequent_if_exist’

      When an unknown category is encountered during transform, the resulting one-hot encoded columns for this feature will map to the infrequent category if it exists. The infrequent category will be mapped to the last position in the encoding. During inverse transform, an unknown category will be mapped to the category denoted 'infrequent' if it exists. If the 'infrequent' category does not exist, then transform and inverse_transform will handle an unknown category as with handle_unknown='ignore'. Infrequent categories exist based on min_frequency and max_categories. Read more in the User Guide.

  • drop (union type, optional, not for optimizer, default None) –

    Specifies a methodology to use to drop one of the categories per feature.

    • ’first’ or ‘if_binary’

    • or array, not for optimizer of items : float

    • or None

  • sparse_output (boolean, optional, not for optimizer, default True) – Will return sparse matrix if set true, else will return an array.

  • feature_name_combiner (union type, optional, not for optimizer, default 'concat') –

    Used to create feature names to be returned by get_feature_names_out.

    • ’concat’

      concatenates encoded feature name and category with feature + “_” + str(category).E.g. feature X with values 1, 6, 7 create feature names X_1, X_6, X_7.

    • or callable, not for optimizer

      Callable with signature def callable(input_feature, category) that returns a string

fit(X, y=None, **fit_params)

Train the operator.

Note: The fit method is not available until this operator is trainable.

Once this method is available, it will have the following signature:

Parameters
  • X (array) –

    Features; the outer array is over samples.

    • items : array

      • items : union type

        • float

        • or string

  • y (any type, optional) – Target class labels; the array is over samples.

transform(X, y=None)

Transform the data.

Note: The transform method is not available until this operator is trained.

Once this method is available, it will have the following signature:

Parameters

X (array) –

Features; the outer array is over samples.

  • items : array

    • items : union type

      • float

      • or string

Returns

result – One-hot codes.

Return type

array of items : array of items : float