lale.lib.sklearn.ordinal_encoder module

class lale.lib.sklearn.ordinal_encoder.OrdinalEncoder(*, categories='auto', dtype='float64', handle_unknown='error', encode_unknown_with='auto', unknown_value=None, encoded_missing_value=nan, max_categories=None, min_frequency=None)

Bases: PlannedIndividualOp

Ordinal encoder transformer from scikit-learn that encodes categorical features as numbers.

This documentation is auto-generated from JSON schemas.

Parameters
  • categories (union type, not for optimizer, default 'auto') –

    • ‘auto’ or None

      Determine categories automatically from training data.

    • or array

      The ith list element holds the categories expected in the ith column.

      • items : union type

        • array of items : string

        • or array of items : float

          Should be sorted.

  • dtype (Any, not for optimizer, default 'float64') – Desired dtype of output, must be number. See https://docs.scipy.org/doc/numpy-1.14.0/reference/arrays.scalars.html#arrays-scalars-built-in

  • handle_unknown (‘error’, ‘ignore’, or ‘use_encoded_value’, optional, not for optimizer, default ‘error’) –

    When set to ‘error’ an error will be raised in case an unknown categorical feature is present during transform.

    When set to ‘use_encoded_value’, the encoded value of unknown categories will be set to the value given for the parameter unknown_value. In inverse_transform, an unknown category will be denoted as None. When this parameter is set to ignore and an unknown category is encountered during transform, the resulting encoding with be set to the value indicated by encode_unknown_with (this functionality is added by lale).

    See also constraint-1, constraint-2.

  • encode_unknown_with (union type, optional, not for optimizer, default 'auto') –

    When an unknown categorical feature value is found during transform, and ‘handle_unknown’ is set to ‘ignore’, that value is encoded with this value. Default of ‘auto’ sets it to an integer equal to n+1, where n is the maximum encoding value based on known categories.

    • integer

    • or ‘auto’

  • unknown_value (union type, optional, not for optimizer, default None) –

    When the parameter handle_unknown is set to ‘use_encoded_value’, this parameter is required and will set the encoded value of unknown categories.

    It has to be distinct from the values used to encode any of the categories in fit.

    • integer

    • or nan

    • or None

    See also constraint-1, constraint-1, constraint-2.

  • encoded_missing_value (union type, optional, not for optimizer, default nan) –

    Encoded value of missing categories. If set to np.nan, then the dtype parameter must be a float dtype.

    • integer

    • or nan

    • or None

  • max_categories (union type, optional, not for optimizer, default None) –

    Specifies an upper limit to the number of output categories for each input feature when considering infrequent categories. If there are infrequent categories, max_categories includes the category representing the infrequent categories along with the frequent categories. If None, there is no limit to the number of output features.

    max_categories do not take into account missing or unknown categories. Setting unknown_value or encoded_missing_value to an integer will increase the number of unique integer codes by one each. This can result in up to max_categories + 2 integer codes.

    • integer, >1

    • or None

  • min_frequency (union type, optional, not for optimizer, default None) –

    Specifies the minimum frequency below which a category will be considered infrequent.

    • integer, >=1

      Categories with a smaller cardinality will be considered infrequent.

    • or float, >=0.0, <=1.0

      Categories with a smaller cardinality than min_frequency * n_samples will be considered infrequent.

    • or None

Notes

constraint-1 : union type

unknown_value should be an integer or np.nan when handle_unknown is ‘use_encoded_value’.

  • handle_unknown : negated type of ‘use_encoded_value’

  • or unknown_value : nan

  • or unknown_value : integer

constraint-2 : union type

unknown_value should only be set when handle_unknown is ‘use_encoded_value’.

  • handle_unknown : ‘use_encoded_value’

  • or unknown_value : None

fit(X, y=None, **fit_params)

Train the operator.

Note: The fit method is not available until this operator is trainable.

Once this method is available, it will have the following signature:

Parameters
  • X (array) –

    Features; the outer array is over samples.

    • items : union type

      • array of items : float

      • or array of items : string

  • y (any type, optional) – Target class labels; the array is over samples.

transform(X, y=None)

Transform the data.

Note: The transform method is not available until this operator is trained.

Once this method is available, it will have the following signature:

Parameters

X (array) –

Features; the outer array is over samples.

  • items : union type

    • array of items : float

    • or array of items : string

Returns

result – Ordinal codes.

Return type

array of items : array of items : float