lale.lib.sklearn.ordinal_encoder module¶
- class lale.lib.sklearn.ordinal_encoder.OrdinalEncoder(*, categories='auto', dtype='float64', handle_unknown='error', encode_unknown_with='auto', unknown_value=None, encoded_missing_value=nan, max_categories=None, min_frequency=None)¶
Bases:
PlannedIndividualOp
Ordinal encoder transformer from scikit-learn that encodes categorical features as numbers.
This documentation is auto-generated from JSON schemas.
- Parameters
categories (union type, not for optimizer, default 'auto') –
‘auto’ or None
Determine categories automatically from training data.
or array
The ith list element holds the categories expected in the ith column.
items : union type
array of items : string
or array of items : float
Should be sorted.
dtype (Any, not for optimizer, default 'float64') – Desired dtype of output, must be number. See https://docs.scipy.org/doc/numpy-1.14.0/reference/arrays.scalars.html#arrays-scalars-built-in
handle_unknown (‘error’, ‘ignore’, or ‘use_encoded_value’, optional, not for optimizer, default ‘error’) –
- When set to ‘error’ an error will be raised in case an unknown categorical feature is present during transform.
When set to ‘use_encoded_value’, the encoded value of unknown categories will be set to the value given for the parameter unknown_value. In inverse_transform, an unknown category will be denoted as None. When this parameter is set to ignore and an unknown category is encountered during transform, the resulting encoding with be set to the value indicated by encode_unknown_with (this functionality is added by lale).
See also constraint-1, constraint-2.
encode_unknown_with (union type, optional, not for optimizer, default 'auto') –
When an unknown categorical feature value is found during transform, and ‘handle_unknown’ is set to ‘ignore’, that value is encoded with this value. Default of ‘auto’ sets it to an integer equal to n+1, where n is the maximum encoding value based on known categories.
integer
or ‘auto’
unknown_value (union type, optional, not for optimizer, default None) –
- When the parameter handle_unknown is set to ‘use_encoded_value’, this parameter is required and will set the encoded value of unknown categories.
It has to be distinct from the values used to encode any of the categories in fit.
integer
or nan
or None
See also constraint-1, constraint-1, constraint-2.
encoded_missing_value (union type, optional, not for optimizer, default nan) –
Encoded value of missing categories. If set to
np.nan
, then thedtype
parameter must be a float dtype.integer
or nan
or None
max_categories (union type, optional, not for optimizer, default None) –
Specifies an upper limit to the number of output categories for each input feature when considering infrequent categories. If there are infrequent categories, max_categories includes the category representing the infrequent categories along with the frequent categories. If None, there is no limit to the number of output features.
max_categories do not take into account missing or unknown categories. Setting unknown_value or encoded_missing_value to an integer will increase the number of unique integer codes by one each. This can result in up to max_categories + 2 integer codes.
integer, >1
or None
min_frequency (union type, optional, not for optimizer, default None) –
Specifies the minimum frequency below which a category will be considered infrequent.
integer, >=1
Categories with a smaller cardinality will be considered infrequent.
or float, >=0.0, <=1.0
Categories with a smaller cardinality than min_frequency * n_samples will be considered infrequent.
or None
Notes
constraint-1 : union type
unknown_value should be an integer or np.nan when handle_unknown is ‘use_encoded_value’.
handle_unknown : negated type of ‘use_encoded_value’
or unknown_value : nan
or unknown_value : integer
constraint-2 : union type
unknown_value should only be set when handle_unknown is ‘use_encoded_value’.
handle_unknown : ‘use_encoded_value’
or unknown_value : None
- fit(X, y=None, **fit_params)¶
Train the operator.
Note: The fit method is not available until this operator is trainable.
Once this method is available, it will have the following signature:
- Parameters
X (array) –
Features; the outer array is over samples.
items : union type
array of items : float
or array of items : string
y (any type, optional) – Target class labels; the array is over samples.
- transform(X, y=None)¶
Transform the data.
Note: The transform method is not available until this operator is trained.
Once this method is available, it will have the following signature:
- Parameters
X (array) –
Features; the outer array is over samples.
items : union type
array of items : float
or array of items : string
- Returns
result – Ordinal codes.
- Return type
array of items : array of items : float