lale.lib.sklearn.one_hot_encoder module¶

class lale.lib.sklearn.one_hot_encoder.OneHotEncoder(*, categories='auto', dtype='float64', handle_unknown='error', drop=None, sparse_output=True, feature_name_combiner='concat')¶

Bases: PlannedIndividualOp

One-hot encoder transformer from scikit-learn that encodes categorical features as numbers.

This documentation is auto-generated from JSON schemas.

Parameters

categories (union type, not for optimizer, default 'auto') –
- ‘auto’ or None
  
  Determine categories automatically from training data.
- or array
  The ith list element holds the categories expected in the ith column.
  - items : union type
    
    array of items : string
    
    or array of items : float
    
    Should be sorted.
dtype (Any, not for optimizer, default 'float64') – Desired dtype of output, must be number. See https://docs.scipy.org/doc/numpy-1.14.0/reference/arrays.scalars.html#arrays-scalars-built-in
handle_unknown (union type, not for optimizer, default 'error') –
Specifies the way unknown categories are handled during transform.
- ’error’
  
  Raise an error if an unknown category is present during transform.
- or ‘ignore’
  
  When an unknown category is encountered during transform, the resulting one-hot encoded columns for this feature will be all zeros. In the inverse transform, an unknown category will be denoted as None.
- or ‘infrequent_if_exist’
  
  When an unknown category is encountered during transform, the resulting one-hot encoded columns for this feature will map to the infrequent category if it exists. The infrequent category will be mapped to the last position in the encoding. During inverse transform, an unknown category will be mapped to the category denoted 'infrequent' if it exists. If the 'infrequent' category does not exist, then transform and inverse_transform will handle an unknown category as with handle_unknown='ignore'. Infrequent categories exist based on min_frequency and max_categories. Read more in the User Guide.
drop (union type, optional, not for optimizer, default None) –
Specifies a methodology to use to drop one of the categories per feature.
- ’first’ or ‘if_binary’
- or array, not for optimizer of items : float
- or None
sparse_output (boolean, optional, not for optimizer, default True) – Will return sparse matrix if set true, else will return an array.
feature_name_combiner (union type, optional, not for optimizer, default 'concat') –
Used to create feature names to be returned by get_feature_names_out.
- ’concat’
  
  concatenates encoded feature name and category with feature + “_” + str(category).E.g. feature X with values 1, 6, 7 create feature names X_1, X_6, X_7.
- or callable, not for optimizer
  
  Callable with signature def callable(input_feature, category) that returns a string

fit(X, y=None, **fit_params)¶

Train the operator.

Note: The fit method is not available until this operator is trainable.

Once this method is available, it will have the following signature:

Parameters

X (array) –
Features; the outer array is over samples.
- items : array
  - items : union type
    
    float
    
    or string
y (any type, optional) – Target class labels; the array is over samples.

transform(X, y=None)¶

Transform the data.

Note: The transform method is not available until this operator is trained.

Once this method is available, it will have the following signature:

Parameters

X (array) –

Features; the outer array is over samples.

items : array
- items : union type
  float
  
  or string

Returns

result – One-hot codes.

Return type

array of items : array of items : float