lale.lib.rasl.hashing_encoder module

class lale.lib.rasl.hashing_encoder.HashingEncoder(*, n_components=8, cols=None, hash_method='md5')

Bases: PlannedIndividualOp

Relational algebra reimplementation of scikit-learn contrib’s HashingEncoder transformer.

This documentation is auto-generated from JSON schemas.

Works on both pandas and Spark dataframes by using Map for transform, which in turn use the appropriate backend.

Parameters
  • n_components (integer, not for optimizer, default 8) – how many bits to use to represent the feature.

  • cols (union type, not for optimizer, default None) –

    a list of columns to encode, if None, all string columns will be encoded.

    • None

    • or array of items : string

  • hash_method (‘sha512_224’, ‘blake2s’, ‘blake2b’, ‘sha1’, ‘sm3’, ‘shake_128’, ‘sha256’, ‘md5-sha1’, ‘shake_256’, ‘md5’, ‘sha3_224’, ‘sha3_512’, ‘sha512_256’, ‘sha3_256’, ‘sha512’, ‘sha3_384’, ‘sha384’, or ‘sha224’, not for optimizer, default ‘md5’) – which hashing method to use.

fit(X, y=None, **fit_params)

Train the operator.

Note: The fit method is not available until this operator is trainable.

Once this method is available, it will have the following signature:

Parameters
  • X (array) –

    Features; the outer array is over samples.

    • items : array

      • items : union type

        • float

        • or string

  • y (any type, optional) – Target class labels; the array is over samples.

partial_fit(X, y=None, **fit_params)

Incremental fit to train train the operator on a batch of samples.

Note: The partial_fit method is not available until this operator is trainable.

Once this method is available, it will have the following signature:

transform(X, y=None)

Transform the data.

Note: The transform method is not available until this operator is trained.

Once this method is available, it will have the following signature:

Parameters

X (array) –

Features; the outer array is over samples.

  • items : array

    • items : union type

      • float

      • or string

Returns

result – Hash codes.

Return type

array of items : array of items : float