lale.lib.rasl.batching module

class lale.lib.rasl.batching.Batching(*, operator, batch_size=64, shuffle=False, num_workers=0, inmemory=False, num_epochs=None, max_resident=None, scoring=None, progress_callback=None, partial_transform=False, priority='resource_aware', verbose=0)

Bases: PlannedIndividualOp

Batching trains the given pipeline using batches.

This documentation is auto-generated from JSON schemas.

The batch_size is used across all steps of the pipeline, serializing the intermediate outputs if specified.

Parameters
  • operator (operator, optional, not for optimizer) – A lale pipeline object to be used inside of batching

  • batch_size (integer, >=1, >=32 for optimizer, <=128 for optimizer, uniform distribution, optional, default 64) – Batch size used for transform.

  • shuffle (boolean, optional, not for optimizer, default False) – Shuffle dataset before batching or not.

  • num_workers (integer, optional, not for optimizer, default 0) – Number of workers for pytorch dataloader.

  • inmemory (boolean, optional, not for optimizer, default False) –

    Whether all the computations are done in memory

    or intermediate outputs are serialized. Only applies to transform/predict. For fit, use the max_resident argument.

  • num_epochs (union type, optional, not for optimizer, default None) –

    Number of epochs. If the operator has num_epochs as a parameter, that takes precedence.

    • integer

    • or None

  • max_resident (union type, optional, not for optimizer, default None) –

    Amount of memory to be used in bytes.

    • integer

    • or None

  • scoring (union type, optional, not for optimizer, default None) –

    Batch-wise scoring metrics from lale.lib.rasl.

    • callable

    • or None

  • progress_callback (union type, optional, not for optimizer, default None) –

    Callback function to get performance metrics per batch.

    • callable

    • or None

  • partial_transform (boolean, optional, not for optimizer, default False) – Whether to allow partially-trained upstream operators to transform data for training downstream operators even before the upstream operator has been fully trained.

  • priority (‘batch’, ‘step’, or ‘resource_aware’, optional, not for optimizer, default ‘resource_aware’) – Scheduling priority in task graphs. “batch” will execute tasks from earlier batches first. “step” will execute tasks from earlier steps first, like nested-loop algorithm. And “resource_aware” will execute tasks with less non-resident data first.

  • verbose (integer, optional, not for optimizer, default 0) – Verbosity level, higher values mean more information.

fit(X, y=None, **fit_params)

Train the operator.

Note: The fit method is not available until this operator is trainable.

Once this method is available, it will have the following signature:

Parameters
  • X (union type) –

    Features; the outer array is over samples.

    • array

      • items : union type

        • float

        • or string

        • or boolean

    • or array

      • items : array

        • items : union type

          • float

          • or string

          • or boolean

    • or dict

  • y (union type) –

    • array

      • items : union type

        • integer

        • or float

        • or string

    • or None

  • classes (union type, optional) –

    The total number of classes in the entire training dataset.

    • array

      • items : union type

        • float

        • or string

        • or boolean

    • or None

predict(X, **predict_params)

Make predictions.

Note: The predict method is not available until this operator is trained.

Once this method is available, it will have the following signature:

Parameters
  • X (union type) –

    Features; the outer array is over samples.

    • array

      • items : union type

        • float

        • or string

        • or boolean

    • or array

      • items : array

        • items : union type

          • float

          • or string

          • or boolean

    • or any type

  • y (array, optional) –

    • items : union type

      • integer

      • or float

Returns

result – Output data schema for transformed data.

Return type

Any

transform(X, y=None)

Transform the data.

Note: The transform method is not available until this operator is trained.

Once this method is available, it will have the following signature:

Parameters
  • X (union type) –

    Features; the outer array is over samples.

    • array

      • items : union type

        • float

        • or string

        • or boolean

    • or array

      • items : array

        • items : union type

          • float

          • or string

          • or boolean

    • or any type

  • y (array, optional) –

    • items : union type

      • integer

      • or float

Returns

result – Output data schema for transformed data.

Return type

Any