lale.helpers module¶

class lale.helpers.GenSym(names: Set[str])[source]¶: Bases: object

lale.helpers.add_missing_values(orig_X, missing_rate=0.1, seed=None)[source]¶

lale.helpers.append_batch(data, batch_data)[source]¶

lale.helpers.are_hyperparameters_equal(hyperparam1, hyperparam2)[source]¶

lale.helpers.arg_name(pos=0, level=1) → Optional[str][source]¶

lale.helpers.assignee_name(level=1) → Optional[str][source]¶

lale.helpers.create_data_loader(X: Any, y: Optional[Any] = None, batch_size: int = 1, num_workers: int = 0, shuffle: bool = True)[source]¶

A function that takes a dataset as input and outputs a Pytorch dataloader.

Parameters

X (Input data.) – The formats supported are Pandas DataFrame, Numpy array, a sparse matrix, torch.tensor, torch.utils.data.Dataset, path to a HDF5 file, lale.util.batch_data_dictionary_dataset.BatchDataDict, a Python dictionary of the format {“dataset”: torch.utils.data.Dataset, “collate_fn”:collate_fn for torch.utils.data.DataLoader}
y (Labels., optional) – Supported formats are Numpy array or Pandas series, by default None
batch_size (int, optional) – Number of samples in each batch, by default 1
num_workers (int, optional) – Number of workers used by the data loader, by default 0
shuffle (boolean, optional, default True) – Whether to use SequentialSampler or RandomSampler for creating batches

Return type

torch.utils.data.DataLoader

Raises

TypeError – Raises a TypeError if the input format is not supported.

lale.helpers.create_individual_op_using_reflection(class_name, operator_name, param_dict)[source]¶

lale.helpers.create_instance_from_hyperopt_search_space(lale_object, hyperparams) → Operator[source]¶: Hyperparams is a n-tuple of dictionaries of hyper-parameters, each dictionary corresponds to an operator in the pipeline

lale.helpers.cross_val_score(estimator, X, y=None, scoring: ~typing.Any = <function accuracy_score>, cv: ~typing.Any = 5)[source]¶

Use the given estimator to perform fit and predict for splits defined by ‘cv’ and compute the given score on each of the splits.

Parameters

estimator (A valid sklearn_wrapper estimator) –
X (Valid data value that works with the estimator) –
y (Valid target value that works with the estimator) –
scoring (a scorer object from sklearn.metrics (https://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics)) – Default value is accuracy_score.
cv (an integer or an object that has a split function as a generator yielding (train, test) splits as arrays of indices.) – Integer value is used as number of folds in sklearn.model_selection.StratifiedKFold, default is 5. Note that any of the iterators from https://scikit-learn.org/stable/modules/cross_validation.html#cross-validation-iterators can be used here.

Returns

cv_results

Return type

a list of scores corresponding to each cross validation fold

lale.helpers.cross_val_score_track_trials(estimator, X, y=None, scoring: ~typing.Any = <function accuracy_score>, cv: ~typing.Any = 5, args_to_scorer: ~typing.Optional[~typing.Dict[str, ~typing.Any]] = None, args_to_cv: ~typing.Optional[~typing.Dict[str, ~typing.Any]] = None, **fit_params)[source]¶

Use the given estimator to perform fit and predict for splits defined by ‘cv’ and compute the given score on each of the splits.

Parameters

estimator (A valid sklearn_wrapper estimator) –
X (Valid data that works with the estimator) –
y (Valid target that works with the estimator) –
scoring (string or a scorer object created using) – https://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html#sklearn.metrics.make_scorer. A string from sklearn.metrics.SCORERS.keys() can be used or a scorer created from one of sklearn.metrics (https://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics). A completely custom scorer object can be created from a python function following the example at https://scikit-learn.org/stable/modules/model_evaluation.html The metric has to return a scalar value,
cv (an integer or an object that has a split function as a generator yielding (train, test) splits as arrays of indices.) – Integer value is used as number of folds in sklearn.model_selection.StratifiedKFold, default is 5. Note that any of the iterators from https://scikit-learn.org/stable/modules/cross_validation.html#cross-validation-iterators can be used here.
args_to_scorer (A dictionary of additional keyword arguments to pass to the scorer.) – Used for cases where the scorer has a signature such as scorer(estimator, X, y, **kwargs).
args_to_cv (A dictionary of additional keyword arguments to pass to the split method of cv.) – This is only applicable when cv is not an integer.
fit_params (Additional parameters that should be passed when calling fit on the estimator) –

Returns

cv_results

Return type

a list of scores corresponding to each cross validation fold

lale.helpers.data_to_json(data, subsample_array: bool = True) → Union[list, dict, int, float][source]¶

lale.helpers.dict_without(orig_dict: Dict[str, Any], key: str) → Dict[str, Any][source]¶

lale.helpers.find_lale_wrapper(sklearn_obj: Any) → Optional[Any][source]¶

Parameters: sklearn_obj – An sklearn compatible object that may have a lale wrapper
Returns: The lale wrapper type, or None if one could not be found

lale.helpers.fold_schema(X, y, cv=1, is_classifier=True)[source]¶

lale.helpers.get_estimator_param_name_from_hyperparams(hyperparams)[source]¶

lale.helpers.get_name_and_index(name: str) → Tuple[str, int][source]¶: given a name of the form “name@i”, returns (name, i) if given a name of the form “name”, returns (name, 0)

lale.helpers.get_sklearn_estimator_name() → str[source]¶: Some higher order sklearn operators changed the name of the nested estimatator in later versions. This returns the appropriate version dependent paramater name

lale.helpers.import_from_sklearn(sklearn_obj: Any, fitted: bool = True, in_place: bool = False)[source]¶

This method take an object and tries to wrap sklearn objects (at the top level or contained within hyperparameters of other sklearn objects). It will modify the object to add in the appropriate lale wrappers. It may also return a wrapper or different object than given.

Parameters

sklearn_obj – the object that we are going to try and wrap
fitted – should we return a TrainedOperator
in_place – should we try to mutate what we can in place, or should we aggressively deepcopy everything

Returns

The wrapped object (or the input object if we could not wrap it)

lale.helpers.import_from_sklearn_pipeline(sklearn_pipeline: Any, fitted: bool = True)[source]¶

Note: Same as import_from_sklearn. This alternative name exists for backwards compatibility.

This method take an object and tries to wrap sklearn objects (at the top level or contained within hyperparameters of other sklearn objects). It will modify the object to add in the appropriate lale wrappers. It may also return a wrapper or different object than given.

Parameters

sklearn_pipeline – the object that we are going to try and wrap
fitted – should we return a TrainedOperator

Returns

The wrapped object (or the input object if we could not wrap it)

lale.helpers.instantiate_from_hyperopt_search_space(obj_hyperparams, new_hyperparams)[source]¶

lale.helpers.is_empty_dict(val) → bool[source]¶

lale.helpers.is_numeric_structure(structure_type: str)[source]¶

lale.helpers.json_lookup(ptr, jsn, default=None)[source]¶

lale.helpers.make_array_index_name(index, is_tuple: bool = False)[source]¶

lale.helpers.make_degen_indexed_name(name, index)[source]¶

lale.helpers.make_indexed_name(name, index)[source]¶

lale.helpers.make_nested_hyperopt_space(sub_space)[source]¶

lale.helpers.ndarray_to_json(arr: ndarray, subsample_array: bool = True) → Union[list, dict][source]¶

lale.helpers.nest_HPparam(name: str, key: str)[source]¶

lale.helpers.nest_HPparams(name: str, grid: Mapping[str, V]) → Dict[str, V][source]¶

lale.helpers.nest_all_HPparams(name: str, grids: Iterable[Mapping[str, V]]) → List[Dict[str, V]][source]¶: Given the name of an operator in a pipeline, this transforms every key(parameter name) in the grids to use the operator name as a prefix (separated by __). This is the convention in scikit-learn pipelines.

lale.helpers.nest_choice_HPparam(key: str)[source]¶

lale.helpers.nest_choice_HPparams(grid: Mapping[str, V]) → Dict[str, V][source]¶

lale.helpers.nest_choice_all_HPparams(grids: Iterable[Mapping[str, V]]) → List[Dict[str, V]][source]¶: this transforms every key(parameter name) in the grids to be nested under a choice, using a ? as a prefix (separated by __). This is the convention in scikit-learn pipelines.

lale.helpers.partition_sklearn_choice_params(d: Dict[str, Any]) → Tuple[int, Dict[str, Any]][source]¶

lale.helpers.partition_sklearn_params(d: Dict[str, Any]) → Tuple[Dict[str, Any], Dict[str, Dict[str, Any]]][source]¶

lale.helpers.split_with_schemas(estimator, all_X, all_y, indices, train_indices=None)[source]¶

lale.helpers.to_graphviz(lale_operator: Operator, ipython_display: bool = True, call_depth: int = 1, **dot_graph_attr)[source]¶

lale.helpers.unnest_HPparams(k: str) → List[str][source]¶

lale.helpers.unnest_choice(k: str) → str[source]¶

class lale.helpers.val_wrapper(base)[source]¶

Bases: object

This is used to wrap values that cause problems for hyper-optimizer backends lale will unwrap these when given them as the value of a hyper-parameter

classmethod unwrap(obj)[source]¶

unwrap_self()[source]¶

lale.helpers.with_fixed_estimator_name(**kwargs)[source]¶: Some higher order sklearn operators changed the name of the nested estimator in later versions. This fixes up the arguments, renaming estimator and base_estimator appropriately.

lale.helpers.write_batch_output_to_file(file_obj, file_path, total_len, batch_idx, batch_X, batch_y, batch_out_X, batch_out_y)[source]¶