lale.lib.rasl.functions module

class lale.lib.rasl.functions.ColumnMonoidFactory(col_maker: Callable[[Union[str, int]], MonoidFactory[Any, bool, _D]])[source]

Bases: ColumnSelector[DictMonoid[_D]]

Given a MonoidFactory for deciding if a given column is valid, This returns the list of valid columns

from_monoid(monoid: DictMonoid[_D]) List[Union[str, int]][source]

Given the monoid instance, return the appropriate type of output. This method may also modify self based on the monoid instance.

to_monoid(batch)[source]

Create a monoid instance representing the input data

class lale.lib.rasl.functions.ColumnSelector(*args, **kwargs)[source]

Bases: MonoidFactory[Any, List[Union[str, int]], _D], Protocol

class lale.lib.rasl.functions.DictMonoid(m: Dict[Any, _D])[source]

Bases: Generic[_D], Monoid

Given a monoid, this class lifts it to a dictionary pointwise

combine(other: DictMonoid[_D])[source]

Combines this monoid instance with another, producing a result. This operation must be observationally associative, satisfying x.from_monoid(a.combine(b.combine(c))) == x.from_monoid(a.combine(b).combine(c))) where x is the instance of :class:MonoidFactory that created these instances.

property is_absorbing

A monoid value x is absorbing if for all y, x.combine(y) == x. This can help stop training early for monoids with learned coefficients.

class lale.lib.rasl.functions.categorical(max_values: int = 5)[source]

Bases: ColumnMonoidFactory

Creates a MonoidFactory (and callable) for projecting categorical columns with sklearn’s ColumnTransformer or Lale’s Project operator.

Parameters

max_values (int) – Maximum number of unique values in a column for it to be considered categorical.

Returns

Function that, given a dataset X, returns a list of columns, containing either string column names or integer column indices.

Return type

callable

class lale.lib.rasl.functions.categorical_column(col: Union[str, int], threshold: int = 5)[source]

Bases: MonoidFactory[Any, bool, _column_distinct_count_data]

Determines if a column should be considered categorical, by seeing if there are more than threshold distinct values in it

from_monoid(monoid: _column_distinct_count_data) bool[source]

Given the monoid instance, return the appropriate type of output. This method may also modify self based on the monoid instance.

to_monoid(batch) _column_distinct_count_data[source]

Create a monoid instance representing the input data

class lale.lib.rasl.functions.count_distinct_column(col: Union[str, int], limit: Optional[int] = None)[source]

Bases: MonoidFactory[Any, int, _column_distinct_count_data]

Counts the number of distinct elements in a given column. If a limit is specified, then, once the limit is reached, the count may no longer be accurate (but will always remain over the limit).

from_monoid(monoid: _column_distinct_count_data) int[source]

Given the monoid instance, return the appropriate type of output. This method may also modify self based on the monoid instance.

to_monoid(batch) _column_distinct_count_data[source]

Create a monoid instance representing the input data

class lale.lib.rasl.functions.date_time(fmt)[source]

Bases: object

Creates a callable for projecting date/time columns with sklearn’s ColumnTransformer or Lale’s Project operator.

Parameters

fmt (str) – Format string for strptime(), see https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior

Returns

Function that, given a dataset X, returns a list of columns, containing either string column names or integer column indices.

Return type

callable

lale.lib.rasl.functions.filter_isnan(df: Any, column_name: str)[source]
lale.lib.rasl.functions.filter_isnotnan(df: Any, column_name: str)[source]
lale.lib.rasl.functions.filter_isnotnull(df: Any, column_name: str)[source]
lale.lib.rasl.functions.filter_isnull(df: Any, column_name: str)[source]
class lale.lib.rasl.functions.make_categorical_column(threshold=5)[source]

Bases: object