lale.lib.sklearn.feature_agglomeration module

class lale.lib.sklearn.feature_agglomeration.FeatureAgglomeration(*, n_clusters=2, memory=None, connectivity=None, compute_full_tree='auto', linkage='ward', pooling_func='<function mean>', distance_threshold=None, compute_distances=False, metric='euclidean')

Bases: PlannedIndividualOp

Feature agglomeration transformer from scikit-learn.

This documentation is auto-generated from JSON schemas.

Parameters
  • n_clusters (union type, optional, not for optimizer, default 2) –

    The number of clusters to find.

    • integer, >=2 for optimizer, <=’X/maxItems’, <=8 for optimizer

    • or None, not for optimizer

    See also constraint-3.

  • memory (union type, not for optimizer, default None) –

    Used to cache the output of the computation of the tree.

    • string

      Path to the caching directory.

    • or dict, not for optimizer

      Object with the joblib.Memory interface

    • or None

      No caching.

  • connectivity (union type, optional, not for optimizer, default None) –

    Connectivity matrix. Defines for each feature the neighboring features following a given structure of the data.

    • array of items : array of items : float

    • or callable, not for optimizer

      A callable that transforms the data into a connectivity matrix, such as derived from kneighbors_graph.

    • or None

  • compute_full_tree (union type, default 'auto') –

    Stop early the construction of the tree at n_clusters.

    • boolean

    • or ‘auto’

    See also constraint-4.

  • linkage (‘ward’, ‘complete’, ‘average’, or ‘single’, optional, default ‘ward’) –

    Which linkage criterion to use. The linkage criterion determines which distance to use between sets of features.

    See also constraint-1.

  • pooling_func (callable, not for optimizer, default <function mean at 0x7fdfd7fbe570>) – This combines the values of agglomerated features into a single value, and should accept an array of shape [M, N] and the keyword argument axis=1, and reduce it to an array of size [M].

  • distance_threshold (union type, optional, not for optimizer, default None) –

    The linkage distance threshold above which, clusters will not be merged.

    • float

    • or None

    See also constraint-3, constraint-4.

  • compute_distances (boolean, optional, not for optimizer, default False) – Computes distances between clusters even if distance_threshold is not used. This can be used to make dendrogram visualization, but introduces a computational and memory overhead.

  • metric (union type, optional, not for optimizer, default 'euclidean') –

    Metric used to compute the linkage. The default is euclidean

    • ’euclidean’, ‘l1’, ‘l2’, ‘manhattan’, ‘cosine’, or ‘precomputed’

    • or None, not for optimizer

      deprecated

    • or callable, not for optimizer

    See also constraint-1.

Notes

constraint-1 : union type

affinity, if linkage is “ward”, only “euclidean” is accepted

  • affinity : ‘euclidean’ or None

  • or metric : ‘euclidean’ or None

  • or linkage : negated type of ‘ward’

constraint-2 : negated type of ‘X/isSparse’

A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.

constraint-3 : union type

n_clusters must be None if distance_threshold is not None.

  • n_clusters : None

  • or distance_threshold : None

constraint-4 : union type

compute_full_tree must be True if distance_threshold is not None.

  • compute_full_tree : ‘True’

  • or distance_threshold : None

fit(X, y=None, **fit_params)

Train the operator.

Note: The fit method is not available until this operator is trainable.

Once this method is available, it will have the following signature:

Parameters
  • X (array of items : array of items : float) – The data

  • y (any type, optional) – Ignored

transform(X, y=None)

Transform the data.

Note: The transform method is not available until this operator is trained.

Once this method is available, it will have the following signature:

Parameters

X (array of items : array of items : float) – A M by N array of M observations in N dimensions or a length

Returns

result – The pooled values for each feature cluster.

Return type

array of items : array of items : float