lale.lib.sklearn.feature_agglomeration module¶

class lale.lib.sklearn.feature_agglomeration.FeatureAgglomeration(*, n_clusters=2, memory=None, connectivity=None, compute_full_tree='auto', linkage='ward', pooling_func='<function mean>', distance_threshold=None, compute_distances=False, metric='euclidean')¶

Bases: PlannedIndividualOp

Feature agglomeration transformer from scikit-learn.

This documentation is auto-generated from JSON schemas.

Parameters

n_clusters (union type, optional, not for optimizer, default 2) –
The number of clusters to find.
- integer, >=2 for optimizer, <=’X/maxItems’, <=8 for optimizer
- or None, not for optimizer
See also constraint-3.
memory (union type, not for optimizer, default None) –
Used to cache the output of the computation of the tree.
- string
  
  Path to the caching directory.
- or dict, not for optimizer
  
  Object with the joblib.Memory interface
- or None
  
  No caching.
connectivity (union type, optional, not for optimizer, default None) –
Connectivity matrix. Defines for each feature the neighboring features following a given structure of the data.
- array of items : array of items : float
- or callable, not for optimizer
  
  A callable that transforms the data into a connectivity matrix, such as derived from kneighbors_graph.
- or None
compute_full_tree (union type, default 'auto') –
Stop early the construction of the tree at n_clusters.
- boolean
- or ‘auto’
See also constraint-4.
linkage (‘ward’, ‘complete’, ‘average’, or ‘single’, optional, default ‘ward’) –
Which linkage criterion to use. The linkage criterion determines which distance to use between sets of features.

See also constraint-1.
pooling_func (callable, not for optimizer, default <function mean at 0x7fdfd7fbe570>) – This combines the values of agglomerated features into a single value, and should accept an array of shape [M, N] and the keyword argument axis=1, and reduce it to an array of size [M].
distance_threshold (union type, optional, not for optimizer, default None) –
The linkage distance threshold above which, clusters will not be merged.
- float
- or None
See also constraint-3, constraint-4.
compute_distances (boolean, optional, not for optimizer, default False) – Computes distances between clusters even if distance_threshold is not used. This can be used to make dendrogram visualization, but introduces a computational and memory overhead.
metric (union type, optional, not for optimizer, default 'euclidean') –
Metric used to compute the linkage. The default is euclidean
- ’euclidean’, ‘l1’, ‘l2’, ‘manhattan’, ‘cosine’, or ‘precomputed’
- or None, not for optimizer
  
  deprecated
- or callable, not for optimizer
See also constraint-1.

Notes

constraint-1 : union type

affinity, if linkage is “ward”, only “euclidean” is accepted

affinity : ‘euclidean’ or None

or metric : ‘euclidean’ or None

or linkage : negated type of ‘ward’

constraint-2 : negated type of ‘X/isSparse’

A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.

constraint-3 : union type

n_clusters must be None if distance_threshold is not None.

n_clusters : None

or distance_threshold : None

constraint-4 : union type

compute_full_tree must be True if distance_threshold is not None.

compute_full_tree : ‘True’

or distance_threshold : None

fit(X, y=None, **fit_params)¶

Train the operator.

Note: The fit method is not available until this operator is trainable.

Once this method is available, it will have the following signature:

Parameters

X (array of items : array of items : float) – The data
y (any type, optional) – Ignored

transform(X, y=None)¶

Transform the data.

Note: The transform method is not available until this operator is trained.

Once this method is available, it will have the following signature:

Parameters: X (array of items : array of items : float) – A M by N array of M observations in N dimensions or a length
Returns: result – The pooled values for each feature cluster.
Return type: array of items : array of items : float