lale.lib.sklearn.feature_agglomeration module¶
- class lale.lib.sklearn.feature_agglomeration.FeatureAgglomeration(*, n_clusters=2, memory=None, connectivity=None, compute_full_tree='auto', linkage='ward', pooling_func='<function mean>', distance_threshold=None, compute_distances=False, metric='euclidean')¶
Bases:
PlannedIndividualOp
Feature agglomeration transformer from scikit-learn.
This documentation is auto-generated from JSON schemas.
- Parameters
n_clusters (union type, optional, not for optimizer, default 2) –
The number of clusters to find.
integer, >=2 for optimizer, <=’X/maxItems’, <=8 for optimizer
or None, not for optimizer
See also constraint-3.
memory (union type, not for optimizer, default None) –
Used to cache the output of the computation of the tree.
string
Path to the caching directory.
or dict, not for optimizer
Object with the joblib.Memory interface
or None
No caching.
connectivity (union type, optional, not for optimizer, default None) –
Connectivity matrix. Defines for each feature the neighboring features following a given structure of the data.
array of items : array of items : float
or callable, not for optimizer
A callable that transforms the data into a connectivity matrix, such as derived from kneighbors_graph.
or None
compute_full_tree (union type, default 'auto') –
Stop early the construction of the tree at n_clusters.
boolean
or ‘auto’
See also constraint-4.
linkage (‘ward’, ‘complete’, ‘average’, or ‘single’, optional, default ‘ward’) –
Which linkage criterion to use. The linkage criterion determines which distance to use between sets of features.
See also constraint-1.
pooling_func (callable, not for optimizer, default <function mean at 0x7fdfd7fbe570>) – This combines the values of agglomerated features into a single value, and should accept an array of shape [M, N] and the keyword argument axis=1, and reduce it to an array of size [M].
distance_threshold (union type, optional, not for optimizer, default None) –
The linkage distance threshold above which, clusters will not be merged.
float
or None
See also constraint-3, constraint-4.
compute_distances (boolean, optional, not for optimizer, default False) – Computes distances between clusters even if distance_threshold is not used. This can be used to make dendrogram visualization, but introduces a computational and memory overhead.
metric (union type, optional, not for optimizer, default 'euclidean') –
Metric used to compute the linkage. The default is euclidean
’euclidean’, ‘l1’, ‘l2’, ‘manhattan’, ‘cosine’, or ‘precomputed’
or None, not for optimizer
deprecated
or callable, not for optimizer
See also constraint-1.
Notes
constraint-1 : union type
affinity, if linkage is “ward”, only “euclidean” is accepted
affinity : ‘euclidean’ or None
or metric : ‘euclidean’ or None
or linkage : negated type of ‘ward’
constraint-2 : negated type of ‘X/isSparse’
A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.
constraint-3 : union type
n_clusters must be None if distance_threshold is not None.
n_clusters : None
or distance_threshold : None
constraint-4 : union type
compute_full_tree must be True if distance_threshold is not None.
compute_full_tree : ‘True’
or distance_threshold : None
- fit(X, y=None, **fit_params)¶
Train the operator.
Note: The fit method is not available until this operator is trainable.
Once this method is available, it will have the following signature:
- Parameters
X (array of items : array of items : float) – The data
y (any type, optional) – Ignored
- transform(X, y=None)¶
Transform the data.
Note: The transform method is not available until this operator is trained.
Once this method is available, it will have the following signature:
- Parameters
X (array of items : array of items : float) – A M by N array of M observations in N dimensions or a length
- Returns
result – The pooled values for each feature cluster.
- Return type
array of items : array of items : float