lale.lib.autogen.latent_dirichlet_allocation module¶

class lale.lib.autogen.latent_dirichlet_allocation.LatentDirichletAllocation(*, n_components=10, doc_topic_prior=None, topic_word_prior=None, learning_method='batch', learning_decay=0.7, learning_offset=10.0, max_iter=10, batch_size=128, evaluate_every=-1, total_samples=1000000.0, perp_tol=0.1, mean_change_tol=0.001, max_doc_update_iter=100, n_jobs=1, verbose=0, random_state=None)¶

Bases: PlannedIndividualOp

Combined schema for expected data and hyperparameters.

This documentation is auto-generated from JSON schemas.

Parameters

n_components (integer, >=2 for optimizer, <='X/items/maxItems', <=256 for optimizer, uniform distribution, default 10) – Number of topics.
doc_topic_prior (union type, not for optimizer, default None) –
Prior of document topic distribution theta
- float
- or None
topic_word_prior (union type, not for optimizer, default None) –
Prior of topic word distribution beta
- float
- or None
learning_method (‘batch’ or ‘online’, default ‘batch’) – Method used to update _component
learning_decay (float, not for optimizer, default 0.7) – It is a parameter that control learning rate in the online learning method
learning_offset (float, not for optimizer, default 10.0) – A (positive) parameter that downweights early iterations in online learning
max_iter (integer, >=10 for optimizer, <=1000 for optimizer, uniform distribution, default 10) – The maximum number of iterations.
batch_size (integer, >=3 for optimizer, <=128 for optimizer, uniform distribution, default 128) – Number of documents to use in each EM iteration
evaluate_every (integer, >=-1 for optimizer, <=0 for optimizer, uniform distribution, default -1) – How often to evaluate perplexity
total_samples (union type, default 1000000.0) –
Total number of documents
- integer, not for optimizer
- or float, >=0.0 for optimizer, <=1.0 for optimizer, uniform distribution
perp_tol (float, not for optimizer, default 0.1) – Perplexity tolerance in batch learning
mean_change_tol (float, not for optimizer, default 0.001) – Stopping tolerance for updating document topic distribution in E-step.
max_doc_update_iter (integer, >=100 for optimizer, <=101 for optimizer, uniform distribution, default 100) – Max number of iterations for updating document topic distribution in the E-step.
n_jobs (union type, not for optimizer, default 1) –
The number of jobs to use in the E-step
- integer
- or None
verbose (integer, not for optimizer, default 0) – Verbosity level.
random_state (union type, not for optimizer, default None) –
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
- integer
- or numpy.random.RandomState
- or None

Notes

constraint-1 : any type

fit(X, y=None, **fit_params)¶

Train the operator.

Note: The fit method is not available until this operator is trainable.

Once this method is available, it will have the following signature:

Parameters

X (union type) –
Document word matrix.
- array of items : Any
- or array of items : array of items : float
y (any type) –

transform(X, y=None)¶

Transform the data.

Note: The transform method is not available until this operator is trained.

Once this method is available, it will have the following signature:

Parameters

X (union type) –

Document word matrix.

array of items : Any
or array of items : array of items : float

Returns

result – Document topic distribution for X.

Return type

Any