lale.lib.autogen.latent_dirichlet_allocation module¶
- class lale.lib.autogen.latent_dirichlet_allocation.LatentDirichletAllocation(*, n_components=10, doc_topic_prior=None, topic_word_prior=None, learning_method='batch', learning_decay=0.7, learning_offset=10.0, max_iter=10, batch_size=128, evaluate_every=-1, total_samples=1000000.0, perp_tol=0.1, mean_change_tol=0.001, max_doc_update_iter=100, n_jobs=1, verbose=0, random_state=None)¶
Bases:
PlannedIndividualOp
Combined schema for expected data and hyperparameters.
This documentation is auto-generated from JSON schemas.
- Parameters
n_components (integer, >=2 for optimizer, <='X/items/maxItems', <=256 for optimizer, uniform distribution, default 10) – Number of topics.
doc_topic_prior (union type, not for optimizer, default None) –
Prior of document topic distribution theta
float
or None
topic_word_prior (union type, not for optimizer, default None) –
Prior of topic word distribution beta
float
or None
learning_method (‘batch’ or ‘online’, default ‘batch’) – Method used to update _component
learning_decay (float, not for optimizer, default 0.7) – It is a parameter that control learning rate in the online learning method
learning_offset (float, not for optimizer, default 10.0) – A (positive) parameter that downweights early iterations in online learning
max_iter (integer, >=10 for optimizer, <=1000 for optimizer, uniform distribution, default 10) – The maximum number of iterations.
batch_size (integer, >=3 for optimizer, <=128 for optimizer, uniform distribution, default 128) – Number of documents to use in each EM iteration
evaluate_every (integer, >=-1 for optimizer, <=0 for optimizer, uniform distribution, default -1) – How often to evaluate perplexity
total_samples (union type, default 1000000.0) –
Total number of documents
integer, not for optimizer
or float, >=0.0 for optimizer, <=1.0 for optimizer, uniform distribution
perp_tol (float, not for optimizer, default 0.1) – Perplexity tolerance in batch learning
mean_change_tol (float, not for optimizer, default 0.001) – Stopping tolerance for updating document topic distribution in E-step.
max_doc_update_iter (integer, >=100 for optimizer, <=101 for optimizer, uniform distribution, default 100) – Max number of iterations for updating document topic distribution in the E-step.
n_jobs (union type, not for optimizer, default 1) –
The number of jobs to use in the E-step
integer
or None
verbose (integer, not for optimizer, default 0) – Verbosity level.
random_state (union type, not for optimizer, default None) –
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
integer
or numpy.random.RandomState
or None
Notes
constraint-1 : any type
- fit(X, y=None, **fit_params)¶
Train the operator.
Note: The fit method is not available until this operator is trainable.
Once this method is available, it will have the following signature:
- Parameters
X (union type) –
Document word matrix.
array of items : Any
or array of items : array of items : float
y (any type) –
- transform(X, y=None)¶
Transform the data.
Note: The transform method is not available until this operator is trained.
Once this method is available, it will have the following signature:
- Parameters
X (union type) –
Document word matrix.
array of items : Any
or array of items : array of items : float
- Returns
result – Document topic distribution for X.
- Return type
Any