lale.lib.autogen.latent_dirichlet_allocation module

class lale.lib.autogen.latent_dirichlet_allocation.LatentDirichletAllocation(*, n_components=10, doc_topic_prior=None, topic_word_prior=None, learning_method='batch', learning_decay=0.7, learning_offset=10.0, max_iter=10, batch_size=128, evaluate_every=-1, total_samples=1000000.0, perp_tol=0.1, mean_change_tol=0.001, max_doc_update_iter=100, n_jobs=1, verbose=0, random_state=None)

Bases: PlannedIndividualOp

Combined schema for expected data and hyperparameters.

This documentation is auto-generated from JSON schemas.

Parameters
  • n_components (integer, >=2 for optimizer, <='X/items/maxItems', <=256 for optimizer, uniform distribution, default 10) – Number of topics.

  • doc_topic_prior (union type, not for optimizer, default None) –

    Prior of document topic distribution theta

    • float

    • or None

  • topic_word_prior (union type, not for optimizer, default None) –

    Prior of topic word distribution beta

    • float

    • or None

  • learning_method (‘batch’ or ‘online’, default ‘batch’) – Method used to update _component

  • learning_decay (float, not for optimizer, default 0.7) – It is a parameter that control learning rate in the online learning method

  • learning_offset (float, not for optimizer, default 10.0) – A (positive) parameter that downweights early iterations in online learning

  • max_iter (integer, >=10 for optimizer, <=1000 for optimizer, uniform distribution, default 10) – The maximum number of iterations.

  • batch_size (integer, >=3 for optimizer, <=128 for optimizer, uniform distribution, default 128) – Number of documents to use in each EM iteration

  • evaluate_every (integer, >=-1 for optimizer, <=0 for optimizer, uniform distribution, default -1) – How often to evaluate perplexity

  • total_samples (union type, default 1000000.0) –

    Total number of documents

    • integer, not for optimizer

    • or float, >=0.0 for optimizer, <=1.0 for optimizer, uniform distribution

  • perp_tol (float, not for optimizer, default 0.1) – Perplexity tolerance in batch learning

  • mean_change_tol (float, not for optimizer, default 0.001) – Stopping tolerance for updating document topic distribution in E-step.

  • max_doc_update_iter (integer, >=100 for optimizer, <=101 for optimizer, uniform distribution, default 100) – Max number of iterations for updating document topic distribution in the E-step.

  • n_jobs (union type, not for optimizer, default 1) –

    The number of jobs to use in the E-step

    • integer

    • or None

  • verbose (integer, not for optimizer, default 0) – Verbosity level.

  • random_state (union type, not for optimizer, default None) –

    If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

    • integer

    • or numpy.random.RandomState

    • or None

Notes

constraint-1 : any type

fit(X, y=None, **fit_params)

Train the operator.

Note: The fit method is not available until this operator is trainable.

Once this method is available, it will have the following signature:

Parameters
  • X (union type) –

    Document word matrix.

    • array of items : Any

    • or array of items : array of items : float

  • y (any type) –

transform(X, y=None)

Transform the data.

Note: The transform method is not available until this operator is trained.

Once this method is available, it will have the following signature:

Parameters

X (union type) –

Document word matrix.

  • array of items : Any

  • or array of items : array of items : float

Returns

result – Document topic distribution for X.

Return type

Any