lale.lib.lightgbm.lgbm_regressor module

class lale.lib.lightgbm.lgbm_regressor.LGBMRegressor(*, boosting_type='gbdt', num_leaves=31, max_depth=-1, learning_rate=0.1, n_estimators=200, subsample_for_bin=200000, objective=None, class_weight=None, min_split_gain=0.0, min_child_weight=0.001, min_child_samples=20, subsample=1.0, subsample_freq=0, colsample_bytree=1.0, reg_alpha=0.0, reg_lambda=0.0, random_state=None, n_jobs=-1, silent='warn', importance_type='split', n_job=None)

Bases: PlannedIndividualOp

Combined schema for expected data and hyperparameters.

This documentation is auto-generated from JSON schemas.

Parameters
  • boosting_type (union type, optional, default 'gbdt') –

    • ‘gbdt’

      Traditional Gradient Boosting Decision Tree.

    • or ‘dart’

      Dropouts meet Multiple Additive Regression Trees.

    • or ‘goss’, not for optimizer

      Gradient-based One-Side Sampling.

    • or ‘rf’, not for optimizer

      Random Forest.

    See also constraint-1, constraint-2.

  • num_leaves (union type, optional, default 31) –

    Maximum tree leaves for base learners

    • integer, not for optimizer

    • or 2, 4, 8, 32, 64, 128, or 16

  • max_depth (union type, optional, not for optimizer, default -1 of integer, >=3 for optimizer, <=5 for optimizer) – Maximum tree depth for base learners, <=0 means no limit

  • learning_rate (float, >=0.02 for optimizer, <=1.0 for optimizer, loguniform distribution, optional, default 0.1) – Boosting learning rate.

  • n_estimators (integer, >=50 for optimizer, <=1000 for optimizer, uniform distribution, optional, default 200) – Number of boosted trees to fit.

  • subsample_for_bin (integer, optional, not for optimizer, default 200000) – Number of samples for constructing bins.

  • objective (union type, optional, not for optimizer, default None) –

    Specify the learning task and the corresponding learning objective or a custom objective function to be used

    • dict

    • or ‘regression’ or None

  • class_weight (union type, optional, not for optimizer, default None) –

    Weights associated with classes

    • dict

    • or ‘balanced’ or None

  • min_split_gain (float, optional, not for optimizer, default 0.0) – Minimum loss reduction required to make a further partition on a leaf node of the tree.

  • min_child_weight (float, >=0.0001 for optimizer, <=0.01 for optimizer, optional, default 0.001) – Minimum sum of instance weight (hessian) needed in a child (leaf).

  • min_child_samples (integer, >=5 for optimizer, <=30 for optimizer, uniform distribution, optional, default 20) – Minimum number of data needed in a child (leaf).

  • subsample (float, >=0.01 for optimizer, <=1.0 for optimizer, uniform distribution, optional, default 1.0) –

    Subsample ratio of the training instance.

    See also constraint-2.

  • subsample_freq (integer, >=0 for optimizer, <=5 for optimizer, uniform distribution, optional, default 0) –

    Frequence of subsample, <=0 means no enable.

    See also constraint-2.

  • colsample_bytree (float, >=0.01 for optimizer, <=1.0 for optimizer, optional, default 1.0) – Subsample ratio of columns when constructing each tree.

  • reg_alpha (float, >=0.0 for optimizer, <=1.0 for optimizer, optional, default 0.0) – L1 regularization term on weights.

  • reg_lambda (float, >=0.0 for optimizer, <=1.0 for optimizer, optional, default 0.0) – L2 regularization term on weights.

  • random_state (union type, optional, not for optimizer, default None) –

    Random number seed. If None, default seeds in C++ code will be used.

    • integer

    • or numpy.random.RandomState

    • or None

  • n_jobs (integer, optional, not for optimizer, default -1) – Number of parallel threads.

  • silent (union type, optional, not for optimizer, default 'warn') –

    Whether to print messages while running boosting.

    • ’warn’

    • or boolean

  • importance_type (‘split’ or ‘gain’, optional, not for optimizer, default ‘split’) – The type of feature importance to be filled into feature_importances_.

  • n_job (union type, optional, not for optimizer, default None) –

    Number of parallel threads to use for training (can be changed at prediction time by passing it as an extra keyword argument). For better performance, it is recommended to set this to the number of physical cores in the CPU. Negative integers are interpreted as following joblib’s formula (n_cpus + 1 + n_jobs), just like scikit-learn (so e.g. -1 means using all threads). A value of zero corresponds the default number of threads configured for OpenMP in the system.

    • integer

      Number of parallel threads.

    • or None

      Use the number of physical cores in the system (its correct detection requires either the joblib or the psutil util libraries to be installed).

Notes

constraint-1 : union type

boosting_type rf needs bagging (which means subsample_freq > 0 and subsample < 1.0)

  • boosting_type : negated type of ‘rf’

  • or intersection type

    • dict of subsample_freq : negated type of 0

    • and dict of subsample : negated type of 1.0

constraint-2 : union type

boosting_type goss cannot use bagging (which means subsample_freq = 0 and subsample = 1.0)

  • boosting_type : negated type of ‘goss’

  • or subsample_freq : 0

  • or subsample : 1.0

fit(X, y=None, **fit_params)

Train the operator.

Note: The fit method is not available until this operator is trainable.

Once this method is available, it will have the following signature:

Parameters
  • X (array of items : array of items : float) – The input samples. Internally, it will be converted to

  • y (array of items : float) – Target values real numbers

  • sample_weight (union type, optional, default None) –

    Weights of training data.

    • array of items : float

    • or None

  • init_score (union type, optional, default None) –

    Init score of training data.

    • array of items : float

    • or None

  • group (any type, optional, default None) – Group data of training data.

  • eval_set (any type, optional, default None) – A list of (X, y) tuple pairs to use as validation sets.

  • eval_names (any type, optional, default None) – Names of eval_set.

  • eval_sample_weight (any type, optional, default None) – Weights of eval data.

  • eval_class_weight (union type, optional, default None) –

    Class weights of eval data.

    • array of items : float

    • or None

  • eval_init_score (any type, optional, default None) – Init score of eval data.

  • eval_group (any type, optional, default None) – Group data of eval data.

  • eval_metric (union type, optional, default None) –

    string, list of strings, callable or None, optional (default=None).

    • array of items : string

    • or ‘l2’ or None

    • or callable

  • early_stopping_rounds (union type, optional, default None) –

    Activates early stopping. The model will train until the validation score stops improving.

    • integer

    • or None

  • verbose (union type, optional, default True) –

    Requires at least one evaluation data.

    • boolean

    • or integer

  • feature_name (union type, optional, default 'auto') –

    Feature names. If ‘auto’ and data is pandas DataFrame, data columns names are used.

    • array of items : string

    • or ‘auto’

  • categorical_feature (union type, optional, default 'auto') –

    Categorical features. If list of int, interpreted as indices. If list of strings, interpreted as feature names.

    • array

      • items : union type

        • string

        • or integer

    • or ‘auto’

  • callbacks (union type, optional, default None) –

    List of callback functions that are applied at each iteration.

    • array of items : dict

    • or None

partial_fit(X, y=None, **fit_params)

Incremental fit to train train the operator on a batch of samples.

Note: The partial_fit method is not available until this operator is trainable.

Once this method is available, it will have the following signature:

Parameters
  • X (array of items : array of items : float) – The input samples. Internally, it will be converted to

  • y (array of items : float) – Target values real numbers

  • sample_weight (union type, optional, default None) –

    Weights of training data.

    • array of items : float

    • or None

  • init_score (union type, optional, default None) –

    Init score of training data.

    • array of items : float

    • or None

  • group (any type, optional, default None) – Group data of training data.

  • eval_set (any type, optional, default None) – A list of (X, y) tuple pairs to use as validation sets.

  • eval_names (any type, optional, default None) – Names of eval_set.

  • eval_sample_weight (any type, optional, default None) – Weights of eval data.

  • eval_class_weight (union type, optional, default None) –

    Class weights of eval data.

    • array of items : float

    • or None

  • eval_init_score (any type, optional, default None) – Init score of eval data.

  • eval_group (any type, optional, default None) – Group data of eval data.

  • eval_metric (union type, optional, default None) –

    string, list of strings, callable or None, optional (default=None).

    • array of items : string

    • or ‘l2’ or None

    • or callable

  • early_stopping_rounds (union type, optional, default None) –

    Activates early stopping. The model will train until the validation score stops improving.

    • integer

    • or None

  • verbose (union type, optional, default True) –

    Requires at least one evaluation data.

    • boolean

    • or integer

  • feature_name (union type, optional, default 'auto') –

    Feature names. If ‘auto’ and data is pandas DataFrame, data columns names are used.

    • array of items : string

    • or ‘auto’

  • categorical_feature (union type, optional, default 'auto') –

    Categorical features. If list of int, interpreted as indices. If list of strings, interpreted as feature names.

    • array

      • items : union type

        • string

        • or integer

    • or ‘auto’

  • callbacks (union type, optional, default None) –

    List of callback functions that are applied at each iteration.

    • array of items : dict

    • or None

predict(X, **predict_params)

Make predictions.

Note: The predict method is not available until this operator is trained.

Once this method is available, it will have the following signature:

Parameters
  • X (array, optional of items : array of items : float) – Input features matrix.

  • raw_score (boolean, optional, default False) – Whether to predict raw scores.

  • num_iteration (union type, optional, default None) –

    Limit number of iterations in the prediction.

    • integer

    • or None

  • pred_leaf (boolean, optional, default False) – Whether to predict leaf index.

  • pred_contrib (boolean, optional, default False) – Whether to predict feature contributions.

Returns

result – Return the predicted value for each sample.

Return type

array of items : float