lale.lib.sklearn.gradient_boosting_classifier module

class lale.lib.sklearn.gradient_boosting_classifier.GradientBoostingClassifier(*, loss='log_loss', learning_rate=0.1, n_estimators=100, subsample=1.0, criterion='friedman_mse', min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_depth=3, min_impurity_decrease=0.0, init=None, random_state=None, max_features=None, verbose=0, max_leaf_nodes=None, warm_start=False, validation_fraction=0.1, n_iter_no_change=None, tol=0.0001, ccp_alpha=0.0)

Bases: PlannedIndividualOp

Gradient boosting classifier random forest from scikit-learn.

This documentation is auto-generated from JSON schemas.

Parameters
  • loss (‘log_loss’ or ‘exponential’, optional, default ‘log_loss’) – TThe loss function to be optimized. ‘log_loss’ refers to binomial and multinomial deviance, the same as used in logistic regression. It is a good choice for classification with probabilistic outputs. For loss ‘exponential’, gradient boosting recovers the AdaBoost algorithm.

  • learning_rate (float, >=0.01 for optimizer, <=1.0 for optimizer, loguniform distribution, optional, not for optimizer, default 0.1) – learning rate shrinks the contribution of each tree by learning_rate.

  • n_estimators (integer, >=1, >=10 for optimizer, <=100 for optimizer, uniform distribution, optional, default 100) – The number of boosting stages to perform.

  • subsample (float, >0.0, >=0.01 for optimizer, <=1.0, <=1.0 for optimizer, uniform distribution, optional, not for optimizer, default 1.0) – The fraction of samples to be used for fitting the individual base learners.

  • criterion (union type, optional, not for optimizer, default ‘friedman_mse’ of ‘squared_error’ or ‘friedman_mse’) – The function to measure the quality of a split. Supported criteria are friedman_mse for the mean squared error with improvement score by Friedman, squared_error for mean squared error. The default value of friedman_mse is generally the best as it can provide a better approximation in some cases.

  • min_samples_split (union type, optional, default 2) –

    The minimum number of samples required to split an internal node:

    • integer, >=2, uniform distribution, not for optimizer

    • or float, >0.0, >=0.01 for optimizer, <=1.0, <=0.5 for optimizer, default 0.05

  • min_samples_leaf (union type, optional, default 1) –

    The minimum number of samples required to be at a leaf node.

    • integer, >=1, not for optimizer

    • or float, >0.0, >=0.01 for optimizer, <=0.5, default 0.05

  • min_weight_fraction_leaf (float, >=0.0, <=0.5, optional, not for optimizer, default 0.0) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.

  • max_depth (integer, >=3 for optimizer, <=5 for optimizer, optional, default 3) – Maximum depth of the individual regression estimators.

  • min_impurity_decrease (float, >=0.0, <=10.0 for optimizer, optional, not for optimizer, default 0.0) – A node will be split if this split induces a decrease of the impurity greater than or equal to this value.

  • init (union type, not for optimizer, default None) –

    An estimator object that is used to compute the initial predictions.

    • operator

    • or ‘zero’ or None

  • random_state (union type, optional, not for optimizer, default None) –

    If int, random_state is the seed used by the random number generator;

    • integer

    • or numpy.random.RandomState

    • or None

  • max_features (union type, optional, default None) –

    The number of features to consider when looking for the best split.

    • integer, >=2, <=’X/items/maxItems’, not for optimizer

      Consider max_features features at each split.

    • or float, >0.0, >=0.01 for optimizer, <1.0, uniform distribution, default 0.5

    • or ‘auto’, ‘sqrt’, ‘log2’, or None

  • verbose (integer, optional, not for optimizer, default 0) – Enable verbose output. If 1 then it prints progress and performance

  • max_leaf_nodes (union type, optional, not for optimizer, default None) –

    Grow trees with max_leaf_nodes in best-first fashion.

    • integer, >=1, >=3 for optimizer, <=1000 for optimizer

    • or None

      Unlimited number of leaf nodes.

  • warm_start (boolean, optional, not for optimizer, default False) – When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just erase the previous solution.

  • validation_fraction (float, >=0.0, <=1.0, optional, not for optimizer, default 0.1) – The proportion of training data to set aside as validation set for early stopping.

  • n_iter_no_change (union type, optional, not for optimizer, default None) –

    n_iter_no_change is used to decide if early stopping will be used

    • integer, >=5 for optimizer, <=10 for optimizer

    • or None

  • tol (float, >=1e-08 for optimizer, <=0.01 for optimizer, optional, not for optimizer, default 0.0001) – Tolerance for the early stopping. When the loss is not improving

  • ccp_alpha (float, >=0.0, <=0.1 for optimizer, optional, not for optimizer, default 0.0) – Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller than ccp_alpha will be chosen. By default, no pruning is performed.

decision_function(X)

Confidence scores for all classes.

Note: The decision_function method is not available until this operator is trained.

Once this method is available, it will have the following signature:

Parameters

X (array of items : array of items : float) – Features; the outer array is over samples.

Returns

result – Confidence scores for samples for each class in the model.

  • array of items : array of items : float

    In the multi-way case, score per (sample, class) combination.

  • or array of items : float

    In the binary case, score for self._classes[1].

Return type

union type

fit(X, y=None, **fit_params)

Train the operator.

Note: The fit method is not available until this operator is trainable.

Once this method is available, it will have the following signature:

Parameters
  • X (array of items : array of items : float) – The input samples. Internally, it will be converted to

  • y (union type) –

    Target values (strings or integers in classification, real numbers

    • array of items : float

    • or array of items : string

    • or array of items : boolean

  • sample_weight (union type, optional, default None) –

    Sample weights. If None, then samples are equally weighted. Splits

    • array of items : float

    • or None

  • monitor (union type, optional, default None) –

    The monitor is called after each iteration with the current the current iteration, a reference to the estimator and the local variables of _fit_stages as keyword arguments callable(i, self, locals()).

    • callable

    • or None

predict(X, **predict_params)

Make predictions.

Note: The predict method is not available until this operator is trained.

Once this method is available, it will have the following signature:

Parameters

X (array, optional of items : array of items : float) – The input samples. Internally, it will be converted to

Returns

result – The predicted values.

  • array of items : float

  • or array of items : string

  • or array of items : boolean

Return type

union type

predict_proba(X)

Probability estimates for all classes.

Note: The predict_proba method is not available until this operator is trained.

Once this method is available, it will have the following signature:

Parameters

X (array, optional of items : array of items : float) – The input samples. Internally, it will be converted to

Returns

result – The class probabilities of the input samples. The order of the

Return type

array of items : array of items : float