lale.lib.xgboost.xgb_regressor module

class lale.lib.xgboost.xgb_regressor.XGBRegressor(*, max_depth=None, learning_rate=None, n_estimators, verbosity=None, silent=None, objective='reg:linear', booster=None, tree_method=None, n_jobs=1, nthread=None, gamma=None, min_child_weight=None, max_delta_step=None, subsample=None, colsample_bytree=None, colsample_bylevel=None, colsample_bynode=None, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, base_score=None, random_state=0, missing=nan, importance_type='gain', seed=None, monotone_constraints=None, interaction_constraints=None, num_parallel_tree=None, validate_parameters=None, gpu_id=None, enable_categorical=False, predictor=None, max_leaves=None, max_bin=None, grow_policy=None, sampling_method=None, max_cat_to_onehot=None, eval_metric=None, early_stopping_rounds=None, callbacks=None, feature_types, max_cat_threshold=None, device=None, multi_strategy=None)

Bases: PlannedIndividualOp

XGBRegressor gradient boosted decision trees.

This documentation is auto-generated from JSON schemas.

Parameters
  • max_depth (union type, default None) –

    Maximum tree depth for base learners.

    • integer, >=0, >=1 for optimizer, <=7 for optimizer, uniform distribution

    • or None, not for optimizer

  • learning_rate (union type, default None) –

    Boosting learning rate (xgb’s “eta”)

    • float, >=0.02 for optimizer, <=1 for optimizer, loguniform distribution

    • or None, not for optimizer

  • n_estimators (union type) –

    Number of trees to fit.

    • integer, >=50 for optimizer, <=1000 for optimizer, default 200

    • or None

  • verbosity (union type, >=0, <=3, not for optimizer, default None) –

    The degree of verbosity.

    • integer

    • or None

  • silent (union type, optional, not for optimizer, default None) –

    Deprecated and replaced with verbosity, but adding to be backward compatible.

    • boolean

    • or None

  • objective (union type, not for optimizer, default 'reg:linear') –

    Specify the learning task and the corresponding learning objective or a custom objective function to be used.

    • ’reg:linear’, ‘reg:logistic’, ‘reg:gamma’, ‘reg:tweedie’, or ‘reg:squarederror’

    • or callable

  • booster (‘gbtree’, ‘gblinear’, ‘dart’, or None, not for optimizer, default None) – Specify which booster to use.

  • tree_method (‘auto’, ‘exact’, ‘approx’, ‘hist’, ‘gpu_hist’, or None, not for optimizer, default None) – Specify which tree method to use. Default to auto. If this parameter is set to default, XGBoost will choose the most conservative option available. Refer to https://xgboost.readthedocs.io/en/latest/parameter.html.

  • n_jobs (union type, not for optimizer, default 1) –

    Number of parallel threads used to run xgboost. (replaces nthread)

    • integer

    • or None

  • nthread (union type, optional, not for optimizer, default None) –

    Number of parallel threads used to run xgboost. Deprecated, please use n_jobs

    • integer

    • or None

  • gamma (union type, default None) –

    Minimum loss reduction required to make a further partition on a leaf node of the tree.

    • float, >=0, <=1.0 for optimizer

    • or None, not for optimizer

  • min_child_weight (union type, default None) –

    Minimum sum of instance weight(hessian) needed in a child.

    • integer, >=2 for optimizer, <=20 for optimizer, uniform distribution

    • or None, not for optimizer

  • max_delta_step (union type, not for optimizer, default None) –

    Maximum delta step we allow each tree’s weight estimation to be.

    • None

    • or integer

  • subsample (union type, default None) –

    Subsample ratio of the training instance.

    • float, >0, >=0.01 for optimizer, <=1.0 for optimizer, uniform distribution

    • or None, not for optimizer

  • colsample_bytree (union type, not for optimizer, default None) –

    Subsample ratio of columns when constructing each tree.

    • float, >0, >=0.1 for optimizer, <=1, <=1.0 for optimizer, uniform distribution

    • or None, not for optimizer

  • colsample_bylevel (union type, not for optimizer, default None) –

    Subsample ratio of columns for each split, in each level.

    • float, >0, >=0.1 for optimizer, <=1, <=1.0 for optimizer, uniform distribution

    • or None, not for optimizer

  • colsample_bynode (union type, not for optimizer, default None) –

    Subsample ratio of columns for each split.

    • float, >0, <=1

    • or None, not for optimizer

  • reg_alpha (union type, default None) –

    L1 regularization term on weights

    • float, >=0 for optimizer, <=1 for optimizer, uniform distribution

    • or None, not for optimizer

  • reg_lambda (union type, default None) –

    L2 regularization term on weights

    • float, >=0.1 for optimizer, <=1 for optimizer, uniform distribution

    • or None, not for optimizer

  • scale_pos_weight (union type, not for optimizer, default None) –

    Balancing of positive and negative weights.

    • float

    • or None, not for optimizer

  • base_score (union type, not for optimizer, default None) –

    The initial prediction score of all instances, global bias.

    • float

    • or None, not for optimizer

  • random_state (union type, not for optimizer, default 0) –

    Random number seed. (replaces seed)

    • integer

    • or None

  • missing (union type, not for optimizer, default nan) –

    Value in the data which needs to be present as a missing value. If If None, defaults to np.nan.

    • float

    • or None or nan

  • importance_type (‘gain’, ‘weight’, ‘cover’, ‘total_gain’, ‘total_cover’, or None, optional, not for optimizer, default ‘gain’) – The feature importance type for the feature_importances_ property.

  • seed (any type, optional, not for optimizer, default None) – deprecated and replaced with random_state, but adding to be backward compatible.

  • monotone_constraints (union type, optional, not for optimizer, default None) –

    Constraint of variable monotonicity.

    • None

    • or string

  • interaction_constraints (union type, optional, not for optimizer, default None) –

    Constraints for interaction representing permitted interactions. The constraints must be specified in the form of a nest list, e.g. [[0, 1], [2, 3, 4]], where each inner list is a group of indices of features that are allowed to interact with each other.

    • None

    • or string

  • num_parallel_tree (union type, optional, not for optimizer, default None) –

    Used for boosting random forest.

    • None

    • or integer

  • validate_parameters (union type, optional, not for optimizer, default None) –

    Give warnings for unknown parameter.

    • None

    • or boolean

    • or integer

  • gpu_id (union type, optional, not for optimizer, default None) –

    Device ordinal.

    • integer

    • or None

  • enable_categorical (boolean, optional, not for optimizer, default False) – Experimental support for categorical data. Do not set to true unless you are interested in development. Only valid when gpu_hist and dataframe are used.

  • predictor (union type, optional, not for optimizer, default None) –

    Force XGBoost to use specific predictor, available choices are [cpu_predictor, gpu_predictor].

    • string

    • or None

  • max_leaves (union type, optional, not for optimizer, default None) –

    Maximum number of leaves; 0 indicates no limit.

    • integer

    • or None, not for optimizer

  • max_bin (union type, optional, not for optimizer, default None) –

    If using histogram-based algorithm, maximum number of bins per feature.

    • integer

    • or None, not for optimizer

  • grow_policy (0, 1, ‘depthwise’, ‘lossguide’, or None, optional, not for optimizer, default None) –

    Tree growing policy.

    0 or depthwise: favor splitting at nodes closest to the node, i.e. grow depth-wise. 1 or lossguide: favor splitting at nodes with highest loss change.

  • sampling_method (‘uniform’, ‘gadient_based’, or None, optional, not for optimizer, default None) –

    Sampling method. Used only by gpu_hist tree method.
    • uniform: select random training instances uniformly.

    • gradient_based select random training instances with higher probability when the gradient and hessian are larger. (cf. CatBoost)

  • max_cat_to_onehot (union type, optional, not for optimizer, default None) –

    A threshold for deciding whether XGBoost should use

    one-hot encoding based split for categorical data.

    • integer

    • or None

  • eval_metric (union type, optional, not for optimizer, default None) –

    Metric used for monitoring the training result and early stopping.

    • string

    • or array of items : string

    • or array of items : callable

    • or None

  • early_stopping_rounds (union type, optional, not for optimizer, default None) –

    Activates early stopping.

    Validation metric needs to improve at least once in every early_stopping_rounds round(s) to continue training.

    • integer

    • or None

  • callbacks (union type, optional, not for optimizer, default None) –

    List of callback functions that are applied at end of each iteration.

    It is possible to use predefined callbacks by using Callback API.

    • array of items : callable

    • or None

  • feature_types (Any, optional, not for optimizer) – Used for specifying feature types without constructing a dataframe. See DMatrix for details.

  • max_cat_threshold (union type, optional, not for optimizer, default None) –

    Maximum number of categories considered for each split.

    Used only by partition-based splits for preventing over-fitting. Also, enable_categorical needs to be set to have categorical feature support. See Categorical Data and Parameters for Categorical Feature for details.

    • integer, >=0, >=1 for optimizer, <=10 for optimizer, uniform distribution

    • or None

  • device (union type, optional, not for optimizer, default None) –

    Device ordinal

    • ’cpu’, ‘cuda’, or ‘gpu’

    • or None

  • multi_strategy (union type, optional, not for optimizer, default None) –

    The strategy used for training multi-target models,

    including multi-target regression and multi-class classification. See Multiple Outputs for more information.

    • ’one_output_per_tree’

      One model for each target.

    • or ‘multi_output_tree’

      Use multi-target trees.

    • or None

fit(X, y=None, **fit_params)

Train the operator.

Note: The fit method is not available until this operator is trainable.

Once this method is available, it will have the following signature:

Parameters
  • X (array of items : array of items : float) – Feature matrix

  • y (array of items : float) – Labels

  • sample_weight (union type, optional, default None) –

    Weight for each instance

    • array of items : float

    • or None

  • eval_set (union type, optional, default None) –

    A list of (X, y) pairs to use as a validation set for

    • array

    • or None

  • sample_weight_eval_set (union type, optional, default None) –

    A list of the form [L_1, L_2, …, L_n], where each L_i is a list of

    • array

    • or None

  • eval_metric (union type, optional, default None) –

    If a str, should be a built-in evaluation metric to use. See

    • array of items : string

    • or string

    • or None

    • or dict

  • early_stopping_rounds (union type, optional, default None) –

    Activates early stopping. Validation error needs to decrease at

    • integer

    • or None

  • verbose (boolean, optional, default True) – If verbose and an evaluation set is used, writes the evaluation

  • xgb_model (union type, optional, default None) –

    file name of stored xgb model or ‘Booster’ instance Xgb model to be

    • string

    • or None

  • callbacks (union type, optional, default None) –

    List of callback functions that are applied at each iteration.

    • array of items : dict

    • or None

partial_fit(X, y=None, **fit_params)

Incremental fit to train train the operator on a batch of samples.

Note: The partial_fit method is not available until this operator is trainable.

Once this method is available, it will have the following signature:

Parameters
  • X (array of items : array of items : float) – Feature matrix

  • y (array of items : float) – Labels

  • sample_weight (union type, optional, default None) –

    Weight for each instance

    • array of items : float

    • or None

  • eval_set (union type, optional, default None) –

    A list of (X, y) pairs to use as a validation set for

    • array

    • or None

  • sample_weight_eval_set (union type, optional, default None) –

    A list of the form [L_1, L_2, …, L_n], where each L_i is a list of

    • array

    • or None

  • eval_metric (union type, optional, default None) –

    If a str, should be a built-in evaluation metric to use. See

    • array of items : string

    • or string

    • or None

    • or dict

  • early_stopping_rounds (union type, optional, default None) –

    Activates early stopping. Validation error needs to decrease at

    • integer

    • or None

  • verbose (boolean, optional, default True) – If verbose and an evaluation set is used, writes the evaluation

  • xgb_model (union type, optional, default None) –

    file name of stored xgb model or ‘Booster’ instance Xgb model to be

    • string

    • or None

  • callbacks (union type, optional, default None) –

    List of callback functions that are applied at each iteration.

    • array of items : dict

    • or None

predict(X, **predict_params)

Make predictions.

Note: The predict method is not available until this operator is trained.

Once this method is available, it will have the following signature:

Parameters
  • X (array of items : array of items : float) – The dmatrix storing the input.

  • output_margin (boolean, optional, default False) – Whether to output the raw untransformed margin value.

  • ntree_limit (union type, optional) –

    Limit number of trees in the prediction; defaults to best_ntree_limit if defined

    • integer

    • or None

  • validate_features (boolean, optional, default True) – When this is True, validate that the Booster’s and data’s feature_names are identical.

Returns

result – Output data schema for predictions (target class labels).

Return type

array of items : float