lale.lib.xgboost.xgb_regressor module¶

class lale.lib.xgboost.xgb_regressor.XGBRegressor(*, max_depth=None, learning_rate=None, n_estimators, verbosity=None, silent=None, objective='reg:linear', booster=None, tree_method=None, n_jobs=1, nthread=None, gamma=None, min_child_weight=None, max_delta_step=None, subsample=None, colsample_bytree=None, colsample_bylevel=None, colsample_bynode=None, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, base_score=None, random_state=0, missing=nan, importance_type='gain', seed=None, monotone_constraints=None, interaction_constraints=None, num_parallel_tree=None, validate_parameters=None, gpu_id=None, enable_categorical=False, predictor=None, max_leaves=None, max_bin=None, grow_policy=None, sampling_method=None, max_cat_to_onehot=None, eval_metric=None, early_stopping_rounds=None, callbacks=None, feature_types, max_cat_threshold=None, device=None, multi_strategy=None)¶

Bases: PlannedIndividualOp

XGBRegressor gradient boosted decision trees.

This documentation is auto-generated from JSON schemas.

Parameters

max_depth (union type, default None) –
Maximum tree depth for base learners.
- integer, >=0, >=1 for optimizer, <=7 for optimizer, uniform distribution
- or None, not for optimizer
learning_rate (union type, default None) –
Boosting learning rate (xgb’s “eta”)
- float, >=0.02 for optimizer, <=1 for optimizer, loguniform distribution
- or None, not for optimizer
n_estimators (union type) –
Number of trees to fit.
- integer, >=50 for optimizer, <=1000 for optimizer, default 200
- or None
verbosity (union type, >=0, <=3, not for optimizer, default None) –
The degree of verbosity.
- integer
- or None
silent (union type, optional, not for optimizer, default None) –
Deprecated and replaced with verbosity, but adding to be backward compatible.
- boolean
- or None
objective (union type, not for optimizer, default 'reg:linear') –
Specify the learning task and the corresponding learning objective or a custom objective function to be used.
- ’reg:linear’, ‘reg:logistic’, ‘reg:gamma’, ‘reg:tweedie’, or ‘reg:squarederror’
- or callable
booster (‘gbtree’, ‘gblinear’, ‘dart’, or None, not for optimizer, default None) – Specify which booster to use.
tree_method (‘auto’, ‘exact’, ‘approx’, ‘hist’, ‘gpu_hist’, or None, not for optimizer, default None) – Specify which tree method to use. Default to auto. If this parameter is set to default, XGBoost will choose the most conservative option available. Refer to https://xgboost.readthedocs.io/en/latest/parameter.html.
n_jobs (union type, not for optimizer, default 1) –
Number of parallel threads used to run xgboost. (replaces nthread)
- integer
- or None
nthread (union type, optional, not for optimizer, default None) –
Number of parallel threads used to run xgboost. Deprecated, please use n_jobs
- integer
- or None
gamma (union type, default None) –
Minimum loss reduction required to make a further partition on a leaf node of the tree.
- float, >=0, <=1.0 for optimizer
- or None, not for optimizer
min_child_weight (union type, default None) –
Minimum sum of instance weight(hessian) needed in a child.
- integer, >=2 for optimizer, <=20 for optimizer, uniform distribution
- or None, not for optimizer
max_delta_step (union type, not for optimizer, default None) –
Maximum delta step we allow each tree’s weight estimation to be.
- None
- or integer
subsample (union type, default None) –
Subsample ratio of the training instance.
- float, >0, >=0.01 for optimizer, <=1.0 for optimizer, uniform distribution
- or None, not for optimizer
colsample_bytree (union type, not for optimizer, default None) –
Subsample ratio of columns when constructing each tree.
- float, >0, >=0.1 for optimizer, <=1, <=1.0 for optimizer, uniform distribution
- or None, not for optimizer
colsample_bylevel (union type, not for optimizer, default None) –
Subsample ratio of columns for each split, in each level.
- float, >0, >=0.1 for optimizer, <=1, <=1.0 for optimizer, uniform distribution
- or None, not for optimizer
colsample_bynode (union type, not for optimizer, default None) –
Subsample ratio of columns for each split.
- float, >0, <=1
- or None, not for optimizer
reg_alpha (union type, default None) –
L1 regularization term on weights
- float, >=0 for optimizer, <=1 for optimizer, uniform distribution
- or None, not for optimizer
reg_lambda (union type, default None) –
L2 regularization term on weights
- float, >=0.1 for optimizer, <=1 for optimizer, uniform distribution
- or None, not for optimizer
scale_pos_weight (union type, not for optimizer, default None) –
Balancing of positive and negative weights.
- float
- or None, not for optimizer
base_score (union type, not for optimizer, default None) –
The initial prediction score of all instances, global bias.
- float
- or None, not for optimizer
random_state (union type, not for optimizer, default 0) –
Random number seed. (replaces seed)
- integer
- or None
missing (union type, not for optimizer, default nan) –
Value in the data which needs to be present as a missing value. If If None, defaults to np.nan.
- float
- or None or nan
importance_type (‘gain’, ‘weight’, ‘cover’, ‘total_gain’, ‘total_cover’, or None, optional, not for optimizer, default ‘gain’) – The feature importance type for the feature_importances_ property.
seed (any type, optional, not for optimizer, default None) – deprecated and replaced with random_state, but adding to be backward compatible.
monotone_constraints (union type, optional, not for optimizer, default None) –
Constraint of variable monotonicity.
- None
- or string
interaction_constraints (union type, optional, not for optimizer, default None) –
Constraints for interaction representing permitted interactions. The constraints must be specified in the form of a nest list, e.g. [[0, 1], [2, 3, 4]], where each inner list is a group of indices of features that are allowed to interact with each other.
- None
- or string
num_parallel_tree (union type, optional, not for optimizer, default None) –
Used for boosting random forest.
- None
- or integer
validate_parameters (union type, optional, not for optimizer, default None) –
Give warnings for unknown parameter.
- None
- or boolean
- or integer
gpu_id (union type, optional, not for optimizer, default None) –
Device ordinal.
- integer
- or None
enable_categorical (boolean, optional, not for optimizer, default False) – Experimental support for categorical data. Do not set to true unless you are interested in development. Only valid when gpu_hist and dataframe are used.
predictor (union type, optional, not for optimizer, default None) –
Force XGBoost to use specific predictor, available choices are [cpu_predictor, gpu_predictor].
- string
- or None
max_leaves (union type, optional, not for optimizer, default None) –
Maximum number of leaves; 0 indicates no limit.
- integer
- or None, not for optimizer
max_bin (union type, optional, not for optimizer, default None) –
If using histogram-based algorithm, maximum number of bins per feature.
- integer
- or None, not for optimizer
grow_policy (0, 1, ‘depthwise’, ‘lossguide’, or None, optional, not for optimizer, default None) –

Tree growing policy.
0 or depthwise: favor splitting at nodes closest to the node, i.e. grow depth-wise. 1 or lossguide: favor splitting at nodes with highest loss change.
sampling_method (‘uniform’, ‘gadient_based’, or None, optional, not for optimizer, default None) –
Sampling method. Used only by gpu_hist tree method.
- uniform: select random training instances uniformly.
- gradient_based select random training instances with higher probability when the gradient and hessian are larger. (cf. CatBoost)
max_cat_to_onehot (union type, optional, not for optimizer, default None) –

A threshold for deciding whether XGBoost should use
one-hot encoding based split for categorical data.
- integer
- or None
eval_metric (union type, optional, not for optimizer, default None) –
Metric used for monitoring the training result and early stopping.
- string
- or array of items : string
- or array of items : callable
- or None
early_stopping_rounds (union type, optional, not for optimizer, default None) –

Activates early stopping.
Validation metric needs to improve at least once in every early_stopping_rounds round(s) to continue training.
- integer
- or None
callbacks (union type, optional, not for optimizer, default None) –

List of callback functions that are applied at end of each iteration.
It is possible to use predefined callbacks by using Callback API.
- array of items : callable
- or None
feature_types (Any, optional, not for optimizer) – Used for specifying feature types without constructing a dataframe. See DMatrix for details.
max_cat_threshold (union type, optional, not for optimizer, default None) –

Maximum number of categories considered for each split.
Used only by partition-based splits for preventing over-fitting. Also, enable_categorical needs to be set to have categorical feature support. See Categorical Data and Parameters for Categorical Feature for details.
- integer, >=0, >=1 for optimizer, <=10 for optimizer, uniform distribution
- or None
device (union type, optional, not for optimizer, default None) –
Device ordinal
- ’cpu’, ‘cuda’, or ‘gpu’
- or None
multi_strategy (union type, optional, not for optimizer, default None) –

The strategy used for training multi-target models,
including multi-target regression and multi-class classification. See Multiple Outputs for more information.
- ’one_output_per_tree’
  
  One model for each target.
- or ‘multi_output_tree’
  
  Use multi-target trees.
- or None

fit(X, y=None, **fit_params)¶

Train the operator.

Note: The fit method is not available until this operator is trainable.

Once this method is available, it will have the following signature:

Parameters

X (array of items : array of items : float) – Feature matrix
y (array of items : float) – Labels
sample_weight (union type, optional, default None) –
Weight for each instance
- array of items : float
- or None
eval_set (union type, optional, default None) –
A list of (X, y) pairs to use as a validation set for
- array
- or None
sample_weight_eval_set (union type, optional, default None) –
A list of the form [L_1, L_2, …, L_n], where each L_i is a list of
- array
- or None
eval_metric (union type, optional, default None) –
If a str, should be a built-in evaluation metric to use. See
- array of items : string
- or string
- or None
- or dict
early_stopping_rounds (union type, optional, default None) –
Activates early stopping. Validation error needs to decrease at
- integer
- or None
verbose (boolean, optional, default True) – If verbose and an evaluation set is used, writes the evaluation
xgb_model (union type, optional, default None) –
file name of stored xgb model or ‘Booster’ instance Xgb model to be
- string
- or None
callbacks (union type, optional, default None) –
List of callback functions that are applied at each iteration.
- array of items : dict
- or None

partial_fit(X, y=None, **fit_params)¶

Incremental fit to train train the operator on a batch of samples.

Note: The partial_fit method is not available until this operator is trainable.

Once this method is available, it will have the following signature:

Parameters

X (array of items : array of items : float) – Feature matrix
y (array of items : float) – Labels
sample_weight (union type, optional, default None) –
Weight for each instance
- array of items : float
- or None
eval_set (union type, optional, default None) –
A list of (X, y) pairs to use as a validation set for
- array
- or None
sample_weight_eval_set (union type, optional, default None) –
A list of the form [L_1, L_2, …, L_n], where each L_i is a list of
- array
- or None
eval_metric (union type, optional, default None) –
If a str, should be a built-in evaluation metric to use. See
- array of items : string
- or string
- or None
- or dict
early_stopping_rounds (union type, optional, default None) –
Activates early stopping. Validation error needs to decrease at
- integer
- or None
verbose (boolean, optional, default True) – If verbose and an evaluation set is used, writes the evaluation
xgb_model (union type, optional, default None) –
file name of stored xgb model or ‘Booster’ instance Xgb model to be
- string
- or None
callbacks (union type, optional, default None) –
List of callback functions that are applied at each iteration.
- array of items : dict
- or None

predict(X, **predict_params)¶

Make predictions.

Note: The predict method is not available until this operator is trained.

Once this method is available, it will have the following signature:

Parameters

X (array of items : array of items : float) – The dmatrix storing the input.
output_margin (boolean, optional, default False) – Whether to output the raw untransformed margin value.
ntree_limit (union type, optional) –
Limit number of trees in the prediction; defaults to best_ntree_limit if defined
- integer
- or None
validate_features (boolean, optional, default True) – When this is True, validate that the Booster’s and data’s feature_names are identical.

Returns

result – Output data schema for predictions (target class labels).

Return type

array of items : float