lale.lib.sklearn.decision_tree_classifier module¶

class lale.lib.sklearn.decision_tree_classifier.DecisionTreeClassifier(*, criterion='gini', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0, class_weight, ccp_alpha=0.0, monotonic_cst=None)¶

Bases: PlannedIndividualOp

Decision tree classifier from scikit-learn.

This documentation is auto-generated from JSON schemas.

Parameters

criterion (‘gini’ or ‘entropy’, optional, default ‘gini’) – The function to measure the quality of a split.
splitter (‘best’ or ‘random’, optional, default ‘best’) – The strategy used to choose the split at each node.
max_depth (union type, optional, default None) –
The maximum depth of the tree.
- integer, >=1, >=3 for optimizer, <=5 for optimizer
- or None
  
  Nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
min_samples_split (union type, optional, default 2) –
The minimum number of samples required to split an internal node.
- integer, >=2, <=’X/maxItems’, not for optimizer
  
  Consider min_samples_split as the minimum number.
- or float, >0.0, >=0.01 for optimizer, <=1.0, <=0.5 for optimizer
  
  min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
min_samples_leaf (union type, optional, default 1) –
The minimum number of samples required to be at a leaf node.
- integer, >=1, <=’X/maxItems’, not for optimizer
  
  Consider min_samples_leaf as the minimum number.
- or float, >0.0, <=0.5, default 0.05
  
  min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.
min_weight_fraction_leaf (float, >=0.0, <=0.5, optional, not for optimizer, default 0.0) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.
max_features (union type, optional, default None) –
The number of features to consider when looking for the best split.
- integer, >=2, <=’X/items/maxItems’, not for optimizer
  
  Consider max_features features at each split.
- or float, >0.0, >=0.01 for optimizer, <=1.0, uniform distribution, default 0.5
  
  max_features is a fraction and int(max_features * n_features) features are considered at each split.
- or ‘sqrt’, ‘log2’, or None
random_state (union type, optional, not for optimizer, default None) –
Seed of pseudo-random number generator.
- numpy.random.RandomState
- or None
  
  RandomState used by np.random
- or integer
  
  Explicit seed.
max_leaf_nodes (union type, optional, not for optimizer, default None) –
Grow a tree with max_leaf_nodes in best-first fashion.
- integer, >=1, >=3 for optimizer, <=1000 for optimizer
- or None
  
  Unlimited number of leaf nodes.
min_impurity_decrease (float, >=0.0, <=10.0 for optimizer, optional, not for optimizer, default 0.0) – A node will be split if this split induces a decrease of the impurity greater than or equal to this value.
class_weight (union type, not for optimizer) –
Weights associated with classes in the form {class_label: weight}.
- dict
- or array of items : dict
- or ‘balanced’ or None
ccp_alpha (float, >=0.0, <=0.1 for optimizer, optional, not for optimizer, default 0.0) – Complexity parameter used for Minimal Cost-Complexity Pruning. The subtree with the largest cost complexity that is smaller than ccp_alpha will be chosen. By default, no pruning is performed.
monotonic_cst (union type, optional, not for optimizer, default None) –
Indicates the monotonicity constraint to enforce on each feature. Monotonicity constraints are not supported for: multioutput regressions (i.e. when n_outputs > 1),

regressions trained on data with missing values.
- array of items : -1, 0, or 1
  
  array-like of int of shape (n_features)
- or None
  
  No constraints are applied.

fit(X, y=None, **fit_params)¶

Train the operator.

Note: The fit method is not available until this operator is trainable.

Once this method is available, it will have the following signature:

Parameters

X (array) –
The outer array is over samples aka rows.
- items : array of items : float
  
  The inner array is over features aka columns.
y (union type) –
The predicted classes.
- array of items : float
- or array of items : string
- or array of items : boolean
sample_weight (union type, optional) –
Sample weights.
- array of items : float
- or None
  
  Samples are equally weighted.
check_input (boolean, optional, default True) – Allow to bypass several input checking.
X_idx_sorted (union type, optional, default None) –
The indexes of the sorted training input samples. If many tree
- array of items : array of items : float
- or None

predict(X, **predict_params)¶

Make predictions.

Note: The predict method is not available until this operator is trained.

Once this method is available, it will have the following signature:

Parameters

X (array, optional) –
The outer array is over samples aka rows.
- items : array of items : float
  
  The inner array is over features aka columns.
check_input (boolean, optional, default True) – Allow to bypass several input checking.

Returns

result – The predicted classes.

array of items : float
or array of items : string
or array of items : boolean

Return type

union type

predict_proba(X)¶

Probability estimates for all classes.

Note: The predict_proba method is not available until this operator is trained.

Once this method is available, it will have the following signature:

Parameters

X (array, optional) –
The outer array is over samples aka rows.
- items : array of items : float
  
  The inner array is over features aka columns.
check_input (boolean, optional) – Run check_array on X.

Returns

result – The outer array is over samples aka rows.

items : array of items : float

The inner array has items corresponding to each class.

Return type

array