Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- Command-line output:
- ====================
- Decision tree
- Train and evaluate using a decision tree. Given a dataset containing numeric
- features and associated labels for each point in the dataset, this program can
- train a decision tree on that data.
- The training file and associated labels are specified with the
- '--training_file' and '--labels_file' parameters, respectively. The labels
- should be in the range [0, num_classes - 1]. Optionally, if '--labels_file' is
- not specified, the labels are assumed to be the last dimension of the training
- dataset.
- When a model is trained, the '--output_model_file' output parameter may be
- used to save the trained model. A model may be loaded for predictions with
- the '--input_model_file' parameter. The '--input_model_file' parameter may
- not be specified when the '--training_file' parameter is specified. The
- '--minimum_leaf_size' parameter specifies the minimum number of training
- points that must fall into each leaf for it to be split. If
- '--print_training_error' is specified, the training error will be printed.
- Test data may be specified with the '--test_file' parameter, and if
- performance numbers are desired for that test set, labels may be specified
- with the '--test_labels_file' parameter. Predictions for each test point may
- be saved via the '--predictions_file' output parameter. Class probabilities
- for each prediction may be saved with the '--probabilities_file' output
- parameter.
- For example, to train a decision tree with a minimum leaf size of 20 on the
- dataset contained in 'data.csv' with labels 'labels.csv', saving the output
- model to 'tree.bin' and printing the training error, one could call
- $ decision_tree --training_file data.csv --labels_file labels.csv
- --output_model_file tree.bin --minimum_leaf_size 20 --print_training_error
- Then, to use that model to classify points in 'test_set.csv' and print the
- test error given the labels 'test_labels.csv' using that model, while saving
- the predictions for each point to 'predictions.csv', one could call
- $ decision_tree --input_model_file tree.bin --test_file test_set.csv
- --test_labels_file test_labels.csv --predictions_file predictions.csv
- Optional input options:
- --help (-h) [bool] Default help info.
- --info [string] Get help on a specific module or option.
- Default value ''.
- --input_model_file (-m) [string]
- Pre-trained decision tree, to be used with test
- points. Default value ''.
- --labels_file (-l) [string] Training labels. Default value ''.
- --minimum_leaf_size (-n) [int]
- Minimum number of points in a leaf. Default
- value 20.
- --print_training_error (-e) [bool]
- Print the training error.
- --test_file (-T) [string] Matrix of test points. Default value ''.
- --test_labels_file (-L) [string]
- Test point labels, if accuracy calculation is
- desired. Default value ''.
- --training_file (-t) [string]
- Matrix of training points. Default value ''.
- --verbose (-v) [bool] Display informational messages and the full list
- of parameters and timers at the end of
- execution.
- --version (-V) [bool] Display the version of mlpack.
- Optional output options:
- --output_model_file (-M) [string]
- Output for trained decision tree. Default value
- ''.
- --predictions_file (-p) [string]
- Class predictions for each test point. Default
- value ''.
- --probabilities_file (-P) [string]
- Class probabilities for each test point.
- Default value ''.
- For further information, including relevant papers, citations, and theory,
- consult the documentation found at http://www.mlpack.org or included with your
- distribution of mlpack.
- ==========================
- Python binding output:
- ======================
- >>> help(decision_tree)
- Help on built-in function decision_tree in module mlpack.decision_tree:
- decision_tree(...)
- Decision tree
- Train and evaluate using a decision tree. Given a dataset containing numeric
- features and associated labels for each point in the dataset, this program can
- train a decision tree on that data.
- The training file and associated labels are specified with the 'training' and
- 'labels' parameters, respectively. The labels should be in the range [0,
- num_classes - 1]. Optionally, if 'labels' is not specified, the labels are
- assumed to be the last dimension of the training dataset.
- When a model is trained, the 'output_model' output parameter may be used to
- save the trained model. A model may be loaded for predictions with the
- 'input_model' parameter. The 'input_model' parameter may not be specified
- when the 'training' parameter is specified. The 'minimum_leaf_size' parameter
- specifies the minimum number of training points that must fall into each leaf
- for it to be split. If 'print_training_error' is specified, the training
- error will be printed.
- Test data may be specified with the 'test' parameter, and if performance
- numbers are desired for that test set, labels may be specified with the
- 'test_labels' parameter. Predictions for each test point may be saved via the
- 'predictions' output parameter. Class probabilities for each prediction may
- be saved with the 'probabilities' output parameter.
- For example, to train a decision tree with a minimum leaf size of 20 on the
- dataset contained in 'data' with labels 'labels', saving the output model to
- 'tree' and printing the training error, one could call
- >>> decision_tree(training=data, labels=labels, minimum_leaf_size=20,
- print_training_error=True)
- >>> tree = output['output_model']
- Then, to use that model to classify points in 'test_set' and print the test
- error given the labels 'test_labels' using that model, while saving the
- predictions for each point to 'predictions', one could call
- >>> decision_tree(input_model=tree, test=test_set, test_labels=test_labels)
- >>> predictions = output['predictions']
- Parameters:
- - input_model (DecisionTreeModelType): Pre-trained decision tree, to be
- used with test points.
- - labels (row vector): Training labels.
- - minimum_leaf_size (int): Minimum number of points in a leaf.
- - print_training_error (bool): Print the training error.
- - test (matrix): Matrix of test points.
- - test_labels (row vector): Test point labels, if accuracy calculation
- is desired.
- - training (matrix): Matrix of training points.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement