Advertisement
Guest User

Untitled

a guest
May 17th, 2017
88
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 7.08 KB | None | 0 0
  1. Command-line output:
  2. ====================
  3.  
  4. Decision tree
  5.  
  6. Train and evaluate using a decision tree. Given a dataset containing numeric
  7. features and associated labels for each point in the dataset, this program can
  8. train a decision tree on that data.
  9.  
  10. The training file and associated labels are specified with the
  11. '--training_file' and '--labels_file' parameters, respectively. The labels
  12. should be in the range [0, num_classes - 1]. Optionally, if '--labels_file' is
  13. not specified, the labels are assumed to be the last dimension of the training
  14. dataset.
  15.  
  16. When a model is trained, the '--output_model_file' output parameter may be
  17. used to save the trained model. A model may be loaded for predictions with
  18. the '--input_model_file' parameter. The '--input_model_file' parameter may
  19. not be specified when the '--training_file' parameter is specified. The
  20. '--minimum_leaf_size' parameter specifies the minimum number of training
  21. points that must fall into each leaf for it to be split. If
  22. '--print_training_error' is specified, the training error will be printed.
  23.  
  24. Test data may be specified with the '--test_file' parameter, and if
  25. performance numbers are desired for that test set, labels may be specified
  26. with the '--test_labels_file' parameter. Predictions for each test point may
  27. be saved via the '--predictions_file' output parameter. Class probabilities
  28. for each prediction may be saved with the '--probabilities_file' output
  29. parameter.
  30.  
  31. For example, to train a decision tree with a minimum leaf size of 20 on the
  32. dataset contained in 'data.csv' with labels 'labels.csv', saving the output
  33. model to 'tree.bin' and printing the training error, one could call
  34.  
  35. $ decision_tree --training_file data.csv --labels_file labels.csv
  36. --output_model_file tree.bin --minimum_leaf_size 20 --print_training_error
  37.  
  38. Then, to use that model to classify points in 'test_set.csv' and print the
  39. test error given the labels 'test_labels.csv' using that model, while saving
  40. the predictions for each point to 'predictions.csv', one could call
  41.  
  42. $ decision_tree --input_model_file tree.bin --test_file test_set.csv
  43. --test_labels_file test_labels.csv --predictions_file predictions.csv
  44.  
  45. Optional input options:
  46.  
  47. --help (-h) [bool] Default help info.
  48. --info [string] Get help on a specific module or option.
  49. Default value ''.
  50. --input_model_file (-m) [string]
  51. Pre-trained decision tree, to be used with test
  52. points. Default value ''.
  53. --labels_file (-l) [string] Training labels. Default value ''.
  54. --minimum_leaf_size (-n) [int]
  55. Minimum number of points in a leaf. Default
  56. value 20.
  57. --print_training_error (-e) [bool]
  58. Print the training error.
  59. --test_file (-T) [string] Matrix of test points. Default value ''.
  60. --test_labels_file (-L) [string]
  61. Test point labels, if accuracy calculation is
  62. desired. Default value ''.
  63. --training_file (-t) [string]
  64. Matrix of training points. Default value ''.
  65. --verbose (-v) [bool] Display informational messages and the full list
  66. of parameters and timers at the end of
  67. execution.
  68. --version (-V) [bool] Display the version of mlpack.
  69.  
  70. Optional output options:
  71.  
  72. --output_model_file (-M) [string]
  73. Output for trained decision tree. Default value
  74. ''.
  75. --predictions_file (-p) [string]
  76. Class predictions for each test point. Default
  77. value ''.
  78. --probabilities_file (-P) [string]
  79. Class probabilities for each test point.
  80. Default value ''.
  81.  
  82. For further information, including relevant papers, citations, and theory,
  83. consult the documentation found at http://www.mlpack.org or included with your
  84. distribution of mlpack.
  85.  
  86. ==========================
  87.  
  88. Python binding output:
  89. ======================
  90.  
  91. >>> help(decision_tree)
  92. Help on built-in function decision_tree in module mlpack.decision_tree:
  93.  
  94. decision_tree(...)
  95. Decision tree
  96.  
  97. Train and evaluate using a decision tree. Given a dataset containing numeric
  98. features and associated labels for each point in the dataset, this program can
  99. train a decision tree on that data.
  100.  
  101. The training file and associated labels are specified with the 'training' and
  102. 'labels' parameters, respectively. The labels should be in the range [0,
  103. num_classes - 1]. Optionally, if 'labels' is not specified, the labels are
  104. assumed to be the last dimension of the training dataset.
  105.  
  106. When a model is trained, the 'output_model' output parameter may be used to
  107. save the trained model. A model may be loaded for predictions with the
  108. 'input_model' parameter. The 'input_model' parameter may not be specified
  109. when the 'training' parameter is specified. The 'minimum_leaf_size' parameter
  110. specifies the minimum number of training points that must fall into each leaf
  111. for it to be split. If 'print_training_error' is specified, the training
  112. error will be printed.
  113.  
  114. Test data may be specified with the 'test' parameter, and if performance
  115. numbers are desired for that test set, labels may be specified with the
  116. 'test_labels' parameter. Predictions for each test point may be saved via the
  117. 'predictions' output parameter. Class probabilities for each prediction may
  118. be saved with the 'probabilities' output parameter.
  119.  
  120. For example, to train a decision tree with a minimum leaf size of 20 on the
  121. dataset contained in 'data' with labels 'labels', saving the output model to
  122. 'tree' and printing the training error, one could call
  123.  
  124. >>> decision_tree(training=data, labels=labels, minimum_leaf_size=20,
  125. print_training_error=True)
  126. >>> tree = output['output_model']
  127.  
  128. Then, to use that model to classify points in 'test_set' and print the test
  129. error given the labels 'test_labels' using that model, while saving the
  130. predictions for each point to 'predictions', one could call
  131.  
  132. >>> decision_tree(input_model=tree, test=test_set, test_labels=test_labels)
  133. >>> predictions = output['predictions']
  134.  
  135.  
  136. Parameters:
  137.  
  138. - input_model (DecisionTreeModelType): Pre-trained decision tree, to be
  139. used with test points.
  140. - labels (row vector): Training labels.
  141. - minimum_leaf_size (int): Minimum number of points in a leaf.
  142. - print_training_error (bool): Print the training error.
  143. - test (matrix): Matrix of test points.
  144. - test_labels (row vector): Test point labels, if accuracy calculation
  145. is desired.
  146. - training (matrix): Matrix of training points.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement