Guest User

Untitled

a guest
Mar 22nd, 2018
102
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 3.78 KB | None | 0 0
  1. ### Enforce constness of SGMatrix / SGVector
  2. Currently, one can always mutate const matrix and vector by creating shadow copy, which breaks the semantic of const reference. We can make copy ctors return deep copy and add move ctors. The move constructor should obtain the ownership of underlying memory, and leave the original one uninitialized.
  3.  
  4. For example
  5. ```
  6. SGMatrix::SGMatrix(const SGMatrix &orig); // deep copy
  7. SGMatrix& SGMatrix::operator=(const SGMatrix<T>&); // deep copy
  8. SGMatrix(SGMatrix &&); // move ctor
  9. ```
  10.  
  11. ### Preprocessor
  12. Preprocessors have a unified three stage API:
  13. ```
  14. (constructor) // initialize and set up parameters
  15. void fit(CFeatures *); // fit into training features, preprocessors that do not require training data have a empty implementation
  16. Some<CFeatures *>apply(CFeatures *); // apply to features by creating a new instance, maybe 'transform' would be a better name
  17. ```
  18.  
  19. We will remove the whole preprocessor stuff from `CFeatures` (I am not sure whether we should do this). To use multiple preprocessors on a feature, one can use a pipeline.
  20.  
  21. ### Pipeline
  22. Pipeline is a convenient wrapper a collection of preprocessors and a machine as the final stage.
  23. ```
  24. class Pipeline
  25. void add_preprocessor(CPreprocessor *);
  26. void add_machine(CMachine *);
  27. void fit(CFeatures *); // fit preprocessors on features one by one, and train the machine on transformed features
  28. void fit(CFeatures *, CLabels *); // fit preprocessors, and then fit the machine with features and training labels
  29. CLabels predict(CFeatures *);
  30. ```
  31.  
  32. Convenient method to create a pipeline, which need some tricks with template.
  33. ```
  34. Some<Pipeline> make_pipeline(CPreprocessors *, CPreprocessors *, ..., CMachine *);
  35. ```
  36. Or we can have a vector of preprocessors
  37. ```
  38. Some<Pipeline> make_pipeline(std::vector<CPreprocessors *> preprocessors, CMachine *);
  39. ```
  40.  
  41. ### View
  42. ```
  43. Some<CFeatures> CFeatures::view(SGVector index);
  44. Some<CLabels> CLabels::view(SGVector index);
  45. ```
  46. The view method call creates a new instance of feature or label, which shadow-copies underlying data. A subset is added to the subset stack of the new instance. After this, we will make all subset APIs in `CFeatures` and `CLabels` private.
  47.  
  48. Taken from @micmn 's design, we need to solve the issue of covariant type.
  49. ```
  50. @non-virtual
  51. Some<Features> Features::view(SGVector<index_t> idx); // create a new instance
  52.  
  53. @non-virtual
  54. Some<DenseFeatures<T>> DenseFeatures::view(SGVector<index_t> idx) // do the type cast
  55. auto feats = wrap(
  56. static_cast<DenseFeatures<T>*>(
  57. Features::view(idx).get()))
  58. return feats
  59. ```
  60.  
  61. ### Feature Iterator
  62. There are a set of old iteration APIs in `CDotFeatures`
  63. ```
  64. void* get_feature_iterator(int32_t vector_index);
  65. bool get_next_feature(int32_t& index, float64_t& value, void* iterator);
  66. void free_feature_iterator(void* iterator);
  67. ```
  68. These APIs expose data as raw pointers. We will adapt them to new iterator design in `DotIterator`. This also involves refactor in LibLinear where they are mostly used.
  69.  
  70. ### Refactor non-const methods of features
  71. We could start from `CDotFeatures`, which is the super class of many other feature types. There are many non-const methods, for example,
  72. ```
  73. virtual float64_t dot(int32_t vec_idx1, CDotFeatures* df, int32_t vec_idx2)=0;
  74. virtual void add_to_dense_vec(float64_t alpha, int32_t vec_idx1, float64_t* vec2, int32_t vec2_len, bool abs_val=false)=0;
  75. ```
  76. They are non-const because it will call other non-const methods to get the actual feature vector, i.e. `get_feature_vector(int32_t num, int32_t& len, bool& dofree)` in `CDenseFeatures` case, which may compute feature vector on the fly (using implementation of subclasses) and then cache it. We can make cache mutable so that these methods can be const.
  77.  
  78. We will add locks to `CCache` for thread safety to enable concurrent access of features.
Add Comment
Please, Sign In to add comment