Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- ### Enforce constness of SGMatrix / SGVector
- Currently, one can always mutate const matrix and vector by creating shadow copy, which breaks the semantic of const reference. We can make copy ctors return deep copy and add move ctors. The move constructor should obtain the ownership of underlying memory, and leave the original one uninitialized.
- For example
- ```
- SGMatrix::SGMatrix(const SGMatrix &orig); // deep copy
- SGMatrix& SGMatrix::operator=(const SGMatrix<T>&); // deep copy
- SGMatrix(SGMatrix &&); // move ctor
- ```
- ### Preprocessor
- Preprocessors have a unified three stage API:
- ```
- (constructor) // initialize and set up parameters
- void fit(CFeatures *); // fit into training features, preprocessors that do not require training data have a empty implementation
- Some<CFeatures *>apply(CFeatures *); // apply to features by creating a new instance, maybe 'transform' would be a better name
- ```
- We will remove the whole preprocessor stuff from `CFeatures` (I am not sure whether we should do this). To use multiple preprocessors on a feature, one can use a pipeline.
- ### Pipeline
- Pipeline is a convenient wrapper a collection of preprocessors and a machine as the final stage.
- ```
- class Pipeline
- void add_preprocessor(CPreprocessor *);
- void add_machine(CMachine *);
- void fit(CFeatures *); // fit preprocessors on features one by one, and train the machine on transformed features
- void fit(CFeatures *, CLabels *); // fit preprocessors, and then fit the machine with features and training labels
- CLabels predict(CFeatures *);
- ```
- Convenient method to create a pipeline, which need some tricks with template.
- ```
- Some<Pipeline> make_pipeline(CPreprocessors *, CPreprocessors *, ..., CMachine *);
- ```
- Or we can have a vector of preprocessors
- ```
- Some<Pipeline> make_pipeline(std::vector<CPreprocessors *> preprocessors, CMachine *);
- ```
- ### View
- ```
- Some<CFeatures> CFeatures::view(SGVector index);
- Some<CLabels> CLabels::view(SGVector index);
- ```
- The view method call creates a new instance of feature or label, which shadow-copies underlying data. A subset is added to the subset stack of the new instance. After this, we will make all subset APIs in `CFeatures` and `CLabels` private.
- Taken from @micmn 's design, we need to solve the issue of covariant type.
- ```
- @non-virtual
- Some<Features> Features::view(SGVector<index_t> idx); // create a new instance
- @non-virtual
- Some<DenseFeatures<T>> DenseFeatures::view(SGVector<index_t> idx) // do the type cast
- auto feats = wrap(
- static_cast<DenseFeatures<T>*>(
- Features::view(idx).get()))
- return feats
- ```
- ### Feature Iterator
- There are a set of old iteration APIs in `CDotFeatures`
- ```
- void* get_feature_iterator(int32_t vector_index);
- bool get_next_feature(int32_t& index, float64_t& value, void* iterator);
- void free_feature_iterator(void* iterator);
- ```
- These APIs expose data as raw pointers. We will adapt them to new iterator design in `DotIterator`. This also involves refactor in LibLinear where they are mostly used.
- ### Refactor non-const methods of features
- We could start from `CDotFeatures`, which is the super class of many other feature types. There are many non-const methods, for example,
- ```
- virtual float64_t dot(int32_t vec_idx1, CDotFeatures* df, int32_t vec_idx2)=0;
- virtual void add_to_dense_vec(float64_t alpha, int32_t vec_idx1, float64_t* vec2, int32_t vec2_len, bool abs_val=false)=0;
- ```
- They are non-const because it will call other non-const methods to get the actual feature vector, i.e. `get_feature_vector(int32_t num, int32_t& len, bool& dofree)` in `CDenseFeatures` case, which may compute feature vector on the fly (using implementation of subclasses) and then cache it. We can make cache mutable so that these methods can be const.
- We will add locks to `CCache` for thread safety to enable concurrent access of features.
Add Comment
Please, Sign In to add comment