Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- Hello Benoit,
- Thank you very much for your input! I really like your suggestion about the more efficient approach and I will insert it in the code :D
- I 'm finishing now a final run on the webspam dataset and will send a link by tonight with all the results combined, somehow!
- To be honest I do have a question, I got a bit confused about the hashing process and I would like to clarify a few things.
- In ngrams/tokens we had the following pseudo-algorithm :
- for f in tokens {
- h_idx = Hash( f, seed) % target_dim;
- vec[h_idx]++;
- }
- Now,
- in numerical features are we supposed to get the following? ( because I was still following the above procedure)
- for ( i=0; i<data.size(); i++) {
- h_idx = Hash( i, seed:i ) % target_dim;
- vec[h_idx] += data[i]; // instead of +1
- }
- And so, would that result in quadratics of the following form?
- for ( i=0; i<data.size(); i++) {
- for ( j = i; j < data.size(); j++ ) {
- h_idx = Hash( i * data.size() + j, seed: I*data.size()+j) % target_dim;
- vec[h_idx] += data[i] * data[j];
- }
- }
- I hope it's not too confusing.
- Looking forward to your reply.
- Vangelis
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement