Advertisement
Guest User

Untitled

a guest
Jul 23rd, 2013
43
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.14 KB | None | 0 0
  1.  
  2. Hello Benoit,
  3.  
  4. Thank you very much for your input! I really like your suggestion about the more efficient approach and I will insert it in the code :D
  5. I 'm finishing now a final run on the webspam dataset and will send a link by tonight with all the results combined, somehow!
  6.  
  7. To be honest I do have a question, I got a bit confused about the hashing process and I would like to clarify a few things.
  8.  
  9. In ngrams/tokens we had the following pseudo-algorithm :
  10.  
  11. for f in tokens {
  12. h_idx = Hash( f, seed) % target_dim;
  13. vec[h_idx]++;
  14. }
  15.  
  16. Now,
  17. in numerical features are we supposed to get the following? ( because I was still following the above procedure)
  18.  
  19. for ( i=0; i<data.size(); i++) {
  20. h_idx = Hash( i, seed:i ) % target_dim;
  21. vec[h_idx] += data[i]; // instead of +1
  22. }
  23.  
  24. And so, would that result in quadratics of the following form?
  25.  
  26. for ( i=0; i<data.size(); i++) {
  27. for ( j = i; j < data.size(); j++ ) {
  28. h_idx = Hash( i * data.size() + j, seed: I*data.size()+j) % target_dim;
  29. vec[h_idx] += data[i] * data[j];
  30. }
  31. }
  32.  
  33. I hope it's not too confusing.
  34. Looking forward to your reply.
  35.  
  36. Vangelis
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement