Advertisement
Cook

Untitled

Oct 10th, 2011
54
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 3.03 KB | None | 0 0
  1. Hi TOR,
  2. I have a few questions about SearchDigest:
  3. Is it feasible in the future to create an exclusion page in the MediaWiki namespace, where we could make a list of all of the search terms we don't want to display in the digest? If not, is there any other way to handle the spam?
  4. Are there any other reasons that you think the full list of search results is useless, besides the spam and the blue links? If we had a way to control the spam, and the pages were removed from the list once created, I don't see why we would have a need to reset the list at all. The needed results would continue to rise to the top, and we'd still have a much larger sample.
  5. How does the rolling weekly/monthly work? If it involves time stamps, would it be possible to have multiple rolling lists? If we're just looking for the most popular search terms right now, it seems like the best way to see them is through a daily list, but the others would also be useful. Could that be done without sacrificing too many resources?
  6. Do the earlier results that were removed recently still exist anywhere?
  7. If you still don't want to keep the full list of results somewhere, could the monthly versions of the CSV downloads be available somewhere, perhaps even in the same download?
  8. Thanks a ton for continuing to work on this extension.
  9. --Cook
  10. --------
  11. Hey Cook,
  12.  
  13. thanks for contacting us. I’m glad you like SearchDigest!
  14.  
  15. For spam filtering, I plan to hook SearchDigest up to our unified spam protection system, Phalanx, which should get rid of most spam. If that doesn’t reduce the amount of spam to a managable level we can include a MediaWiki blacklist as an additional step.
  16.  
  17. As for the full list vs periodic purge, my reasoning was that you would have pages that are neither spam nor pages you would want to create (either as redirects or as actual content pages). Stuff like “What is the name of...?”.
  18.  
  19. The system as it is now is not a rolling monthly window but rather a reset once per month. Each search query is a record with the string and the number of searches. This conserves disk space, but the result is there are no timestamps. The earlier results are dropped in each reset.
  20.  
  21. Regarding your question about CSV and data retention, it should be pretty easy to set up a bot that would copy the data using the CSV form and store it somewhere, if you want to.
  22.  
  23. I will be working on fixing the CSV output (currently only 50 most popular queries are listed), once that’s fixed you will be able to download and play with the data. Even store it on the wiki, if you want.
  24.  
  25. Keep in mind that this is still an experimental feature and things might change at any moment, including the behaviour I described above. I will be taking your suggestions into account when working on SearchDigest. Also, be aware this is a side project and we can't put much time and effort into this right now, so please bear with us, improvements and fixes to SearchDigest might take some time to complete.
  26.  
  27. Once again, thanks for your input. Happy editing!
  28.  
  29. Lucas 'TOR' Garczewski
  30. Community Engineer
  31. Wikia, Inc.
  32.  
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement