Guest User

Untitled

a guest
Jan 13th, 2018
65
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 4.36 KB | None | 0 0
  1. # #Core Idea
  2. - easy to start with and tough to master
  3. - brainstorming
  4.  
  5. # #d11hack
  6. We started off brain storming almost a week prior to the event about various problems that might get people interested in them. The [Web Crawler](link to the problem statement) was one which was the right balance of complexity and fun and adhered to our philosophy of
  7. > A good game is easy to start with but tough to master.
  8.  
  9.  
  10. # #The problem statement
  11. <!-- The Problem Statement -->
  12. - Graph traversal problem
  13. - Little bit of web
  14.  
  15. Building a web crawler is essentially a Graph traversal problem. You have pages that link to other pages. Its possible some pages link back and you end up in a cycle. Thus one needs to keep a reference of all the pages that have been visited and make sure that they are not visited again.
  16.  
  17. Going to a particular page takes time and crawling it serially is not the most efficient way. Ideally you should be able to parallize as much as possible, ie. since each page has links to multiple other pages, its better to request them in parallel and continue parsing as an when the response is received.
  18.  
  19.  
  20. <!-- Project setup -->
  21. - Travis Running time
  22. - Benchmarking set
  23. - Git Repos Setup
  24. - Plagiarism
  25. - Open sourced 4 days before JSFoo
  26. - Branches contained solutions
  27.  
  28. To make people's lives easier we created an OS github repo with the boilerplate code and a unit test. At dream11 we take TDD to heart :P. So we added unit tests before hand to help developers test/benchmark their code once they are done. We also integrated the project with Travis so that your code could be automatically executed on a standard machine setup. This also helped us saving a lot of our time later when we added our hidden test cases ;)
  29.  
  30. Initially we were worried about plagiarism since everyone had to publish their solutions by raising a pull request to an open source project. But then we realized that most devs takes days to understand what other's have coded and given this tight deadline in the middle of JSFoo people would perfer doing it themselves rather than copying. We even went to the limit of publishing the project almost a week before JSFoo actually started :D
  31. PS: You might want to keep a watch on your github repo for the next challenge
  32.  
  33.  
  34. <!-- Caveats -->
  35. - Lexicographically smallest
  36. - With some latency/IO
  37. - Throttled server
  38. - Any library
  39. - Any hacks
  40. - Hidden Test Cases
  41. - Last minute test cases
  42. - Travis back merges
  43. - Server throttles/delays/Connection Reset etc.
  44. - Failures post hidden tests disappointment.
  45.  
  46. We added some caveats to the challenge, for instance — While crawling you also need to find the lexicographically smallest word on the website. The webserver would automatically start throttling the response after sometime and eventually send a `429` response. You could use any library but had to pass all our test cases. Initially we only had one so that people could get started with. Observed people were solving it in 10seconds where as our per our internal benchmarks it should take atleast 34seconds to solve the problem. At the end of day one we added some more test cases to be fair to those who were taking longer to crawl the website. We made this change and backmerged all the 21 pull requests that we had received on day one. Unsurprisingly one two of them passed all cases and other just the case they had hard coded the results for.
  47.  
  48. Our deadline was 3:30pm on day two of JSFoo and we had announced that we will be adding more test cases at 3:30 so make sure your solution is a generic one and not, something that works of just one test case. Even though initially people we disappointed most of the them updated their code to make it work for all cases.
  49.  
  50.  
  51. <!-- Solutions -->
  52. - Crazy hacks!
  53.  
  54. People submitted amazing solutions and most of them were able to do it under 30 seconds. This was much less than our original estimation of 34 seconds. People were trying to beat each other by throttling requests based on the size of the graph. One of the solutions exploited the hints that the server returns in a special header `X-RateLimit-Limit` and `X-RateLimit-Remaining` depicting when it was going to throttle next.
  55.  
  56.  
  57. <!-- Winner declarations -->
  58. - Benchmark cases
  59.  
  60. To benchmark we considered on the the first test case which was given on day 1 for which people had optimized the most. Though the submission was supposed to pass all the test their run time was not considered to create the leaderboard.
  61.  
  62. <!-- Actual Winners -->
Add Comment
Please, Sign In to add comment