a guest Jun 24th, 2018 77 Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. I was recently asked this question in an interview:
  2. Given a log file which has customer id and corresponding to that id it has a page id visited by that customer. Given such log files for 3 consecutive days, design an algo to find those customers who visited the site on exactly 2 out of 3 days and visited at least 3 distinct pages. Discuss the design and space complexity.
  4. I thought of this approach:
  5. 1)sorting the 3 log files
  6. 2) looking through the top of each file: look for the same customer ids and see the number of different days they visited the pages. If the same customer has visited the pages on 3 different days, then we can discard that customer and move on looking to the next customer in each file. Else, if this customer has visited the the pages on exactly 2 different days, we will keep storing his page ids in a set.
  8. But, I am not sure whether this approach is the most efficient, also what will be the approach which will scale well if I have to find k visited sites out of K sites, visiting exactly N distinct pages. I thought that k-d trees(2-d trees) may help because they can handle multiple queries.
  10. So, what will be the best data strucuture and the algorithm to handle these kinds of queries.
RAW Paste Data
We use cookies for various purposes including analytics. By continuing to use Pastebin, you agree to our use of cookies as described in the Cookies Policy. OK, I Understand