Guest User

Untitled

a guest
May 24th, 2018
85
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 3.26 KB | None | 0 0
  1.  
  2. Schema Design in Mongo
  3. ======================
  4.  
  5. table => collection
  6. row => json doc
  7.  
  8. embed vs. link
  9. --------------
  10.  
  11. * "contains relationship:" embed (pre-joined in a sense)
  12. * 4MB limit on embedded document size
  13.  
  14.  
  15. order = { _id
  16. ...
  17.  
  18. 'lineitems': []
  19. 'shippingaddress': {}
  20. 'total':0
  21. 'tax':0
  22. 'subtotal':0
  23. }
  24.  
  25. db.orders.ensureIndex({})
  26.  
  27. * reach into objects via expressions
  28.  
  29. db.factories.insert( { name: "xyz", metro: { city: "New York", state: "NY" } } )
  30. db.factories.ensureIndex({'metro.city': 1, 'metro.state':1})
  31. db.factories.find({'metro.state':'NY'})
  32.  
  33.  
  34. Map/Reduce
  35. ----------
  36.  
  37. map
  38.  
  39.  
  40. _id: ...
  41. ----------
  42. * automatically indexed
  43. * unique
  44. * invariant
  45. * or use ObjectID (best for sharded collection)
  46.  
  47.  
  48.  
  49.  
  50. Atomic
  51. compare and swap
  52.  
  53. db.inventory.update({_id:n._id, qty:qty_old}, n)
  54.  
  55.  
  56.  
  57.  
  58.  
  59.  
  60.  
  61. MySQL to MongoDB
  62. ================
  63.  
  64. physical HW or dedicated VM
  65.  
  66. disk + memory = happy mongo
  67.  
  68. migrating data
  69. -------------
  70. * read from old storage, write to the new storage
  71. * moved 5 billion rows from MySQL
  72. - 100,000 inserts/sec
  73. - cpu-bound
  74.  
  75. wordnik reads + creates java objects @ 250/sec (!)
  76.  
  77. disk space
  78. ----------
  79.  
  80.  
  81.  
  82.  
  83.  
  84.  
  85.  
  86.  
  87. CMS and MongoDB
  88. ==============
  89.  
  90. gridfs - rich media storage, binary objects
  91.  
  92.  
  93.  
  94.  
  95.  
  96.  
  97.  
  98.  
  99. Event Logging
  100. =============
  101.  
  102. * who does what/how? -- funnels
  103. * how valuable are groups of users? -- virality
  104. * are our changes working? -- retention, funnel conversion
  105.  
  106. backend dreams
  107. --------------
  108. * flexible
  109. * scalable
  110. * queryable
  111. * easy to work with
  112.  
  113. Enter Mongo
  114. ========
  115. * schemaless
  116. * rich data manipulation/access
  117. * at home in web-centric toolchain
  118.  
  119. Event example
  120. ==============
  121.  
  122. [{
  123. name: 'front_page/broadcast_link',
  124. date: '',
  125. unique_id: 'sfsadfas',
  126. bucket: 'big_red_button'
  127. },
  128. {
  129. name: 'front_page/broadcast_link',
  130. date: '',
  131. unique_id: 'sfsadfas',
  132. bucket: 'small_blue_button'
  133. }]
  134.  
  135.  
  136. Processing Data
  137. ---------------
  138. Python - 1 process, 1 machine
  139. Map/Reduce - hadoop
  140. Config Docs - name of events to track, realtime?, unique?
  141. Generate/Apply MongoDB operations
  142.  
  143. example:
  144.  
  145. how many times each event occurred per bucket
  146.  
  147. ( for small collections, use collection.group() )
  148. ( for large collections, use collection.mapReduce() )
  149.  
  150. for every event that comes in, increment
  151. each event builds up into a heap (group)
  152.  
  153. for e in event:
  154. key = event['name']
  155. if key in matchers:
  156. count_key = ""
  157. db.event_counts.update()
  158. ...
  159.  
  160. event: someone click the broadcast button
  161. (auth)
  162. click "allow" or "disallow" box
  163. (share with friends)
  164. start broadcasting
  165.  
  166. tracking 36 different events
  167. * counts
  168. * periodic map/reduce: map/reduce every 15min for more complex analysis
  169. generated javascript map/reduce
  170.  
  171. Funnel Calculation
  172. ------------------
  173. * per user rollup:
  174. - for each user, which steps in the funnel have they been at with constraints applied
  175. - a map to get unique users, reduce to count which unique events triggered
  176. * per bucket rollup:
  177. - for each bucket, count of users at each step (abandoned/completed)
  178.  
  179. calculations done in batch
  180.  
  181. Future
  182. ------
  183. * migrate postgres stuff to mongo
  184. * batch jobs for funnel, retention, virality
  185.  
  186. Observations
  187. ------------
  188. * big deletes seem to slow things down... so capped collection might be a good idea
Add Comment
Please, Sign In to add comment