SHARE
TWEET

Untitled

a guest Jun 21st, 2015 261 Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. # Normalization in data modeling
  2.  
  3. Say you have a service where users can add products they like to a wish list, representing this with JSON, you might expect something like this:
  4.  
  5. ```json
  6. [
  7.   {
  8.     "user_id": 3,
  9.     "email": "foo@example.com",
  10.     "wishlist": [
  11.       {
  12.         "product_id": 323,
  13.         "product_price": 3.29
  14.       },
  15.       {
  16.         "product_id": 4942,
  17.         "product_price": 15.99
  18.       }
  19.     ]
  20.   },
  21.   {
  22.     "user_id": 7,
  23.     "email": "bar@example.com",
  24.     "wishlist": [
  25.       {
  26.         "product_id": 87,
  27.         "product_price": 8.98
  28.       },
  29.       {
  30.         "product_id": 4942,
  31.         "product_price": 15.99
  32.       }
  33.     ]
  34.   }
  35. ]
  36. ```
  37.  
  38. However, notice that there are two copies of the product with id 4942, this is common with the EAV way of modeling the data, it is also referred to as "denormalized" data. To get a better idea of what that means, let's normalize the above example using relations:
  39.  
  40. ### the users relation
  41. | user_id | email |
  42. |---------|-------|
  43. | 3       | foo@example.com |
  44. | 7       | bar@example.com |
  45.  
  46. ## the products relation
  47.  
  48. | product_id | product_price |
  49. |------------|---------------|
  50. | 323        | 3.29          |
  51. | 4942       | 15.99         |
  52. | 87         | 8.98          |
  53.  
  54. And to tie the two together, the
  55.  
  56. ## the wishlist relation
  57. | user_id | product_id |
  58. |---------|------------|
  59. | 3       | 323        |
  60. | 3       | 4942       |
  61. | 7       | 87         |
  62. | 7       | 4942       |
  63.  
  64.  
  65. So, this way you avoid unncessary duplication, you store the users in their own table, the products in their own table, and then use a third table to store the links between them. This will allow for much more efficient storage of data, and also help with consistency. For example, if product 4942 goes on sale, the denormalized form only has to update the price for the single record in the products table. By contrast, in the denormalized (JSON object above) form, you'd have to iterate over all the objects, dig into each of the wishlists and update the price in all of them.
  66.  
  67. ## Querying
  68. To query out the full records (JOINed together), you could run a query like:
  69.  
  70. ```SQL
  71. SELECT u.user_id, u.email, p.product_id, p.product_price
  72. FROM users u
  73. JOIN wishlist w
  74. JOIN products p
  75. ON u.user_id = w.user_id AND
  76.    w.product_id = p.product_id
  77. ```
  78.  
  79.  
  80. | user_id | email | product_id | product_price |
  81. |---------|-------|------------|---------------|
  82. | 3       |foo@example.com | 323        | 3.29 |
  83. | 3       |foo@example.com | 4942       | 15.99|
  84. | 7       |bar@example.com | 87         | 8.98 |
  85. | 7       |bar@example.com | 4942       | 15.99|
RAW Paste Data
We use cookies for various purposes including analytics. By continuing to use Pastebin, you agree to our use of cookies as described in the Cookies Policy. OK, I Understand
 
Top