Advertisement
Guest User

Untitled

a guest
Aug 23rd, 2019
78
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.74 KB | None | 0 0
  1. /*
  2. Pairwise distance between every single members of the parliament
  3. active in that specific term, for every single voting that happened.
  4.  
  5. Generates a 61GB dataset for the 14,452 voting events.
  6.  
  7. Next step is reducing the amount of data, using roling window aggregation
  8. on the time dimension, and clustering on the people dimension.
  9. */
  10.  
  11. WITH
  12.  
  13. extended AS (
  14. SELECT
  15. person,
  16. party,
  17. vote_timestamp,
  18. -- to reduce data volume, we aggregate
  19. -- distances down to weekly granurality
  20. --
  21. -- also, analytic functions don't work
  22. -- on date/time data types, we need number
  23. DATE(vote_timestamp) AS date_week, -- human readable
  24. DATE_DIFF( -- for analytic functions
  25. DATE(vote_timestamp),
  26. DATE('2009-01-01'),
  27. WEEK
  28. ) AS week_no,
  29. vote_subject_hash,
  30. cycle,
  31. (
  32. CASE
  33. WHEN vote = 'igen'
  34. THEN 1
  35. WHEN vote = 'nem'
  36. THEN 0
  37. -- considering absence
  38. -- and not voting equal
  39. ELSE .5
  40. END
  41. ) AS vote
  42. FROM
  43. `atlo-parliament-polarization.scraped.ready_for_distance_calculation`
  44. )
  45.  
  46. SELECT
  47. date_week,
  48. week_no,
  49. vote_timestamp,
  50. vote_subject_hash,
  51. cycle,
  52. ARRAY_AGG(
  53. STRUCT(
  54. q1.person AS source_person,
  55. q1.party AS source_party,
  56. q2.person AS target_person,
  57. q2.party AS target_party,
  58. ABS(q1.vote - q2.vote) AS distance
  59. )
  60. ) AS edges
  61. FROM
  62. extended AS q1
  63. FULL OUTER JOIN
  64. extended AS q2
  65. USING(
  66. date_week,
  67. week_no,
  68. cycle,
  69. vote_timestamp,
  70. vote_subject_hash
  71. )
  72. WHERE
  73. -- calculating edges only once
  74. q1.person > q2.person
  75. GROUP BY
  76. date_week,
  77. week_no,
  78. vote_timestamp,
  79. vote_subject_hash,
  80. cycle
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement