Guest User

Untitled

a guest
Jul 17th, 2018
102
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 2.79 KB | None | 0 0
  1. # Running vacuum and analyze in Sinter
  2.  
  3. dbt and Sinter have the ability to run regular Redshift maintenance jobs. It's great to set these up early on in a project so that things stay clean as the project grows, and implementing these jobs in Sinter allows the same easy transparency and notifications as with your other dbt jobs.
  4.  
  5. This document will go through the specific steps necessary to configure vacuum and analyze jobs in the current version of dbt and Sinter. In the future, there will likely be a more idiomatically consistent way to express this logic using native dbt operations. Currently, this does work even if it is not elegant.
  6.  
  7.  
  8. ### Step 1: Create the macros
  9. macros/redshift_maintenance.sql
  10.  
  11. These macros will get called in a special operation model and are responsible for actually writing the SQL for the maintenance commands. While it is possible to write much more specific `vacuum` and `analyze` commands, often it is enough to simply execute the their default behaviors.
  12.  
  13. ```SQL
  14. {% macro vacuum(run) %}
  15. {% if run == true %}
  16. vacuum
  17. {% else %}
  18. select 1 as test
  19. {% endif %}
  20. {% endmacro %}
  21.  
  22. {% macro analyze(run) %}
  23. {% if run == true %}
  24. analyze
  25. {% else %}
  26. select 1 as test
  27. {% endif %}
  28. {% endmacro %}
  29.  
  30. ```
  31.  
  32.  
  33. ### Step 2: Create the model
  34. models/admin/warehouse_operation.sql
  35.  
  36. This model is called specifically to run maintenance jobs. While it will always be run with your project on a standard `dbt run`, it won't actually perform maintenance tasks unless the appropriate variable is passed in from the command line. As a convenience, it will log all runs of itself and track whether or not that run caused it to actually run the maintenance tasks.
  37.  
  38. ```SQL
  39. {{
  40. config({
  41. "materialized" : 'incremental',
  42. "sql_where" : 'TRUE',
  43. "post-hook" : [
  44. after_commit("{{ vacuum( var('maintenance', false) ) }}"),
  45. after_commit("{{ analyze( var('maintenance', false) ) }}")
  46. ]
  47. })
  48. }}
  49.  
  50. select
  51. current_timestamp as run_at,
  52. {{var('maintenance', false)}} as maintenance_jobs_run
  53. ```
  54.  
  55.  
  56. ### Step 3: Run the project
  57.  
  58. ```BASH
  59. dbt run --models warehouse_operation --vars '{ "maintenance": true }'
  60. ```
  61.  
  62. This command can be run locally or in Sinter.
  63.  
  64.  
  65. ### Recommendations
  66.  
  67. Your mileage may vary, but here are some good default recommendations for implementation:
  68.  
  69. - Schedule these jobs to run once a week on weekends.
  70. - Make sure your Sinter schedule doesn't cause other jobs to be run at the same time as this maintenance job.
  71. - If you're running maintenance tasks on an existing cluster for the first time, babysit them manually. Frequently, the first time these tasks are run on a cluster they take significantly longer because the cluster requires more work to be done. Once you're in a known good state, weekly jobs can be automated.
Add Comment
Please, Sign In to add comment