Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- # Running vacuum and analyze in Sinter
- dbt and Sinter have the ability to run regular Redshift maintenance jobs. It's great to set these up early on in a project so that things stay clean as the project grows, and implementing these jobs in Sinter allows the same easy transparency and notifications as with your other dbt jobs.
- This document will go through the specific steps necessary to configure vacuum and analyze jobs in the current version of dbt and Sinter. In the future, there will likely be a more idiomatically consistent way to express this logic using native dbt operations. Currently, this does work even if it is not elegant.
- ### Step 1: Create the macros
- macros/redshift_maintenance.sql
- These macros will get called in a special operation model and are responsible for actually writing the SQL for the maintenance commands. While it is possible to write much more specific `vacuum` and `analyze` commands, often it is enough to simply execute the their default behaviors.
- ```SQL
- {% macro vacuum(run) %}
- {% if run == true %}
- vacuum
- {% else %}
- select 1 as test
- {% endif %}
- {% endmacro %}
- {% macro analyze(run) %}
- {% if run == true %}
- analyze
- {% else %}
- select 1 as test
- {% endif %}
- {% endmacro %}
- ```
- ### Step 2: Create the model
- models/admin/warehouse_operation.sql
- This model is called specifically to run maintenance jobs. While it will always be run with your project on a standard `dbt run`, it won't actually perform maintenance tasks unless the appropriate variable is passed in from the command line. As a convenience, it will log all runs of itself and track whether or not that run caused it to actually run the maintenance tasks.
- ```SQL
- {{
- config({
- "materialized" : 'incremental',
- "sql_where" : 'TRUE',
- "post-hook" : [
- after_commit("{{ vacuum( var('maintenance', false) ) }}"),
- after_commit("{{ analyze( var('maintenance', false) ) }}")
- ]
- })
- }}
- select
- current_timestamp as run_at,
- {{var('maintenance', false)}} as maintenance_jobs_run
- ```
- ### Step 3: Run the project
- ```BASH
- dbt run --models warehouse_operation --vars '{ "maintenance": true }'
- ```
- This command can be run locally or in Sinter.
- ### Recommendations
- Your mileage may vary, but here are some good default recommendations for implementation:
- - Schedule these jobs to run once a week on weekends.
- - Make sure your Sinter schedule doesn't cause other jobs to be run at the same time as this maintenance job.
- - If you're running maintenance tasks on an existing cluster for the first time, babysit them manually. Frequently, the first time these tasks are run on a cluster they take significantly longer because the cluster requires more work to be done. Once you're in a known good state, weekly jobs can be automated.
Add Comment
Please, Sign In to add comment