Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- ### Finding long running queries in Postgres
- There have been a few times in the last several months at $DAY_JOB during which the following query,
- or some slight variation of it,
- has been used to identify long-running and potentially problematic queries:
- ```
- select pid,
- now() - query_start as duration,
- query,
- state
- from pg_stat_activity
- ```
- I believe that this can yield misleading results as other things need to be taken into consideration.
- To illustrate this, I'm going to iterate on the above query in one session,
- and issue other test queries in another one.
- OK... let's start with a clean slate and run this query for the first time;
- on my local machine I happen to see the following:
- ```
- pid | query | state | duration
- -------+-----------------------------------------+--------+-----------------
- 84306 | SHOW TRANSACTION ISOLATION LEVEL | idle | 00:25:31.513976
- 84307 | SHOW TRANSACTION ISOLATION LEVEL | idle | 00:25:31.509142
- 84318 | SHOW TRANSACTION ISOLATION LEVEL | idle | 00:25:01.31746
- 84319 | SHOW TRANSACTION ISOLATION LEVEL | idle | 00:25:01.311403
- 83673 | select pid, +| active | 00:00:00
- | query, +| |
- | state, +| |
- | now() - query_start as duration+| |
- | from pg_stat_activity; | |
- (5 rows)
- ```
- This is all stuff we can ignore for this demonstration so I'm going to alter the query a little bit to exclude them:
- ```
- select pid,
- now() - query_start as duration,
- query,
- state
- from pg_stat_activity
- where query not ilike '%pg_stat_activity%'
- and query not ilike '%transaction%';
- ```
- This is a little better:
- ```
- pid | duration | query | state
- -----+----------+-------+-------
- (0 rows)
- ```
- Let's issue a query from another session:
- ```
- genome_development=# \timing
- Timing is on.
- genome_development=# select 'Hi, I''m not a long-running query!';
- ?column?
- -----------------------------------
- Hi, I'm not a long-running query!
- (1 row)
- Time: 0.314 ms
- ```
- ... and then wait a few moments before looking at things in our main session:
- ```
- genome_development=# select pid,
- now() - query_start as duration,
- query,
- state
- from pg_stat_activity
- where query not ilike '%pg_stat_activity%'
- and query not ilike '%transaction%';
- pid | duration | query | state
- -------+----------------+----------------------------------------------+-------
- 85188 | 00:01:08.68768 | select 'Hi, I''m not a long-running query!'; | idle
- (1 row)
- ```
- What's this? This says that the duration is over 1 minute...
- but `psql` in the other session said that our query took 0.314 ms.
- What's happening here?
- The problem here is that there are other fields in this view that we need to look at to determine what's _actually_ taking a long time.
- Notice that the state of this row is `idle`.
- That means the process that this row represents is _not_ currently executing the query.
- To see this, let's issue this which is purposefully constructed to take 30 seconds to run:
- ```
- select 'But _I_ am indeed a long-running query.' from pg_sleep(30);
- ```
- ... and quickly switch back to the main session to see what the view is tells us:
- ```
- genome_development=# select pid,
- now() - query_start as duration,
- query,
- state
- pg_stat_activity
- where query not ilike '%pg_stat_activity%'
- and query not ilike '%transaction%';
- pid | duration | query | state
- -------+-----------------+---------------------------------------------------------------------+--------
- 85188 | 00:00:03.551807 | select 'But _I_ am indeed a long-running query.' from pg_sleep(30); | active
- (1 row)
- ```
- Wait... is this saying that the query took just under 4 seconds?
- No... because the process state is still `active` and the query is still running.
- If we wait for the full thirty seconds to pass, we should see a state change:
- ```
- genome_development=# select pid,
- now() - query_start as duration,
- query,
- state
- pg_stat_activity
- where query not ilike '%pg_stat_activity%'
- and query not ilike '%transaction%';
- pid | duration | query | state
- -------+-----------------+---------------------------------------------------------------------+-------
- 85188 | 00:00:45.365226 | select 'But _I_ am indeed a long-running query.' from pg_sleep(30); | idle
- (1 row)
- ```
- Aha... the process is `idle` now... but wait... the duration is wrong again.
- What is going on?
- The problem this time is that we need to look at something in addition to the `query_start` field to determine how long a query actually took to run.
- It turns out that the `pg_stat_activity` view also has a `state_change` field that we can exploit to properly figure this out.
- According to the documentation, the `state_change` field is the "Time when the state was last changed".
- (Duh.)
- So, if a process completes executing a query, then its state will go from `active` to `idle`, and thus we should see that the difference between the `state_change` and `query_start` field will be the actual execution time:
- Let's see:
- ```
- genome_development=# select pid,
- query_start,
- state_change,
- query,
- state
- from pg_stat_activity
- where query not ilike '%pg_stat_activity%'
- and query not ilike '%transaction%';
- pid | query_start | state_change | query | state
- -------+-------------------------------+-------------------------------+---------------------------------------------------------------------+-------
- 85188 | 2017-06-23 13:48:23.776142-04 | 2017-06-23 13:48:53.778372-04 | select 'But _I_ am indeed a long-running query.' from pg_sleep(30); | idle
- (1 row)
- ```
- Well, look at that... they're 30 seconds apart, just as we expected.
- So... what did we learn here?
- What we need to do is make sure that we _only_ look at processes that are currently `active`;
- those are the ones that are actually running queries.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement