A Database Session Leak Can Slow Down Your Database

Sahil Aggarwal
3 min readSep 21, 2019

In this article, I will explain some consequences of session leaks, which I faced when an issue came to me.

Session Leak is very common in developers’ lives. In this article, I will explain some consequences of session leaks, which I faced when an issue came to me and how I came to the root cause of the issue.

Issue and Analysis

Issue: Load is increasing on the server and Postgres queries are consuming CPU and taking time.

I had to resolve this issue and find the root cause of it.

Analysis: After all my debugging, I came to the following analysis:

There is a table that has 200 rows, but the number of live tuples showing there is more than that (around 60K). We are using Postgresql 9.3.

The following are the queries that I ran.

select count(*) from subscriber_offset_manager;count
-------
200 (1 row)
SELECT schemaname,relname,n_live_tup,n_dead_tup FROM pg_stat_user_tables where relname='subscriber_offset_manager' ORDER BY n_dead_tup ;schemaname | relname | n_live_tup | n_dead_tup
------------+---------------------------+------------+------------public | subscriber_offset_manager | 61453 | 5 (1 row)

But as seen from pg_stat_activity and pg_locks, we are not able to track any open connection.

SELECT query, state,locktype,mode FROM pg_locks JOIN pg_stat_activity USING (pid) WHERE relation::regclass = 'subscriber_offset_manager'::regclass ;query | state | locktype | mode
-------+-------+----------+------
(0 rows)

I also tried full vacuum on this table. Below were the results:

  • All the times no rows were removed
  • Many times all the live tuples become dead tuples.

Here is the output of the running vacuum command:

vacuum FULL VERBOSE ANALYZE subscriber_offset_manager;INFO: vacuuming "public.subscriber_offset_manager"INFO: "subscriber_offset_manager": found 0 removable, 67920 nonremovable row versions in714 pagesDETAIL: 67720 dead row versions cannot be removed yet. CPU 0.01s/0.06u sec elapsed 0.13 sec.INFO: analyzing "public.subscriber_offset_manager"INFO: "subscriber_offset_manager": scanned 710 of 710 pages, containing 200 live rows and67720 dead rows; 200 rows in sample, 200 estimated total rows VACUUMafter that i checked for live and dead tuples for that table as follows :SELECT schemaname,relname,n_live_tup,n_dead_tup FROM pg_stat_user_tables where relname='subscriber_offset_manager' ORDER BY n_dead_tup ;schemaname | relname | n_live_tup | n_dead_tup
------------+---------------------------+------------+------------public | subscriber_offset_manager | 200 | 67749

After 10 seconds:

SELECT schemaname,relname,n_live_tup,n_dead_tup FROM pg_stat_user_tables where relname='subscriber_offset_manager' ORDER BY n_dead_tup ;schemaname | relname | n_live_tup | n_dead_tup
------------+---------------------------+------------+------------public | subscriber_offset_manager | 68325 | 132

All the dead tuples moved to live tuples instead of cleaning up.

One more interesting observation: When I stop my Java app and then do a full vacuum, it works fine (number of rows and live tuples become equal). So there is something wrong if we select and update continuously from the Java app.

After all the research and analysis and help for stack overflow and after following many links, I found the following root cause.

Root Cause:

When there is one long-running transaction or a database session leak, then dead tuples are created after the start time of that transaction and will not be cleaned up by the vacuum for all the tables for that database. This is due to the PostgreSQL vacuum process that checks for a transaction ID less than the transaction ID of the oldest transaction for cleaning dead rows. The transaction ID is generated globally.

When I checked, I found a transaction that was opened for too long and when I killed it, the vacuum worked fine.

Please read the below-given links for detailed info and effects of not ending database transaction and some Postgres internals. These links helped me a lot in solving the issue.

1.Question asked by me on StackOverFlow: https://stackoverflow.com/questions/51844616/high-number-of-live-dead-tuples-in-postgresql-vacuum-not-working/51849491?noredirect=1#comment90728543_51849491

2. When Postgresql Vacuum does not Work: https://www.cybertec-postgresql.com/en/reasons-why-vacuum-wont-remove-dead-rows/

3. Consequences of not ending a database transaction: https://stackoverflow.com/questions/33815267/what-are-the-consequences-of-not-ending-a-database-transaction/33815610#33815610

4.Some Postgres internals and how the vacuum works internally in Postgres: https://fritshoogland.wordpress.com/category/postgresql/

Explanation of this blog is as follows :

--

--