Recently I had issues on a production database running slow on tables with more than 1M rows.
This is a complex Ruby on Rails app connected to a Postgresql database, I went through the whole stack to figure out that it’s a simple index problem. Very few or no related doc was found about this issue, therefor I quickly post a summary of my adventure.
In Rails I often write something simple as ActivityLog.first or Player.find(123).activity_logs.first , where the model ActivityLog have a default scope describe like so:
class ActivityLog < ActiveRecord::Base default_scope { order("created_at desc") } end
This leads to a SQL query looking like this:
SELECT "activity_logs".* FROM "activity_logs" WHERE "activity_logs"."player_id" = 123 ORDER BY created_at ASC LIMIT 1;
This is probably one of the most common query in my Rails applications, and certainly not only in my applications.
Unfortunately, this may become a slow query on big tables. Indeed, if this query is often called your app may become from sticky to unusable (the latter was my case).
The following example illustrate the issue.
I recall it: this may be a very common query:
# EXPLAIN ANALYSE SELECT "activity_logs".* FROM "activity_logs" WHERE "activity_logs"."player_id" = 123 ORDER BY created_at ASC LIMIT 1; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.42..464.30 rows=1 width=708) (actual time=1124.357..1124.357 rows=0 loops=1) -> Index Scan using index_activity_logs_on_created_at on activity_logs (cost=0.42..421660.15 rows=909 width=708) (actual time=1124.356..1124.356 rows=0 loops=1) Filter: (player_id = 123) Rows Removed by Filter: 1033340 Planning time: 0.084 ms Execution time: 1124.378 ms (6 rows)
This query on my dev machine, which is way more powerful than our production VPS takes 1’124ms to run. If called only once per user’s click, this would immediately degrade your Apdex score.
As described in Postresql doc and by reading the EXPLAIN (ANALYZE, BUFFERS) output, a lot of disk access caused by the filter on player_id makes the query slow.
The numbers provided by BUFFERS help to identify which parts of the query are the most I/O-intensive.
# EXPLAIN (ANALYSE, BUFFERS) SELECT "activity_logs".* FROM "activity_logs" WHERE "activity_logs"."player_id" = 123 ORDER BY created_at ASC LIMIT 1; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.42..464.30 rows=1 width=708) (actual time=1127.860..1127.860 rows=0 loops=1) Buffers: shared hit=405695 read=445170 -> Index Scan using index_activity_logs_on_created_at on activity_logs (cost=0.42..421660.15 rows=909 width=708) (actual time=1127.858..1127.858 rows=0 loops=1) Filter: (player_id = 123) Rows Removed by Filter: 1033340 Buffers: shared hit=405695 read=445170 Planning time: 0.083 ms Execution time: 1127.880 ms (8 rows)
The planner took the decision to use an index scan on created_at (the ORDER BY attribute) and then filter player_id which in this situation isn’t the best choice. Disabling it with SET enable_indexscan = OFF; improve drastically the performances, but it’s not a production solution.
# SET enable_indexscan = OFF; # EXPLAIN ANALYSE SELECT "activity_logs".* FROM "activity_logs" WHERE "activity_logs"."player_id" = 123 ORDER BY created_at ASC LIMIT 1; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=3395.09..3395.09 rows=1 width=708) (actual time=0.026..0.026 rows=0 loops=1) -> Sort (cost=3395.09..3397.36 rows=909 width=708) (actual time=0.025..0.025 rows=0 loops=1) Sort Key: created_at Sort Method: quicksort Memory: 25kB -> Bitmap Heap Scan on activity_logs (cost=19.47..3390.54 rows=909 width=708) (actual time=0.021..0.021 rows=0 loops=1) Recheck Cond: (player_id = 123) -> Bitmap Index Scan on index_activity_logs_on_player_id (cost=0.00..19.24 rows=909 width=0) (actual time=0.019..0.019 rows=0 loops=1) Index Cond: (player_id = 268) Planning time: 0.087 ms Execution time: 0.052 ms (10 rows)
The query execution time drop down to 0.05ms which is factor of 22’480 from the previous planner decision !
As described here another way to trick the planner is to add another field in the ORDER BY clause.
# EXPLAIN (ANALYSE, BUFFERS) SELECT "activity_logs".* FROM "activity_logs" WHERE "activity_logs"."player_id" = 268 ORDER BY created_at ASC, id ASC LIMIT 1; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=3395.09..3395.09 rows=1 width=708) (actual time=0.038..0.038 rows=0 loops=1) Buffers: shared read=3 -> Sort (cost=3395.09..3397.36 rows=909 width=708) (actual time=0.037..0.037 rows=0 loops=1) Sort Key: created_at, id Sort Method: quicksort Memory: 25kB Buffers: shared read=3 -> Bitmap Heap Scan on activity_logs (cost=19.47..3390.54 rows=909 width=708) (actual time=0.034..0.034 rows=0 loops=1) Recheck Cond: (player_id = 268) Buffers: shared read=3 -> Bitmap Index Scan on index_activity_logs_on_player_id (cost=0.00..19.24 rows=909 width=0) (actual time=0.032..0.032 rows=0 loops=1) Index Cond: (player_id = 268) Buffers: shared read=3 Planning time: 0.093 ms Execution time: 0.060 ms (14 rows)
Which improve the execution time in a comparable way. In a hurry this the solution a adopted, I have quickly patched my scope that way to improve globally the performances. It was safe and easy to commit with the advantage not to expand database indices size.
class ActivityLog < ActiveRecord::Base default_scope { order("created_at desc, id desc") } end
But for the sake of a better understanding I dug a bit more this issue. I have more than one big table, and I have more than one app online… I don’t like to keep such gray zone and smoking work-around in my code.
First, I tried to help the planner change its decision by tweaking statistics on every column, but it didn’t changed planner decision and didn’t improve the execution time.
# ALTER TABLE activity_logs ALTER COLUMN created_at SET STATISTICS 1000; # EXPLAIN (ANALYSE, BUFFERS) SELECT "activity_logs".* FROM "activity_logs" WHERE "activity_logs"."player_id" = 268 ORDER BY created_at ASC LIMIT 1; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.42..474.81 rows=1 width=707) (actual time=1145.167..1145.167 rows=0 loops=1) Buffers: shared hit=405660 read=445205 -> Index Scan using index_activity_logs_on_created_at on activity_logs (cost=0.42..421251.54 rows=888 width=707) (actual time=1145.165..1145.165 rows=0 loops=1) Filter: (player_id = 268) Rows Removed by Filter: 1033340 Buffers: shared hit=405660 read=445205 Planning time: 0.092 ms Execution time: 1145.187 ms (8 rows)
Then I realized that simply building an index on player_id, my condition, and created_at, my ordering attribute, should help Postgersql extracting my query while using the actual planner decision.
CREATE INDEX activity_logs_player_id_created_at ON activity_logs (player_id, created_at);
…and the end result is pleasant enough. 0.03ms – my work is almost done.
# SET enable_indexscan = ON; # EXPLAIN (ANALYSE, BUFFERS) SELECT "activity_logs".* FROM "activity_logs" WHERE "activity_logs"."player_id" = 268 ORDER BY created_at ASC LIMIT 1; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.42..4.45 rows=1 width=707) (actual time=0.013..0.013 rows=0 loops=1) Buffers: shared hit=3 -> Index Scan using activity_logs_player_id_created_at on activity_logs (cost=0.42..3570.31 rows=888 width=707) (actual time=0.012..0.012 rows=0 loops=1) Index Cond: (player_id = 268) Buffers: shared hit=3 Planning time: 0.099 ms Execution time: 0.031 ms (7 rows)
On the Rails side a simple migration like the following change my user’s life, and restored my peace of mind.
class AddIndicesToBigTables < ActiveRecord::Migration def change add_index :activity_logs, [:player_id, :created_at], unique: false end end