Problem Statement
A report query is not completing as expected. The underlying query is as follows:
SQL1SELECT d.user_identifier,2 SUM(3 CASE4 WHEN CHAR_LENGTH(l.message) < 159 THEN 15 ELSE CEIL(CHAR_LENGTH(l.message) / 150)6 END7 ) AS msg_blocks8FROM log l9JOIN users u ON u.user_id = l.user_id10JOIN dashboard d ON d.dashboard_id = u.dashboard_id11WHERE (d.dashboard_id > 69 AND d.dashboard_id < 118)12 AND l.timestamp >= '2025-01-01 00:00:00'13 AND l.reply_message = 114GROUP BY d.user_identifier;15
Problem Replication
Confirmed that the original query hangs for more than 100 seconds in the database management tool.
Troubleshooting
Suspected that the CASE and CHAR_LENGTH() functions might be causing full table scans. A slight refactor was attempted:
SQL1SELECT d.user_identifier,2 SUM(3 CASE4 WHEN CHAR_LENGTH(l.message) < 159 THEN 15 ELSE CEIL(CHAR_LENGTH(l.message) / 150)6 END7 ) AS msg_blocks8FROM log l9JOIN users u ON u.user_id = l.user_id10JOIN dashboard d ON d.dashboard_id = u.dashboard_id11WHERE (d.dashboard_id > 69 AND d.dashboard_id < 118)12 AND l.timestamp >= '2025-01-01 00:00:00'13 AND l.reply_message = 114GROUP BY d.user_identifier;15
There was no improvement in performance; execution still exceeded 100 seconds.
Isolating the Issue with a Simplified Query
SQL1SELECT2 l.user_id,3 SUM(4 CASE5 WHEN CHAR_LENGTH(l.message) < 159 THEN 16 ELSE CEIL(CHAR_LENGTH(l.message) / 150)7 END8 ) AS msg_blocks9FROM log l10WHERE l.timestamp >= '2025-01-01 00:00:00'11 AND l.reply_message = 112GROUP BY l.user_id;13
Even without joins, the query still takes over 100 seconds.
Indexing Investigation
Checked the number of rows scanned:
SQL1SELECT COUNT(l.user_id) AS record_count2FROM log l3WHERE l.timestamp >= '2025-01-01 00:00:00'4 AND l.reply_message = 1;5
Returned approximately 55,000 rows but still took more than 10 seconds, indicating potential indexing issues.
Index Addition
SQL1ALTER TABLE log ADD INDEX idx_log_reply_timestamp (reply_message, timestamp);2
This improved count queries, but the main SUM query still showed no significant improvement. Precomputing results using temporary tables or CTEs was considered but would not address the root cause.
Checking for Table Fragmentation
SQL1SELECT2 TABLE_NAME,3 ENGINE,4 DATA_LENGTH,5 INDEX_LENGTH,6 DATA_FREE7FROM information_schema.tables8WHERE TABLE_NAME = 'log';9
Results showed significant fragmentation, with approximately 30% of the table's size marked as free space. Rebuilding the table was considered.
Table Rebuild Using Online Schema Change
Backup
BASH1sudo mysqldump -u root -p --single-transaction --quick --triggers database_name log | sudo tee /home/backups/mysql/log_backup_$(date +\%F).sql > /dev/null2
Checked the backup with:
BASH1head -n 20 /home/backups/mysql/log_backup_$(date +\%F).sql2
Running pt-online-schema-change
BASH1pt-online-schema-change --execute --alter "ENGINE=InnoDB" \2 --user=root --ask-pass \3 --socket=/var/run/mysqld/mysqld.sock \4 --alter-foreign-keys-method=auto D=database_name,t=log5
The rebuild took approximately 30 minutes.
Solution Testing
Retested the query post-rebuild:
SQL1SELECT l.user_id,2 SUM(3 CASE4 WHEN CHAR_LENGTH(l.message) < 159 THEN 15 ELSE CEIL(CHAR_LENGTH(l.message) / 150)6 END7 ) AS msg_blocks8FROM log l9WHERE l.timestamp >= '2025-01-01 00:00:00'10 AND l.reply_message = 111GROUP BY l.user_id;12
Performance improved but issues persisted for queries spanning dates beyond mid-January.
Additional Index Optimization
Added a composite index incorporating user_id:
SQL1ALTER TABLE log ADD INDEX idx_log_reply_timestamp_user (reply_message, timestamp, user_id);2
This improved performance for mid-range queries but did not resolve issues for broader date ranges.
Execution Plan Issues and Index Forcing
Using EXPLAIN, MySQL reported an optimal execution plan, but the actual execution time did not reflect this. Forcing the index manually resolved the issue:
SQL1SELECT l.user_id,2 SUM(3 CASE4 WHEN CHAR_LENGTH(l.message) < 159 THEN 15 ELSE CEIL(CHAR_LENGTH(l.message) / 150)6 END7 ) AS msg_blocks8FROM log l FORCE INDEX (idx_log_reply_timestamp_user)9WHERE l.timestamp >= '2025-01-01 00:00:00'10 AND l.reply_message = 111GROUP BY l.user_id;12
Recommendations
Recreate Old Indexes – If MySQL is misreporting index usage, dropping and recreating indexes may help.
Partitioning – The log table should be partitioned by timestamp to optimize time-based queries.
Automate Table Rebuilds – Periodic rebuilds using pt-online-schema-change via a cron job could prevent fragmentation issues.
Index Optimization Monitoring – Regularly check query plans and adjust indexes to ensure optimal performance.