I have a table called
PendingExpense with a few simple columns but a lot of rows. I'm working on some queries for paginated GET responses, but running into some confusion working with the queries and the MySQL optimizer seemingly making a senseless decision to do a full index scan for the
ORDER BY clause before filtering from the
This is on MySQL version 8.0.23.
PendingExpense DDL (note, a companyId and loginCredentialId is how I specify a user in my schema):
create table PendingExpense ( ID bigint auto_increment primary key, LOGINCREDENTIALID int null, COMPANYID int null, DATE datetime null, -- ... other rows that don't pertain, e.g. amount, status, type, state, country, merchant ) create index IN_PendingExpense_COMPANYID_ASC_LOGINCREDENTIALID_ASC on PendingExpense (COMPANYID, LOGINCREDENTIALID); create index IN_PendingExpense_LOGINCREDENTIALID_ASC on PendingExpense (LOGINCREDENTIALID); create index IN_PendingExpense_Date on PendingExpense (DATE);
Then here are the two queries I'm comparing, they are identical other than the index hint. I'm including the execution plans for both immediately below:
Query 1 (no hints):
explain analyze select id from PendingExpense where COMPANYID = 1641 and LOGINCREDENTIALID = 2451 order by date DESC, id DESC limit 101; -- takes 5.5 seconds
-> Limit: 101 row(s) (cost=2356102.00 rows=101) (actual time=2292.676..4474.843 rows=101 loops=1) -> Filter: ((PendingExpense.LOGINCREDENTIALID = 2451) and (PendingExpense.COMPANYID = 1641)) (cost=2356102.00 rows=105) (actual time=2292.675..4474.818 rows=101 loops=1) -> Index scan on PendingExpense using IN_PendintExpense_Date (reverse) (cost=2356102.00 rows=5660) (actual time=0.088..4371.774 rows=1491859 loops=1)
Query 2 (index hint):
explain analyze select id from PendingExpense use index (IN_PendingExpense_COMPANYID_ASC_LOGINCREDENTIALID_ASC) where COMPANYID = 1641 and LOGINCREDENTIALID = 2451 order by date desc, id desc limit 101; -- .184 seconds
-> Limit: 101 row(s) (cost=9722.30 rows=101) (actual time=38.255..38.267 rows=101 loops=1) -> Sort: PendingExpense.`DATE` DESC, PendingExpense.ID DESC, limit input to 101 row(s) per chunk (cost=9722.30 rows=27778) (actual time=38.254..38.259 rows=101 loops=1) -> Index lookup on PendingExpense using IN_PendingExpense_COMPANYID_ASC_LOGINCREDENTIALID_ASC (COMPANYID=1641, LOGINCREDENTIALID=2451) (actual time=0.046..35.410 rows=14170 loops=1)
Essentially, I'm confused why MySql chooses to do the full index scan first before filtering on companyId / loginCredentialId when the index already exists for those two, causing significant inefficiencies. I'd much prefer to not have to specify index hints in my code/queries for cleanliness. I was under the impression MySQL generally chooses to run the where clause filtering first, especially if an index already exists for it.
Any help / hints / insight would be appreciated here. Thanks!