-
-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed email_recipients indexes to match query usage #19918
Fixed email_recipients indexes to match query usage #19918
Conversation
closes https://linear.app/tryghost/issue/ENG-791/migration-to-fix-email-recipients-indexes Our indexes over single columns (`delivered_at`, `opened_at`, `failed_at`) were ineffective because the only time we query those is alongside `email_id` meaning we were frequently performing full table scans on very large tables during our email analytics jobs. - added migration to add new indexes covering `email_id` and the respective columns - added migration to drop the old indexes that weren't being used in any query plans
It looks like this PR contains a migration 👀 General requirements
Schema changes
Data changes
|
@kevinansfield there's a few more things I think we should get clear on:
Thank you! |
There's no query cache in MySQL 8 as far as I can tell. To avoid cases of query data being in memory I was restarting the server between query runs. |
In this case we're replacing an already existing index so we shouldn't have huge differences. I can take a look at timing a batch insert operation. |
I did try looking into the The explain output does have |
My understanding from the docs is that In the properly-indexed query it's showing a Here's the full table create table output for each before:
after:
|
Could we use the optimizer cost model to find out what the impact of the cost of the query is to MySQL? |
I had used before:
after:
|
With the queries split out as in #19917 there may be some marginal benefit to changing the queries to match typical patterns. Counting failed records is quick because there's usually very few of them
Conversely counting delivered records with
If we can flip that to count only the non-delivered records then it's a much quicker query
However, that requires us to already have a total
From there it leaves us with calculating the |
closes https://linear.app/tryghost/issue/ENG-791/migration-to-fix-email-recipients-indexes
Our indexes over single columns (
delivered_at
,opened_at
,failed_at
) were ineffective because the only time we query those is alongsideemail_id
meaning we were frequently performing full table scans on very large tables during our email analytics jobs.email_id
and the respective columnsLocal runtime with ~2M email_recipient rows:
Explain output...
before:
after: