-
Notifications
You must be signed in to change notification settings - Fork 766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: Positions table index on Odometer and Car_ID to improve response time of some queries #3801
perf: Positions table index on Odometer and Car_ID to improve response time of some queries #3801
Conversation
✅ Deploy Preview for teslamate ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
Thanks for the change. I appreciate the performance improvements as well. I feel like we should change it in the first place and add your suggested migration in addition. @brianmay What do you think? |
I had thought about this before, but adding index in positions means the index needs to be updated many times a second, this could possibly cause write issues for RPI etc... Not a problem for most, but I know we have a lot of RPI users too. What dashboard specifically are you noticing this wtih? Im trying to think of use cases where odometer is a useful index, but I don't have all the SQL up to look. Also there is already a car_id index. |
Some random thoughts:
|
Agree, I mean you're probably right, but I'd love to test the code some. So Ill await what dashboard is that slow first. |
Hi, the odometer is used on several dashboards, but it goes unnoticed because they are usually time filtered (30 days, 90 days, etc). But in the Battery Health dashboard there is a panel for the trips to get the distance logged in TeslaMate and it's extremely slow, specially this query:
I have a database with more than 16 millions of records in the table "positions" (of 3 cars as shown in the following screenshot): That query without this suggested index takes around 26 secs to run and with the index just 55ms (tested on a cloud server with 4 cores and 8GB of RAM). You said in the previous comment:
Yes, it is, and it's necessary (since all dashboards have to filter by car), but at the same time there is another composite index based on "date, drive_id" which is also used a lot (even having already an index based on date). That is why I'm proposing this composite index based on "odometer, car_id", which will be very useful especially for those who have more than one car in the database. If a simple index is created based only on the odometer field, it would work, but it would not be enough when there are more cars registered. |
Sorry to push back, I don't think you understand indexing. Indexing is used to FIND records. I doubt any of the dashboards have odometer in a join or in the where statement. Indexes are not used to provide data *with a few exceptions, B-tree indexes allow for efficient searching, insertion, and deletion of data. In your case, since you have such a large data set, if you want to add an index, go for it. It doesn't hurt anything most likely for you. But I really don't see a benefit to adding it to everyone. For my above comment of B-Tree indexing not helping most dashboards, it's prob accurate. But for a MIN/MAX situation it likely helps because it can just check the MIN/MAX of the index. But for returning 1:1 values and most use cases, it would not benefit anything. Also, most of the dashboards should be using the drive data NOT the positions data for min/max odometer readings. If they are not, they should be reviewed and possibly enhanced. Also, you can show the explain analyze for more indexing details and how the query optimizer works. EX: select min(start_km), max(end_km) https://explain-postgresql.com/archive/explain/b4250b3cd7cbef4f2d81032ac1fcef85:0:2024-04-03 using positions: |
Hi, I understand what is an Index for and agree with you, but this is the use case: In the battery health dashboard, we have this information: Where: "Logged" is the distance traveled that is saved on Teslamate database, and "Mileage" is the distance the car has traveled since using Teslamate. These are the queries:
Any problem here, fast and clean.
Without the index takes more than 20 seconds. With the index just milliseconds.
And the special one is this:
Without the index takes almost a minute. With the index just milliseconds. In both last queries we cannot use the drives table (of course it will be so fast), but, what about if there some unclosed drives? or if Teslamate was offline while driving. The real information about the odometer is only in the positions table, so we have to use it. The case here is to show the user, about how much mileages have been not saved in the database, when for some reason a drive hasn't been fully recorded, for example due to a bug or an unexpected restart and that Teslamate has not been able to record, either due to lack of connection, areas without signal, or that it has been out of service. I am open to hearing any improvements you can come up with to improve query performance for this data without the need of creating this proposed index. |
Explain is your friend here :-). Without the new index:
Adding the new index:
Running the above again:
The 1st is the exact same. It only touches the drives table. Hardly surprising. The 2nd one has changed the "index scan" to a "index only scan". I think this means we get get the max odometer value straight from the index, instead of having the search every car_id=1 entry. Cheap. The third query we are looking for every odometer entry, finding one with the latest date. The new index isn't helping here. Instead we do a search on the date index for the latest date, and then scan all of these entries for car_id=1. Which is going to be expensive if you have a lot of cars, but cheap if you only have 1 car. To me this looks like it should be improved if we had an index on (car_id, date) or (date, car_id). The 4th case I think instead of scanning each any every entry sequentially, we can get the max and min values straight from the index. |
So I am going to go back to my previous statement, wouldn't it be far more efficient just to use the drives for the query instead? A quick PR with a simple query change and that long query time go away. It made little difference for me, but bet it will be for you. The only difference may be the dashboard WHILE you are driving. But honestly, really who does that? I think thats a edge case that really shouldnt matter. One thing I dont mind is the query that orders by date :) That's a good use of indexing as the date field is indexed.
And why can't we use the other one? It seems a weird way to get the MAX record basically. I'm sure it was created a while ago, but that doesn't mean it's the best way to do it. Same results, much faster /w the 2nd.
And the other can easily be done as:
Working with existing indexed can also make some of these queries MUCH faster. EX: Changing
to:
made that query a lot faster. I threw together the code, so it's poorly written, but it gives an example of working with the existing indexing can make much of our existing queries MUCH fast. |
Thank you. You're right, it only makes a little difference of data using only the "drives" Table instead of using positions. |
Ok, I've managed to include these query changes on this PR #3806 |
This is an interesting dialog considering the issue that was raised here #3727 |
a similar index has been created (ideal_battery_range_km) in #4200. after 2 weeks of usage this index is within top 3 indexes used within PG. in addition it solves issues like #4225 (while this has also been addressed by filtering for date via #4228). in general dashboarding should follow these practices: |
I've realized that some queries from Grafana dashboard that shows information from the odometer are very slow specially on databases with too much data logged and for those who has more than one car.
This response speed is greatly improved with an index on the "positions" table based on the "odometer" and "car_id" fields.