-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up accessors for time series and supplemental attributes #416
base: main
Are you sure you want to change the base?
Conversation
6c3dbab
to
e2b7959
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #416 +/- ##
==========================================
- Coverage 78.07% 77.90% -0.17%
==========================================
Files 69 69
Lines 5504 5612 +108
==========================================
+ Hits 4297 4372 +75
- Misses 1207 1240 +33
Flags with carried forward coverage won't be shown. Click here to find out more.
|
bb3a3b3
to
bfe1415
Compare
@@ -567,7 +567,7 @@ function _transform_single_time_series!( | |||
horizon = params.horizon, | |||
time_series_type = DeterministicSingleTimeSeries, | |||
scaling_factor_multiplier = get_scaling_factor_multiplier(metadata), | |||
internal = get_internal(metadata), | |||
internal = InfrastructureSystemsInternal(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous behavior was incorrect, but was harmless. It caused the metadata of the transformed object to have the same UUID, but no code ever looked at that.
src/time_series_manager.jl
Outdated
@@ -110,7 +110,11 @@ function add_time_series!(mgr::TimeSeriesManager, batch::Vector{TimeSeriesAssoci | |||
for uuid in new_ts_uuids | |||
serialize_time_series!(mgr.data_store, time_series_uuids[uuid]) | |||
end | |||
add_metadata!(mgr.metadata_store, owners, all_metadata) | |||
if length(all_metadata) == 1 | |||
add_metadata!(mgr.metadata_store, owners[1], all_metadata[1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is faster than calling the the array version.
@@ -23,14 +23,17 @@ const SUPPLEMENTAL_ATTRIBUTE_TABLE_NAME = "supplemental_attributes" | |||
|
|||
mutable struct SupplementalAttributeAssociations | |||
db::SQLite.DB | |||
# If we don't cache SQL statements, there is a cost of 3-4 us on every query. | |||
cached_statements::Dict{String, SQLite.Stmt} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some minor changes to the supplemental attribute associations, but did not profile them. If the time series changes work out, I can pursue these in greater detail.
# Caching compiled SQL statements saves 3-4 us per query query. | ||
# DBInterface.jl does something similar with @prepare. | ||
# We need this to be tied to our connection. | ||
cached_statements::Dict{String, SQLite.Stmt} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This applies to selective queries that we run frequently.
# This caching allows the code to skip some string interpolations. | ||
# It is experimental for PowerSimulations, which calls has_metadata frequently. | ||
# It may not be necessary. Savings are less than 1 us. | ||
has_metadata_statements::Dict{HasMetadataQueryKey, SQLite.Stmt} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cached applies only to the has_time_series
path, which is the most important to optimize for PSI. And is possibly over-engineered.
47db714
to
297a698
Compare
@daniel-thom now there is a workaround in PowerSimulations to handle this case "The fact that we expect has_time_series(component, Deterministic) to return true when only DeterministicSingleTimeSeries is present causes extra latency." and any impact of that can be addressed. |
Currently, |
@daniel-thom I tested this branch and it should be merged. These are the results on the EI main: Build of The EI: 407.636684 seconds (3.26 G allocations: 109.041 GiB, 43.34% gc time, 17.21% compilation time: 2% of which was recompilation) Loop 10 times over all StaticInjections and call This branch: Build of The EI: 384.807498 seconds (3.25 G allocations: 108.283 GiB, 44.03% gc time, 17.02% compilation time: 2% of which was recompilation) Loop 10 times over all StaticInjections and call |
6eee783
to
b71e91a
Compare
b71e91a
to
42c4f21
Compare
42c4f21
to
d5cac54
Compare
d5cac54
to
95f9851
Compare
Latency fixes
The existing code had a few issues that caused slowness when calling
has_time_series
orget_time_series
:to_json(sys)
followed bySystem(file.json)
. The new code ensures that they are always set. This could have caused latency of up to 30-40 us on each query.SQLite.DBInterface.execute(db, "a sql query as string")
has an overhead of ~3-4 us. The delay seems to come from compiling the SQL statement. I'll note that Python does not suffer from this latency. Interpolating strings causes additional latency. This new code caches the statements. It also avoids string interpolations, but only for thehas_time_series
code path. This is perhaps over-engineered. If this still isn't fast enough for PSI, we need to re-think the entire idea and allow the application to defined what time series attributes should be cached.get_time_series
results in a SQL query to retrieve the time series metadata, which is stored as a JSON string, and then deserialize it into an object. That takes ~30 us. In most cases, this is harmless. Reading the data from the disk takes a lot longer than that. Some users might want to use the in-memory time series storage, and so might notice this delay. To solve that I split the metadata-as-json data into a separate table and added caching of the real metadata objects. De-serialization will only occur on the first read request.has_time_series(component, Deterministic)
to return false if the component hasDeterministicSingleTimeSeries
. This PR fixes the bug.has_time_series(component, Deterministic)
to return true when only DeterministicSingleTimeSeries is present causes extra latency. We could remove that latency by changing the requirement (make the user ask for the exact type stored) or add another column to the database table. We agreed to change this behavior in the next release.Bulk time series addition
This PR introduces a simplification to bulk time series addition. Previously, the user had to call
bulk_add_time_series!()
with an iterable ofTimeSeriesAssociation
instances. The code performed bulk database queries to check for duplicates and insert new rows. That was misguided. The code is now easier to write for users. They will do this instead:The new code is a ~10% faster than the old code. The user can still call the existing function
bulk_add_time_series!
. It redirects to the new code. We can deprecate that function now and remove it in the next major release.Supplemental attributes
Fixes #417 and #382