Add support for a time series storage file that can be shared across systems. #412

daniel-thom · 2024-11-06T00:05:08Z

The current system design creates a time series storage file (HDF5) that contains all time series arrays used by components in the system. Whenever components are removed from the system, associated time series arrays are also removed.

This has a downside: when people create derived systems, much of the time series data is copied. We can instead create a time series storage file that is shared across systems. We wouldn't want to track references across systems, and so deleting time series arrays would not likely be supported.

This would be very beneficial in an HPC environment where all compute nodes have access to a shared filesystem. For non-HPC environments, we could expand the feature by providing a way to store the data in a database server, such as postgres. That may present performance challenges when reading the data. Our existing data prefetching behavior may or may not be sufficient, and would require benchmarking. It would also require some thought about how to store multi-dimensional data.

Also, the current code generates UUIDs as unique identifiers for time series arrays. We probably want to change the identifiers to be hashes of the data. That would allow us to prevent storing the same data multiple times. We would have to account for two arrays with different dimensions that create the same hash (which could easily occur in dummy data used in tests).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for a time series storage file that can be shared across systems. #412

Add support for a time series storage file that can be shared across systems. #412

daniel-thom commented Nov 6, 2024

Add support for a time series storage file that can be shared across systems. #412

Add support for a time series storage file that can be shared across systems. #412

Comments

daniel-thom commented Nov 6, 2024