You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current system design creates a time series storage file (HDF5) that contains all time series arrays used by components in the system. Whenever components are removed from the system, associated time series arrays are also removed.
This has a downside: when people create derived systems, much of the time series data is copied. We can instead create a time series storage file that is shared across systems. We wouldn't want to track references across systems, and so deleting time series arrays would not likely be supported.
This would be very beneficial in an HPC environment where all compute nodes have access to a shared filesystem. For non-HPC environments, we could expand the feature by providing a way to store the data in a database server, such as postgres. That may present performance challenges when reading the data. Our existing data prefetching behavior may or may not be sufficient, and would require benchmarking. It would also require some thought about how to store multi-dimensional data.
Also, the current code generates UUIDs as unique identifiers for time series arrays. We probably want to change the identifiers to be hashes of the data. That would allow us to prevent storing the same data multiple times. We would have to account for two arrays with different dimensions that create the same hash (which could easily occur in dummy data used in tests).
The text was updated successfully, but these errors were encountered:
The current system design creates a time series storage file (HDF5) that contains all time series arrays used by components in the system. Whenever components are removed from the system, associated time series arrays are also removed.
This has a downside: when people create derived systems, much of the time series data is copied. We can instead create a time series storage file that is shared across systems. We wouldn't want to track references across systems, and so deleting time series arrays would not likely be supported.
This would be very beneficial in an HPC environment where all compute nodes have access to a shared filesystem. For non-HPC environments, we could expand the feature by providing a way to store the data in a database server, such as postgres. That may present performance challenges when reading the data. Our existing data prefetching behavior may or may not be sufficient, and would require benchmarking. It would also require some thought about how to store multi-dimensional data.
Also, the current code generates UUIDs as unique identifiers for time series arrays. We probably want to change the identifiers to be hashes of the data. That would allow us to prevent storing the same data multiple times. We would have to account for two arrays with different dimensions that create the same hash (which could easily occur in dummy data used in tests).
The text was updated successfully, but these errors were encountered: