introduce data splits in dataset descriptor (#4012)

Summary: Pull Request resolved: #4012 This should enable us to read units of data where the unit is a split instead of reading X number of rows. Reviewed By: kuarora, asadoughi Differential Revision: D65429573 fbshipit-source-id: 27d901fe83840c3b2bd3cca66fbad3721b12a9ec
facebookresearch · Nov 6, 2024 · 1f2b7ce · 1f2b7ce
1 parent cfd4804
commit 1f2b7ce
Showing 1 changed file with 9 additions and 0 deletions.
diff --git a/benchs/bench_fw/descriptors.py b/benchs/bench_fw/descriptors.py
@@ -85,6 +85,15 @@ class DatasetDescriptor:
 
     embedding_id_column: Optional[str] = None
 
+    # unused in open-source
+    splits_distribution: Optional[List[List[bytes]]] = None
+
+    # unused in open-source
+    splits: Optional[List[bytes]] = None
+
+    # unused in open-source
+    serialized_df: Optional[str] = None
+
     sampling_rate: Optional[float] = None
 
     # sampling column for xdb