diff --git a/draft-irtf-cfrg-vdaf.md b/draft-irtf-cfrg-vdaf.md index 245b7fa6..83980b88 100644 --- a/draft-irtf-cfrg-vdaf.md +++ b/draft-irtf-cfrg-vdaf.md @@ -355,11 +355,11 @@ In fact, our framework admits DAFs with slightly more functionality, computing aggregation functions of the form ~~~ -F(agg_param, meas_1, ..., meas_BATCH_SIZE) = - G(agg_param, meas_1) + ... + G(agg_param, meas_BATCH_SIZE) +F(agg_param, meas_1, ..., meas_M) = + G(agg_param, meas_1) + ... + G(agg_param, meas_M) ~~~ -where `meas_1, ..., meas_BATCH_SIZE` are the measurements, `G` is a possibly +where `meas_1, ..., meas_M` are the measurements, `G` is a possibly non-linear function, and `agg_param` is a parameter of that function chosen by the data collector. This paradigm, known as function secret sharing {{BGI15}}, allows for more sophisticated data analysis tasks, such as grouping metrics by @@ -859,7 +859,7 @@ Some common functionalities: | +-->| Aggregator 1 |--+ | | | +--------------+ | | +--------+-+ | ^ | +->+-----------+ -| Client |---+ | +--->| Collector |--> Aggregate +| Client |---+ | +--->| Collector |--> aggregate +--------+-+ +->+-----------+ | ... | | | @@ -928,90 +928,60 @@ additional discussion. A DAF scheme is used to compute a particular "aggregation function" over a set of measurements generated by Clients. Depending on the aggregation function, the -Collector might select an "aggregation parameter" and disseminates it to the +Collector might select an "aggregation parameter" and disseminate it to the Aggregators. The semantics of this parameter is specific to the aggregation function, but in general it is used to represent the set of "queries" that can -be made on the measurement set. For example, the aggregation parameter is used -to represent the candidate prefixes in Poplar1 {{poplar1}}. +be made by the Collector on the batch of measurements. For example, the +aggregation parameter is used to represent the candidate prefixes in the +Poplar1 VDAF {{poplar1}}. Execution of a DAF has four distinct stages: -* Sharding - Each Client generates input shares from its measurement and - distributes them among the Aggregators. -* Preparation - Each Aggregator converts each input share into an output share +* Sharding: Each Client generates input shares from its measurement and + distributes them among the Aggregators. In addition to the input shares, the + client generates a "public share" during this step that is disseminated to + all of the Aggregators. +* Preparation: Each Aggregator converts each input share into an output share compatible with the aggregation function. This computation involves the aggregation parameter. In general, each aggregation parameter may result in a - different an output share. -* Aggregation - Each Aggregator combines a sequence of output shares into its + different output share. +* Aggregation: Each Aggregator combines a sequence of output shares into its aggregate share and sends the aggregate share to the Collector. -* Unsharding - The Collector combines the aggregate shares into the aggregate +* Unsharding: The Collector combines the aggregate shares into the aggregate result. -Sharding and Preparation are done once per measurement. Aggregation and -Unsharding are done over a batch of measurements (more precisely, over the +Sharding and preparation are done once per measurement. Aggregation and +unsharding are done over a batch of measurements (more precisely, over the recovered output shares). -A concrete DAF specifies an algorithm for the computation needed in each of -these stages. The interface of each algorithm is defined in the remainder of -this section. In addition, a concrete DAF defines the associated constants and -types enumerated in the following table. +A concrete DAF specifies the algorithm for the computation needed in each of +these stages. The interface, denoted `Daf`, is defined in the remainder of this +section. In addition, a concrete DAF defines the associated constants and types +enumerated in the following table. | Parameter | Description | |:------------------|:---------------------------------------------------------------| | `ID: int` | Algorithm identifier for this DAF, in `[0, 2^32)`. | | `SHARES: int` | Number of input shares into which each measurement is sharded. | -| `NONCE_SIZE: int` | Size of the nonce passed by the application. | -| `RAND_SIZE: int` | Size of the random byte string passed to sharding algorithm. | +| `NONCE_SIZE: int` | Size of the nonce associated with the report. | +| `RAND_SIZE: int` | Size of the random byte string consumed by the sharding algorithm. | | `Measurement` | Type of each measurement. | | `PublicShare` | Type of each public share. | | `InputShare` | Type of each input share. | -| `AggParam` | Type of aggregation parameter. | +| `AggParam` | Type of the aggregation parameter. | | `OutShare` | Type of each output share. | | `AggShare` | Type of the aggregate share. | | `AggResult` | Type of the aggregate result. | {: #daf-param title="Constants and types defined by each concrete DAF."} -These types define the inputs and outputs of DAF methods at various stages of -the computation. Some of these values need to be written to the network in -order to carry out the computation. In particular, it is RECOMMENDED that -concrete instantiations of the `Daf` interface specify a method of encoding the -`PublicShare`, `InputShare`, and `AggShare`. - -Each DAF is identified by a unique, 32-bit integer `ID`. Identifiers for each -(V)DAF specified in this document are defined in {{codepoints}}. +The types in this table define the inputs and outputs of DAF methods at various +stages of the computation. Some of these values need to be written to the +network in order to carry forward the computation. In particular, it is +RECOMMENDED that concrete instantiations of the `Daf` interface specify a +standard encoding the `PublicShare`, `InputShare`, `AggParam`, and `AggShare`. ## Sharding {#sec-daf-shard} -In order to protect the privacy of its measurements, a DAF Client shards its -measurements into a sequence of input shares. The `shard` method is used for -this purpose. - -* `daf.shard(ctx: bytes, measurement: Measurement, nonce: bytes, rand: bytes) - -> tuple[PublicShare, list[InputShare]]` is the randomized sharding algorithm - run by each Client that consumes the application context, a measurement, and - a nonce and produces a "public share" distributed to each of the Aggregate - and a corresponding sequence of input shares, one for each Aggregator. - - Pre-conditions: - - * `nonce` MUST have length equal to `NONCE_SIZE` and MUST be generated using - a CSPRNG. - - * `rand` consists of the random bytes consumed by the algorithm. It MUST have - length equal to `RAND_SIZE` and MUST be generated using a CSPRNG. - - Post-conditions: - - * The number of input shares MUST equal `SHARES`. - -Sharding is bound to a specific "application context". The application context -is a string intended to uniquely identify an instance of the higher level -protocol that uses the DAF. This is intended to ensure that aggregation succeeds -only if the Clients and Aggregators agree on the application context. -(Preparation binds the application context, too; see {{sec-daf-prepare}}.) Note -that, unlike VDAFs ({{vdaf}}), there is no explicit signal of disagreement; it -may only manifest as a garbled aggregate result. - ~~~~ Client ====== @@ -1033,53 +1003,82 @@ may only manifest as a garbled aggregate result. V V V V V V Aggregator 0 Aggregator 1 Aggregator SHARES-1 ~~~~ -{: #shard-flow title="The Client divides its measurement into input shares and distributes them to the Aggregators. The public share is broadcast to all Aggregators."} +{: #shard-flow title="Illustration of the sharding algorithm."} + +The sharding algorithm run by each Client is defined as follows: + +* `daf.shard(ctx: bytes, measurement: Measurement, nonce: bytes, rand: bytes) + -> tuple[PublicShare, list[InputShare]]` consumes the "application context" + (defined below), a measurement, and a nonce and produces the public share, + distributed to each of the Aggregators, and the input shares, one for each + Aggregator. + + Pre-conditions: + + * `nonce` MUST have length equal to `daf.NONCE_SIZE` and MUST be generated + using a cryptographically secure random number generator (CSPRNG). + + * `rand` consists of the random bytes consumed by the algorithm. It MUST have + length equal to `daf.RAND_SIZE` and MUST be generated using a CSPRNG. + + Post-conditions: + + * The number of input shares MUST equal `daf.SHARES`. + +Sharding is bound to a specific "application context". The application context +is a string intended to uniquely identify an instance of the higher level +protocol that uses the DAF. The goal of binding the application to DAF +execution is to ensure that aggregation succeeds only if the Clients and +Aggregators agree on the application context. (Preparation binds the +application context, too; see {{sec-daf-prepare}}.) Note that, unlike VDAFs +({{vdaf}}), there is no explicit signal of disagreement; it may only manifest +as a garbled aggregate result. + +The nonce is a public random value associated with the report. It is referred +to as a nonce because normally it will also be used as a unique identifier for +that report in the context of some application. The randomness requirement is +especially important for VDAFs, where it may be used by the Aggregators to +derive per-report randomness for verification of the computation. See +{{nonce-requirements}} for details. ## Preparation {#sec-daf-prepare} -Once an Aggregator has received the public share and one of the input shares, -the next step is to prepare the input share for aggregation. This is -accomplished using the following algorithm: +Once an Aggregator has received the public share and its input share, the next +step is to prepare the input share for aggregation. This is accomplished using +the preparation algorithm: * `daf.prep(ctx: bytes, agg_id: int, agg_param: AggParam, nonce: bytes, - public_share: PublicShare, input_share: InputShare) -> OutShare` is the - deterministic preparation algorithm. It takes as input the public share and - one of the input shares generated by a Client, the application context, the - Aggregator's unique identifier, the aggregation parameter selected by the - Collector, and a nonce and returns an output share. + public_share: PublicShare, input_share: InputShare) -> OutShare` consumes the + public share and one of the input shares generated by the Client, the + application context, the Aggregator's unique identifier, the aggregation + parameter selected by the Collector, and the report nonce and returns an + output share. Pre-conditions: - * `agg_id` MUST be in `[0, SHARES)` and match the index of + * `agg_id` MUST be in the range `[0, daf.SHARES)` and match the index of `input_share` in the sequence of input shares produced by the Client. - * `nonce` MUST have length `NONCE_SIZE`. -## Validity of Aggregation Parameters {#sec-daf-validity-scopes} - -Concrete DAFs implementations MAY impose certain restrictions for input shares -and aggregation parameters. Protocols using a DAF MUST ensure that for each -input share and aggregation parameter `agg_param`, `daf.prep` is only called if -`daf.is_valid(agg_param, previous_agg_params)` returns True, where -`previous_agg_params` contains all aggregation parameters that have previously -been used with the same input share. + * `nonce` MUST have length `daf.NONCE_SIZE`. -DAFs MUST implement the following function: +The Aggregators MUST agree on the value of the aggregation parameter. +Otherwise, the aggregate result may be computed incorrectly by the Collector. -* `daf.is_valid(agg_param: AggParam, previous_agg_params: list[AggParam]) -> - bool`: checks if the `agg_param` is compatible with all elements of - `previous_agg_params`. +## Validity of Aggregation Parameters {#sec-daf-validity-scopes} -## Aggregation {#sec-daf-aggregate} +In general, it is permissible to aggregate a batch of reports multiple times. +However, to prevent privacy violations, DAFs may impose certain restrictions on +the aggregation parameters selected by the Collector. Restrictions are +expressed by the aggregation-parameter validity function: -Once an Aggregator holds an output share, it adds it into its aggregate share -for the batch (where batches are defined by the application). +* `daf.is_valid(agg_param: AggParam, previous_agg_params: list[AggParam]) -> + bool` returns `True` if `agg_param` is allowed given the sequence + `previous_agg_params` of previously accepted aggregation parameters. -* `daf.agg_init(agg_param: AggParam) -> AggShare` returns an empty aggregate - share. It is called to initialize aggregation of a batch of measurements. +Prior to accepting an aggregation parameter from the Collector and beginning +preparation, each Aggregator MUST validate it using this function. -* `daf.agg_update(agg_param: AggParam, agg_share: AggShare, out_share: - OutShare) -> AggShare` accumulates an output share into an aggregate share - and returns the updated aggregate share. +## Aggregation {#sec-daf-aggregate} ~~~~ Aggregator j @@ -1099,14 +1098,24 @@ for the batch (where batches are defined by the application). | V +------------+ - | agg_update |<--- out_share_MEAS_COUNT + | agg_update |<--- out_share_M +------------+ | V agg_share_j ~~~~ -{: #aggregate-flow title="Local aggregation of output shares by Aggregator j. -The number of measurements in the batch is denoted by MEAS_COUNT."} +{: #aggregate-flow title="Illustration of aggregation. The number of measurements in the batch is denoted by M."} + +Once an Aggregator holds an output share, it adds it into its aggregate share +for the batch. This streaming aggregation process is implemented by the +following pair of algorithms: + +* `daf.agg_init(agg_param: AggParam) -> AggShare` returns an empty aggregate + share. It is called to initialize aggregation of a batch of measurements. + +* `daf.agg_update(agg_param: AggParam, agg_share: AggShare, out_share: + OutShare) -> AggShare` accumulates an output share into an aggregate share + and returns the updated aggregate share. In many situations it is desirable to split an aggregate share across multiple storage elements, then merge the aggregate shares together just before @@ -1119,28 +1128,17 @@ with the following method: ### Aggregation Order {#agg-order} For most DAFs and VDAFs, the outcome of aggregation is not sensitive to the -order in which ouptut shares are aggregated. That is, aggregate shares can be -updated and merged with other aggregate shares in any order. For instance, for -both Prio3 ({{prio3}}) and Poplar1 ({{poplar1}}), the aggregate shares and -output shares both have the same type, a vector over some finite field -({{field}}); aggregation involves simply adding vectors toegther. +order in which output shares are aggregated. This means that aggregate shares +can be updated or merged with other aggregate shares in any order. For +instance, for both Prio3 ({{prio3}}) and Poplar1 ({{poplar1}}), the aggregate +shares and output shares both have the same type, a vector over some finite +field ({{field}}); and aggregation involves simply adding vectors together. -In theory, however, correct execution may require each Aggregator to aggregate -output shares in the same order. +In theory, however, there may be a DAF or VDAF for which correct execution +requires each Aggregator to aggregate output shares in the same order. ## Unsharding {#sec-daf-unshard} -After the Aggregators have aggregated a sufficient number of output shares, each -sends its aggregate share to the Collector, who runs the following algorithm to -recover the following output: - -* `daf.unshard(agg_param: AggParam, agg_shares: list[AggShare], - num_measurements: int) -> AggResult` is run by the Collector in order to - compute the aggregate result from the Aggregators' shares. The length of - `agg_shares` MUST be `SHARES`. `num_measurements` is the number of - measurements that contributed to each of the aggregate shares. This algorithm - is deterministic. - ~~~~ Aggregator 0 Aggregator 1 Aggregator SHARES-1 ============ ============ =================== @@ -1158,12 +1156,27 @@ recover the following output: Collector ========= ~~~~ -{: #unshard-flow title="Computation of the final aggregate result from aggregate -shares."} +{: #unshard-flow title="Illustration of unsharding."} + +After the Aggregators have aggregated all measurements in the batch, each sends +its aggregate share to the Collector, who runs the unsharding algorithm to +recover the aggregate result: + +* `daf.unshard(agg_param: AggParam, agg_shares: list[AggShare], + num_measurements: int) -> AggResult` consumes the aggregate shares + and produces the aggregate result. + + Pre-conditions: + + * The length of `agg_shares` MUST be `SHARES`. + + * `num_measurements` MUST equal the number of measurements in the batch. ## Execution of a DAF {#daf-execution} -Securely executing a DAF involves emulating the following procedure. +Secure execution of a DAF involves simulating the following procedure over an +insecure network. + ~~~ python def run_daf( @@ -1238,59 +1251,50 @@ provide in each. # Definition of VDAFs {#vdaf} -Like DAFs described in the previous section, a VDAF scheme is used to compute a -particular aggregation function over a set of Client-generated measurements. -Evaluation of a VDAF involves the same four stages as for DAFs: Sharding, -Preparation, Aggregation, and Unsharding. However, the Preparation stage will -require interaction among the Aggregators in order to facilitate verifiability -of the computation's correctness. Accommodating this interaction will require -syntactic changes. - -Overall execution of a VDAF comprises the following stages: - -* Sharding - Computing input shares from an individual measurement -* Preparation - Conversion and verification of input shares to output shares - compatible with the aggregation function being computed -* Aggregation - Combining a sequence of output shares into an aggregate share -* Unsharding - Combining a sequence of aggregate shares into an aggregate - result - -In contrast to DAFs, the Preparation stage for VDAFs now performs an additional -task: verification of the validity of the recovered output shares. This process -ensures that aggregating the output shares will not lead to a garbled aggregate -result. - -The remainder of this section defines the VDAF interface. The attributes are -listed in {{vdaf-param}} are defined by each concrete VDAF. - -| Parameter | Description | -|:------------------|:-------------------------| -| `ID` | Algorithm identifier for this VDAF. | -| `VERIFY_KEY_SIZE` | Size (in bytes) of the verification key ({{sec-vdaf-prepare}}). | -| `RAND_SIZE` | Size of the random byte string passed to sharding algorithm. | -| `NONCE_SIZE` | Size (in bytes) of the nonce. | -| `ROUNDS` | Number of rounds of communication during the Preparation stage ({{sec-vdaf-prepare}}). | -| `SHARES` | Number of input shares into which each measurement is sharded ({{sec-vdaf-shard}}). | -| `Measurement` | Type of each measurement. | -| `PublicShare` | Type of each public share. | -| `InputShare` | Type of each input share. | -| `AggParam` | Type of aggregation parameter. | -| `OutShare` | Type of each output share. | -| `AggShare` | Type of the aggregate share. | -| `AggResult` | Type of the aggregate result. | -| `PrepState` | Aggregator's state during preparation. | -| `PrepShare` | Type of each prep share. | -| `PrepMessage` | Type of each prep message. | +VDAFs are identical to DAFs except that preparation is an interactive process +carried out by the Aggregators. If successful, this process results in each +Aggregator computing an output share. The process will fail if, for example, +the underlying measurement is invalid. + +Failure manifests as an exception raised by one of the algorithms defined in +this section. If an exception is raised during preparation, the Aggregators +MUST remove the report from the batch and not attempt to aggregate it. +Otherwise, a malicious Client can cause the Collector to compute a malformed +aggregate result. + +The remainder of this section defines the VDAF interface, which we denote by +`Vdaf` in the remainder. The attributes are listed in {{vdaf-param}} are +defined by each concrete VDAF. + +| Parameter | Description | +|:-----------------------|:---------------------------------------------------------------| +| `ID: int` | Algorithm identifier for this VDAF, in `[0, 2^32)`. | +| `SHARES: int` | Number of input shares into which each measurement is sharded. | +| `ROUNDS: int` | Number of rounds of communication during preparation. | +| `NONCE_SIZE: int` | Size of the report nonce. | +| `RAND_SIZE: int` | Size of the random byte string consumed during sharding. | +| `VERIFY_KEY_SIZE: int` | Size of the verification key used during preparation. | +| `Measurement` | Type of each measurement. | +| `PublicShare` | Type of each public share. | +| `InputShare` | Type of each input share. | +| `AggParam` | Type of the aggregation parameter. | +| `OutShare` | Type of each output share. | +| `AggShare` | Type of the aggregate share. | +| `AggResult` | Type of the aggregate result. | +| `PrepState` | Type of the prep state. | +| `PrepShare` | Type of each prep share. | +| `PrepMessage` | Type of each prep message. | {: #vdaf-param title="Constants and types defined by each concrete VDAF."} -Some of these values need to be written to the network in order to carry out -the computation. In particular, it is RECOMMENDED that concrete instantiations -of the `Vdaf` interface specify a method of encoding the `PublicShare`, -`InputShare`, `AggShare`, `PrepShare`, and `PrepMessage`. +Some of type types in the table above need to be written to the network in +order to carry out the computation. It is RECOMMENDED that concrete +instantiations of the `Vdaf` interface specify a method of encoding the +`PublicShare`, `InputShare`, `AggParam`, `AggShare`, `PrepShare`, and +`PrepMessage`. -Each VDAF is identified by a unique, 32-bit integer `ID`. Identifiers for each -(V)DAF specified in this document are defined in {{codepoints}}. The following -method is defined for each VDAF specified in this document: +Each VDAF is identified by a unique, 32-bit integer, denoted `ID`. Identifiers +for each VDAF specified in this document are defined in {{codepoints}}. The +following method is used by both Prio3 and Poplar1: ~~~ python def domain_separation_tag(self, usage: int, ctx: bytes) -> bytes: @@ -1305,62 +1309,26 @@ def domain_separation_tag(self, usage: int, ctx: bytes) -> bytes: return format_dst(0, self.ID, usage) + ctx ~~~ -It is used to construct a domain separation tag for an instance of `Xof` used by -the VDAF. (See {{xof}}.) +The output, called the "domain separation tag", is used in our constructions +for domain separation. Function `format_dst()` is defined in {{dst-binder}}. ## Sharding {#sec-vdaf-shard} -Sharding transforms a measurement and nonce into a public share and input shares -as it does in DAFs (cf. {{sec-daf-shard}}): - -* `vdaf.shard(ctx: bytes, measurement: Measurement, nonce: bytes, rand: bytes) - -> tuple[PublicShare, list[InputShare]]` is the randomized sharding algorithm - run by each Client that consumes the application context, a measurement, and - a nonce and produces a public share distributed to each of the Aggregate and - a corresponding sequence of input shares, one for each Aggregator. Depending - on the VDAF, the input shares may encode additional information used to - verify the recovered output shares (e.g., the "proof shares" in Prio3 - {{prio3}}) +Sharding is as described for DAFs in {{sec-daf-shard}}. The public share and +input shares encode additional information used during preparation to validate +the output shares before they are aggregated (e.g., the "proof shares" in +{{prio3}}). - Pre-conditions: - - * `nonce` MUST have length equal to `NONCE_SIZE` and MUST be generated using - a CSPRNG. (See {{security}} for details.) - - * `rand` consists of the random bytes consumed by the algorithm. It MUST have - length equal to `RAND_SIZE` and MUST be generated using a CSPRNG. - - Post-conditions: - - * The number of input shares MUST equal `SHARES`. - -Like DAFs, sharding is bound to the application context via the `ctx` string. -Again, this is intended to ensure that aggregation succeeds only if the Clients -and Aggregators agree on the application context. Unlike DAFs, however, -disagreement on the context should manifest as a failure to validate the -report, causing the report to be rejected without garbling the aggregate +Like DAFs, sharding is bound to the application context via the application +context string. Again, this is intended to ensure that aggregation succeeds +only if the Clients and Aggregators agree on the application context. Unlike +DAFs, however, disagreement on the context should manifest as a preparation +failure, causing the report to be rejected without garbling the aggregate result. The application context also provides some defense-in-depth against cross protocol attacks; see {{deep}}. ## Preparation {#sec-vdaf-prepare} -To recover and verify output shares, the Aggregators interact with one another -over `ROUNDS` rounds. Prior to each round, each Aggregator constructs an -outbound message. Next, the sequence of outbound messages is combined into a -single message, called a "preparation message", or "prep message" for short. -(Each of the outbound messages are called "preparation-message shares", or -"prep shares" for short.) Finally, the preparation message is distributed to -the Aggregators to begin the next round. - -An Aggregator begins the first round with its input share and it begins each -subsequent round with the previous prep message. Its output in the last round -is its output share and its output in each of the preceding rounds is a prep -share. - -This process involves a value called the "aggregation parameter" used to map the -input shares to output shares. The Aggregators need to agree on this parameter -before they can begin preparing the measurement shares for aggregation. - ~~~~ Aggregator 0 Aggregator 1 Aggregator SHARES-1 ============ ============ =================== @@ -1388,28 +1356,46 @@ before they can begin preparing the measurement shares for aggregation. V V V out_share_0 out_share_1 out_share_[SHARES-1] ~~~~ -{: #prep-flow title="VDAF preparation process on the input shares for a single -measurement. At the end of the computation, each Aggregator holds an output -share or an error."} +{: #prep-flow title="Illustration of interactive VDAF preparation."} -To facilitate the preparation process, a concrete VDAF implements the following -methods: +Preparation is organized into a number of rounds. The number of rounds depends +on the VDAF: Prio3 ({{prio3}}) has one round and Poplar1 ({{poplar1}}) has two. + +Aggregators retain some local state between successive rounds of preparation. +This is referred to as "preparation state" or "prep state" for short. + +At the start of each round, each Aggregator broadcasts a message called a +"preparation share", or "prep share" for short. The prep shares are then +combined into a single message called the "preparation message", or "prep +message". The prep message MAY be computed by any one of the Aggregators. + +The prep message is disseminated to each of the Aggregators to begin the next +round, or compute the output shares in case of the last round. An Aggregator +begins the first round with its input share and it begins each subsequent round +with the current prep state and the previous prep message. Its output in the +last round is its output share and its output in each of the preceding rounds +is a prep share. + +Just as for DAFs ({{sec-daf-prepare}}), preparation involves an aggregation +parameter. The aggregation parameter is consumed by each Aggregator before the +first round of communication. + +Unlike DAFs, VDAF preparation involves a secret "verification key" held by each +of the Aggregators. This key is used to verify validity of the output shares +they compute. It is up to the high level protocol in which the VDAF is used to +arrange for the distribution of the verification key prior to generating and +processing reports. See {{security}} for details. + +Preparation is implemented by the following set of algorithms: * `vdaf.prep_init(verify_key: bytes, ctx: bytes, agg_id: int, agg_param: AggParam, nonce: bytes, public_share: PublicShare, input_share: InputShare) -> tuple[PrepState, PrepShare]` is the deterministic preparation-state - initialization algorithm run by each Aggregator to begin processing its input - share into an output share. Its inputs are the shared verification key - (`verify_key`), the application context (`ctx`), the Aggregator's unique - identifier (`agg_id`), the aggregation parameter (`agg_param`), the nonce - provided by the environment (`nonce`, see {{vdaf-execution}}), the public - share (`public_share`), and one of the input shares generated by the Client - (`input_share`). Its output is the Aggregator's initial preparation state and - initial prep share. - - It is up to the high level protocol in which the VDAF is used to arrange for - the distribution of the verification key prior to generating and processing - reports. (See {{security}} for details.) + initialization algorithm run by each Aggregator. It consumes the shared + verification key, the application context, the Aggregator's unique + identifier, the aggregation parameter chosen by the Collector, the report + nonce, the public share, and one of the input shares generated by the Client. + It produces the Aggregator's initial prep state and prep share. Protocols MUST ensure that public share consumed by each of the Aggregators is identical. This is security critical for VDAFs such as Poplar1. @@ -1422,89 +1408,56 @@ methods: Client. * `nonce` MUST have length `vdaf.NONCE_SIZE`. +* `vdaf.prep_shares_to_prep(ctx: bytes, agg_param: AggParam, prep_shares: + list[PrepShare]) -> PrepMessage` is the deterministic preparation-message + pre-processing algorithm. It combines the prep shares produced by the + Aggregators in the previous round into the prep message consumed by each + Aggregator to start the next round. + * `vdaf.prep_next(ctx: bytes, prep_state: PrepState, prep_msg: PrepMessage) -> tuple[PrepState, PrepShare] | OutShare` is the deterministic preparation-state update algorithm run by each Aggregator. It updates the - Aggregator's preparation state (`prep_state`) and returns either its next - preparation state and its message share for the current round or, if this is - the last round, its output share. An exception is raised if a valid output - share could not be recovered. The input of this algorithm is the inbound - preparation message. + Aggregator's prep state (`prep_state`) and returns either its next prep state + and prep share for the next round or, if this is the last round, its output + share. -* `vdaf.prep_shares_to_prep(ctx: bytes, agg_param: AggParam, prep_shares: - list[PrepShare]) -> PrepMessage` is the deterministic preparation-message - pre-processing algorithm. It combines the prep shares generated by the - Aggregators in the previous round into the prep message consumed by each in - the next round. - -In effect, each Aggregator moves through a linear state machine with `ROUNDS` -states. The Aggregator enters the first state on using the initialization -algorithm, and the update algorithm advances the Aggregator to the next state. -Thus, in addition to defining the number of rounds (`ROUNDS`), a VDAF instance -defines the state of the Aggregator after each round. - -The preparation-state update accomplishes two tasks: recovery of output shares -from the input shares and ensuring that the recovered output shares are valid. -The abstraction boundary is drawn so that an Aggregator only recovers an output -share if it is deemed valid (at least, based on the Aggregator's view of the -protocol). Another way to draw this boundary would be to have the Aggregators -recover output shares first, then verify that they are valid. However, this -would allow the possibility of misusing the API by, say, aggregating an invalid -output share. Moreover, in protocols like Prio+ {{AGJOP21}} based on oblivious -transfer, it is necessary for the Aggregators to interact in order to recover -aggregatable output shares at all. +An exception may be raised by one of these algorithms, in which case the report +MUST be deemed invalid and not processed any further. + +Implementation note: The preparation process accomplishes two tasks: recovery +of output shares from the input shares and ensuring that the recovered output +shares are valid. The abstraction boundary is drawn so that an Aggregator only +recovers an output share if the underlying data is deemed valid (at least, +based on the Aggregator's view of the protocol). Another way to draw this +boundary would be to have the Aggregators recover output shares first, then +verify that they are valid. However, this would allow the possibility of +misusing the API by, say, aggregating an invalid output share. Moreover, in +protocols like Prio+ {{AGJOP21}} based on oblivious transfer, it is necessary +for the Aggregators to interact in order to recover aggregatable output shares +at all. ## Validity of Aggregation Parameters {#sec-vdaf-validity-scopes} -Similar to DAFs (see {{sec-daf-validity-scopes}}), VDAFs MAY impose -restrictions for input shares and aggregation parameters. Protocols using a VDAF -MUST ensure that for each input share and aggregation parameter `agg_param`, the -preparation phase (including `vdaf.prep_init`, `vdaf.prep_next`, and -`vdaf.prep_shares_to_prep`; see {{sec-vdaf-prepare}}) is only called if -`vdaf.is_valid(agg_param, previous_agg_params)` returns True, where -`previous_agg_params` contains all aggregation parameters that have previously -been used with the same input share. - -VDAFs MUST implement the following function: - -* `vdaf.is_valid(agg_param: AggParam, previous_agg_params: list[AggParam]) -> - bool`: checks if the `agg_param` is compatible with all elements of - `previous_agg_params`. +Aggregation parameter validation is as described for DAFs in +{{sec-daf-validity-scopes}}. Again, each Aggregator MUST validate each +aggregation parameter received by the Collector before beginning preparation +with that parameter. ## Aggregation {#sec-vdaf-aggregate} -VDAF aggregation is identical to DAF aggregation ({{sec-daf-aggregate}}): - -* `vdaf.agg_init(agg_param: AggParam) -> AggShare` returns an empty aggregate - share. It is called to initialize aggregation of a batch of measurements. - -* `vdaf.agg_update(agg_param: AggParam, agg_share: AggShare, out_share: - OutShare) -> AggShare` accumulates an output share into an aggregate share - and returns the updated aggregate share. - -* `vdaf.merge(agg_param: AggParam, agg_shares: list[AggShare]) -> AggShare` - merges a sequence of aggregate shares into a single aggregate share. - -The data flow for this stage is illustrated in {{aggregate-flow}}. Like DAFs, -computation of the VDAF aggregate is not usually sensitive to the order in -which output shares are aggregated. See {{agg-order}}. +Aggregation is identical to DAF aggregation as described in +{{sec-daf-aggregate}}. Like DAFs, computation of the VDAF aggregate is not +usually sensitive to the order in which output shares are aggregated. See +{{agg-order}}. ## Unsharding {#sec-vdaf-unshard} -VDAF Unsharding is identical to DAF Unsharding (cf. {{sec-daf-unshard}}): - -* `vdaf.unshard(agg_param: AggParam, agg_shares: list[AggShare], - num_measurements: int) -> AggResult` is run by the Collector in order to - compute the aggregate result from the Aggregators' shares. The length of - `agg_shares` MUST be `SHARES`. `num_measurements` is the number of - measurements that contributed to each of the aggregate shares. This algorithm - is deterministic. - -The data flow for this stage is illustrated in {{unshard-flow}}. +Unsharding is identical to DAF unsharding as described in {{sec-daf-unshard}}. ## Execution of a VDAF {#vdaf-execution} -Secure execution of a VDAF involves simulating the following procedure. +Secure execution of a VDAF involves simulating the following procedure over an +insecure network. ~~~ python def run_vdaf( @@ -1599,11 +1552,9 @@ def run_vdaf( ~~~ The inputs to this algorithm are the aggregation parameter, a list of -measurements, and a nonce for each measurement. This document does not specify -how the nonces are chosen, but security requires that the nonces be unique. See -{{security}} for details. As explained in {{daf-execution}}, the secure -execution of a VDAF requires the application to instantiate secure channels -between each of the protocol participants. +measurements, and a nonce for each measurement. As explained in +{{daf-execution}}, the secure execution of a VDAF requires the application to +instantiate secure channels between each of the protocol participants. ## Communication Patterns for Preparation {#vdaf-prep-comm}