Proposal features: 16 -> 32 bit range check on LogUp #702

hero78119 · 2024-12-05T07:06:36Z

Background

Ceno heavily reply on LogUp to do range check, e.g. 16 bit range check

See circuits statistis here #585 (comment)

Most of lookup operation are contribution via 16 bit range check. Take add opcode as example, the 9 lookups are all contributes via

rs1, rs2, rd offline memory check assert timestamp < global timestamp => (3 * 2) = 6 of 16 bit range check
rd splitted into 2 of 16 bit range check
program fetch: 1 lookup
So in overall there are 6 + 2 + 1 = 9 lookup

If we do 32 bit range check

rs1, rs2, rd offline memory check timestamp check => 3 lookup
rd range: 1 lookup
program fetch: 1 lookup
overall is 5 lookup, which across 2^3 = 8 boundary. The tower sumcheck part leaf layer size will be cut half size. so the expected latency would be cut to half.

As a side effect, we also save bunch of witin for they are there for holding 16 limb, so it also benefit of mpcs since there are less polynomial.

Design Rationales

On logup formula right hand side, we have m(x) and T(x). One of nice property for 32 bits range check table T(x) is we can skip its commitment & PCS, for verifier can evaluate T(r) succinctly via tricks here. So another challenge is how to deal with huge & sparse polynomial m(x)

Via spartan paper p29 7.2.2. sparse polynomial commitment SPARK, we can view sparse m(x) into tuple of 3 dense [(i, j, M(i,j))] and commit 3 dense polynomial respectively. Giving original m(x) dense size is 2^32. The insight magic of the SPARK is via split into i, j polynomial, each size just match non-zero entries of m(x). I think the most innovation to break variables into row, col, is in SPARK offline memory check memory-in-the-head, it reduced audit_ts_(row/col) dense size from 2^32 to 2^16 size.

originally my question is why row + col instead of row, is it just to deal with R1CS matricx ? After some thought I found it's not, the key point to split into row, col is |audit_ts| size can be reduce exponentially, from 2^32 to 2 ^ 16. The math magic due to identity polynomial eq are splittable, each with exponentially size reduction, and can be evaluated separately.

Prover need to commit i, j, M, along with read_ts_row, write_ts_row, audit_ts_row, read_ts_col, write_ts_col, audit_ts_col related to SPARK protocol.

With SPARK, the e2e table proving flow will be like this

prover commit to original polynomiall related to table proof, plus few polynomial for SPARK: i, j, M, read_ts_row, write_ts_row, audit_ts_row, read_ts_col, write_ts_col, audit_ts_col
derived challenges
generate tower sumcheck proof, at the end of logup m(x), we have random point $r$ and evaluation v=m($r$)
generate mPCS proof for original polynomials.
generate spark_proof(r, v=m($r$), [i, j, M, read_ts_row, write_ts_row, audit_ts_row, read_ts_col, write_ts_col, audit_ts_col])
- tower_product_sumcheck: for row, col offline memory check follow SPARK
- one sumcheck: v = $\sum_k e_{row}(k) * e_{col}(k) * M(k)$
- check offline memory check formula of row, col satisfication
- derive new random point $r'$ and 9 evaluations i($r'$), j($r'$), M($r'$), read_ts_row($r'$), write_ts_row($r'$), ...
generate mPCS proof for SPARK protocol

What the overhead

In table proof part, since there is a new SPARK proof flow involve in critical path sequentially, the overall latency of table proof will increase. However all the opcode proof will benefit quite a lot on 32 bits range check. As the opcode proof occupied the major overhead, the increase overhead in table probably negligible.

Sub Task breandown

preparation: justify the solution worthy by some mock data. E.g. hardcode riscv_add lookup from 9 -> 5 and see latency improvement. [Simulation to support #702] experiment latency improvement for lookup size reduce to half #704
implement SPARK spare polynomial commitment & benchmark
opcode circuit to do 32 bit range check Refactor/using UInt32 as default UInt to reduce witness size #686
introduce SPARK into table proof flow

Other side effects

This feature rely on base field to hold 32 bit riv32, therefore we need to stick to Goldilock64.
So the future roalmap will be Goldilock64 -> Binary Field, without mersenne31/babybear in transition.

The text was updated successfully, but these errors were encountered:

hero78119 · 2024-12-05T07:46:14Z

related to #285
#686

hero78119 · 2024-12-09T07:42:46Z

An updated of one blocking issue need to address before carrying on the tasks

Sparse Fraction-Sum

Giving RHS of logup.

We already figured out an PCS to verify ($\vec{r}$, t($\vec{r}$)) and ($\vec{r}$, m($\vec{r}$)) pair giving sparsity of m and structural t.

However there still one challenge: how to conduct the fraction sum of $\sum_{i} \frac{m_i}{X + t_i}$ for i in [0..T] succinctly. We can't do that in dense product argument since prover cost will be dominated by table size T. Thus we need to develop a sparse fraction sum protocol to address this issue.

There are two directions

direction 1: 2 stage fraction sum

A quick attempt is to convert to 2 stages fraction sum: $\sum_{\vec{b_1}, \vec{b_2}} \frac{m(\vec{b_1}, \vec{b_2})}{X + t(\vec{b_1}, \vec{b_2})}$ motivated by spartan. The idea behind this is we split variables into 2 stages, and first stage when doing low-degree extension, we just go through the non-zero entry of $|m|$. However the difficulty is, let say define $\frac{p(\vec{b_1})}{q(\vec{b_1})} = \sum_{\vec{b_2}} \frac{m(\vec{b_1}, \vec{b_2})}{X + t(\vec{b_1}, \vec{b_2})}$ agree on boolean hybercube $\vec{b_1}$, then 1st stage fraction sum as $\sum_{\vec{b_1}} \frac{p(\vec{b_1})}{q(\vec{b_1})}$. At end end of product argument, we get $v_p = p(\vec{r_1})$, $v_q = q(\vec{r_1})$, however $\frac{p(\vec{r_1})}{q(\vec{r_1})} \ne \sum_{\vec{b_2}} \frac{m(\vec{r_1}, \vec{b_2})}{X + t(\vec{r_1}, \vec{b_2})}$ because LHS & RHS are not multilinear polynomial.

direction 2: develop sparse fraction sum argument

This direction aim to resolve fraction sum leaf layer problem directly. In each layer fraction sum, we are dealing with sumcheck $\sum_{\vec{b}} p_1(\vec{b}) * q_2(\vec{b}) + p_2(\vec{b}) * q_1(\vec{b}) + \alpha * q_1(\vec{b}) * q_2(\vec{b})$. In our case, $p_{1/2}$ is sparse, however $q_{1/2}$ is dense and large and structural. We need to develop a method so prover can exploit structural q $q_{next} = q_1(\vec{b}) * q_2(\vec{b})$ such that

fix_variable
uni-variate
both can be succinctly compute.

hero78119 · 2024-12-17T02:46:33Z

An updated of one blocking issue need to address before carrying on the tasks

Sparse Fraction-Sum

Giving RHS of logup.

...

Lev hint a nice solution. Mindset is transforming table $t$ into another vector $t_{regularized}$, and feeding $t_{regularized}$ to fraction sum input layer instead. Denoted a selector polynomial s, such that s satisfy below:

for all x in boolean hypercube [0 - 2^n-1]

s(x) = 1, if p(x) != 0
s(x) = 0, if p(x) = 0

And derive $t_{regularized}$ via a extra "sparse" sumcheck
$t_{regularized}(r) = \sum_x eq(r, x) \times (s(x) \times t(x) + (1-s(x)))$

What make $t_{regularized}$ special is because $t_{regularized}$ most places are 1, and 1 possess nice property during uni-variate/fix_variable per sumcheck round.

for example, in fix_variable, 2 consecutive position (1-r) * 1 + r = 1, the result still 1.

Further more, in fraction sum first layer, it keep the sparsity on denominator part: $q_1(x) * q_2(x)$ most of place still 1.

So prover just need to maintain $t$ in its sparse format during sumcheck round and so as $s$, which recording the evaluations != 1 position.

Besides, we dont need to commit s(x) sparse polynomial, because s(x) share the same indices sparse tuple with $m(x)$: [(i, j, 1)] with value vector are constant 1. So for s(x) we can share same opening in SPARK pcs.

We also dont need to commit $t_{regularized}$ as its correctness will be reduced to $t$ under GKR context

reference of sparse sumcheck

GKR-MSM: https://github.com/morgana-proofs/GKR-MSM/blob/main/src/cleanup/protocols/sumchecks/vecvec_eq.rs

Previously to quick verify idea and avoid massive change, there are new functionality with suffix `_v2`. After experiment with good result, long time ago all logic already stick to v2 version and no longer use v1. This PR clean up all leftover v1 version, do renaming and file replacement without modify existing logic. In summary - `sumcheck/src/prover_v2.rs` -> `sumcheck/src/prover.rs` - `multilinear_extensions/src/virtual_poly_v2.rs` -> `multilinear_extensions/src/virtual_poly.rs` - clean up all `V2` suffix This addressed previous out-dated PR #162, and as a preparation for #788, #702

hero78119 changed the title ~~Proposal features: 16 -> 32 bit range check via LogUp~~ Proposal features: 16 -> 32 bit range check on LogUp Dec 5, 2024

hero78119 added the enhancement New feature or request label Dec 5, 2024

hero78119 mentioned this issue Dec 5, 2024

[Simulation to support #702] experiment latency improvement for lookup size reduce to half #704

Draft

hero78119 mentioned this issue Dec 9, 2024

[WIP] add SPARK PCS #713

Draft

4 tasks

hero78119 mentioned this issue Dec 27, 2024

refactor to replace v1 by v2 and cleanup suffix #791

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal features: 16 -> 32 bit range check on LogUp #702

Proposal features: 16 -> 32 bit range check on LogUp #702

hero78119 commented Dec 5, 2024 •

edited

Loading

hero78119 commented Dec 5, 2024 •

edited

Loading

hero78119 commented Dec 9, 2024

hero78119 commented Dec 17, 2024 •

edited

Loading

Sparse Fraction-Sum

Proposal features: 16 -> 32 bit range check on LogUp #702

Proposal features: 16 -> 32 bit range check on LogUp #702

Comments

hero78119 commented Dec 5, 2024 • edited Loading

Background

Design Rationales

What the overhead

Sub Task breandown

Other side effects

hero78119 commented Dec 5, 2024 • edited Loading

hero78119 commented Dec 9, 2024

Sparse Fraction-Sum

direction 1: 2 stage fraction sum

direction 2: develop sparse fraction sum argument

hero78119 commented Dec 17, 2024 • edited Loading

Sparse Fraction-Sum

reference of sparse sumcheck

hero78119 commented Dec 5, 2024 •

edited

Loading

hero78119 commented Dec 5, 2024 •

edited

Loading

hero78119 commented Dec 17, 2024 •

edited

Loading