Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal features: 16 -> 32 bit range check on LogUp #702

Open
1 of 4 tasks
hero78119 opened this issue Dec 5, 2024 · 3 comments
Open
1 of 4 tasks

Proposal features: 16 -> 32 bit range check on LogUp #702

hero78119 opened this issue Dec 5, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@hero78119
Copy link
Collaborator

hero78119 commented Dec 5, 2024

Background

Ceno heavily reply on LogUp to do range check, e.g. 16 bit range check
image
See circuits statistis here #585 (comment)

Most of lookup operation are contribution via 16 bit range check. Take add opcode as example, the 9 lookups are all contributes via

  • rs1, rs2, rd offline memory check assert timestamp < global timestamp => (3 * 2) = 6 of 16 bit range check
  • rd splitted into 2 of 16 bit range check
  • program fetch: 1 lookup
    So in overall there are 6 + 2 + 1 = 9 lookup

If we do 32 bit range check

  • rs1, rs2, rd offline memory check timestamp check => 3 lookup
  • rd range: 1 lookup
  • program fetch: 1 lookup
    overall is 5 lookup, which across 2^3 = 8 boundary. The tower sumcheck part leaf layer size will be cut half size. so the expected latency would be cut to half.

As a side effect, we also save bunch of witin for they are there for holding 16 limb, so it also benefit of mpcs since there are less polynomial.

Design Rationales

On logup formula right hand side, we have m(x) and T(x). One of nice property for 32 bits range check table T(x) is we can skip its commitment & PCS, for verifier can evaluate T(r) succinctly via tricks here. So another challenge is how to deal with huge & sparse polynomial m(x)

Via spartan paper p29 7.2.2. sparse polynomial commitment SPARK, we can view sparse m(x) into tuple of 3 dense [(i, j, M(i,j))] and commit 3 dense polynomial respectively. Giving original m(x) dense size is 2^32. The insight magic of the SPARK is via split into i, j polynomial, each size just match non-zero entries of m(x). I think the most innovation to break variables into row, col, is in SPARK offline memory check memory-in-the-head, it reduced audit_ts_(row/col) dense size from 2^32 to 2^16 size.

originally my question is why row + col instead of row, is it just to deal with R1CS matricx ? After some thought I found it's not, the key point to split into row, col is |audit_ts| size can be reduce exponentially, from 2^32 to 2 ^ 16. The math magic due to identity polynomial eq are splittable, each with exponentially size reduction, and can be evaluated separately.

Prover need to commit i, j, M, along with read_ts_row, write_ts_row, audit_ts_row, read_ts_col, write_ts_col, audit_ts_col related to SPARK protocol.

With SPARK, the e2e table proving flow will be like this

  • prover commit to original polynomiall related to table proof, plus few polynomial for SPARK: i, j, M, read_ts_row, write_ts_row, audit_ts_row, read_ts_col, write_ts_col, audit_ts_col
  • derived challenges
  • generate tower sumcheck proof, at the end of logup m(x), we have random point $r$ and evaluation v=m($r$)
  • generate mPCS proof for original polynomials.
  • generate spark_proof(r, v=m($r$), [i, j, M, read_ts_row, write_ts_row, audit_ts_row, read_ts_col, write_ts_col, audit_ts_col])
    • tower_product_sumcheck: for row, col offline memory check follow SPARK
    • one sumcheck: v = $\sum_k e_{row}(k) * e_{col}(k) * M(k)$
    • check offline memory check formula of row, col satisfication
    • derive new random point $r'$ and 9 evaluations i($r'$), j($r'$), M($r'$), read_ts_row($r'$), write_ts_row($r'$), ...
  • generate mPCS proof for SPARK protocol

What the overhead

In table proof part, since there is a new SPARK proof flow involve in critical path sequentially, the overall latency of table proof will increase. However all the opcode proof will benefit quite a lot on 32 bits range check. As the opcode proof occupied the major overhead, the increase overhead in table probably negligible.

In more detail analysis, the proving time overhead in SPARK is donimated by the size |read_ts_row|, |write_ts_row|, |read_ts_col|, |write_ts_col|, associate with number of non zero entry |m(x)|. In real world workload, if there are more repeated values to be range check, then non zero entry in |m(x)| will be even less, so the cost will be save quite a lots. The worst case happend suppose all the lookups value are distinct.

Sub Task breandown

Other side effects

This feature rely on base field to hold 32 bit riv32, therefore we need to stick to Goldilock64.
So the future roalmap will be Goldilock64 -> Binary Field, without mersenne31/babybear in transition.

@hero78119 hero78119 changed the title Proposal features: 16 -> 32 bit range check via LogUp Proposal features: 16 -> 32 bit range check on LogUp Dec 5, 2024
@hero78119 hero78119 added the enhancement New feature or request label Dec 5, 2024
@hero78119
Copy link
Collaborator Author

hero78119 commented Dec 5, 2024

related to #285
#686

@hero78119
Copy link
Collaborator Author

An updated of one blocking issue need to address before carrying on the tasks

Sparse Fraction-Sum

Giving RHS of logup.
image

We already figured out an PCS to verify ($\vec{r}$, t($\vec{r}$)) and ($\vec{r}$, m($\vec{r}$)) pair giving sparsity of m and structural t.

However there still one challenge: how to conduct the fraction sum of $\sum_{i} \frac{m_i}{X + t_i}$ for i in [0..T] succinctly. We can't do that in dense product argument since prover cost will be dominated by table size T. Thus we need to develop a sparse fraction sum protocol to address this issue.

There are two directions

direction 1: 2 stage fraction sum

A quick attempt is to convert to 2 stages fraction sum: $\sum_{\vec{b_1}, \vec{b_2}} \frac{m(\vec{b_1}, \vec{b_2})}{X + t(\vec{b_1}, \vec{b_2})}$ motivated by spartan. The idea behind this is we split variables into 2 stages, and first stage when doing low-degree extension, we just go through the non-zero entry of $|m|$. However the difficulty is, let say define $\frac{p(\vec{b_1})}{q(\vec{b_1})} = \sum_{\vec{b_2}} \frac{m(\vec{b_1}, \vec{b_2})}{X + t(\vec{b_1}, \vec{b_2})}$ agree on boolean hybercube $\vec{b_1}$, then 1st stage fraction sum as $\sum_{\vec{b_1}} \frac{p(\vec{b_1})}{q(\vec{b_1})}$. At end end of product argument, we get $v_p = p(\vec{r_1})$, $v_q = q(\vec{r_1})$, however $\frac{p(\vec{r_1})}{q(\vec{r_1})} \ne \sum_{\vec{b_2}} \frac{m(\vec{r_1}, \vec{b_2})}{X + t(\vec{r_1}, \vec{b_2})}$ because LHS & RHS are not multilinear polynomial.

direction 2: develop sparse fraction sum argument

This direction aim to resolve fraction sum leaf layer problem directly. In each layer fraction sum, we are dealing with sumcheck $\sum_{\vec{b}} p_1(\vec{b}) * q_2(\vec{b}) + p_2(\vec{b}) * q_1(\vec{b}) + \alpha * q_1(\vec{b}) * q_2(\vec{b})$. In our case, $p_{1/2}$ is sparse, however $q_{1/2}$ is dense and large and structural. We need to develop a method so prover can exploit structural q $q_{next} = q_1(\vec{b}) * q_2(\vec{b})$ such that

  • fix_variable
  • uni-variate
    both can be succinctly compute.

@hero78119 hero78119 mentioned this issue Dec 9, 2024
4 tasks
@hero78119
Copy link
Collaborator Author

hero78119 commented Dec 17, 2024

An updated of one blocking issue need to address before carrying on the tasks

Sparse Fraction-Sum

Giving RHS of logup.
image
...

Lev hint a nice solution. Mindset is transforming table $t$ into another vector $t_{regularized}$, and feeding $t_{regularized}$ to fraction sum input layer instead. Denoted a selector polynomial s, such that s satisfy below:

for all x in boolean hypercube [0 - 2^n-1]

  • s(x) = 1, if p(x) != 0
  • s(x) = 0, if p(x) = 0

And derive $t_{regularized}$ via a extra "sparse" sumcheck
$t_{regularized}(r) = \sum_x eq(r, x) \times (s(x) \times t(x) + (1-s(x)))$

What make $t_{regularized}$ special is because $t_{regularized}$ most places are 1, and 1 possess nice property during uni-variate/fix_variable per sumcheck round.

for example, in fix_variable, 2 consecutive position (1-r) * 1 + r = 1, the result still 1.

Further more, in fraction sum first layer, it keep the sparsity on denominator part: $q_1(x) * q_2(x)$ most of place still 1.

So prover just need to maintain $t$ in its sparse format during sumcheck round and so as $s$, which recording the evaluations != 1 position.

Besides, we dont need to commit s(x) sparse polynomial, because s(x) share the same indices sparse tuple with $m(x)$: [(i, j, 1)] with value vector are constant 1. So for s(x) we can share same opening in SPARK pcs.

We also dont need to commit $t_{regularized}$ as its correctness will be reduced to $t$ under GKR context

reference of sparse sumcheck

github-merge-queue bot pushed a commit that referenced this issue Jan 2, 2025
Previously to quick verify idea and avoid massive change, there are new
functionality with suffix `_v2`. After experiment with good result, long
time ago all logic already stick to v2 version and no longer use v1.
This PR clean up all leftover v1 version, do renaming and file
replacement without modify existing logic.

In summary
- `sumcheck/src/prover_v2.rs` ->  `sumcheck/src/prover.rs`
- `multilinear_extensions/src/virtual_poly_v2.rs` ->
`multilinear_extensions/src/virtual_poly.rs`
- clean up all `V2` suffix

This addressed previous out-dated PR
#162, and as a preparation for
#788,
#702
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant