|
| 1 | +# Order and prove |
| 2 | + |
| 3 | +In this document, we distinguish between the sequencing and proving processes, with emphasis on the need to distinguish between the two stages of the overall transaction flow. |
| 4 | + |
| 5 | +The decoupling of sequencing from proving enhances the system's efficiency. |
| 6 | + |
| 7 | +Central to the zkEVM architecture are the verifier smart contract, the prover, the aggregator, and sequencer. |
| 8 | + |
| 9 | +## Typical state transition |
| 10 | + |
| 11 | +Recall that, a proof of correct execution of transactions in a batch is generated, and the proof needs to be verified by the verifier smart contract. |
| 12 | + |
| 13 | +A new L2 state is reached, and it is dubbed the _consolidated state_. |
| 14 | + |
| 15 | +The processed batch is referred to as a 'consolidated batch'. That is, it has been verified by the L1 smart contract. |
| 16 | + |
| 17 | +Submitting the proof $\pi$, together with the corresponding publics, for verification of correct execution, is an L1 transaction. |
| 18 | + |
| 19 | +Once the smart contract successfully verifies this L1 transaction, this proves that the L2 state has correctly evolved from the old state $S^{L2_x}_i$ to a new state $S^{L2_x}_{i+1}$ according to the processed batch. |
| 20 | + |
| 21 | +The figure below depicts a state transition. |
| 22 | + |
| 23 | + |
| 24 | + |
| 25 | +## zkEVM key performance indicators |
| 26 | + |
| 27 | +The zkEVM's key performance indicators (KPIs) are _Delay_ and _Throughput_. |
| 28 | + |
| 29 | +_Delay_ refers to the time elapsed from when a user sends an L2 transaction until the transaction's execution results are reflected in the L2 state. |
| 30 | + |
| 31 | +This is a major KPI when it comes to positive user experience (UX). |
| 32 | + |
| 33 | +_Throughput_ measures the system's capacity of processing transactions. |
| 34 | + |
| 35 | +It can be computed in transactions per second, gas per second, or batches per second. |
| 36 | + |
| 37 | +Let's analyze and possibly re-engineer the current system to improve these KPIs. |
| 38 | + |
| 39 | +There are three parameters affecting these KPIs: $\mathtt{close\_a\_batch\_time}$, $\mathtt{prove\_a\_batch\_time}$ , and $\mathtt{block\_time}$. |
| 40 | + |
| 41 | +1. $\mathtt{close\_a\_batch\_time}$: The time taken to get sufficient transactions to close a batch or a given timeout for this purpose. |
| 42 | + |
| 43 | +  |
| 44 | + |
| 45 | +2. $\mathtt{prove\_a\_batch\_time}$: The time taken to generate a proof for a single batch. The size of the batch can obviously affect this time. |
| 46 | + |
| 47 | +  |
| 48 | + |
| 49 | +3. $\mathtt{block\_time}$: This is the minimum time taken to execute L1 transactions. |
| 50 | + |
| 51 | +  |
| 52 | + |
| 53 | +Let's explore how this parameters impact the _Delay_ and _Throughput_ of the full system. |
| 54 | + |
| 55 | +### Processing pipeline |
| 56 | + |
| 57 | +Consider an example of a simplified processing pipeline, similar to an assembly line in a factory, as illustrated in the figure below. |
| 58 | + |
| 59 | +Identify two key performance indicators (KPIs) of interest: lead time (or delay) and production rate (or throughput). |
| 60 | + |
| 61 | +- Lead time refers to how long it takes to produce the first car after starting the line. |
| 62 | + |
| 63 | + The lead time, as indicated in the figure above, is 6 hours. |
| 64 | + |
| 65 | +- Production rate tells us how many cars can be produced per unit of time. |
| 66 | + |
| 67 | + The production rate is 1 car every 6 hours, which is equivalent to $\mathtt{\frac{1}{6}}$ cars per hour. |
| 68 | + |
| 69 | + |
| 70 | + |
| 71 | +#### Two scaling methods |
| 72 | + |
| 73 | +The following question arises: How can we improve both delay and throughput metrics? |
| 74 | + |
| 75 | +The objective is to increase the throughput and reduce the delay. |
| 76 | + |
| 77 | +Two methods are employed: horizontal scaling and vertical scaling. |
| 78 | + |
| 79 | +- Horizontal scaling involves adding more operators of smaller power in parallel, boosting throughput. |
| 80 | + |
| 81 | + In which case, the delay remains the same. (In our system, operators are equivalent to CPUs.) |
| 82 | + |
| 83 | +- Vertical scaling, on the other hand, entails adding more powerful operators. This is equivalent to adding faster CPUs. |
| 84 | + |
| 85 | + In this case, delay is reduced but throughput is increased. |
| 86 | + |
| 87 | +As expected, vertical scaling is more expensive to implement compared to horizontal scaling. |
| 88 | + |
| 89 | +Consider the figure below, depicting two scenarios of scaling on the previous _processing pipeline_ scenario. |
| 90 | + |
| 91 | +(a) Serial horizontal scaling. Replacing the 3-hour operator (at the engine-mounting stage) with three 1-hour operators working serially. And, replacing the 2-hour operator (at the body-mounting stage) with two 1-hour operators. |
| 92 | + |
| 93 | +(b) Vertical scaling. Replacing the 3-hour operator (at the engine-mounting stage) with one _3-times_ faster operator, capable of performing the normal 3-hour engine-mounting task in only one hour. And, replacing the 2-hour operator (at the body-mounting stage) with one _2-times_ faster operator, capable of performing the usual 2-hour body-mounting task in just an hour. |
| 94 | + |
| 95 | + |
| 96 | + |
| 97 | +Initially, before any scaling was applied, the delay was 6 hours and the throughput was $\mathtt{\frac{1}{6}}$ cars per hour. |
| 98 | + |
| 99 | +The serial horizontal scaling, (a), does not result in any improvement on the delay and throughput. |
| 100 | + |
| 101 | +But the vertical scaling, (b), results in the delay of 3 hours, while the throughput increases to _1 car every 3 hours_, which is equivalent to $\frac{1}{3}$ _cars per hour_. |
| 102 | + |
| 103 | +## Improving batch processing KPIs |
| 104 | + |
| 105 | +In the current design, batch processing consists of: closing a batch, generating the proof, and submitting the _verify-this-batch_ L1 transaction to the verifier smart contract. |
| 106 | + |
| 107 | +The two zkEVM KPIs can be computed as follows: |
| 108 | + |
| 109 | +- Delay is computed with this formula, |
| 110 | + |
| 111 | +$$ |
| 112 | +\texttt{delay} = \mathtt{close\_a\_batch\_time} + \mathtt{prove\_a\_batch\_time} + \mathtt{block\_time}\ [\text{seconds}] |
| 113 | +$$ |
| 114 | + |
| 115 | +- And throughput is given by this formula, |
| 116 | + |
| 117 | +$$ |
| 118 | +\texttt{throughput} = \dfrac{1}{\mathtt{prove\_a\_batch\_time}}\ [\text{batches per second}] |
| 119 | +$$ |
| 120 | + |
| 121 | +When computing throughput, we assume that closing, proving, and verifying a batch can be done in parallel with other batches. |
| 122 | + |
| 123 | +And thus, in practice, throughput is determined by the longest part of the process, which is to prove the batch. |
| 124 | + |
| 125 | +$$ |
| 126 | +\begin{aligned} |
| 127 | +&\mathtt{close\_a\_batch\_time} << \mathtt{prove\_a\_batch\_time} \\ |
| 128 | +&\mathtt{block\_time} << \mathtt{prove\_a\_batch\_time} |
| 129 | +\end{aligned} |
| 130 | +$$ |
| 131 | + |
| 132 | +To provide specific numbers: |
| 133 | + |
| 134 | +$$ |
| 135 | +\begin{aligned} |
| 136 | +&\mathtt{block\_time} = 15\ \text{seconds}\ (\text{average})\\ |
| 137 | +&\mathtt{prove\_a\_batch\_time} = 120\ \text{seconds}\ (\text{min})\\ |
| 138 | +&\mathtt{close\_a\_batch\_time} = 3\ \text{seconds}\ (\text{max})\\ |
| 139 | +\end{aligned} |
| 140 | +$$ |
| 141 | + |
| 142 | +### Improving KPIs with vertical scaling |
| 143 | + |
| 144 | +The aim is to increase throughput and reduce delay. |
| 145 | + |
| 146 | +The one limiting factor in this case is the $\mathtt{prove\_a\_batch\_time}$. |
| 147 | + |
| 148 | +Vertical scaling means adding more resources to the existing machines. |
| 149 | + |
| 150 | +It can be achieved by running provers on more powerful machines, optimizing the proving system, or a combination of both. |
| 151 | + |
| 152 | +Although vertical scaling seems like a straightforward solution to speed up proof generation, it has limitations: |
| 153 | + |
| 154 | +- Cost-effectiveness: Upgrading to very powerful machines often results in diminishing returns. The cost increase might not be proportional to the performance gain, especially for high-end hardware. |
| 155 | +- Optimization challenges: Optimizing the prover system itself can be complex and time-consuming. |
| 156 | + |
| 157 | +### Improving KPIs with horizontal scaling |
| 158 | + |
| 159 | +Another option is to scale the system horizontally. |
| 160 | + |
| 161 | +Horizontal scaling involves adding more processing units (workers) to distribute the workload across multiple machines and leverage additional hardware resources in parallel. |
| 162 | + |
| 163 | +In the context of a batch processing system, this translates to spinning up multiple provers to work in parallel. |
| 164 | + |
| 165 | +#### Naive horizontal scaling |
| 166 | + |
| 167 | +Consider the figure below, depicting a naive implementation of horizontal scaling, which involves: |
| 168 | + |
| 169 | +1. Parallelized proof generation by spinning up multiple provers. |
| 170 | +2. Proof reception, where each prover individually sends the proof it generated to the aggregator. |
| 171 | +3. Proof Verification, which means the aggregator puts all these proofs into an L1 transaction, and sends it to the smart contract for verification of batches. |
| 172 | + |
| 173 | + |
| 174 | + |
| 175 | +This approach means closing batches serially, while generating their proofs in parallel. |
| 176 | + |
| 177 | +Notice that, as depicted in the figure above, the proofs $\pi_a$, $\pi_b$ and $\pi_c$ are serially input to the L1 smart contract for verification. |
| 178 | + |
| 179 | +This means the overall verification cost is proportional to the number of proofs sent to the aggregator. |
| 180 | + |
| 181 | +The disadvantage with the naive approach is the associated costs, seen in terms of the space occupied by each proof, and cumulative verification expenses with every additional proof. |
| 182 | + |
| 183 | +#### Proof aggregation in horizontal scaling |
| 184 | + |
| 185 | +Another option is to scale the system horizontally with proof aggregation, as shown in Figure 6. |
| 186 | + |
| 187 | +Here’s how it works: |
| 188 | + |
| 189 | +1. Parallelized proof generation, by instantiating multiple provers. |
| 190 | +2. Proof reception, where each prover individually sends the proof it generated to the aggregator. |
| 191 | +3. Proof aggregation, where proofs are aggregated into a single proof. |
| 192 | +4. Proof verification here means encapsulating only one proof, the aggregated proof, in an L1 transaction. And hence transmitting it to the smart contract for batch verification. |
| 193 | + |
| 194 | +The foundation of this approach rests on zkEVM's custom cryptographic backend, designed specifically to support proof aggregation. |
| 195 | + |
| 196 | +It allows multiple proofs to be combined into a single verifiable proof. |
| 197 | + |
| 198 | +As depicted in the figure below, the proofs $\pi_a$, $\pi_b$ and $\pi_c$ are aggregated into a single proof $\pi_{a,b,c,...}$. |
| 199 | + |
| 200 | +The key advantage is constant verification costs on L1, regardless of the number of proofs being aggregated. |
| 201 | + |
| 202 | + |
| 203 | + |
| 204 | +#### Deep dive into horizontal scaling |
| 205 | + |
| 206 | +Let’s delve deeper into how the use of proof aggregation boosts the system’s throughput. |
| 207 | + |
| 208 | +A crucial metric in this process is $\mathtt{aggregation\_time}$, which represents the time it takes to combine proofs from $N$ batches, which is close to 12 seconds. |
| 209 | + |
| 210 | +Throughput, measured in batches per second, can be computed as follows: |
| 211 | + |
| 212 | +$$ |
| 213 | +\dfrac{N}{ \text{max} \big(\mathtt{prove\_a\_batch\_time},\ N · \mathtt{close\_a\_batch\_time},\ \mathtt{block\_time},\ \mathtt{aggregation\_time}\big)} |
| 214 | +$$ |
| 215 | + |
| 216 | +Observe that, since proving and closing batches, and aggregating proofs can run in parallel for a set of $N$ batches, verification of all $N$ batches takes as long as the slowest of the three operations. |
| 217 | + |
| 218 | +Hence the denominator, in the above formula, is the maximum among the values: $\mathtt{prove\_a\_batch\_time}$, $N · \mathtt{close\_a\_batch\_time}$, $\mathtt{block\_time}$, and $\mathtt{aggregation\_time}$. |
| 219 | + |
| 220 | +This means, in the case where the maximum time in the denominator is $\mathtt{prove\_a\_batch\_time}$, the system's throughput increases by a factor of $N$. |
| 221 | + |
| 222 | +Delay in this scenario can be computed as follows: |
| 223 | + |
| 224 | +$$ |
| 225 | +\texttt{delay} = N·\mathtt{close\_a\_batch\_time} + \mathtt{prove\_a\_batch\_time} + \mathtt{aggregation\_time}+ \mathtt{block\_time} |
| 226 | +$$ |
| 227 | + |
| 228 | +A straightforward aggregation of batches substantially increases delay relative to a single batch approach. |
| 229 | + |
| 230 | +As discussed earlier, delay is a critical factor to user experience. |
| 231 | + |
| 232 | +To retain the throughput gains while reducing the delay, we can adopt a two-step approach for batch processing: first, _order_ (also known as _sequence_) and then _prove_. |
| 233 | + |
| 234 | +This segmentation allows for optimization in each step, potentially reducing the overall delay while maintaining improvements in throughput. |
| 235 | + |
| 236 | +### Enhancing delay by order then prove |
| 237 | + |
| 238 | +The rationale behind decoupling batch ordering (sequencing) from batch proving is twofold: |
| 239 | + |
| 240 | +- Ensure swift responses to users regarding their L2 transactions with minimal delay. |
| 241 | +- Enable transaction aggregation for maximizing system throughput. |
| 242 | + |
| 243 | +Sequencing an L2 batch involves deciding which L2 transactions should be part of the next batch. That is, when to create or close the batch, and sent it to L1. |
| 244 | + |
| 245 | +As the sequence of batches is written to L1, data availability and immutability are ensured on L1. |
| 246 | + |
| 247 | +Sequenced batches may not be proved immediately, but they are guaranteed to be proved eventually. |
| 248 | + |
| 249 | +This creates a state within the L2 system that reflects the eventual outcome of executing those transactions, even though the proof hasn’t been completed yet. |
| 250 | + |
| 251 | +Such a state is called a *virtual state* because it represents a future state to be consolidated once the proof is processed. |
| 252 | + |
| 253 | +More precisely, the virtual state is the state reached after executing and sequencing batches in L1, before they are validated using proofs. |
| 254 | + |
| 255 | + |
| 256 | + |
| 257 | +It’s crucial for users to understand that once a transaction is in the virtual state, its processing is guaranteed. |
| 258 | + |
| 259 | +Notable improvement lies in the ability to close batches more rapidly than the block time, providing a more efficient and expedited processing mechanism. |
| 260 | + |
| 261 | +Let’s adopt a revised definition for the delay: |
| 262 | + |
| 263 | +- The duration from the moment a user submits an L2 transaction until that transaction reaches the virtual state. |
| 264 | + |
| 265 | +From the user’s perspective, once the transaction is in the virtual state, it can be regarded as processed. |
| 266 | + |
| 267 | +$$ |
| 268 | +\mathtt{delay}^{(\mathtt{to\_virtual})} = \mathtt{close\_a\_batch\_time} + \mathtt{block\_time} |
| 269 | +$$ |
| 270 | + |
| 271 | +The smart contract allows us to sequence multiple batches, in which case the delay can be computed as shown below. |
| 272 | + |
| 273 | +$$ |
| 274 | +\mathtt{delay}^{(\mathtt{to\_virtual})} = N · \mathtt{close\_a\_batch\_time} + \mathtt{block\_time} |
| 275 | +$$ |
| 276 | + |
| 277 | +Note that we have experienced a significant reduction in the delay. |
| 278 | + |
| 279 | +Below, we present several advantages of decoupling batch sequencing from batch proving: |
| 280 | + |
| 281 | +- Queue management: This approach enables effective management of the queue for sequenced batches that await consolidation. |
| 282 | +- Flexibility in delay and prover resources: It becomes possible to adjust the amount of delay by adjusting the number of provers in operation. |
| 283 | +- User perception: Decoupling allows for adjustments in delay and resource alloca- tion without impacting the perceived delay experienced by users. |
0 commit comments