Benchmarks

Performance metrics for HotShot consensus and Tiramisu data availability in Espresso's Cappuccino testnet release

As a part of launching the Cappuccino testnet and releasing our implementation of HotShot under the MIT license, we are publishing benchmarks related to performance of this release. Compared to earlier benchmarks, these results benchmark the addition of Tiramisu DA's Savoiardi layer to the HotShot protocol.

In our evaluations, we progressively increased the block size from 50KB to 20MB and tested on network sizes ranging from 10 to 1000 nodes. In all settings, a subset of 10 nodes serves both as validators and the committee for Tiramisu DA's Mascarpone layer. As shown in the below figure, throughput rises with the increasing load without a corresponding increase in latency, up to a certain point of saturation. Beyond this point, latency begins to increase while throughput either remains steady or shows a slight increase. In the below table, we show the benchmark data for block sizes of 5MB block size, which is approximately the turning point.

Network SizeMascarpone Committee SizeBlock Size (MB)Average Latency (s)Average View Time (s)Throughput (MB/s)
10
10
5

3

1.08
4.58
100
10
5

2

0.85
5.76
200
10
5

4

1.21
4.04
500
10
5

9

1.97
2.48
1000
10
5

21

5.56
0.88

Experimental Setup

These benchmarks were run on HotShot version 0.5.63.

We conducted our experiments on two types of machines:

  • CDN Instances: Our CDN (repository located here) is a distributed and fault-tolerant system responsible for routing messages between validators. The CDN was run across 3 Amazon EC2 m6a.xlarge instances located in the us-east-2 region. Each instance ran a broker, which is the component responsible for forwarding messages to their intended recipients. One instance also ran the marshal service, which is the service that facilitates the authentication and marshaling of validators to a specific broker. Each instance had 4 vCPUs and 16.0 GiB memory.

  • Validator Instances: HotShot nodes were run on Amazon ECS tasks with 2 vCPUs and 4 GiB memory. Nodes were equally distributed among the us-east-2a, us-east-2b and us-east-2c availability zones.

Data Calculation

Each benchmark was run until 100 blocks were committed. After each benchmark run, nodes reported:

  • the total time elapsed for the run

  • the throughput per second

  • the total latency

  • the total number of blocks committed

  • the total number of views it took to reach 100 commits

  • the number of failed views (views that failed to make progress)

These values were collected and averaged in the final results. Note that throughput is measured in megabytes per second, not mebibytes per second.

Analysis of Results

  • Our implementation of the Tiramisu data availability protocol achieves better maximum throughput in large networks than standard consensus protocols where data is sent to all nodes. That being said, this particular implementation’s latency is worse in large networks. However, we’ve identified several implementation-specific bottlenecks to fix this issue.

  • During benchmarks where the network is unsaturated with data, small network sizes (10 and 100 nodes) achieve finality in ~1s, and large network sizes (500, and 1000 nodes) achieve finality between 2-5s.

  • The primary bottlenecks of this particular implementation are twofold:

    • Our current implementation of Tiramisu DA's Savoiardi layer is compute-intensive. This causes builders, leaders, and Mascarpone DA committee members to spend additional time computing Savoiardi shares during each view. This bottleneck can be addressed by more optimally parallelizing intensive compute, dynamically tuning Savoiardi parameters such as multiplicity to optimally encode block data, experimenting with different hardware such as GPUs, and having the Cocoa layer optimistically calculate Savoiardi shares.

    • The builder used in these benchmarks is a simple, naive builder. Unlike a sophisticated builder, this builder does no optimistic execution or optimistic Savoiardi calculations. The simple builder does not begin building blocks until the HotShot leader requests it to do so. This causes the builder to be slow in returning block data to the HotShot leader, thus adding unneeded latency each view. This bottleneck can be addressed by using a sophisticated builder that optimistically builds blocks.

  • This implementation of HotShot uses the HotStuff-1 protocol. We plan to upgrade to the HotStuff-2 protocol in the future, which will reduce commit latency significantly.

Notes

  • These benchmarks did not use a public transaction mempool. Instead, block builders were configured to build predetermined-sized blocks each view. This configuration is equivalent to block builders only building blocks with privately-sent transactions. A public mempool is part of the current HotShot implementation, however, and will be included in future benchmarks. Note that throughput and latency results will differ with the inclusion of the public mempool.

  • Multiplicity: Tiramisu DA's Savoiardi VID scheme is inspired by Ethereum's danksharding proposal, where the block payload is viewed as a list of polynomial coefficients. Ordinarily, these coefficients are partitioned into multiple polynomials, and each storage node gets one evaluation from each of those polynomials. At the other extreme, one could instead gather these coefficients into a single high-degree polynomial, and give each storage node multiple evaluations from this polynomial. We use the word “multiplicity” to denote the number of evaluations per polynomial sent to each storage node. Multiplicity is a parameter that can be tuned between two extremes to optimize performance.

Last updated