SNARKs 101

Deep RnD
8 min readFeb 1, 2023

Today we’re going to give the first of tutorial series on SNARKs. We’ll start a little bit about SNARK design, what SNARKs even are, and how they’re built.

SNARKs are pretty exciting right now as they’re being built for blockchain scalability. This article won’t be super technical and in the next parts we’ll have a deeper dive into the nuts and bolts.

What is a SNARK?

So in a SNARK, some untrusted prover P claims to know some witness w satisfying some property.

So a couple examples that come up a lot in cryptography:

  1. A pre-image w of some designated output y of a cryptographic hash function h, where w such that h(w) = y. Where the prover and verifier have agreed in advance on the hash function h and the output y. Here the prover is claiming to know w and the verifier doesn’t know w, but wants to make sure the prover does.
  2. A witness might be a private key w corresponding to a designated public key y for some crypto system.

Trivial Proof

There’s always a trivial proof for the prover’s claim, which is to just sending the witness w, that it claims to know, to the verifier. The verifier can just check, that w satisfies the claim property.

From our previous examples, the verifier could just evaluate the hash function at w and confirm it equals y.

How SNARK comes to rescue?

So what a SNARK does is achieve the same goal, but with hopefully much better cost for the verifier and what it stand for is Succinct Non-interactive Argument of Knowledge. So let’s break each of these words and understand what it actually means.

Succinct means that the SNARK proof must be shorter than the witness w. In a trivial proof, it’s just the witness, so anything that’s shorter, qualifies as succinct. Some researcher want succinct to be at least log(w), but from our practical point of view, anything non-trivial should be considered succinct.

In principle, you’re not going to get a SNARK proof, that’s smaller than several hundred bits. So if your witness is very short, in practice, you’re not going to be more succinct than that. But there are statements that could have larger witnesses or the prover is proving to know many pre-images. So while there is a limit to the succinctness, it depends on the SNARK transparency (will be discussed later). For non-transparent SNARKs, the most succinct version still have to contain at least three cryptographic group elements, totaling to around 700 bits. For transparent ones, it will be at least log(w), which can be still be quite large.

Non-interactive means the proof is static and can be posted on a website or sent an email. No interaction is needed during the verification phase.

Argument of knowledge just basically means if the prover can’t break some crypto system, then it really must know w in order to produce a convincing proof.

Ideally, checking the proof should be faster than the runtime of the verifier in the trivial proof system. This property is not actually captured in the SNARK acronym, however we do want that the verifier would save work — as compared to the trivial proof system. Otherwise, it could be you have a short proof, that takes a lot of time to check — this is not good. In terms of speed, for the non-transparent SNARKs it should be achievable to get into the range of milliseconds, when the proof is very short, and for the transparent SNARKs it would grow to tens or hundreds of milliseconds.

How are SNARKs designed?

To design a SNARK, one would ideally start with something like a computer program, written in a high-level language (C, Java etc). And that computer program is just going to take the witness, that the prover claims to know as input, and check that the witness satisfies the property claimed by the prover.

Following our previous hash pre-image example, where the prover claims to know a pre-image w, under cryptographic hash h of some hash output y. The program would just evaluate the hash on the witness and check if it equals the claim value, or h(w) = y.

Frontend

However, applying SNARK optimisation directly to a computer program conceptually can’t be done, so at at high level most SNARKs work via a two-step paradigm. The first step of this two-step process is to transform that computer program into an equivalent model, that is amenable to probabilistic proof systems. And typically this is some kind of circuit satisfiability instance and is called the frontend of the system.

Usually, rank-1 constraint system (R1CS) is used, the idea of which is that it keeps track of the values that each variable assumes during the computation, and binds the relationships among all those variables that are implied by the computation itself. This format is not really “human readable”, so this translation will typically be performed by some kind of “compiler”. So the frontend ensures that the circuit instance is equivalent to the original computer program.

Backend

The next step would be to apply a SNARK for circuit satisfiability to the circuit and that’s called the backend. In the backend, the prover claims to know a satisfying assignment to a circuit. So all of these SNARK names out there in the literature, which maybe you’ve encountered some of them before, they refer to the backend the probabilistic proof system. Examples are Groth16, Marlin, PlonK, Ligero, Hyrax, Libra, Virgo, Brakedown, Supersonic, etc.

Conceptually the verifier randomly challenges the prover, but the randomness and any interaction is completely pulled out of the protocol using cryptography. In that way you wind up with the static proof where the verifier never has to send a challenge to the approver.

Simpler put, in principle how you should view these steps as follows — step one means you need to basically re-prepare the same program in a way that you could check it through a number of random challenges in the step two.

Performance Matters

If you don’t care about really squeezing as much overhead out as you possibly can, making these steps as performative as possible, there are fairly straightforward ways to do both steps. Even the simplest things for step two use some of the most celebrated ideas in theory of computer science. But if you really want these things to perform, then everything becomes very challenging to understand. The details of different transformations and to make sure it’s bug-free and all of that just get complicated.

Some computer programs are much more amenable to being turned into small circuits than others. Some aren’t. The design of the system starts with both steps close to the best we can hope for which log factors in either steps. But concretely those log factors matter and the constants matter and everything in practice can get quite nasty. The problem of making production ready, performant and reliably system lies in the details of implementation and optimisations.

The really efficient transformations from programs to circuits will likely to add more variables and extend the witness with advice inputs to help to keep the circuit small.

With regards to frontend, taking arbitrary C or Java program and spitting out a circuit is still an open problem, so in reality frontend is being written in very low level language such as bellman and circom, and almost just writing out the circuit gate by gate. Other approaches will have something slightly higher level than that, but it’s still like an assembly-like language e.g. WebAssembly.

Backend Goals

Our backend goal is to have a SNARK for circuit satisfiability where:

  1. The prover runs close to linear time, that is it is not much slower than just what it would take to evaluate the circuit gate by gate on the witness with no guarantee of correctness. so that’s sort of the best
  2. The proof size is to be as small as possible, from a few hundred bytes to a few kilobytes. The smallest proof for non-transparent SNARKs will be a few hundred bytes or it turns out if you if you want a sort of a a nice version of non-transparency there’ll be a hundred bytes to a few kilobytes. Actually even the transparent ones can we can get down to a few kilobytes.
  3. Verifier is to be fast as possible. From a few milliseconds, is sort of the fastest that exist right now, to maybe 10 times that — under 100 msec.

There are other properties we would want beyond just fast prover and fast verifier. We want the SNARK to be transparent (next section) and the other thing we would ideally like is for the SNARK to be post quantum secure. Meaning if there’s the prover has a quantum computer and is trying to find convincing proofs of false statements it still can’t do.

Transparency

So in a non-transparent SNARK, the prover will have to know what’s called a common reference string (CRS). And what it is, is a very long string, the size of the circuit, which serves as the pair of proving and verification keys. These “keys” are used by the prover and verifier to generate and verify proofs for a specific problem (or constraint system), respectively.

In setup process, there are random elements which are sampled and must be kept secret — if the prover knows them, they will be able to create proofs which are verified successfully, without using an actual solution to the problem during the proving process. In other words, to forge proofs and break soundness. This randomness is also known as “toxic waste”.

There are ways to avoid this worry and not put trust in a single entity. For public circuits, these usually involve a Multi-Party Computation (MPC) — a process in which multiple players donate their own randomness, which they destroy afterwards. The interesting fact is that it’s enough that one player is honest and destroys the malicious actor mislead randomness for the whole process to be secure.

If SNARK is transparent it means you just don’t have to worry about any of the CRS and the toxic waste at all. There is no trusted setup, there’s no trapdoor, that someone might know and use to forge proofs of false statements.

Trusted Setup

The setup is a process where the CRS is called Trusted Setup and ideally you don’t want to have to run a different trusted setup for every time you tweak your circuit. So usually, you’d like 200 people get together run the trusted setup through MPC and use it from now on doesn’t matter what circuit they use, as long as the circuit isn’t too big.

And that’s kind of what people are doing today like zCash and others ran some of these trusted setups a while back and now other projects get to use those same strings as long as they believe that someone in that ceremony from long ago was honest and didn’t collude with every other person in the ceremony.

There is also potential issue with the increased complexity and thus possible issues arising from that, such as zCash CRS bug.

Stay Tuned

In the next article we’ll be talking about SNARK backend design and its key components.

--

--