Formally Verifying Curve25519-dalek: A Two-Team Approach

20 Apr

Elliptic-curve cryptography underpins much of modern security — from TLS handshakes to digital signatures to privacy-preserving tools such as Signal Messenger. Yet even carefully written, battle-tested cryptographic libraries can harbor subtle bugs that are nearly impossible to catch through testing alone. The alert fragmentation attack uncovered by formal verification is a stark example: a tiny, overlooked flaw in TLS message handling can fully break authentication across all versions, even when the underlying cryptography is mathematically sound. Our curve25519-dalek verification project takes a different approach: we use formal verification to provide machine-checked, mathematical guarantees that curve25519-dalek — one of the most widely used elliptic-curve cryptography libraries in the Rust ecosystem — behaves exactly as its mathematics demands. We did this verification project both for the peace of mind of the over 70 million people using curve25519-dalek, and to improve AI-powered verification workflows for future projects. Of the 35,000 lines of code (35 KLOC) we inspected, we found 0 defects/KLOC. This blog post describes our results and lessons learned.

What are We Verifying?

The curve25519-dalek library provides a Rust implementation of Curve25519 group operations, which underlie cryptographic protocols used in systems such as Signal, Tor, and TLS. Our verification effort spans a Rust crate consisting of 288 functions, covering the core cryptographic primitives and algorithms that make up the library. A complete verification must reason about the entire cryptographic stack, from low-level arithmetic to higher-level protocol operations:

Finite-field arithmetic over GF(2^255 - 19), ensuring that optimized limb-based implementations correctly realize the intended mathematical field operations.
Scalar arithmetic, including canonical representations and reductions modulo the group order that underpin secure scalar multiplication.
Elliptic-curve group operations, covering both Edwards and Montgomery forms of Curve25519, and proving that the implemented formulas correctly realize the intended group laws.
Encoding and abstraction layers, such as compressed representations and the Ristretto construction, which eliminate cofactor-related pitfalls while preserving the underlying algebraic structure.
Protocol-level building blocks, including optimized scalar multiplication algorithms and other higher-level primitives used by real-world applications.

Formally verifying such a system is challenging because it requires connecting machine-level Rust implementations with deep algebraic properties of elliptic curves and finite fields, while also accounting for optimized representations and multiple coordinate systems used throughout the codebase. To our knowledge, this is the first end-to-end formal verification of an elliptic-curve cryptographic library at this scale, spanning both implementation-level correctness and the underlying mathematical guarantees.

Two Verification Teams, One Shared Goal

The project is structured around two complementary verification efforts that together form an unbroken chain of trust from the Rust source code to the underlying mathematics. One team would use Lean; with Aeneas, one can translate Rust code semantics into Lean, which can then be equipped with lemmas and proofs to reason about the properties of the translated code. The other team would use Verus, which uses Rust-native macros to embed pre-and post-conditions directly into Rust source code (no-ops to the Rust compiler), as well as allow for the introduction of lemmas and proofs as Verus-dialect Rust code.

The two teams initially set out to verify different Rust source files in curve25519-dalek, with their coverage areas partitioned so that together they span the entire library. This division of labor was intentional: it allowed each team to apply the verification methodology best suited to each component, while the union of both efforts achieved full library coverage. It is worth noting, however, that curve25519-dalek provides multiple back ends (e.g., serial 32-bit, serial 64-bit, and SIMD implementations). In this work, both teams focused exclusively on the serial 64-bit back end, which is widely used in practice. Thus, our notion of “full library coverage” refers specifically to this back end.

Once the combined efforts of Verus and Lean established the correctness of the entire library, both teams started working toward independently verifying full coverage (the goal was not only to verify, but also to better understand the strengths and weaknesses of the two approaches for future work).

This division allowed each team to play to its strengths. Verus excels at reasoning about low-level Rust implementations—especially files that involve machine-level concerns such as integer overflow or native Rust types. Lean, on the other hand, is better suited for more abstract mathematical reasoning and for proving general mathematical lemmas (e.g., Fermat’s Little Theorem), which are often cumbersome to express directly in Verus. As we gained a better understanding of the verification process and the time it would take to verify, we managed to extend our scope to the entire crate in both languages.

Team 1 — Verus: Specifications and Compile-Time Proofs

In Verus, one writes the mathematical specifications directly alongside the Rust source, using pre- and postcondition annotations (requires, ensures), specification functions (spec fn), and proofs (proof fn). To verify a crate, `cargo verus verify` builds all non-Verus dependencies with standard rustc, then routes the target crate through the Verus compiler, which checks that the executable code satisfies its specifications via an SMT solver (Z3).

The Verus effort covers the full stack — from `FieldElement51` arithmetic and Fermat's Little Theorem, through scalar and Montgomery reduction, up to Edwards/Ristretto point operations and the Lizard protocol. The resulting `*_specs` modules constitute the project's authoritative mathematical ground truth: they precisely state what the library is supposed to compute, without ambiguity.

Team 2 — Lean 4: Implementation-Level Proofs via Aeneas Extraction

The Lean team takes a complementary approach. Instead of annotating the Rust source directly, it uses Aeneas to automatically translate the Rust implementation into a functional Lean representation that preserves the exact computational behavior of the original code.

Team members then write carefully designed specifications for the extracted Lean code, capturing and constraining its intended correctness properties. By formally proving that the extracted Lean program satisfies these specifications, the team establishes the correctness of the extracted model. Because the extraction preserves the semantics of the Rust implementation, these proofs also provide guarantees about the correctness of the original Rust code.

Results

For Lean verification, we previously introduced several modifications to the Rust code as well as three modifications to the output Lean code generated by Aeneas due to its earlier limitations. These semantically equivalent modifications are no longer needed, and we are eliminating them. All modifications have been appropriately documented.

For Verus, we had to sometimes introduce minor syntactic tweaks (preserving semantics) to remain in the Rust fragment currently supported by the Verus checker. Some examples include:

Introducing wrappers around external functions, so that they could be annotated with postconditions, e.g. [1]
Replacing iterators with alternative syntax, since they are unsupported in Verus at the time of writing this, e.g. [2]

De-chaining method application by introducing named intermediate values, to allow proofs to refer to those values, e.g. [3]

Constructing a constant-initialized vector manually, instead of with the vec! macro, since that is unsupported in Verus, e.g. [4]

De-sugaring overloaded arithmetic operator use, due to lack of Verus support, e.g. [5]
Changing anonymous struct field into a named one, e.g. [6]

All changes made include a tag and a reference to the original code, typically within an <ORIGINAL CODE> comment block.

Repositories:

Challenges in the verification

Formally verifying curve25519-dalek is challenging due to both the depth of the underlying mathematics and the complexity of the Rust implementation. On the mathematical side, the verification requires reasoning about sophisticated algebraic structures underlying elliptic-curve cryptography. This includes finite fields, elliptic-curve groups, and the relationships between different curve representations. Bridging the gap between these abstract mathematical objects and their concrete implementations requires substantial formalization effort. On the program side, the difficulty comes from the scale and structure of the code.

During the Lean verification, proof states can contain a large number of local variables and intermediate hypotheses generated by the translation from Rust code. This often makes it difficult for formalizers to quickly understand the current proof state and determine the next proof step.

During the Verus verification, the proof state is, unlike Lean, not tracked explicitly, so a reader must construct their own mental representation when following a proof. This can be made simpler, or more difficult, depending on how much a proof utilizes sub-goal scoping (assert … by). In addition, since Verus is SMT-based, it is subject to proof instability, a phenomenon where seemingly unrelated changes in the repository break existing proofs, because they changed internal solver heuristics. This can happen when a proof relies on a term implicitly deducible from the context by some nontrivial deduction step (or sequence of steps); the solver will often instantiate some, but not all, pattern-matched terms, deciding which ones to omit based on internal solver metrics. To stabilize such proofs, one needs to explicitly state the critical term required by the proof, instead of relying on its implicit derivation. Together, these factors make the verification process both mathematically and technically demanding.

Findings and Improvements

While using Verus to formally verify Scalar52::to_bytes (formerly as_bytes), our team discovered incomplete documentation regarding the function’s preconditions; the function is only injective under constraints that are tighter than plain limb-boundedness.

The dalek maintainers confirmed this is expected: to_bytes is only meant to be called after Montgomery reduction, which guarantees the tighter bound. The type Scalar52 is internal (not part of the public API), so the invariant is enforced by usage rather than by the type system. They acknowledged this is a potential misuse foothold and opened issue #898 to amend the documentation.

During the verification process, we interacted extensively with the Aeneas toolchain developers and we are grateful to their support of this project. We assisted the toolchain development by testing it on the rather diverse code we targeted, reporting issues, discussing new functionalities, etc. Moving forward we aim to contribute more to this and other critical open-source verification tools.

The project has also served as a useful reference for other verification efforts, particularly in terms of structuring specifications and maintaining scalability in large verification codebases.

Similarly, we are also in touch with the Verus developers regarding the Verus standard library. We have already contributed a selection of lemmas, mostly to do with the 128-bit datatypes, which are used less commonly than the others and sometimes require special case treatment in their proofs, and there are currently plans underway for a large selection of our generic mathematics theorem library to become a part of the standard library in the future.

How does AI help our verification?

AI tools have significantly improved our productivity in the verification process. Coding agents such as Claude Code are particularly helpful at generating draft Lean specifications and proof scaffolding for Rust functions. They can quickly draft definitions, invariants, and lemma statements that would otherwise require substantial manual effort. In practice, we observe that successful AI-assisted proving often depends on properly decomposing a theorem into smaller subgoals. When a proof is structured so that each subgoal is sufficiently small and well-scoped, AI tools are much more likely to produce useful proof steps. To further improve the effectiveness of AI in our workflow, we have also developed Formal Verification Skills, a curated set of prompts, patterns, and workflows designed to enhance the ability of coding agents to assist with formal verification tasks. These techniques help AI better understand proof contexts, suggest appropriate tactics, and generate structured proof outlines—making it a practical collaborator in large-scale verification projects.

Despite having a comparatively smaller body of work available to train on, AI assistants (Claude, in particular) provide support for Verus proofs as well. While they were of relatively limited use at the start of the project, we’ve observed that since around December 2025, the quality of AI-generated proofs has improved to the point of usability. The two main areas where AI-assisted Verus code could be further enhanced is conciseness (often, redundant or duplicated code is generated, such that manual trimming can reduce the code by up to 50%) and sub-goal scoping (AI assistants are comparatively bad at structuring the proofs in scoped assert … by blocks, unlike humans, for whom this structure aids in understanding the proof). Due to this, as a general rule, AI-generated code requires manual review to be maintainable (the above issues often exacerbate proof instability).

Summary

Formal verification of real-world cryptographic libraries remains challenging due to both mathematical complexity and implementation details. In this post, we have presented our work on verifying curve25519-dalek, a widely used Rust elliptic-curve library, using Verus and Lean. We outline the verification approach, key challenges, and how AI tools can assist in scaling formal proofs.

Code Snippets

  
    [1]
#[verifier::external_body]
pub fn ct_eq_bytes32(a: &[u8; 32], b: &[u8; 32]) -> (c: Choice)
    ensures
        choice_is_true(c) == (*a == *b),
{
    a.ct_eq(b)
}

...

impl ConstantTimeEq for CompressedEdwardsY {
    fn ct_eq(&self, other: &CompressedEdwardsY) -> (result: Choice)
        ensures
            choice_is_true(result) == (self.0 == other.0),
    {
        /* <VERIFICATION NOTE>
         Use wrapper function ct_eq_bytes32 instead of direct subtle call to ct_eq for Verus compatibility.
        </VERIFICATION NOTE> */
        /* <ORIGINAL CODE>
         self.as_bytes().ct_eq(other.as_bytes())
         </ORIGINAL CODE> */
        ct_eq_bytes32(self.as_bytes(), other.as_bytes())
    }
}

[2]
impl EdwardsPoint {
    /// Original `Sum` implementation using `Iterator::fold`.
    ///
    /// This is used for exec correctness/performance, but is not verified directly.
    /// The verified implementation is `Sum::sum` below, which reduces to `sum_of_slice`.
    /// Functional equivalence is tested in `mod test_sum` (at the bottom of this file).
    #[verifier::external_body]
    pub fn sum_original<T, I>(iter: I) -> (result: EdwardsPoint) where
        T: Borrow<EdwardsPoint>,
        I: Iterator<Item = T>,
     {
        iter.fold(EdwardsPoint::identity(), |acc, item| acc + item.borrow())
    }
}

impl<T> Sum<T> for EdwardsPoint where T: Borrow<EdwardsPoint> {
    fn sum<I>(iter: I) -> (result: Self) where I: Iterator<Item = T>
        requires
            forall|i: int|
                0 <= i < spec_points_from_iter::<T, I>(iter).len()
                    ==> #[trigger] is_well_formed_edwards_point(
                    spec_points_from_iter::<T, I>(iter)[i],
                ),
        ensures
            is_well_formed_edwards_point(result),
            edwards_point_as_affine(result) == sum_of_points(spec_points_from_iter::<T, I>(iter)),
    {
        let points = collect_points_from_iter(iter);
        EdwardsPoint::sum_of_slice(&points)
    }
...
}

[3]
impl Neg for &Scalar {
    type Output = Scalar;

    #[allow(non_snake_case)]
    fn neg(self) -> (result: Scalar)
        ensures
            group_canonical(scalar_as_nat(self) + scalar_as_nat(&result)) == 0,
    {
	/* <ORIGINAL CODE>
       let self_R = UnpackedScalar::mul_internal(&self.unpack(), &constants::R);
       let self_mod_l = UnpackedScalar::montgomery_reduce(&self_R);
       UnpackedScalar::sub(&UnpackedScalar::ZERO, &self_mod_l).pack()
       </ORIGINAL CODE> */
...
       // Execute the actual computation
       let self_unpacked = self.unpack();
       ...
       let self_R = UnpackedScalar::mul_internal(&self_unpacked, &constants::R);
       ...
let sub_result = UnpackedScalar::sub(&UnpackedScalar::ZERO, &self_mod_l);
let result = sub_result.pack();
...
result
    }
}

[4]
pub(crate) fn batch_invert(
        inputs: &mut [FieldElement],
    ){
...
/* <VERIFICATION NOTE>
        Build vec manually instead of vec![one; n] for Verus compatibility
       </VERIFICATION NOTE> */
       /* <ORIGINAL CODE>
        let mut scratch = vec![FieldElement::ONE; n];
       </ORIGINAL CODE> */
       let mut scratch = Vec::new();
       for _k in 0..n  // Added loop variable _k to help with verification
           invariant
               scratch.len() == _k,
               scratch@ =~= Seq::new(_k as nat, |j: int| one),
       {
           scratch.push(one);
       }
}

[5]
impl Neg for Scalar {
    type Output = Scalar;

    fn neg(self) -> (result: Scalar)
        ensures
            group_canonical(scalar_as_nat(&self) + scalar_as_nat(&result)) == 0,
    {
        let result = (&self).neg();
        result
    }
    /* <ORIGINAL CODE>
    fn neg(self) -> Scalar {
        -&self
    }
    </ORIGINAL CODE> */

}

[6]
pub struct FieldElement51 {
    // ADAPTED CODE LINE: we give a name to the field: "limbs"
    pub limbs: [u64; 5],
}


  
  

Mat Azevedo