The Next 700 EVM Languages

Transcript

This is a re-recording of a talk I gave at Devcon 7, where I share a few thoughts on programming language characteristics that, as smart contract developers, we should really care about.

A lot of this is informed by my work on OpenZeppelin Contracts, the widely used Solidity library. During the 6 years that I was a developer and maintainer for the project, I had to read and write a lot of Solidity, so this is a little retrospective on that.

The original ambition set out for OpenZeppelin Contracts was to curate a repository of common types of contracts that developers could reuse instead of writing their own and risk introducing bugs in that code. It quickly became obvious that the code we could offer in the library was only a starting point for developers and that they usually needed to add custom behavior on top. So it became a goal for the code to not only be secure on its own but to remain secure in the context of the developer’s own code, and that also helped prevent errors in that code. So a different way to state this is that OpenZeppelin Contracts had to be conceived as a library of abstractions, and in particular safe abstractions.

I’ll show an example of this that I really like. The ERC20 token is a staple of Ethereum. An implementation of ERC20 has to ensure some basic properties like that an account has enough balance to initiate a transfer. It also has to preserve certain state invariants, like that the sum of balances matches the total supply. And on top of this it has to emit events that track changes in balances. All of these things are relatively easy to ensure in a finished implementation, but in a library we’re only providing a starting point that developers will extend, so can we ensure that these properties hold in the final token contract that a developer puts together?

In fact it was easy to break these assurances in early versions of the library.

import {ERC20} from "@openzeppelin/contracts";

contract Token is ERC20 {
    constructor(address premint) {
        uint256 amount = 10000e18;
        balances[premint] += amount;
    }
}

Even if the ERC20 implementation in the library was correct, this simple contract wasn’t, because it breaks the total supply invariant. We’re increasing a balance without also increasing the total supply. Once we fix that, it’s not even correct yet, because the Transfer event is missing.

import {ERC20} from "@openzeppelin/contracts";

contract Token is ERC20 {
    constructor(address premint) {
        uint256 amount = 10000e18;
        balances[premint] += amount;
        totalSupply += amount;
        emit Transfer(0, premint, amount);
    }
}

At some point we will have fixed everything. But this is the kind of thing we’d like to make easier. We can start by providing an abstraction: a _mint function that always does all three things and makes sure the invariants are preserved.

import {ERC20} from "@openzeppelin/contracts";

contract Token is ERC20 {
    constructor(address premint) {
        uint256 amount = 10000e18;
        _mint(premint, amount);
    }
}

This is really good, but is it enough? Suppose an auditor sees a contract that is using ERC20 from OpenZeppelin, but extending it, are they able to assume that these invariants hold? Again, not in the earlier versions of the library. They would have to keep in mind and make sure, as they go through each line of code, that supply is exclusively created by using _mint.

The auditor, and honestly every person involved in development and review, has a lot more important things they should be able to focus on. So we want to offload this kind of concern to a machine, like a linter, or a static analyzer. And in fact the compiler can take care of this for us, if we make the balances and supply variables private. So we end up with an abstraction, with encapsulated state, that provides a safe interface to manipulate ERC20 balances.

These sound like simple mistakes and a private variable doesn’t sound like much. But even very recently we see this problem show up. Maker, a well respected project in the space, deployed the Savings DAI token in 2023 without the Transfer event for mints. As a result you will see all kinds of tooling (like explorers, wallets, or tax software) struggle with this token. Unfortunately they didn’t use a library, they didn’t use a good ERC20 abstraction, and by implementing the logic at a lower level, they ended up with a non-standard token.

What I think this example shows is the importance of libraries, but more specifically the importance of abstractions, and building on safe abstractions that encode and preserve the properties that we’re interested in.

I mentioned previously that developers need to add custom behavior on top of the basics provided by the library, and additionally the library wants to provide some of these as opt-in modules. We call this extensibility and modularity, and essentially the only mechanism in Solidity to create extensible and modular abstractions is inheritance, often multiple inheritance.

So let’s look at an example, another fruitful abstraction in the context of ERC20, the transfer hook.

contract ERC20Votes is ERC20 {
    function transfer(from, to, amount) override {
        _moveVotingPower(from, to, amount);
        super.transfer(from, to, amount);
    }

    function transferFrom(from, to, amount) override {
        _moveVotingPower(from, to, amount);
        super.transferFrom(from, to, amount);
    }
}

A lot of the customizations people do to ERC20 tokens consist of adding behavior to transfers. At the beginning this was done by inheriting the ERC20 contract and overriding the transfer and transferFrom functions separately to add the behavior. There is a risk here that the author of the extension would override one transfer function and not the other, and we haven’t even touched on other transfer-like functions such as _mint. This would result in inconsistent behavior and very likely a bug.

The abstraction that resolves this is the _beforeTokenTransfer “hook”: a function that the contract invokes at the beginning of every transfer-like function. Custom behavior that is added on this hook is then automatically applied everywhere it should be, and the various features of the contract like transfer or _mint behave consistently with regard to the extension, which is a great improvement.

contract ERC20Votes is ERC20 {
    function _beforeTokenTransfer(from, to, amount) override {
        _moveVotingPower(from, to, amount);
        super._beforeTokenTransfer(from, to, amount);
    }
}

To use this hook abstraction one has to inherit from the contract and override the hook function. But I want to highlight the second line in this function, where we use super, which is extremely important. Because this is using inheritance and overrides, there’s actually a chain of overrides, all of which we have to execute. If we don’t have this line, we’re not able to combine this extension with other extensions that register the same hook… and it’s actually even worse than that, because we can combine it, and it will compile, but it won’t behave the way we expect. In order to correctly use this abstraction, the hook, we have to remember to add this second line, and to do it correctly, by passing the right arguments and so on.

So this is not really a great abstraction. Because abstractions should hide irrelevant details, allow us to focus on the high level goal we’re trying to achieve, and help us avoid mistakes at that. In this case we would like to focus on the voting power logic but we’re distracted by the accidental aspects of inheritance front and center, and risk making a very serious mistake.

Unlike the previous example with private variables, in this case the compiler is not able to help us at all. So if we want to offload the concern we need some custom tool to do that. And although tooling is fine, it is opt-in, and as library authors we have to assume most developers will not use it.

The ideal implementation of a hook abstraction would not require this line to be used safely. So why does OpenZeppelin Contracts do this? Well, because again inheritance is more or less the only mechanism in the language if we want to express extensibility patterns like this one. Even though there are some alternatives we could construct, they’re not considered viable because they get in the way of another important goal: efficiency.

In 2020 a couple of interesting things happened. The Istanbul hard fork quadrupled the cost of reading storage. On top of that, the chain began to be seriously congested, and it was suddenly very expensive to transact on-chain. As a result, gas became an extremely important concern for everyone. Users were paying really high fees, and app developers were being bashed by users for writing gas-inefficient code.

For library authors like us this meant that gas efficiency became a lot higher priority. Users now wanted not just security but also, and maybe even mainly, efficiency. So we had to keep these two things in mind and in some cases find the right balance between them.

Around this time we began to see a really interesting phenomenon. It was the rise of assembly, hand-written inline assembly in Solidity code. In my opinion, the developer community really overcorrected on this, looking to shave just a few units of gas here and there, even though other primitives in the EVM can cost thousands or even tens of thousands of gas.

So I’m quite critical of this movement, but I have to recognize that the proponents of assembly are kind of onto something. They’ve made the observation that their hand-written assembly can be much more efficient than what the compiler generates. And this is, I think, generally true. The higher level abstractions of Solidity are not zero-cost abstractions. And this experience actually reflects that notion as defined in the C++ design principles:

when you use a zero-cost abstraction, you get at least as good performance as if you had handcoded it

This idea has existed for a long time. C++ is a language that was created in the 80s. But recently the idea has gained some new notoriety with the huge popularity of Rust. It’s hard not to be impressed by the success of Rust. Its presence in the blockchain space is particularly strong, and when it comes to smart contract programming languages we see overwhelming influence: Cairo, Noir, Move, Sway, Stylus, Solana, all of these are either Rust-inspired or straight up Rust. And I don’t think this is casual. Smart contracts need to be safe, and they need to make efficient use of resources, and these are things that Rust shines in.

So let’s look at an example of how Solidity is not a zero-cost abstraction language.

function processProof(bytes32[] memory proof, bytes32 leaf) pure returns (bytes32) {
    bytes32 computedHash = leaf;
    for (uint256 i = 0; i < proof.length; i++) {
        computedHash = keccak256(bytes.concat(computedHash, proof[i]));
    }
    return computedHash;
}

This is a function from a merkle proving library where you compute a merkle root from a leaf and a merkle proof. In this function we’re using a number of abstractions that are high-level, in the sense that they don’t map directly to EVM instructions, such as this for loop or this bytes.concat function that we’ll focus on. It turns out that this processProof function has a very undesirable performance characteristic which is that it allocates memory proportional to the size of the proof, memory that is used once and never again needed, it just sits there increasing the memory size and affecting the cost of the rest of the contract. We got this a consequence of using an abstraction, and that the compiler was not smart enough to eliminate or fuse these allocations.

So this is a place where OpenZeppelin chooses to use assembly.

function processProof(bytes32[] memory proof, bytes32 leaf) pure returns (bytes32) {
    bytes32 computedHash = leaf;
    for (uint256 i = 0; i < proof.length; i++) {
        computedHash = efficientKeccak256(computedHash, proof[i]);
    }
    return computedHash;
}

function efficientKeccak256(bytes32 a, bytes32 b) pure returns (bytes32 value) {
    assembly ("memory-safe") {
        mstore(0x00, a)
        mstore(0x20, b)
        value := keccak256(0x00, 0x40)
    }
}

This new implementation is now optimal with respect to allocations, in fact it does zero allocations because it uses the 64 bytes of scratch space that Solidity maintains. We also wrapped the little bit of assembly in a helper function that allows the original one to remain at this higher-level of abstraction so as not to distract from the logic that it’s implementing.

But as a side note, this is technically not zero-cost either, because as far as I found in testing, this function call is not inlined so it does carry some small overhead.

So in this case we got better performance by handcoding in assembly a piece of the logic that we needed. And we also had the discipline to contain it in a nice helper function.

But I really think it’s a mistake to normalize assembly in our smart contracts. Because performance, or gas efficiency, is not the only requirement. Smart contracts have to work, they have to protect funds, they have to be robust against all kinds of attacks. They have to implement these high level goals. When we write assembly we have to think about low level details, like memory layout and dirty bits, and these distract us from those high-level goals. I strongly believe we should be closer to the higher-level end of the spectrum.

This is why high-level languages were developed. Why we moved from assembly, to C, and eventually to JavaScript. It allowed us to solve more and more complex problems. And this is naturally the direction that smart contracts will go in as well.

Now to be fair, the fact that the Solidity compiler is missing optimizations reflects some reasonable decisions by the Solidity team. Smart contracts need high assurance and the compiler is a big piece of that. Last year, a compiler bug in Vyper resulted in millions of dollars stolen. Last month, a bug in the Sway compiler resulted in frozen funds. And while these may not have been optimizer bugs, the additional complexity required to implement optimizations is certainly more surface where bugs can show up. And this is even more true in Solidity’s “legacy pipeline”, which was not designed for optimization at all.

Still, we do need efficient smart contracts. Having to resort to assembly for basic efficiency requirements is not sustainable. So we need to correct course, we need to be able to work at a higher level of abstraction to solve the complex problems of the future. And it’s worth pointing out that higher level languages are not only good for humans, such as developers and auditors, but also for computers, such as static analysis and formal verification, things that we absolutely need.

There is work underway to improve the state of current languages. The Solidity team has been working on the IR pipeline, which is more amenable to optimization, and on a new version of the language that will have better abstraction mechanisms, like generics. We now even have alternative compilers being developed that might contribute to improving the language in various ways. The Vyper team has been working on improving their security processes, also really raising the bar on optimizations, working on their own IR, and have recently introduced modules for abstraction, which was previously notably lacking.

These efforts are very valuable. We should invest to iteratively improve our current languages and compilers. But it’s still early, and there’s no reason to believe we’ve found the best way to program smart contracts, so I also think we should keep exploring and trying new ideas in language design and compiler construction.

Personally, I’m currently exploring a direction for an EVM language I find promising, that I’m calling EVML, and mainly it’s taking ideas from the functional languages and tradition. So why am I interested in this direction in particular? I’ve been talking a lot about abstractions today, and I think the combination of first-class functions, algebraic data types, and an expressive type system, is a kind of swiss-army knife of abstractions that can get us really far. In terms of efficiency, there is a lot to learn from Rust, which has all these features and compiles them efficiently, but also from Haskell and OCaml, which have done great work and published a lot of research on how to compile them well. There are reasons to believe this might even be easier in the EVM. But perhaps more importantly, I think a language with these characteristics can be a small one, with a formal specification and the ability to do formal reasoning, and as a result the possibility of having high confidence in the compiler, perhaps eventually even a fully verified compiler. This language is a work in progress but I’ll be sharing more about it soon.

To wrap up, I’d like to explain the title of this talk. It was inspired by an influential paper from 1965 called “The Next 700 Programming Languages”, which proposed some of these ideas I just discussed as the basis for programming language design. The truth is I can’t tell you what the next 700 EVM languages are, but what I wanted to communicate today is that we should explore this space, as it might just enable us to overcome some significant limitations that we face today.