If you are familiar with a language like JavaScript, you tend to never think about how your variable is stored, except to deal with the scope of the variable. When you are making programs to run on a distributed system like a blockchain, you have to think about things a bit differently. Solidity works as a compiled language where each operation gets converted to a lower level opcode, which the EVM can understand an interpret. Every operation that you write on your program gets executed on every computer in the network, which is why every operation costs ‘gas’ to prevent spamming and infinite loops. In solidity, getting to know the machine readable operations and their associated cost literally saves you money.
Gas optimization is a challenge that is unique to developing Ethereum smart contracts. To be successful, we need to learn how Solidity handles our variables and functions under the hood. Some of the techniques we cover will violate well known code patterns. Before optimizing, we should always consider the technical debt and maintenance costs we might incur.
Variables life cycle optimisations
Initialisation
Every variable assignment in Solidity costs gas. When initializing variables, we often waste gas by assigning default values, that will never be used.
uint256 value1; // this one is cheaper
uint256 value2 = 0;
Deletion
Ethereum gives us a gas refund when we delete variables. Its purpose is an incentive to save space on the blockchain, we use it to reduce the gas cost of our transactions. Deleting a variable refunds 15,000 gas up to a maximum of half the gas cost of the transaction. Deleting with the delete keyword is equivalent to assigning the initial value for the data type, such as 0 for integers.
Variable storage optimisations
Storing variables in memory, on a normal computer, is easy and cheap. However, blockchains are distributed systems, in which every node — or computer — has to store the data locally. That makes storing data expensive, this behavior is discouraged except where necessary.
Variable packing
Solidity contracts have contiguous 32 byte (256 bit) slots used for storage. When we arrange variables to fit in a single slot, it is called variable packing. Variable packing is like a game of Tetris. If a variable we are trying to pack exceeds the 32 byte limit of the current slot, it gets stored in a new one. We must figure out which variables fit together the best to minimize wasted space. Because each storage slot costs gas, variable packing helps us optimize our gas usage by reducing the number of slots our contract requires. Let’s look at an example:
uint128 a;
uint256 b;
uint128 c;
These variables are not packed. If b was packed with a, it would exceed the 32 byte limit so it is instead placed in a new storage slot. The same thing happens with c and b.
uint128 a;
uint128 c;
uint256 b;
These variables are packed. Because packing c with a does not exceed the 32 byte limit, they are stored in the same slot. Keep variable packing in mind, when choosing data types — a smaller version of a data type is only useful if it helps pack the variable in a storage slot. If a uint128 does not pack, we might as well use a uint256. Also pay attention to the upgradability aspect of your program — see the post about anti-patterns discussing the risks of the data separation pattern.
The other consideration is unpacking. As the EVM operates on 32 bytes at a time, variables smaller than that get converted. If we are not saving gas by packing the variable, it is cheaper for us to use 32 byte data types such as uint256.
Data location
Variable packing only occurs in storage — memory and call data does not get packed. You will not save space trying to pack function arguments or local variables.
Reference data types
Let’s look at the size of some common data types in Solidity:
- uint256 is 32 bytes
- uint128 is 16 bytes
- uint64 is 8 bytes
- address (and address payable) is 20 bytes
- bool is 1 byte
- string is usually one byte per character
You can further break down a uint into different sizes, uint8, uint16, uint32… just keep in mind that your integer will overflow if you are using solidity version < 0.8.0 or your function will fail if you are using version > 0.8.0. The largest number is calculated as 2^(number of bits) — 1, meaning uint8 goes up to ((2⁸) — 1) = 255 before your function fails.
Structs and arrays always begin in a new storage slot — however their contents can be packed normally. A uint8 array will take up less space than an equal length uint256 array. It is more gas efficient to initialize a tightly packed struct with separate assignments instead of a single assignment (which is very counter-intuitive). Separate assignments makes it easier for the optimizer to update all the variables at once. Initialize structs like this:
Point storage p = Point();
p.x = 0;
p.y = 0;
Instead of:
Point storage p = Point(0, 0);
Inheritance
When we extend a contract, the variables in the child can be packed with the variables in the parent. The order of variables is determined by C3 linearization. For most applications, all you need to know is that child variables come after parent variables.
Storing data in events
Data that does not need to be accessed on-chain can be stored in events to save gas. While this technique can work, it is not recommended — events are not meant for data storage. If the data we need is stored in an event emitted a long time ago, retrieving it can be too time consuming, because of the number of blocks we need to search. But you can also spin-off some off-chain daemon to do the search and then call your contract with the search results.
Data types optimisations
We have to manage trade-offs, when selecting data types to optimize gas. Different situations can make the same data type cheap or expensive.
Memory vs. Storage
Performing operations on memory — or call data, which is similar to memory — is always cheaper than storage. A common way to reduce the number of storage operations is manipulating a local memory variable before assigning it to a storage variable. We see this often in loops:
uint256 return = 5; // assume 2 decimal places
uint256 totalReturn;function updateTotalReturn(uint256 timesteps) external {
uint256 r = totalReturn || 1;
for (uint256 i = 0; i < timesteps; i++) {
r = r * return;
}
totalReturn = r;
}
In updateTotalReturn method, we use the local memory variable r to store intermediate values and assign the final value to our storage variable totalReturn.
EVM Assembly
If we break it down into how this works, you actually need to read from the return storage variable multiple times — O(n). In some languages that might not be a problem, however if you understand how data is stored on the blockchain, you would realize that reading the variable for amount is an in-memory operation, while reading the variable percentage is a storage operation. It is a different assembly code.
Whenever you read the variable return, you are getting data from the blockchain database (A network of computers that each have to validate that piece of data), and this is done through an opcode called SLOAD which according to the Ethereum Yellow Paper costs up to 2100 gas to execute. Ever since EIP-2929 the first SLOAD operation costs 2100 gas, but once that memory is read, it is cached and considered considered warm, which has a cost of 100 gas to load again.
So if are passed with timesteps=10, you would end up spending at least 2100+900=3000 gas on reading this variable. To combat that, you could always store the object in memory, and load it from there, which is much cheaper (around 3 gas). So what you could do is write from storage to memory once (SLOAD + MSTORE) = 2103 gas, then read the memory variable ten times for 30 gas, for an almost 30% gas reduction for that transaction. So, the code would look like this:
uint256 return = 5; // assume 2 decimal places
uint256 totalReturn;function updateTotalReturn(uint256 timesteps) external {
uint256 r = totalReturn || 1;
uint256 l_return = return;
for (uint256 i = 0; i < timesteps; i++) {
r = r * l_return;
} totalReturn = r;
}
SSTORE and SLOAD are two of the more expensive OPCODEs in the EVM for the reason outlined above. So it is better to use MLOAD instead when possible. However, there are times that you need to instantiate a variable on creation or deployment and do not expect that variable to change. In these cases, you can use a constant or immutable modifier, which will let the solidity compiler know about the future of that variable. Let’s take an example simplified contract:
contract Token {
uint8 VERSION = 1;
uint256 decimals; constructor(uint256 val) {
decimals = val;
}
}
The decimals variable in the contract is only there for display purposes in the frontend, ERC20 tokens don’t actually have a concept of ‘decimals’. This means that it does not use it, it is only for users of the contract to know how to format outputs. This also means that the variable should not change. To indicate that it should not change, we have two options, constant or immutable.
According to the docs: The compiler does not reserve a storage slot for these variables, and every occurrence is replaced by the respective value. Constant variables are replaced at compile time by their values, while immutable variables are replaced at deployment time. Either way, we avoid the annoying fees required with doing an SLOAD operation. One simple fix looks like:
contract Token {
uint8 constant VERSION = 1;
uint256 immutable decimals;
constructor(uint256 val) {
decimals = val;
}
}
One small caveat to note is that constant variables cannot make reference to the state of the blockchain nor call external contracts. You cannot do something like:
uint256 constant VERSION1 = block.number;
uint256 constant VERSION2 = address(this).balance;
uint256 constant VERSION3 = msg.value;
uint256 constant VERSION4 = gasleft();
Immutable variables however, are fine to do so.
Fixed vs. Dynamic
Fixed size variables are always cheaper, than dynamic ones. If we know how long an array should be, we specify a fixed size:
uint256[12] monthlyTransfers;
This same rule applies to strings. A string or bytes variable is dynamically sized; we should use a byte32 if our string is short enough to fit.
Also worth noticing, as many developers add description to the require statement, we can make them cheaper by limiting the string length to 32 bytes.
If we absolutely need a dynamic array, it is best to structure our functions to be additive instead of subractive. Extending an array costs constant gas, whereas truncating an array costs linear gas.
Mapping vs. Array
Most of the time it will be better to use a mapping instead of an array, because of its cheaper operations. However, an array can be the correct choice when using smaller data types. Array elements are packed like other storage variables and the reduced storage space can outweigh the cost of an array’s more expensive operations. This is most useful when working with large arrays.
Overall
Keep in mind these techniques when you write you Solidity code, but also remember that while gas optimisations may save thousands, an exploit left unnoticed can cost hundreds of thousands. Always, always prioritise simplicity, maintainability and modularity of the code, over local optimisations!
Originally published at https://deeprnd.blogspot.com on January 5, 2022.