Testing all the code

Ensuring the security of deployed Ethereum contracts is one of the most important challenges facing Ethereum developers today. After the unfortunate problems with the DAO, the resulting hard fork and associated community fallout, hopefully no-one reading this article would disagree. Indeed, there was a whole session devoted to the topic at the recent DevCon2, with talks spanning formal verification, coding best practices and more. However, the Ethereum community seems to have been silent regarding one of the first tools invented to help avoid unintended behaviour from code — code coverage.

Code coverage, first devised in the 1960s, encapsulates the idea that tests should ‘touch’ all of the code under test. If the tests execute all of your code, and the results from the tests are as expected, it is less likely your code contains unforeseen bugs. Untested code could do anything!

I will freely admit this is not a silver bullet. Writing tests achieving even 100% code coverage would not have caught the reentrancy bug that was the fatal flaw in the DAO unless someone had written a particularly inspired test. Indeed, blindly writing tests to achieve 100% coverage is probably not the best use of a developer’s time. Code coverage should only be treated as another arrow in our collective quiver.

Results

I have written a tool called SolCover. It uses Istanbul (see later) to generate the reports themselves. The HTML report is probably the easiest for users to interpret:

Snippet from some code coverage results.

The numbers in the margin indicate each line here is run 17 times during the tests. Which seems good! But we need to look deeper. The black block containing a yellow ‘I’ tells us the ‘if’ statement never evaluated to true in any of the tests, and this is also shown by ‘throw’ being highlighted in red, which indicates that statement is never executed. Perhaps this is an oversight in the testing, and a user with the wrong permissions is just never used. But perhaps the ‘userIsInRole’ function has a bug, and only ever returns ‘true’, and this represents a security hole? The only way to know is to write a test that should trigger that branch…

If you are not interested in methodology used, and just want to use SolCover, you can get it from GitHub, where the usage instructions can be found too, but be aware that is probably still extremely fragile at the moment. Otherwise, carry on for an overview of the general method and then some nitty-gritty details.

Implementation

This was only as easy to put together as it was because of the excellent tools that already exist, both in Ethereum and the wider community. Istanbul is used throughout the Javascript community for code coverage, and is well supported and documented. It generates a coverage.json file, which is then interpreted in conjunction with the source files to make the coverage report. My script generates the coverage.json file independent of Istanbul, but in the expected format, and just uses Istanbul to generate the reports after the fact.

For the testing itself, it is assumed Truffle is being used, as this is what we’re using at Colony. This is a great testing framework that uses the Javascript frameworks Mocha and Chai for running and writing the tests and makes it very easy to start writing tests. Indeed, it expects it of you, with the demo project created with the ‘truffle init’ command pointedly containing a ‘test’ directory.

How does SolCover know which lines or statements have been run? By changing the source code through a process called instrumentation, which is how most coverage tools work. Before every line or statement of interest, a call to emit an event is inserted that indicates the immediately following piece of code has been run. These events are then interpreted alongside the source files to generate the ‘coverage.json’ file, which is used to generate the report itself.

The instrumentation is done by parsing the contracts with SolParse, and then based on the statements there, altering the contract appropriately. There were technical issues surrounding injecting these alterations to the contract, however, some of which are described in the ‘instrumentation notes and difficulties’ section below.

In the context of Ethereum, a large amount of instrumentation on an already large contract requires increasing the block size limit so the modified contract can be deployed. This is trivial to do when using TestRPC (requiring only a command line flag), which SolCover uses when running the tests. However, it does also require modification of a project’s truffle.js due to the increased contract size, and therefore deployment costs.

Instrumentation notes and difficulties

Modifers

Modifier definitions are treated exactly the same as function definitions, and are therefore counted in the ‘function coverage’ metric.

The ternary operator

The ternary operator is an incredibly useful tool, which allows concise code to be written without sprawling ‘if’ statements. A simple use of it looks like:

This snippet sets ‘x’ to either 1 or 2, depending on whether y is equal to z or not. We wish to instrument it and so be able to track whether the statement has resolved to both false and true during our tests. In Javascript, Istanbul would instrument this code in the following way:

The comma operator is used here to change the underlying code as little as possible. Unfortunately, the comma operator does not exist in solidity at the statement level (which is certainly confusing to me). To instrument this code successfully, we have to modify it more than I would like:

This abuses the fact that on the left hand side of an assignment operator, a tuple is allowed to have empty elements. This works, and is probably the best we can achieve here without the comma operator as a first-class citizen.

.call() and throw

In Ethereum, functions in a contract can be called locally, which means they do not actually cause changes to the blockchain. They are executed, usually to read a value or estimate gas costs for a transaction, and then reverted without the transaction being broadcast to the network. Unfortunately, this means if we relied on a filter — which is the usual way to access events — these events would be invisible to us, and would not be seen by our coverage.

Similarly, if a transaction fails due to a throw statement being hit, while a transaction is created all changes are reverted (other than all gas sent with the transaction being taken). This includes events. That means ordinarily, our test coverage would not be able to tell us when our contracts threw — which we should certainly be making sure they do at appropriate points.

To get around these problems, I modified ethereumjs-vm to log all events as they were executed on the VM to a separate file. Even when the changes due to a transaction are reverted, this file logs the event occurred, however quickly it was reverted. This allows us to track coverage even in these cases, and get a true idea of our test coverage.

Invisible else branches

Rather than explain the subtlety here, I will just link to an excellent blogpost discussing the importance of branch coverage as well as line coverage, before noting we also have to transform ‘if’ statements in nonintuitive ways. If we have the very reasonable looking code:

This needs to be transformed to track if all possible branches have been tried. The transformation used is again more invasive than I would like, but works.

Other caveats

It is entirely possible that when running SolCover, your tests will fail if they test or rely on properties of your contracts related to how much gas they use. This is because the instrumentation process significantly increases the cost for a deployment (up to ~4 times), as well as the cost of function calls due to the extra events. I have already made the gas cost for EVM’s LOG call zero to help this, but further work should probably be done on this front. Given that this tool already uses its own modified TestRPC , this is easy implement with few other side effects.

Conclusion

This only really rates as a first stab at a coverage tool. Some types of statements could easily be instrumented incorrectly, or have been overlooked in this implementation. Eagle-eyed developers will note — presumably with a wry grin — that there are very few tests in the repository! Issues describing problems users encounter are of course welcome at GitHub, especially if they are accompanied by pull requests!


About the author

Alex considers himself a recovering astrophysicist. After completing his Ph.D. at Cambridge, he decided academia wasn’t all it was cracked up to be, and started working for Colony. As a cofounder, he does whatever needs doing, but is happiest when faced with a screen full of code.

Alex can be found on Github, Twitter and probably anywhere the username ‘area’ is used.

To keep up to date on the development of Colony, you can find us on Twitter, or sign up for our (infrequent!) newsletter on our website.