Soft Forks, Censorship, and Stake

It turns out soft forks are indistinguishable from censorship. In that case, we had better make sure control over them is adequately distributed among the stake holders of the network, lest we be subject to soft-fork attacks.


There are four broad categories of attacks in blockchain land:

1) safety violation – a fork in the blockchain; the source of double spends
2) liveness violation – no new blocks are added; the chain halts
3) censorship – certain valid transactions are not included in blocks
4) invalid execution – certain invalid transactions or data are included in a block

I first heard this stated by Vlad Zamfir, and I believe it is correct. One might wish to add additional categories of attack, such as privacy invasion (for instance de-anonymizing transactions) and manipulation of fees/capacity (for instance by sending lots of transactions with high fees), but these are more probably consequences of design than explicit failure scenarios – Bitcoin was never really designed to be that anonymous and higher fees is just the healthy fee market kicking in.

The first two attacks, safety and liveness violation, are properties of the underlying protocol, and are the focus of traditional research on distributed consensus. Any viable algorithm must prove to satisfy safety and liveness. The latter two, of avoiding censorship and invalid state transitions, are more economic and political in nature, and are emphasized much more by the blockchain community, where censorship resistance and code-as-law are cherished ideals.

Ironically, and seemingly less well known, the latter two attacks, of censorship and invalid execution, correspond exactly to what are called “soft forks” and “hard forks”, respectively.

Forks: Soft and Hard

A soft fork is typically described as an update to the protocol that is backwards compatible – peers that do not upgrade can still get along just fine. Sometimes soft forks are said to correspond to a “narrowing” of the rules – transactions which used to be perfectly acceptable may no longer be accepted if they do not account for the new rules. This is blatant censorship. That is, a soft fork is indistinguishable from censorship.

A hard fork is typically described as an update to the protocol that is not backwards compatible – peers that do not upgrade will fork off onto their own chain, as the new rules appear invalid. Sometimes hard forks are said to correspond to a “widening” of the rules – transactions which used to be invalid may now be valid under the new rules. That is, a hard fork is indistinguishable from invalid execution.

It is humourous for proponents of Bitcoin to champion its censorship resistence when it fact the very thing that has enabled Bitcoin to improve its feature set is blatant censorship. The difference is, censorship in the form of a soft fork is explicitly voted for before it activates, and has some guise of community support, though this need not be the case.


In Bitcoin, at least, voting for a soft fork is strictly the domain of miners, and as we saw in July 2015 with BIP66, they can even screw it up. But the distribution of bitcoin users is much different than bitcoin miners, and the users have no say in a soft fork (which is, remember, indistinguishable from censorship). Of course there is a meta-protocol argument that miners will only implement soft forks (censorship) that are good for everyone, like Pay-2-Script-Hash and Segregated Witness, lest they sacrifice the utility of the system and value of their seniorage. But so long as they appease the majority, there is little to prevent them from performing what might be called “soft fork attacks” as they please. Remember: implementing segregated witness as a soft fork amounts to burdening the bitcoin ecosystem with a substantial amount of additional techincal debt.

Proof of stake attempts to mitigate this issue by distributing soft-fork/censorship voting power in proportion to a user’s “stake” in the system. How to measure such an abstract quantity as “stake” is of course extremely difficult, given the various axes of participation or stake one might have in a network – for instance how do you compare someone with high net worth but low transaction volume against someone with low net worth but high transaction volume? What about the offering of additional services which complement the network, like exchanges, embedded currencies, and payment channels?

While it might be difficult to quantify “stake”, presumably PoS does a better job than mining power, especially in today’s age of hyper specialized hardware. However, a naiive PoS suffers from the nothing-at-stake problem, and so far our only solution to that is security deposits or “validator bonds”.


Security deposits were introduced to proof of stake by Vitalik to ensure that validators who attempt to double spend can be punished by “slashing” the deposit when evidence is published that the validator double-signed. The first implementation of such approach came in the form of Tendermint, which used a classic Byzantine Fault Tolerant consensus algorithm to order transactions in blocks, where the voting power of a validator in the consensus is proportional to the security deposit. Of course, since validators must eventually be able to unbond and get their money back, introducing the so-called long-range-nothing-at-stake-attack, nodes joining the network for the first time must authenticate the state via extra-protocol means (for instance by asking friends or checking social media), and must keep up with changes to the validator set, lest they fall subject to attack by unbonded validators who have created an alternative chain from way back when they were still bonded.

The problem with this solution is that not everyone can be, or even wants to be, a validator at the same time, as there are practical considerations which limit the number of users in a conensus algorithm at once and being a validator requires significant expertise in maintaining a secure server on a live adversarial network. Thus, the distribution of soft-fork/censorship voting power follows that of the bonds, not of the actual stake in the currency, bringing us back to the situation in proof-of-work where validators are easily able to soft fork attack the network.

That this is a problem with security deposit based PoS was recently pointed out by Zack Hess in the design documentation for his Flying Fox blockchain. Flying Fox is a security deposit based PoS chain with an emphasis on state channels, where most computation happens off chain between the relevent participants and honest behaviour can be enforced by resorting to the blockchain when necessary. Flying Fox’s solution to the soft-fork attack issue is to introduce a different metric for stake, a different “bond”, based on how much money you had locked in state channels, rather than on security deposit. Thus there are effectively two bonds: the small bond, a deposit used for economic security such that it can be slashed if the validator misbehaves, and the big bond, which amounts to the deposit held in payment channels, and which serves as a measure of a user’s actual stake in the system (having a lot bonded in a payment channel and being online to service it is a major committment to the value of the network). The big bond can not be slashed (it does not protect against double spends), but is instead used to select validators, and once selected, the small bond, which can be slashed, is placed by a validator and used to ensure they do not double sign.


Another approach to the problem of adequately representing stake holders in the consensus process (ie. in soft-forks/censorship) is to use delegation: non-validators can delegate their coins to validators, thereby contributing to the security deposit. This is the approach taken by Atom, a scalable multi-asset proof of stake blockchain based on Tendermint. Delegation allows one to receive a portion of the transaction fees paid to the validator, but also puts the funds at risk if the validator double-signs. Validators will thus partake in some kind of “delegate campaign”, where they attempt to convince stakeholders to delegate to them by advertising the reasons why they are the best protector of a private key or the ideal representative of a stakeholder’s values.

Of course, as we know quite well, representational governance doesn’t always work very well, and more or less degrades into “who can run the best campaign”. That said, since this is cryptoland, we can introduce additional constraints which limit the ability of a validator to vote for a soft-fork without the approval of her constituents. Alternatively, reputational markets and the ability to “switch delegates” may be sufficient incentive for delegates to behave honestly or as their constituents would like.


One trouble with miners is their capacity for censorship. Since they determine what goes in the blockchain, they can co-ordinate to prevent certain kinds of transactions. Such behaviour is also known as a soft fork. People often think of a soft-fork as “any update that is backwards compatible”. But it is more illuminating to think of it as explicit censorship of transactions which do not account for the new rules found in the update.

Proof of stake attempts to solve this issue by distributing the voting power across all stake holders, but secure proof of stake, with security deposits, re-introduces the problem once again. The solution the second time around is to use investment in a useful service (like payment channels) as a direct measure of stake, or to allow delegation, such that a representational governance system with low switching costs ensues.

Addendum – How Ethereum Could have Soft-Forked TheDAO

One of the initial proposals for dealing with The DAO fiasco was a softfork which would censor all transactions to The DAO contracts. Of course the issue with this is that it is not known if a transaction will call a particular contract until run time, and if such transactions are rejected, and hence gas is not collected, they open a denial-of-service attack against miners who may have to do considerable computation only to find out the transaction should be censored. However, as discovered by Zack Hess, it is indeed possible to implement this censorship without the DoS vulnerability, at the cost of significant additional complexity. To wit: an additional state tree and transaction log is added to the Ethereum blockchain. Any transaction which ought to be censored will be included in the new transaction log, and have appropriate gas deducted in the new state tree. Thus the current state will result from the combination of state trees. Anyone who has not upgraded will not be aware of this new state tree, and so will perceive incorrect balances. This will appear to them indistinguishable from censorship. Of course, this would make quite the mess of the protocol, but it is an interesting intellectual exercise. A more refined taxonomy of forks is certainly needed – coming soon!

Guide to the Ethereum Virtual Machine

Ethereum is hot. Really hot. It’s worth around a billion dollars and its community is tightly knit, having just successfully executed a contentious hard fork. Serious researchers and serious bankers alike are flocking to the platform in droves. No less so given the catastrophe of the arrogantly named The DAO, which exposed for the world the enormous difficulty of programming Ethereum contracts securely.

I’ve been building in, on, and around Ethereum for over two years now, since long before solidity was a language that could be used by mere mortals.  Actually, it still shouldn’t be used by mere mortals, unless they are well versed in proper solidity coding practices. That said, there is a dearth of documentation about the low-level design of the Ethereum Virtual Machine. Sure, there’s the yellow paper, and various bits of solidity documentation, but both leave something wanting.

Here, I introduce an overarching Guide to the Ethereum Virtual Machine. The guide is hosted on github, with an associated set of simple tools for studying and executing contracts. One of these tools, simple named evm , is particularly useful, as it allows contracts to be deployed and interacted with all from the command line without ever running a full node. Let this page serve as an introduction and index to the various writeups and tooling that are developed to facilitate this initiative, of making the low level design of Ethereum more accessible.


Guide to the EVM:

In depth analysis of the EtherSignal contract:


Many thanks are owed to Rick Dudley for motivating me to write these guides.

Feel free to contact me for consulting or speaking engagements:

ethan at tendermint dot com




Why Bernie Sanders is the best Libertarian candidate in the running

It is every libertarian’s duty to vote for the most socialist President America has seen in decades.


Because the reality of the situation is that the banker-political class has accumulated trillions of dollars through intensive co-operation with decades of government coercion of the populace. In other words, the people who currently hold most of the capital are very bad capitalists – their capital is a phony capital contrived out of fiat money printing, extensive regulations to curb competition, and artificially high barriers to entry.

Implementing a libertarian government under this context is a false libertarianism. It’s like building up a cattle farm and trying to survive on it as a vegan.

What we need at this point is a reset. The markets are too far perturbed from the idealized “free market” by these artificially immense forces that are “too big to fail”. It’s a sort of hysteresis, a dynamical systems term describing how a little push in one direction can require a really large push to get back the other direction.

Now, I think it’s important to establish that government is doomed, regardless. The new technological elite is ever more rapidly replacing critical functions of society, and will in time come to supplant the entire democratic and political infrastructure through a system of open source software projects. But the incumbents in finance and politics want to either restrict this development (think backlash against bittorrent, uber, bitcoin, and other disruptive technologies), or want to cement themselves at the top of it.

What Bernie Sanders is proposing is a massive redistribution of wealth from the banking-political class towards the education of the masses, to effectively expand the scope of the new technological elite, affording us a more democratic and ultimately freer market in the future technopia, as more individuals are able to participate at technical levels. Effectively, whether he knows it or not, his proposals will expedite the process of technology replacing government, by increasing the class of individuals that can contribute to said technology.

Socialist as this redistribution of wealth sounds, it is absolutely critical if America is going to pull it self out of the slum which has ameliorated its middle class through the artificial manipulation of markets by government. It’s like taking one step back (socialism) to take two steps forward (greater population wide STEM competence and more radical technological innovation). Whereas jumping to a libertarian society today might be one step forward now in exchange for two steps back later (when the majority populace is locked in technological incompetence by extended ineligibility to access education and the entrenched fascism of the upper class bad-capitalists).

In other words, Bernie is the best Libertarian candidate out there.

So do the right thing for Liberty. Vote for Bernie.

Interactive Pyethereum Demo

Ethereum development continues to chug along and the python client is now interoperable with the golang and C++ clients. So you can boot up a pyethereum node, download the latest testnet blockchain, and even issue transactions with the API server, but a nice interface for playing with contracts is still lacking. Even with the C++ or golang clients, this can be frustrating, despite the pretty GUI clients. So, let’s take advantage of the python interpreter to interactively play with an ethereum blockchain in pyethereum and get a feel for the ethereum Virtual Machine and the pyethereum API. This will also inadvertently give us a handle on building native ethereum apps on top of pyethereum.

First thing you’ll want to do is install a copy of pyethereum. You can go for the official version, but I’ve got a clone with some debug flags turned on that are helpful (see

git clone
cd pyethereum
sudo pip install -r requirements.txt
sudo python install
cd ..

If you like, you can do your installs in a virtualenv to keep your global python environment clean. For the sake of simplicity, we install globally here.

You’ll also want the serpent library, so we can write contracts in a high-level language

git clone
cd serpent
make && sudo make install
sudo python install
cd ..

Note: we will be using the example contracts provided by Vitalik in serpent/examples. Vitalik also provided an introduction to using pyethereum and serpent back in April, and much of this will be similar to that, but with some more details.

So, now that we have pyethereum and serpent installed, let’s fire up a python interpreter. I use iPython, mostly because it has tab-completion and lets me explore imported packages really easily (and gives access to function definition and associated documentation). In fact, I figured out most of this tutorial by importing different pyethereum modules in iPython and using tab-completion to explore their innards. If you don’t have iPython, a standard python interpreter is fine, just ignore any mention of tab-completion.


The focus of this tutorial will be simple contract writing and execution on the blockchain. We need to import the following packages

from pyethereum import transactions, blocks, processblock, utils
import serpent

Before we get started, let’s set up logging so we can get some useful print statements


Now let’s get started. The first order of business is to create an address for ourselves. We start by creating an elliptic curve private key. Since a private key is any 256-bit number, the easiest thing to do is take the sha3 hash of some phrase. Let’s actually create two addresses, so we can send funds back and forth.

key = utils.sha3("this is an insecure passphrase")
key2 = utils.sha3("37Dhsg17e92dfghKa Th!s i$ mUcH better r920167dfghjiHFGJsbcm")
addr = utils.privtoaddr(key)
addr2 = utils.privtoaddr(key2)

Now we can create a genesis block, and allocate for ourselves some amount of ether

genesis = blocks.genesis({addr:10**18})

Note that while typically the blockchain grows via mining, we will do all our work here in the genesis block, and will not simulate mining or the growth of the chain. This is sufficient for illustrative purposes in the python interpreter environment, but will have subtle differences from a “real” instance of the blockchain.

Now, if you type genesis. and hit tab, you should see all the wonderful functions available to the block. Try exploring some of them. In particular, let’s take a look at our account:


Accounts have four attributes: an incrementing nonce, a balance, code, and storage. Since this account is not a contract, the code and storage attributes should be 0. And since we have yet to make any transactions, the nonce should start at 0. So let’s send some ether from one account to the other. We begin by constructing and signing a transaction:

tx = transactions.Transaction(0, 1000, 1000, addr2, 56789000, "").sign(key)

The form for a transaction is Transaction(nonce, gasprice, startgas, to, value, data). So here, we are sending 56789 wei from addr to addr2, using 1000 gas and a gasprice of 1000. Since we are not calling a contract, the data is empty, and since it’s our first transaction, the nonce is 0. Now, we apply the transaction to the block:

success, _ = processblock.apply_transaction(genesis, tx)

You should see a bunch of debug output, telling you about the transaction and the state of the block before and after applying it. apply_transaction returns a boolean indicating the success of the transaction. The second return value is variable: if the transaction actually creates a contract, it’s the address of that contract; if it calls a contract, it’s the return value; otherwise, it’s an empty string (so we ignore it here).

If we now check on our accounts, we should see the result of the transaction:


Neat, right? Let’s now fire up a simple contract, and see how it works. Vitalik has kindly provided a number of example contracts in the serpent/examples directory. We’ll start with, since it is essentially just a simple key-value store. We could compile it all in one line, but I have included a few intermediate steps so you can see the LLL code and the EVM assembly code that this contract compiles down to:

serpent_code = open('serpent/examples/').read()
byte_code = serpent.compile(serpent_code)

Now, we create a transaction object that will create a new contract with the code we have just compiled, and we apply it to the block

tx2 = transactions.contract(genesis.get_nonce(addr), 10**12, 1000, 0, byte_code).sign(key)
success, contract_addr = processblock.apply_transaction(genesis, tx2)

The debug print should be long and messy. Note that no contract code has been executed here. However, the code has been copied into memory and returned as this is what must be run when the contract is called. This mechanism is useful for including initialization code in the contract, as we shall see later. Now, let’s check on the new address just returned


Since it is a contract, it has a code attribute, namely, the result of our compile. We can cause the code to execute by sending the contract a message! Since it’s a key-value store, it takes two data arguments, a key, and a value:

msg = transactions.Transaction(genesis.get_nonce(addr), 1000, 10000, contract_addr, 0, serpent.encode_datalist(['myname', 69])).sign(key)
s, r = processblock.apply_transaction(genesis, msg)

Note that the data must be properly encoded when sent to a contract, and serpent.encode_datalist() takes care of that for us. So now, finally, we have some interesting debug output, since this time we have actually executed code in the ethereum virtual machine. See if you can follow the trace. We load the first data argument, ‘myname’, which is a string, but since EVM has no concept of strings, it converts the ASCII bytes into a number (in this case 120368310349157. You can get the same result with int('myname'.encode('hex'), 16)). Next we check if there is any value at that storage location. In this case, there isn’t. So we load the next data argument (recall that data arguments come in 32-byte chunks), in this case, 69, and we store it at the location corresponding to the key (‘myname’). Now, let’s check the contract address again:


Wooohooo! Our value has been successfully stored in the contract’s storage! We can retrieve it programmatically with

genesis.get_storage_data(contract_addr, 'myname')

Sure enough, it returns 69. Now if this isn’t fun times, I don’t know what is! Let’s store another value:

msg = transactions.Transaction(genesis.get_nonce(addr), 1000, 10000, contract_addr, 0, serpent.encode_datalist(['is-this-fun', 'yes-it-is'])).sign(key)
s, r = processblock.apply_transaction(genesis, msg)
genesis.get_storage_data(contract_addr, 'is-this-fun')

Again, we see that the value was stored. Though, of course, it is returned as an integer. If you want to see the string:

utils.int_to_big_endian(genesis.get_storage_data(contract_addr, 'is-this-fun'))

Beautiful. Now, let’s try and overwrite one of the values:

msg = transactions.Transaction(genesis.get_nonce(addr), 1000, 10000, contract_addr, 0, serpent.encode_datalist(['is-this-fun', 'not-really'])).sign(key)
s, r = processblock.apply_transaction(genesis, msg)
genesis.get_storage_data(contract_addr, 'is-this-fun')

Notice how the value was not over written! This is explicit in the contract’s source code, and we see now that it works as intended. Good stuff.

Let’s try a slightly more advanced contract,, which introduces some important tools. Take a look:

    # Initial: premine 1000000 units to creator[msg.sender] = 1000000
    # If a message with one item is sent, that's a balance query
    if msg.datasize == 1:
        addr =[0]
    # If a message with two items [to, value] are sent, that's a transfer request
        from = msg.sender
        fromvalue =[from]
        to =[0]
        value =[1]
        if fromvalue >= value:
  [from] = fromvalue - value
  [to] =[to] + value

The code should be relatively self explanatory – it’s a subcurrency, and users can send that subcurrency back and forth between accounts, but only if they have enough. An important feature is the init/code segmentation. When the contract is created, everything in the init block is run, and the code block is copied into memory and returned. When the contract is called, the code block will execute.

The API for this contract is simple. If one argument is passed, it’s a balance query. You might think this is kind of silly, since we can just query the blockchain directly, though this way other contracts can use this contract’s API as well. If two arguments are passed, the first is the address we are sending funds to, and the second is the amount to send. So let’s compile the code, create the contract, and apply it to the blockchain:

code = serpent.compile(open('serpent/examples/').read())
tx = transactions.contract(genesis.get_nonce(addr), 1000, 10000, 0, code).sign(key)
s, caddr = processblock.apply_transaction(genesis, tx)

Looking at the debug output, we see that there is some VM execution here, corresponding to the init block, where a certain amount of the subcurrency is stored at our address. We can confirm this

genesis.get_storage_data(caddr, addr)

Ok, so we have 1000000 subcurrency units. What about our other address?

genesis.get_storage_data(caddr, addr2)

Indeed, it has none. So let’s try and send 100 units from this second address to the other, and ensure it fails:

tx = transactions.Transaction(genesis.get_nonce(addr2), 1000, 10000, caddr, 0, serpent.encode_datalist([addr, 100])).sign(key2)
s, r = processblock.apply_transaction(genesis, tx)

And we should see that nothing has changed. So now let’s send funds the other way around:

tx = transactions.Transaction(genesis.get_nonce(addr), 1000, 10000, caddr, 0, serpent.encode_datalist([addr2, 100])).sign(key)
s, r = processblock.apply_transaction(genesis, tx)

Tada! We see that now there are two addresses with funds in this subcurrency contract. Scrolling through the debug output can be painful, because of all the memory print outs. We can disable the print memory flag in if desired, but let’s leave it. Finally, let’s try sending more funds then we have from addr2:

tx = transactions.Transaction(genesis.get_nonce(addr2), 1000, 10000, caddr, 0, serpent.encode_datalist([addr, 500])).sign(key2)
s, r = processblock.apply_transaction(genesis, tx)

Of course, nothing changed, since we didn’t have enough funds. So our contract works as expected! How about that – a brand new crypto currency in just a few lines of code!

From here, there are many things to try. You can explore the other example contracts in this way to get a feel for how they work, and you can test your own contracts as well. The most common problem I have had is not supplying enough gas in a transaction – so look out for that.

I should note, that pyethereum now includes a nice module for testing contracts, so you don’t need as much boiler-plate as I provided here. Check it out at:

In another installment, we’ll return to the state trie in pyethereum using real addresses and accounts, and flesh out more clearly and hopefully in fewer words than before.

Understanding the ethereum trie

The other day I finally got around to reading the entire ethereum yellow paper and to figuring out how the modified Merkle-patricia-tree (trie) works. So let’s go through a brief but hopefully complete explanation of the trie, using examples.

A block in the ethereum blockchain consists of a header, a list of transactions, and a list of uncle blocks. Included in the header is a transaction root hash, which is used to validate the list of transactions. While transactions are sent over the wire from peer to peer as a simple list, they must be assembled into a special data structure called a trie to compute the root hash. Note that this data structure is not needed except to verify blocks (and hence of course to mine them), and can technically be discarded once a block has been verified. However, it is implied that the transaction lists are stored locally in a trie, and serialized to lists to send to clients requesting the blockchain. Of course, that client will then reconstruct the transaction list trie for each block to verify the root hash. Note that RLP (recursive length prefix encoding), ethereum’s home-rolled encoding system, is used to encode all entries in the trie.

A trie is also known as a radix tree, and the ethereum implementation introduces a couple modifications to boost efficiency. In a normal radix tree, a key is the actual path taken through the tree to get to the corresponding value. That is, beginning from the root node of the tree, each character in the key tells you which child node to follow to get to the corresponding value, where the values are stored in the leaf nodes that terminate every path through the tree. Supposing the keys come from an alphabet containing N characters, each node in the tree can have up to N children, and the maximum depth of the tree is the maximum length of a key.

Radix trees are nice because they allow keys that begin with the same sequence of characters to have values that are closer together in the tree. There are also no key collisions in a trie, like there might be in hash-tables. They can, however, be rather inefficient, like when you have a long key where no other key shares a common prefix. Then you have to travel (and store) a considerable number of nodes in the tree to get to the value, despite there being no other values along the path.

The ethereum implementation of radix trees introduces a number of improvements. First, to make the tree cryptographically secure, each node is referenced by its hash, which in current implementations are used for look-up in a leveldb database. With this scheme, the root node becomes a cryptographic fingerprint of the entire data structure (hence, Merkle). Second, a number of node ‘types’ are introduced to improve efficiency. There is the blank node, which is simply empty, and the standard leaf node, which is a simple list of [key, value]. Then there are extension nodes, which are also simple [key, value] lists, but where value is a hash of some other node. The hash can be used to look-up that node in the database. Finally, there are branch nodes, which are lists of length 17. The first 16 elements correspond to the 16 possible hex characters in a key, and the final element holds a value if there is a [key, value] pair where the key ends at the branch node. If you don’t get it yet, don’t worry, no one does😀. We will work through examples to make it all clear.

One more important thing is a special hex-prefix (HP) encoding used for keys. As mentioned, the alphabet is hex, so there are 16 possible children for each node. Since there are two kinds of [key, value] nodes (leaf and extension), a special ‘terminator’ flag is used to denote which type the key refers to. If the terminator flag is on, the key refers to a leaf node, and the corresponding value is the value for that key. If it’s off, then the value is a hash to be used to look-up the corresponding node in the db. HP also encodes whether or not the key is of odd or even length. Finally, we note that a single hex character, or 4 bit binary number, is known as a nibble.

The HP specification is rather simple. A nibble is appended to the key that encodes both the terminator status and parity. The lowest significant bit in the nibble encodes parity, while the next lowest encodes terminator status. If the key was in fact even, then we add another nibble, of value 0, to maintain overall evenness (so we can properly represent in bytes).

Ok. So this all sounds fine and dandy, and you probably read about it here or here, or if you’re quite brave, here, but let’s get down and dirty with some python examples. I’ve set up a little repo on github that you can clone and follow along with.

git clone

Basically I just grabbed the necessary files from the pyethereum repo (,,,, and wrote a bunch of exercises as short python scripts that you can try out. I also added some print statements to help you see what’s going on in, though due to recursion, this can get messy, so there’s a flag at the top of allowing you to turn printing on/off. Please feel free to improve the print statements and send a pull-request! You should be in the trie directory after cloning, and run your scripts with python exercises/, where A is the exercise number. So let’s start with

In, we initialize a trie with a blank root, and add a single entry:

state = trie.Trie('triedb', trie.BLANK_ROOT)
state.update('\x01\x01\x02', rlp.encode(['hello']))
print state.root_hash.encode('hex')

Here, we’re using '\x01\x01\x02' as the key and 'hello' as the value. The key should be a string (max 32 bytes, typically a big-endian integer or an address), and the value an rlp encoding of arbitrary data. Note we could have used something simpler, like 'dog', as our key, but let’s keep it real with raw bytes. We can follow through the code in to see what happens under the hood. Basically, in this case, since we start with a blank node, creates a new leaf node (adding the terminator flag to the key), rlp encodes it, takes the hash, and stores [hash, rlp(node)] in the database. The print statement should display the hash, which we can use from now on as the root hash for our trie. Finally, for completeness, we look at the HP encoding of the key:

k, v = state.root_node
print 'root node:', [k, v]
print 'hp encoded key, in hex', k.encode('hex')

The output of is

root hash 15da97c42b7ed2e1c0c8dab6a6d7e3d9dc0a75580bbc4f1f29c33996d1415dcc
root node: [' \x01\x01\x02', '\xc6\x85hello']
hp encoded key, in hex: 20010102

Note the final 6 nibbles are the key we used, 010102, while the first two give us the HP encoding. The first nibble tells us that this is a terminator node (since it would be 10 in binary, so the second least significant bit is on), and since the key was even length (least significant bit is 0), we add a second 0 nibble.

Moving on to, we initialize a trie that starts with the previous hash:

state = trie.Trie('triedb', '15da97c42b7ed2e1c0c8dab6a6d7e3d9dc0a75580bbc4f1f29c33996d1415dcc'.decode('hex'))
print state.root_node

The print statement should give us the [key, value] pair we previously stored. Great. Let’s add some more entries. We’re going to try this a few different ways, so we can clearly see the different possibilities. We’ll use multiple ex2 python files, initializing the trie from the original hash each time. First, let’s make an entry with the same key we already used but a different value. Since the new value will lead to a new hash, we will have two tries, referenced by two different hashes, both starting with the same key (the rest of

state.update('\x01\x01\x02', rlp.encode(['hellothere']))
print state.root_hash.encode('hex')
print state.root_node

The output for is:
[‘ \x01\x01\x02’, ‘\xcb\x8ahellothere’]

So that’s not all that interesting, but it’s nice that we didn’t overwrite the original entry, and can still access both using their respective hashes. Now, let’s add an entry that use’s the same key but with a different final nibble (

state.update('\x01\x01\x03', rlp.encode(['hellothere']))
print 'root hash:', state.root_hash.encode('hex')
k, v = state.root_node
print 'root node:', [k, v]
print 'hp encoded key, in hex:', k.encode('hex')

This print 'root node' statement should return something mostly unintelligible. That’s because it’s giving us a [key, value] node where the key is the common prefix from our two keys ([0,1,0,1,0]), encoded using HP to include a non-terminator flag and an indication that the key is odd-length, and the value is the hash of the rlp encoding of the node we’re interested in. That is, it’s an extension node. We can use the hash to look up the node in the database:

print state._get_node_type(state.root_node) == trie.NODE_TYPE_EXTENSION
common_prefix_key, node_hash = state.root_node
print state._decode_to_node(node_hash)
print state._get_node_type(state._decode_to_node(node_hash)) == trie.NODE_TYPE_BRANCH

And the output for

root hash: b5e187f15f1a250e51a78561e29ccfc0a7f48e06d19ce02f98dd61159e81f71d
root node: ['\x10\x10\x10', '"\x01\xab\x83u\x15o\'\xf7T-h\xde\x94K/\xba\xa3[\x83l\x94\xe7\xb3\x8a\xcf\n\nt\xbb\xef\xd9']
hp encoded key, in hex: 101010
['', '', [' ', '\xc6\x85hello'], [' ', '\xcb\x8ahellothere'], '', '', '', '', '', '', '', '', '', '', '', '', '']

This result is rather interesting. What we have here is a branch node, a list with 17 entries. Note the difference in our original keys: they both start with [0,1,0,1,0], and one ends in 2 while the other ends in 3. So, when we add the new entry (key ending in 3), the node that previously held the key ending in 2 is replaced with a branch node whose key is the HP encoded common prefix of the two keys. The branch node is stored as a [key, value] extension node, where key is the HP encoded common prefix and value is the hash of the node, which can be used to look-up the branch node that it points to. The entry at index 2 of this branch node is the original node with key ending in 2 (‘hello’), while the entry at index 3 is the new node (‘hellothere’). Since both keys are only one nibble longer than the key for the branch node itself, the final nibble is encoded implicitly by the position of the nodes in the branch node. And since that exhausts all the characters in the keys, these nodes are stored with empty keys in the branch node.

You’ll note I added a couple print statements to verify that these nodes are in fact what I say they are – extension and branch nodes, respectively. Also, note, that there is a general rule here for storing nodes in branch nodes: if the rlp encoding of the node is less than 32 bytes, the node is stored directly in an element of the branch node. If the rlp encoding is longer than 32, then a hash of the node is stored in the branch node, which can be used to look-up the node of interest in the db.

Ok, so that was pretty cool. Let’s do it again but with a key equal to the first few nibbles of our original key (

state.update('\x01\x01', rlp.encode(['hellothere']))

Again, we see that this results in the creation of a branch node, but something different has happened. The branch node corresponds to the key ‘\x01\x01’, but there is also a value with that key (‘hellothere’). Hence, that value is placed in the final (17th) position of the branch node. The other entry, with key ‘\x01\x01\x02’, is placed in the position corresponding to the next nibble in its key, in this case, 0. Since it’s key hasn’t been fully exhausted, we store the leftover nibbles (in this case, just ‘2’) in the key position for the node. Hence the output:

[['2', '\xc6\x85hello'], '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '\xcb\x8ahellothere']

Make sense? Let’s do one final component of exercise 2 ( Here, we add a new entry with a key that is identical to the original key, but has an additional two nibbles:

state.update('\x01\x01\x02\x57', rlp.encode(['hellothere']))

In this case, the opposite of what we just saw happens! The original entry’s value is stored at the final position of the branch node, where the key for the branch node is the key for that value (‘\x01\x01\x02’). The second entry is stored at the position of it’s next nibble (5), with a key equal to the remaining nibbles (just 7):

['', '', '', '', '', ['7', '\xcb\x8ahellothere'], '', '', '', '', '', '', '', '', '', '', '\xc6\x85hello']

Tada! Try playing around a bit to make sure you understand what’s going on here. Nodes are stored in the database according to the hash of their rlp encoding. Once a node is retrieved, key’s are used to travel a path through a further series of nodes (which may involve more hash lookups) to reach the final value. Of course, we’ve only used two entries in each of these examples to keep things simple, but that has been sufficient to expose the basic mechanic of the trie. We could add more entries to fill up the branch node, but since we already understand how that works, let’s move on to something more complicated. In exercise 3, we will add a third entry, which shares a common prefix with the second entry. This one’s a little longer, but the result is totally awesome (

state = trie.Trie('triedb', '15da97c42b7ed2e1c0c8dab6a6d7e3d9dc0a75580bbc4f1f29c33996d1415dcc'.decode('hex'))
print state.root_hash.encode('hex')
print state.root_node
print ''
state.update('\x01\x01\x02\x55', rlp.encode(['hellothere']))
print 'root hash:', state.root_hash.encode('hex')
print 'root node:', state.root_node
print 'branch node it points to:', state._decode_to_node(state.root_node[1])
print ''

Nothing new yet. Initialize from original hash, add a new node with key '\x01\x01\x02\x55'. Creates a branch node and points to it with a hash. We know this. Now the fun stuff:

state.update('\x01\x01\x02\x57', rlp.encode(['jimbojones']))
print 'root hash:', state.root_hash.encode('hex')
print 'root node:', state.root_node
branch_node = state._decode_to_node(state.root_node[1])
print 'branch node it points to:', branch_node

We’re doing the same thing – add a new node, this time with key '\x01\x01\x02\x57' and value 'jimbojones'. But now, in our branch node, where there used to be a node with value 'hellothere' (ie. at index 5), there is a messy ole hash! What do we do with hashes in tries? We use em to look up more nodes, of course!

next_hash = branch_node[5]
print 'hash stored in branch node:', next_hash.encode('hex')
print 'branch node it points to:', state._decode_to_node(next_hash)

And the output:

root hash: 17fe8af9c6e73de00ed5fd45d07e88b0c852da5dd4ee43870a26c39fc0ec6fb3
root node: ['\x00\x01\x01\x02', '\r\xca6X\xe5T\xd0\xbd\xf6\xd7\x19@\xd1E\t\x8ehW\x03\x8a\xbd\xa3\xb2\x92!\xae{2\x1bp\x06\xbb']
branch node it points to: ['', '', '', '', '', ['5', '\xcb\x8ahellothere'], '', '', '', '', '', '', '', '', '', '', '\xc6\x85hello']

root hash: fcb2e3098029e816b04d99d7e1bba22d7b77336f9fe8604f2adfb04bcf04a727
root node: ['\x00\x01\x01\x02', '\xd5/\xaf\x1f\xdeO!u>&3h_+\xac?\xf1\xf3*\xb7)3\xec\xe9\xd5\x9f2\xcaoc\x95m']
branch node it points to: ['', '', '', '', '', '\x00&\x15\xb7\xc4\x05\xf6\xf3F2\x9a(N\x8f\xb2H\xe75\xcf\xfa\x89C-\xab\xa2\x9eV\xe4\x14\xdfl0', '', '', '', '', '', '', '', '', '', '', '\xc6\x85hello']
hash stored in branch node: 002615b7c405f6f346329a284e8fb248e735cffa89432daba29e56e414df6c30
branch node it points to: ['', '', '', '', '', [' ', '\xcb\x8ahellothere'], '', [' ', '\xcb\x8ajimbojones'], '', '', '', '', '', '', '', '', '']

Tada! So this hash, which corresponds to key [0,1,0,1,0,2,5], points to another branch node which holds our values 'hellothere' and 'jimbojones' at the appropriate positions. I recommend experimenting a little further by adding some new entries, specifically, try filling in the final branch node some more, including the last position.

Ok! So this has been pretty cool. Hopefully by now you have a pretty solid understanding of how the trie works, the HP encoding, the different node types, and how the nodes are connected and refer to each other. As a final exercise, let’s do some look-ups.

state = trie.Trie('triedb', 'b5e187f15f1a250e51a78561e29ccfc0a7f48e06d19ce02f98dd61159e81f71d'.decode('hex'))
print 'using root hash from ex2b'
print rlp.decode(state.get('\x01\x01\x03'))
print ''
state = trie.Trie('triedb', 'fcb2e3098029e816b04d99d7e1bba22d7b77336f9fe8604f2adfb04bcf04a727'.decode('hex'))
print 'using root hash from ex3'
print rlp.decode(state.get('\x01\x01\x02'))
print rlp.decode(state.get('\x01\x01\x02\x55'))
print rlp.decode(state.get('\x01\x01\x02\x57'))

You should see the values we stored in previous exercises.

And that’s that! Now, you might wonder, “so, how is all this trie stuff actually used in ethereum?” Great question. And my repository does not have the solutions. But if you clone the official pyethereum repo, and do a quick grep -r 'Trie' . , it should clue you in. What we find is that a trie is used in two key places: to encode transaction lists in a block, and to encode the state of a block. For transactions, the keys are big-endian integers representing the transaction count in the current block. For the state trie, the keys are ethereum addresses. It is essential for any full node to maintain the state trie, as it must be used to verify new transactions (since contract data must be referenced). Unlike bitcoin, however, there is no need to store old transactions for verification purposes, since there is a state database. So technically the transaction tries don’t need to be stored. Of course, if no one keeps them around, then no one will ever be able to verify from the genesis block up to the current state again, so it makes sense to hang on to them.

And there you have it folks. We have achieved an understanding of the ethereum trie. Now go forth, and trie it!

Getting setup with nginx, php, and postgresql on ubuntu

So I’ve been working off a BeagleBone Black the last couple months as my primary webserver and learning arena.  It’s been fun getting to know linux a little bitter after all these years on a Mac.  And while the BBB has served me well in learning to host my own server and site, it’s ARM processor has been somewhat restrictive.  I wasn’t able to setup MongoDB, and I couldn’t find a version of php5.5.  There may be solutions out there, and if I was dedicated enough maybe I could even concoct them, but I figure, why not just move the whole thing to the cloud anyway?

My main web project is a wiki-like platform whose goal is a better user experience than the likes of Wikipedia. Interaction with information on the internet could be made many times superior than it is today, and that’s what I’m striving for with YanlJ. YanlJ is currently built with PHP against a postgresql database on the BBB, being served up by nginX. Let’s move the whole stack to the cloud.

I had already set up an account with Amazon, and I’ve got ssh access to my EC2 instance. Great. Let’s grab nginx and build from source so we can change the server response header.

curl -O
tar zxvf nginx-1.7.1.tar.gz

There are a couple files we have to change.  First,

vi src/http/ngx_http_header_filter_module.c

Find the line that reads:

static char ngx_http_server_string[] = "Server: nginx" CRLF;

and change the ‘nginx’ to whatever you want:

static char ngx_http_server_string[] = "Server: YanlJ" CRLF;


vi src/core/nginx.h

find the line that reads

#define NGINX_VER          "nginx/" NGINX_VERSION

and again change the ‘nginx’ to whatever your heart desires. Finally, feel free to change the NGINX_VERSION number to something a tad more magical.
Now, we’re nearly ready to build.  Nginx needs PCRE (Perl Compatible Regular Expression Library) for url rewrites, so let’s grab that first.

sudo apt-get update
sudo apt-get install libpcre3 libpcre3-dev

We’re going to want to build nginx with https compatibility, so we better grab the openSSL developer library as well.

sudo apt-get install libssl-dev

Now, we’re ready to build:

./configure --with-http_ssl_module
sudo make install

Since I didn’t specify, nginx was installed in /usr/local. Let’s boot up the server:

sudo /usr/local/nginx/sbin/nginx

If we visit the EC2 ip address in the browser now, we should see the nginx welcome page. Of course, this assumes we’ve already forwarded port 80 in the Amazon EC2 console. We can shut down the server with

sudo /usr/local/nginx/sbin/nginx -s stop

Right. Moving along now to PHP. Getting nginx and PHP to work together can be a little tricky at first, so let’s see what we can do. Start by grabbing the necessary PHP packages:

sudo apt-get install php5-common php5-cli php5-fpm php5-pgsql

Easy. Note we need the last package so php will be able to speak to the postgresql database. But first we have to edit the nginx configuration files to talk to php.

cd /usr/local/nginx/conf
vi nginx.conf

This is where all the server configuration is. There are some commented out lines that you can supposedly uncomment to get php functionality, however that will not work for us, since we went with the fast processes manager for php (php5-fpm), rather than just grabbing the cgi. Here’s the important part of my config setup, for testing purposes:

    server {
        listen       80;
        server_name  localhost;

        root /home/ubuntu/web_test;
        index index.php index.html;

        location / {

        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root   html;

        location ~ \.php$ {
                fastcgi_split_path_info ^(.+\.php)(/.+)$;
                fastcgi_pass unix:/var/run/php5-fpm.sock;
                fastcgi_index index.php;
                include fastcgi.conf;

        try_files $uri $uri.php $uri.html =404;


Of course, this implies that we have a directory at /home/ubuntu/web_test, inside of which is an index.php file. My index.php looks like this:

<?php echo phpinfo(); ?>

Restart the nginx server (assuming we’re still in the nginx dir) with:

sudo sbin/nginx -s reload

And now if we point our browser at the server, we see a pile of information about our php setup. Tada!

Unfortunately, if it did not work, you may have some hunting to do. Go through the instructions again carefully and consult other instructions on the web. Getting php and nginx to work properly the first time can be a bit of a pain!

Now, we’ve got our nginx server serving up php pages. Lets install postgresql so we can run our data driven application:

sudo apt-get install postgresql

We’ll use a dedicated user for managing the database (this user was setup during installation of postgresql).

sudo -u postgres psql postgres
\password postgres

The psql command starts the postgres interactive shell, in this case to modify the user named postgres, and to change its password. Exit the shell with \quit

Now, we’re ready to clone YanlJ!

git clone
cd YanlJ

YanlJ comes with setup scripts for creating the database, but she uses authorization details in a file `auth.txt`, which stores the username and password for accessing and modifying the database over the database server (from PHP). Let’s create that file

echo usr_name > auth.txt
echo some_pwd >> auth.txt

And run the setup script:

bash setup/

The setup script should create the database ('wikidb'), and set up all the appropriate tables, granting the user in 'auth.txt' permission to modify. To learn about how to play around with the database, see You can pull up an interactive database shell, and list all current tables like so:

sudo -u postgres psql -d wikidb

And that’s that! The database is setup for using YanlJ. The last thing we have to do is some final nginx configuration. Getting back to the nginx conf folder, the best way to do this is to create a new folder sites-enabled, keep configuration files for each site in there, and include a reference to them in the main nginx.conf.

cd /usr/var/nginx/conf
mkdir sites-enabled
vi sites-enabled/yanlj.conf

Let’s take the server block we used to test our setup before and copy it into yanlj.conf, making sure to change the root path to point at the yanlj directory instead of web_test. Then, in nginx.conf, simply add this line where the server block used to be:

include sites-enabled/yanlj.conf;

Now reload nginx. When we point the browser, we should see YanlJ! Of course, YanlJ has been written to only be accessible over https. We can comment this out, or, add another server block to our yanlj.conf file for https connections. For this we’ll need ssl certificates:

mkdir ~/certs
cd ~/certs
sudo openssl genrsa -out server.key 1024
sudo openssl req -new -key server.key -out server.csr
sudo openssl x509 -req -days 365 -in server.csr -signkey server.key -out server.crt

Now, let’s add the https server block to yanlj.conf:

server {
  listen 443;
  server_name localhost;

  ssl                  on;
  ssl_certificate      /home/ubuntu/certs/server.crt;
  ssl_certificate_key  home/ubuntu/certs/server.key;

  keepalive_timeout    70;

  ssl_protocols  SSLv2 SSLv3 TLSv1;

  root /home/ubuntu/programming/YanlJ;
  index index.php index.html;

      location / {

        location ~ \.php$ {
                fastcgi_split_path_info ^(.+\.php)(/.+)$;
                fastcgi_pass unix:/var/run/php5-fpm.sock;
                fastcgi_index index.php;
                include fastcgi.conf;

  try_files $uri $uri.php $uri.html =404;

Reload nginx one more time

sudo /usr/local/nginx/sbin/nginx -s reload

and we’re done! Of course, since we haven’t registered with a certificate authority, the browser will warn that the site is not trusted, but just add the exception and move along.

We did it. We moved the stack over from BBB to AWS EC2. Have fun playing with YanlJ, and please, if you’re a developer, feel free to make pull requests – she needs plenty of work!

And so it begins!

I’ve blogged before:  A lot of it is semi-nonsensical ramblings, poetry, or existentialist musings.  I’d like to be a bit more focused and technical here, or at least better organized about where to find the technical stuff and where to find the art. Prepare for recounts of my journeys into the myriad worlds of coding, cryptography, culture, and the cosmos at large.  Hope you learn something!