BigchainDB has taken a big data distributed database and given it blockchain characteristics. The result: a decentralised database capable of one million writes per second throughput, storing petabytes of data and sub-second latency.

The BigchainDB white paper states that its "permissioning system enables configurations ranging from private enterprise blockchain databases to open, public blockchain databases. BigchainDB is complementary to decentralised processing platforms like Ethereum, and decentralised file systems like InterPlanetary File System (IPFS)".

Large databases are designed to scale linearly; the more nodes you add the better the performance you get. Bitcoin grows well until the performance plummets at about 8000 -10,000 nodes. BigchainDB leverages database replication, where there are three or more copies of each piece of data, spread across the network.

Bruce Pon, co-founder of BigchainDB, told IBTimes: "Distributed databases have their own consensus algorithms underneath to make sure that the transactions are ordered properly and everybody has the right copy of them. They do certain things like have a replication factor so there are, like, three copies across the entire network and then if one falls over then they make sure that gets replicated to another node very quickly.

"Then we have sharding. Sharding conserves storage, where each node stores a portion of the overall amount to be stored. For reliability a few copies of each piece of data is made. This is called fractional replication. Ultimately we were limited by the I/O. In our experiments, the 10 Gbps limit of the network was what held back higher performance."

Pon explained that BigchainDB's design meant basically to "get out of the way" and allow the underlying database architecture to improve performance and capacity as nodes are added. "On top of that we built a federation where each node votes on every transaction, which is a layer on top of a distributed database, so that every transaction has to have a certain quorum of votes for a transaction to pass." Such a federated model could generally have a minimum of between five up to 60 or so validating nodes, he said.

"The second innovation is what we call pipelining where we laid out all the transactions in a row and then we validate them after. So you could write as many transactions as you want and then very miliseconds afterwards it gets validated. That allows you to write as fast as you can and validate afterwards.

He said that if there happens to be a bad transaction in the pipe that you have laid, it can be taken out the block, pushed forward and laid back down on the track and then revalidated. "In Bitcoin there's a validation at the point of entry on the ledger, so you have train plus track at the same time and so the train and track gets built at the same time. We did it so that you build the track and then the train comes over a little bit later," said Pon.

Big data

BigchainDB grew out of digital art ownership platform Ascribe, where co-founders Bruce Pon, Trent McConaghy and Masha McConaghy began to encounter scalability problems. Pon explained that McConaghy comes from a machine learning background and that the team drew upon experience of big data databases and protocol engineering to design BigchainDB.

They began by doing a complete analysis of about 100 big data databases. "We started with one distributed database, it's called RethinkDB and it's one of the most popular databases that nobody has heard of, but it's extremely powerful.

"But if Cassandra, MongoDB, Oracle or MySQL wanted a blockchainified database, there are technical adjustments that can be made to make it happen. The conceptual approach we used for building BigchainDB on top of RethinkDB can be applied. Ideally, within five years there are multiple blockchain databases to choose from - this makes it easier for enterprise to adopt blockchains. We've focused on implementing BigchainDB on RethinkDB for this first phase - they've been great partners."

Since releasing its white paper BigchainDB said they have had hundreds of enquiries. Pon added: "It seems like we filled a gap in the blockchain ecosystem. We play very well with Ethereum, with Chain, Eris, any of these platforms out there.

"BigchainDB obviates the need to have a data stored within the Ethereum because it's inherently inefficient and requires additional code to make the data queryable. The ideal stack is to have an Ethereum smart contract layer running in their Virtual machine, with BigchainDB as their blockchain database that can house tokens for ether, tickets, serials of physical goods or casino chips. Whatever you want to track in your system in a decentralised way."