Building a Blockchain with Python πβοΈ
Always asked yourself how blockchain works? Well, ask no more. Weβre going to implement a simple version of a Blockchain written in Python interactively. π It means that you will be able to run the code straight from your browser.
With just enough theory and a high focus on practice.
Note that if you're reading this article in AMP mode or from mobile you won't be able to run Python code from your browser, but you can still see the code samples and illustrations.
The purpose here is to give you βenoughβ theory so it doesnβt get boring and so we take baby steps towards learning about blockchains.
βοΈ What is a blockchain?
Blockchain is the technology that powers Bitcoin, Ethereum, and Crypto. Itβs a public ledger that guarantees information is decentralized.
For Bitcoin it means transactions are records and that each record is compliant and secure in the ledger.
It literally manages a chain of blocks, and each block contains many transactions.
Blocks can be identified by two means: height or hash. The hash is long and complex while height is quite easy since it's just the block number in the chain.
In our example above it's fine to say that the block of height 1 is also the block of hash 6b86
.
The first block ever is called the Genesis Block π±, it's a special block with a single transaction (50 BTC to Satoshi Nakamoto).
What makes it special is that it's not tied to any real previous block, that's the only and unique exception to the Blockchain.
Since the blockchain is public and open, you can check and see the genesis block here (or any other block really).
I'm going to abstract how transactions are validated, so we can focus on the blockchain itself. We're assuming that every transaction is legit and needs no more than "from who", "to who" and "how much".
If you're interested in knowing more, I'll be happy to make a part two, and implement transaction details, validation, and some cryptography (yet again, interactively!).
Here're the entities Block
and Transaction
we're creating for the scope of this article:
from dataclasses import dataclass
@dataclass
class Transaction:
sender: str
receiver: str
amount: float
@dataclass
class Block:
version: int
timestamp: float
nonce: int
difficulty: int
previous_hash: str
transactions: list[Transaction]
βοΈ What is Bitcoin mining?
Bitcoin mining is what keeps the network consistent and working. That's what allows people to send and receive bitcoins.
Let's analyze first how the economy works with banks.
Inevitably there's a central authority (the bank) that makes things happen:
- Validates you have enough balance to send money;
- Moves money between accounts;
- Attempt to detect frauds and block them;
Since Bitcoin proposes to be completely decentralized (thus we can't have a single central authority) it needs to create a way so people collaborate and replace the need for banks.
In simple words: "Mining" is the process that allows people to use bitcoin as a decentralized currency and that incentivizes people to collaborate.
Instead of a single entity, we have many different entities that communicate between themselves, they're in charge of checking whether John has enough balance, and if he does, they're in charge of moving the money straight to Mary.
π Why Blockchain and Bitcoin are secure?
Ok, let's be honest here. If you gave me $10,000 to deliver it to your friend
would you trust me?
Include in this question many different people I don't know, and an even bigger amount of money and this question gets even more obvious.
If a malicious miner is trying to insert invalid blocks or even remove a block from the Blockchain, he's rejected by all due to the consensus of the network! Attempting to modify the Blockchain is a waste of time and effort.
That's what makes Bitcoin and Blockchain an awesome combination. It forces miners to be honest and (implicitly) punishes those who aren't.
π° Why miners help the Blockchain
But still, how the hell can you get people to contribute? I mean... Why would you do all this work? Why people would want to help?
For a single reason:
Money!
And the money comes from 2 sources:
- Bitcoin mining rewards
- Transaction fees
Every mined block automatically rewards the miner with some bitcoin. The number of Bitcoins earned decreases over time.
This process is called "Bitcoin Halving" and every ~4 years rewards are cut by half.
π Bitcoin Halving Dates
Year | Block Height | Block reward |
---|---|---|
2009 | 1 | 50 BTC |
2012 | 210,000 | 25 BTC |
2016 | 420,000 | 12.5 BTC |
π© 2020 (We're here) | 630,000 | 6.25 BTC |
2024 | 840,000 | 3.125 BTC |
It means that (as of February 2022) miners earn 6.25 BTC for every mined block and that by 2024 they will be making only 3.125 BTC.
Note that rewards are created from nothing, analogous to some central authority printing money.
This process goes on until the rewards become 0 (around the year 2140). We say Bitcoin is a deflationary currency because it has a maximum limit of bitcoins that can be generated (a total of 21 million) enforced by its algorithm.
π± Transaction fees
Transaction fees are another incentive for miners to do their job. For every transaction, you include some extra amount so the miner can take it as his fee.
Take as an example John and Mary:
- John wants to send 0.050 BTC to Mary
- John sends 0.051 BTC where only 0.050 BTC is addressed to Mary
- A miner receives John's transaction and notices 0.001 BTC is unaddressed
- The miner takes 0.001 BTC as his fee and proceeds to transfer 0.050 BTC to Mary
- If the miner mines it, he makes an additional 6.25 BTC as a reward
πβοΈ Proof of Work with Python Example
Enough theory already. Let's see some code.
What deserves your attention from our Entities here is Block's nonce
, difficulty
, and previous_hash
properties since most of the Transaction
complexity will be abstracted in this article.
from dataclasses import dataclass
@dataclass
class Transaction:
sender: str
receiver: str
amount: float
@dataclass
class Block:
version: int
timestamp: float
nonce: int
difficulty: int
previous_hash: str
transactions: list[Transaction]
Nonce
means N
umber only used once
and that's the magic number every computer is fighting to find. They pick any transactions they want from a pool and then compete to decide who finds the nonce, so they can mine the block, and receive the reward of doing so.
Difficulty
is the value defined by the network that makes it harder (or easier under certain circumstances) to find a valid nonce
.
Previous Hash
simply points to the previous block, so you can't move blocks out of order, making it a chain of sequential blocks.
Note that if ANY information inside a block changes it will produce a completely different hash.
With all the block's properties (version, timestamp*ΒΉ, previous hash, transactions*Β², difficulty*ΒΉ, and nonce*ΒΉ) you can calculate the block hash.
*1 All these fields are required for the mining process to work.
*2 The transactions field in this article is a simplification of the merkle root
, we can discuss it more in-depth over the next articles if you guys show interest.
But hey, it's starting to get boring again πͺ.
Let's try it together so we can have some fun.
Let's pretend we're willing to add a new block to the Blockchain after the block 68ffd1
, which means we need to define the hash of our new block.
import json
from hashlib import sha256
from datetime import datetime
from dataclasses import asdict
HEX_BASE_TO_NUMBER = 16
DIFFICULTY = 1
PREV_BLOCK_HASH = "68ffd13b24f9d73399a80aad9de06f676001fed56648314526cd23a4d18fef16"
TRANSACTIONS = (
Transaction("sender1", "receiver1", 1),
Transaction("sender2", "receiver2", 0.5),
Transaction("sender3", "receiver3", 2.7),
)
def create_block(nonce: int, difficulty: int, previous_hash=PREV_BLOCK_HASH) -> Block:
cur_timestamp = datetime.now().timestamp()
return Block(
version=1,
timestamp=cur_timestamp,
nonce=nonce,
previous_hash=previous_hash,
difficulty=difficulty,
transactions=TRANSACTIONS
)
def encode_block(block: Block) -> str:
encoded_block = json.dumps(asdict(block)).encode()
return sha256(sha256(encoded_block).digest()).hexdigest()
def calculate_hash(nonce: int) -> str:
block = create_block(nonce, DIFFICULTY)
encoded_block_hash = encode_block(block)
return encoded_block_hash
result = calculate_hash(nonce=1)
print("Block hash as hex: ", result)
print("Block hash as number: {:,}".format(int(result, HEX_BASE_TO_NUMBER)))
Please, run the code above and you're going to receive as output the block hash like:
9caed6c391d31790d8409804ae3cb41b19951b6f76fa21958b2be35dab38ed09
and the respective decimal number like:
70,869,718,014,525,929,260,730,978,
743,189,498,965,715,300,689,702,298,
738,657,158,148,123,721,788,681
Don't be afraid of running it many times. Every run will produce a different output, and that's because the variables (i.e. timestamp) change every second!.
πͺ What is Proof of Work and Blockchain mining?
The code you run above was quite fast, and that's a problem.
The Bitcoin network enforces blocks to be mined every ~10 minutes otherwise we would mine too many blocks which would produce too many bitcoins as rewards.
Blockchain ensures it won't be a problem by adjusting the difficulty
field with bits.
Having a difficulty
set to 15 bits generates a difficulty target
of 3,533,694,129,556,768,659,166,
595,001,485,837,031,654,967,793,
751,237,916,243,212,402,585,239,552
.
We can represent this calculation as:
BYTE_IN_BITS = 256
DIFFICULTY = 15 # π Play with this number! Make it bigger or smaller
def calculate_difficulty_target(difficulty_bits: int) -> int:
return 2 ** (BYTE_IN_BITS - difficulty_bits)
decimal_target = calculate_difficulty_target(DIFFICULTY)
print("β Any mined block hash needs to be lesser than: {:,}".format(decimal_target))
Such values restrain miners from mining blocks too quickly. They need to keep mining and changing the nonce
until they get a block hash that is lesser than the difficulty target. And this often takes around 10 minutes.
Try yourself with Python. See how long it takes for you to find a random hash with a simplified variation (no timestamp, no Merkle tree, etc) that is lesser than the difficulty target:
import time
from datetime import timedelta, datetime
from dataclasses import asdict
BYTE_IN_BITS = 256
HEX_BASE_TO_NUMBER = 16
SECONDS_TO_EXPIRE = 20
DIFFICULTY = 10 # π Play with this number! Make it bigger or smaller
def calculate_difficulty_target(difficulty_bits: int) -> int:
return 2 ** (BYTE_IN_BITS - difficulty_bits)
decimal_target = calculate_difficulty_target(DIFFICULTY)
def mine_proof_of_work(nonce: int) -> tuple[bool, Block]:
block = create_block(nonce, DIFFICULTY)
encoded_block = encode_block(block)
block_encoded_as_number = int(encoded_block, HEX_BASE_TO_NUMBER)
if block_encoded_as_number < decimal_target:
return True, block
return False, block
print("β Any mined block hash needs to be lesser than: {:,}".format(decimal_target))
nonce = 0
start_time = time.time()
found = False
while not found:
found, block = mine_proof_of_work(nonce)
if not found:
print(f"β Nonce {nonce} didn't meet the difficulty target...")
nonce += 1
end_time = time.time()
elapsed_time = end_time - start_time
if elapsed_time > SECONDS_TO_EXPIRE:
raise TimeoutError(
f"Couldn't find a block with difficulty {DIFFICULTY} fast enough"
)
print(f"β
Nonce {nonce} meet difficulty target, you mined the block in {timedelta(seconds=elapsed_time)}!")
print()
print("Here's the winning block: π§±")
print(asdict(block))
You may notice it takes longer as you increase the difficulty bits, that's why this process is called "Proof of Work".
It's easier to validate the block, but hard to find the winning nonce! If node A receives a correct block hash, we know for sure that node A did the work (thus proof of work!).
Also, every node is frequently adjusting the difficulty to ensure blocks will take ~10 minutes to be mined.
If it's too slow π’, then the network decreases π the difficulty like when China banned Bitcoin mining.
The reverse is also true.
If it's too fast π¨, then the network increases π the difficulty.
You can see the difficulty over time here: https://www.blockchain.com/charts/difficulty.
A simple version of a blockchain could be represented as:
import time
from datetime import timedelta, datetime
BYTE_IN_BITS = 256
HEX_BASE_TO_NUMBER = 16
SECONDS_TO_EXPIRE = 20
GENESIS_BLOCK = Block(
version=1,
timestamp=1231476865,
difficulty=1,
nonce=1,
previous_hash="000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f",
transactions=(Transaction("Satoshi Nakamoto", "Satoshi Nakamoto", 50),)
)
def calculate_difficulty_target(difficulty_bits: int) -> int:
return 2 ** (BYTE_IN_BITS - difficulty_bits)
class Blockchain:
VERSION = 1
DIFFICULTY = 10
MINUTES_TOLERANCE = 1
def __init__(self):
self.chain = [GENESIS_BLOCK]
def get_last_block(self) -> Block:
return self.chain[-1]
def get_difficulty(self) -> int:
return self.DIFFICULTY
def add_block(self, block: Block) -> bool:
is_valid = self._validate_block(block)
if is_valid:
self.chain.append(block)
return is_valid
def _validate_block(self, candidate: Block) -> bool:
if candidate.version != self.VERSION:
return False
last_block = self.get_last_block()
if candidate.previous_hash != encode_block(last_block):
return False
if candidate.difficulty != self.DIFFICULTY:
return False
min_allowed_time = (datetime.now() - timedelta(minutes=self.MINUTES_TOLERANCE)).timestamp()
if candidate.timestamp < min_allowed_time:
return False
candidate_hash = encode_block(candidate)
candidate_decimal = int(candidate_hash, HEX_BASE_TO_NUMBER)
target = calculate_difficulty_target(self.DIFFICULTY)
is_block_valid = candidate_decimal < target
return is_block_valid
blockchain = Blockchain()
def mine_proof_of_work(nonce: int, difficulty: int, prev_hash: str) -> tuple[bool, Block]:
block = create_block(nonce, difficulty, prev_hash)
encoded_block = encode_block(block)
block_encoded_as_number = int(encoded_block, HEX_BASE_TO_NUMBER)
decimal_target = calculate_difficulty_target(difficulty)
if block_encoded_as_number < decimal_target:
return True, block
return False, block
nonce = 0
start_time = time.time()
found = False
prev_hash = encode_block(blockchain.get_last_block())
while not found:
found, block = mine_proof_of_work(nonce, blockchain.get_difficulty(), prev_hash)
if found:
blockchain.add_block(block)
else:
print(f"β Nonce {nonce} didn't meet the difficulty target...")
nonce += 1
end_time = time.time()
elapsed_time = end_time - start_time
if elapsed_time > SECONDS_TO_EXPIRE:
raise TimeoutError(
f"Couldn't find a block with difficulty {blockchain.get_difficulty()} fast enough"
)
print(
f"β
Nonce {nonce} meet difficulty target, you mined the block in {timedelta(seconds=elapsed_time)}!"
)
print("Here's the the blockchain: βοΈ")
print(blockchain.chain)
π What is a blockchain fork?
Notice that every single miner is trying to find the next block with the winning nonce, but only one can earn it, and it's whoever finds it first!
Consider 3 miners trying to find the block 504:
That's why it's so important to have the most potent hardware, so you can find it faster!
If miner B finds it, it quickly propagates block 504 to the whole network, which is an implicit way of telling your fellow miners: "Sorry, you lost".
All miners then quickly attempt to mine the next one (In our example: 505).
But hey, if it's a competition, what would happen if two nodes find the same block at the same time?
We call this event fork (see the picture below to understand why). Now it gets truly fun, they're competing for the reward and no other node in the network knows for sure who will win!
Note that Miner B doesn't care who wins... Actually, other miners don't care either (yes, they're selfish) so whoever adds the new block to any path will persist.
This leads to another blockchain rule: The longest valid path is the source of truth!
A few nodes will try to append a new Block 506 after Block 505-C.
Other nodes will try to append a new Block 506 after Block 505-A.
The winning path becomes the valid chain. The losing group discards their old chain and moves into mining the next block.
All nodes that believed 505-A had a chance just lost their time and energy. As soon as Block 506 is added on top of 505-C it then becomes the valid chain.
Yes, blockchains can be cold like that. No mercy.
β Blockchain confirmations
That's why people often wait for at least 2 confirmation blocks to validate a Bitcoin transaction (like Coinbase shows a transaction between Pending and Complete).
They want to ensure that your transaction won't be reverted due to some temporary fork.
π₯£ What are Mining Pools?
The blockchain world is cold and painful out there, miners need emotional support to face such hard moments.
Everything isn't about just competing, miners can team up as well! We call "Mining Pools" when a group of miners teams up to find the winning nonce and they all split the rewards between themselves!
Now instead of individual miners competing against each other, we have big mining pools competing against each other.
After all, it's hard for a few miners to compete against big mining farms:
You can see some mining pools for Bitcoin here: https://btc.com/stats/pool
Joining a big mining pool means you have more power to find blocks faster, which also means you need to split your rewards with more miners.
I truly hope you enjoyed reading this far!
Most concepts and learnings were extracted from the book Mastering Bitcoin.
I'm on the journey of making learning fun and interactive for everybody.
If you enjoyed reading this and if you're curious to understand how transactions can be validated in the network, let me know. I gauge interest to decide on what to write next.