Building a Blockchain with Python πŸβ›“οΈ

Always asked yourself how blockchain works? Well, ask no more. We’re going to implement a simple version of a Blockchain written in Python interactively. πŸ‘ˆ It means that you will be able to run the code straight from your browser.

With just enough theory and a high focus on practice.

Note that if you're reading this article in AMP mode or from mobile you won't be able to run Python code from your browser, but you can still see the code samples and illustrations.

Example of executing Python interactively

The purpose here is to give you β€œenough” theory so it doesn’t get boring and so we take baby steps towards learning about blockchains.

⛓️ What is a blockchain?

Blockchain is the technology that powers Bitcoin, Ethereum, and Crypto. It’s a public ledger that guarantees information is decentralized.

For Bitcoin it means transactions are records and that each record is compliant and secure in the ledger.

It literally manages a chain of blocks, and each block contains many transactions.

A chain of blocks

Blocks can be identified by two means: height or hash. The hash is long and complex while height is quite easy since it's just the block number in the chain.

In our example above it's fine to say that the block of height 1 is also the block of hash 6b86.

The first block ever is called the Genesis Block 🌱, it's a special block with a single transaction (50 BTC to Satoshi Nakamoto).

What makes it special is that it's not tied to any real previous block, that's the only and unique exception to the Blockchain.

Since the blockchain is public and open, you can check and see the genesis block here (or any other block really).

I'm going to abstract how transactions are validated, so we can focus on the blockchain itself. We're assuming that every transaction is legit and needs no more than "from who", "to who" and "how much".

If you're interested in knowing more, I'll be happy to make a part two, and implement transaction details, validation, and some cryptography (yet again, interactively!).

Here're the entities Block and Transaction we're creating for the scope of this article:

Blockchain EntitiesΒ 
See as Python Code 🐍
from dataclasses import dataclass


@dataclass
class Transaction:
  sender: str
  receiver: str
  amount: float


@dataclass
class Block:
  version: int
  timestamp: float
  nonce: int
  difficulty: int
  previous_hash: str
  transactions: list[Transaction]

⛏️ What is Bitcoin mining?

Bitcoin mining is what keeps the network consistent and working. That's what allows people to send and receive bitcoins.

Let's analyze first how the economy works with banks.

A central authority is in charge of validating and moving money

Inevitably there's a central authority (the bank) that makes things happen:

  • Validates you have enough balance to send money;
  • Moves money between accounts;
  • Attempt to detect frauds and block them;

Since Bitcoin proposes to be completely decentralized (thus we can't have a single central authority) it needs to create a way so people collaborate and replace the need for banks.

In simple words: "Mining" is the process that allows people to use bitcoin as a decentralized currency and that incentivizes people to collaborate.

Decentralized authorities communicate between themselves to validate

Instead of a single entity, we have many different entities that communicate between themselves, they're in charge of checking whether John has enough balance, and if he does, they're in charge of moving the money straight to Mary.

πŸ”’ Why Blockchain and Bitcoin are secure?

Ok, let's be honest here. If you gave me $10,000 to deliver it to your friend

would you trust me?

Include in this question many different people I don't know, and an even bigger amount of money and this question gets even more obvious.

Me not trusting you

If a malicious miner is trying to insert invalid blocks or even remove a block from the Blockchain, he's rejected by all due to the consensus of the network! Attempting to modify the Blockchain is a waste of time and effort.

Bad actors are inevitably isolated from the Blockchain

That's what makes Bitcoin and Blockchain an awesome combination. It forces miners to be honest and (implicitly) punishes those who aren't.

πŸ’° Why miners help the Blockchain

But still, how the hell can you get people to contribute? I mean... Why would you do all this work? Why people would want to help?

For a single reason:

Money

Money!

And the money comes from 2 sources:

  • Bitcoin mining rewards
  • Transaction fees

Every mined block automatically rewards the miner with some bitcoin. The number of Bitcoins earned decreases over time.

This process is called "Bitcoin Halving" and every ~4 years rewards are cut by half.

πŸ“† Bitcoin Halving Dates

Year Block Height Block reward
2009 1 50 BTC
2012 210,000 25 BTC
2016 420,000 12.5 BTC
🚩 2020 (We're here) 630,000 6.25 BTC
2024 840,000 3.125 BTC

It means that (as of February 2022) miners earn 6.25 BTC for every mined block and that by 2024 they will be making only 3.125 BTC.

Note that rewards are created from nothing, analogous to some central authority printing money.

This process goes on until the rewards become 0 (around the year 2140). We say Bitcoin is a deflationary currency because it has a maximum limit of bitcoins that can be generated (a total of 21 million) enforced by its algorithm.

πŸ’± Transaction fees

Transaction fees are another incentive for miners to do their job. For every transaction, you include some extra amount so the miner can take it as his fee.

Take as an example John and Mary:

  • John wants to send 0.050 BTC to Mary
  • John sends 0.051 BTC where only 0.050 BTC is addressed to Mary
  • A miner receives John's transaction and notices 0.001 BTC is unaddressed
  • The miner takes 0.001 BTC as his fee and proceeds to transfer 0.050 BTC to Mary
  • If the miner mines it, he makes an additional 6.25 BTC as a reward
John sends 0.051, the Miner can take 0.001 as fee + the block reward

πŸβ›οΈ Proof of Work with Python Example

Enough theory already. Let's see some code.

What deserves your attention from our Entities here is Block's nonce, difficulty, and previous_hash properties since most of the Transaction complexity will be abstracted in this article.

See as Python Code 🐍
from dataclasses import dataclass


@dataclass
class Transaction:
  sender: str
  receiver: str
  amount: float


@dataclass
class Block:
  version: int
  timestamp: float
  nonce: int
  difficulty: int
  previous_hash: str
  transactions: list[Transaction]

Nonce means Number only used once and that's the magic number every computer is fighting to find. They pick any transactions they want from a pool and then compete to decide who finds the nonce, so they can mine the block, and receive the reward of doing so.

Difficulty is the value defined by the network that makes it harder (or easier under certain circumstances) to find a valid nonce.

Previous Hash simply points to the previous block, so you can't move blocks out of order, making it a chain of sequential blocks.

Note that if ANY information inside a block changes it will produce a completely different hash.

With all the block's properties (version, timestamp*ΒΉ, previous hash, transactions*Β², difficulty*ΒΉ, and nonce*ΒΉ) you can calculate the block hash.

*1 All these fields are required for the mining process to work.

*2 The transactions field in this article is a simplification of the merkle root, we can discuss it more in-depth over the next articles if you guys show interest.

But hey, it's starting to get boring again πŸ˜ͺ.

Let's try it together so we can have some fun.

Let's pretend we're willing to add a new block to the Blockchain after the block 68ffd1, which means we need to define the hash of our new block.

Appending block after 503
import json
from hashlib import sha256
from datetime import datetime
from dataclasses import asdict

HEX_BASE_TO_NUMBER = 16
DIFFICULTY = 1
PREV_BLOCK_HASH = "68ffd13b24f9d73399a80aad9de06f676001fed56648314526cd23a4d18fef16"
TRANSACTIONS = (
    Transaction("sender1", "receiver1", 1),
    Transaction("sender2", "receiver2", 0.5),
    Transaction("sender3", "receiver3", 2.7),
)

def create_block(nonce: int, difficulty: int, previous_hash=PREV_BLOCK_HASH) -> Block:
    cur_timestamp = datetime.now().timestamp()
    return Block(
        version=1,
        timestamp=cur_timestamp,
        nonce=nonce,
        previous_hash=previous_hash,
        difficulty=difficulty,
        transactions=TRANSACTIONS
    )

def encode_block(block: Block) -> str:
    encoded_block = json.dumps(asdict(block)).encode()
    return sha256(sha256(encoded_block).digest()).hexdigest()

def calculate_hash(nonce: int) -> str:
    block = create_block(nonce, DIFFICULTY)
    encoded_block_hash = encode_block(block)
    return encoded_block_hash


result = calculate_hash(nonce=1)
print("Block hash as hex:   ", result)
print("Block hash as number: {:,}".format(int(result, HEX_BASE_TO_NUMBER)))

Please, run the code above and you're going to receive as output the block hash like:

9caed6c391d31790d8409804ae3cb41b19951b6f76fa21958b2be35dab38ed09

and the respective decimal number like:

70,869,718,014,525,929,260,730,978,
743,189,498,965,715,300,689,702,298,
738,657,158,148,123,721,788,681

Don't be afraid of running it many times. Every run will produce a different output, and that's because the variables (i.e. timestamp) change every second!.

πŸ’ͺ What is Proof of Work and Blockchain mining?

The code you run above was quite fast, and that's a problem.

The Bitcoin network enforces blocks to be mined every ~10 minutes otherwise we would mine too many blocks which would produce too many bitcoins as rewards.

Blockchain ensures it won't be a problem by adjusting the difficulty field with bits.

Having a difficulty set to 15 bits generates a difficulty target of 3,533,694,129,556,768,659,166,
595,001,485,837,031,654,967,793,
751,237,916,243,212,402,585,239,552.

We can represent this calculation as:

BYTE_IN_BITS = 256
DIFFICULTY = 15  # πŸ‘ˆ Play with this number! Make it bigger or smaller

def calculate_difficulty_target(difficulty_bits: int) -> int:
    return 2 ** (BYTE_IN_BITS - difficulty_bits)


decimal_target = calculate_difficulty_target(DIFFICULTY)

print("❗ Any mined block hash needs to be lesser than: {:,}".format(decimal_target))

Such values restrain miners from mining blocks too quickly. They need to keep mining and changing the nonce until they get a block hash that is lesser than the difficulty target. And this often takes around 10 minutes.

The higher the difficulty the longer it takes to mine a blockΒ 

Try yourself with Python. See how long it takes for you to find a random hash with a simplified variation (no timestamp, no Merkle tree, etc) that is lesser than the difficulty target:

import time
from datetime import timedelta, datetime
from dataclasses import asdict


BYTE_IN_BITS = 256
HEX_BASE_TO_NUMBER = 16
SECONDS_TO_EXPIRE = 20
DIFFICULTY = 10  # πŸ‘ˆ Play with this number! Make it bigger or smaller


def calculate_difficulty_target(difficulty_bits: int) -> int:
    return 2 ** (BYTE_IN_BITS - difficulty_bits)


decimal_target = calculate_difficulty_target(DIFFICULTY)


def mine_proof_of_work(nonce: int) -> tuple[bool, Block]:
    block = create_block(nonce, DIFFICULTY)
    encoded_block = encode_block(block)
    block_encoded_as_number = int(encoded_block, HEX_BASE_TO_NUMBER)

    if block_encoded_as_number < decimal_target:
        return True, block

    return False, block


print("❗ Any mined block hash needs to be lesser than: {:,}".format(decimal_target))

nonce = 0
start_time = time.time()
found = False

while not found:
    found, block = mine_proof_of_work(nonce)

    if not found:
        print(f"❌ Nonce {nonce} didn't meet the difficulty target...")
        nonce += 1

    end_time = time.time()
    elapsed_time = end_time - start_time

    if elapsed_time > SECONDS_TO_EXPIRE:
        raise TimeoutError(
            f"Couldn't find a block with difficulty {DIFFICULTY} fast enough"
        )

print(f"βœ… Nonce {nonce} meet difficulty target, you mined the block in {timedelta(seconds=elapsed_time)}!")
print()
print("Here's the winning block: 🧱")
print(asdict(block))

You may notice it takes longer as you increase the difficulty bits, that's why this process is called "Proof of Work".

It's easier to validate the block, but hard to find the winning nonce! If node A receives a correct block hash, we know for sure that node A did the work (thus proof of work!).

Also, every node is frequently adjusting the difficulty to ensure blocks will take ~10 minutes to be mined.

If it's too slow 🐒, then the network decreases πŸ“‰ the difficulty like when China banned Bitcoin mining.

The reverse is also true.

If it's too fast πŸ’¨, then the network increases πŸ“ˆ the difficulty.

You can see the difficulty over time here: https://www.blockchain.com/charts/difficulty.

A simple version of a blockchain could be represented as:

import time
from datetime import timedelta, datetime


BYTE_IN_BITS = 256
HEX_BASE_TO_NUMBER = 16
SECONDS_TO_EXPIRE = 20


GENESIS_BLOCK = Block(
    version=1,
    timestamp=1231476865,
    difficulty=1,
    nonce=1,
    previous_hash="000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f",
    transactions=(Transaction("Satoshi Nakamoto", "Satoshi Nakamoto", 50),)
)


def calculate_difficulty_target(difficulty_bits: int) -> int:
    return 2 ** (BYTE_IN_BITS - difficulty_bits)


class Blockchain:
    VERSION = 1
    DIFFICULTY = 10
    MINUTES_TOLERANCE = 1

    def __init__(self):
        self.chain = [GENESIS_BLOCK]

    def get_last_block(self) -> Block:
        return self.chain[-1]

    def get_difficulty(self) -> int:
        return self.DIFFICULTY

    def add_block(self, block: Block) -> bool:
        is_valid = self._validate_block(block)
        if is_valid:
            self.chain.append(block)

        return is_valid

    def _validate_block(self, candidate: Block) -> bool:
        if candidate.version != self.VERSION:
            return False

        last_block = self.get_last_block()
        if candidate.previous_hash != encode_block(last_block):
            return False
        
        if candidate.difficulty != self.DIFFICULTY:
            return False

        min_allowed_time = (datetime.now() - timedelta(minutes=self.MINUTES_TOLERANCE)).timestamp()
        if candidate.timestamp < min_allowed_time:
            return False

        candidate_hash = encode_block(candidate)
        candidate_decimal = int(candidate_hash, HEX_BASE_TO_NUMBER)

        target = calculate_difficulty_target(self.DIFFICULTY)
        is_block_valid = candidate_decimal < target

        return is_block_valid


blockchain = Blockchain()


def mine_proof_of_work(nonce: int, difficulty: int, prev_hash: str) -> tuple[bool, Block]:
    block = create_block(nonce, difficulty, prev_hash)
    encoded_block = encode_block(block)
    block_encoded_as_number = int(encoded_block, HEX_BASE_TO_NUMBER)
    decimal_target = calculate_difficulty_target(difficulty)

    if block_encoded_as_number < decimal_target:
        return True, block

    return False, block


nonce = 0
start_time = time.time()
found = False
prev_hash = encode_block(blockchain.get_last_block())

while not found:
    found, block = mine_proof_of_work(nonce, blockchain.get_difficulty(), prev_hash)

    if found:
        blockchain.add_block(block)
    else:
        print(f"❌ Nonce {nonce} didn't meet the difficulty target...")
        nonce += 1

    end_time = time.time()
    elapsed_time = end_time - start_time

    if elapsed_time > SECONDS_TO_EXPIRE:
        raise TimeoutError(
            f"Couldn't find a block with difficulty {blockchain.get_difficulty()} fast enough"
        )

print(
    f"βœ… Nonce {nonce} meet difficulty target, you mined the block in {timedelta(seconds=elapsed_time)}!"
)
print("Here's the the blockchain: ⛓️")
print(blockchain.chain)

πŸ† What is a blockchain fork?

Notice that every single miner is trying to find the next block with the winning nonce, but only one can earn it, and it's whoever finds it first!

Consider 3 miners trying to find the block 504:

3 miners struggling to find the winning Nonce for block 504

That's why it's so important to have the most potent hardware, so you can find it faster!

If miner B finds it, it quickly propagates block 504 to the whole network, which is an implicit way of telling your fellow miners: "Sorry, you lost".

All miners then quickly attempt to mine the next one (In our example: 505).

Miner B found the nonce, Miners A and C keep trying the next block

But hey, if it's a competition, what would happen if two nodes find the same block at the same time?

We call this event fork (see the picture below to understand why). Now it gets truly fun, they're competing for the reward and no other node in the network knows for sure who will win!

Mining fork caused by Miners A and C

Note that Miner B doesn't care who wins... Actually, other miners don't care either (yes, they're selfish) so whoever adds the new block to any path will persist.

This leads to another blockchain rule: The longest valid path is the source of truth!

A few nodes will try to append a new Block 506 after Block 505-C.

Other nodes will try to append a new Block 506 after Block 505-A.

The winning path becomes the valid chain. The losing group discards their old chain and moves into mining the next block.

Eventually, the fork is dissolved and the longest path wins

All nodes that believed 505-A had a chance just lost their time and energy. As soon as Block 506 is added on top of 505-C it then becomes the valid chain.

Yes, blockchains can be cold like that. No mercy.

βœ… Blockchain confirmations

That's why people often wait for at least 2 confirmation blocks to validate a Bitcoin transaction (like Coinbase shows a transaction between Pending and Complete).

They want to ensure that your transaction won't be reverted due to some temporary fork.

πŸ₯£ What are Mining Pools?

The blockchain world is cold and painful out there, miners need emotional support to face such hard moments.

Everything isn't about just competing, miners can team up as well! We call "Mining Pools" when a group of miners teams up to find the winning nonce and they all split the rewards between themselves!

Now instead of individual miners competing against each other, we have big mining pools competing against each other.

After all, it's hard for a few miners to compete against big mining farms:

Mining farm

You can see some mining pools for Bitcoin here: https://btc.com/stats/pool

Joining a big mining pool means you have more power to find blocks faster, which also means you need to split your rewards with more miners.

Extracted from https://btc.com/stats/pool

I truly hope you enjoyed reading this far!

Most concepts and learnings were extracted from the book Mastering Bitcoin.

I'm on the journey of making learning fun and interactive for everybody.

If you enjoyed reading this and if you're curious to understand how transactions can be validated in the network, let me know. I gauge interest to decide on what to write next.