Abstract your code

Implementation abstraction makes your code flexible and decoupled from vendors or hard implementations, and finally, it's quite easy to follow, yet is constantly ignored.

This post would fit perfectly in a series named “Coding Practices that should be obvious, but for some unknown reason aren't”.

You might think that I'm about to mention interfaces, but not really. Python has no "interface" and you can still follow the principle to make your life (and hopefully yours colleagues too) easier.

Every time that you write a piece of code that is very specific on the “how-to” and not the “what-to” you're cursed with code hard to change and evolve (sometimes called “legacy code”).

Over time I noticed this simple concept being ignored over and over again, people leaking details when:

  • “querying database” ❌ instead of querying for an entity
  • “reading stream from Kafka” ❌ instead of listening to events
  • “popping SQS messages” ❌ instead of receiving messages
  • “pushing a function to celery” ❌ instead of scheduling a job

I'm about to show you one obvious example of where it might happen, and how to prevent it.

Just think about a Flask API that allows users to upload purchase receipts to store them in the cloud.

Flask API allows uploading files to S3

Here's the code split into two files main.py and s3.py, let's see them:

# Located under: s3.py
class S3Storage:
    def store(self, bucket: str, key: str, file: bytes):
        # Implementation details of using botocore to upload bytes to S3
        ...

    def retrieve_obj(self, bucket: str, key: str) -> S3Object:
        # Note it returns a specific "S3 Object" 👆
        # Implementation details of using botocore to read data from S3
        ...
The initial version of s3.py
# Located under: main.py
from s3 import S3Storage
...

s3_storage = S3Storage()
s3_bucket = "purchase-receipts"
s3_key = "submissions/receipt.jpeg"


@app.route('/purchase-receipts', methods=['POST'])
def submit_receipt(receipt_upload: FileStorage):
    s3_content = receipt_upload.read()
    s3_storage.store(s3_bucket, s3_key, s3_content)

    return "OK"


@app.route('/purchase-receipts')
def get_receipt():
    s3_obj = s3_storage.retrieve_obj(s3_bucket, s3_key)
    content = s3_obj.response['Body'].read()

    return content
The initial version of main.py

What annoys me most is that the implementation details are leaked all over the place to the caller. The way we defined our "Storage" class makes it impossible for the client to don't be overloaded with AWS terms: "Bucket name", "Key", S3Storage class, and finally the return type S3Object makes our code completely coupled to the implementation details!

Your code should only be tied to the business purpose.

💦 Leaking implementation details is bad

To make this statement obvious let's say that you recently received a lifetime 80% discount to use GCP to store such data, you pitched it to your team, and it sounds like a decent discount for the long term.

The appropriate solution that I'd expect would be to just replace the S3Storage implementation and don't even touch the main.py file that contains our API code.

# Located under: s3.py
class S3Storage:
    def store(self, bucket: str, key: str, file: bytes):
        # Implementation details of using botocore to upload bytes to S3
        ...

    def retrieve_obj(self, bucket: str, key: str) -> S3Object:
        # Implementation details of using botocore to read data from S3
        ...

# you probably would need to rename the file as well to gcp.py for consistency
# 👇 Would become something like this
class GCPStorage:
    def store(self, bucket: str, blob_name: str, blob: bytes):
        # Implementation details of using gcp sdk to upload bytes to GCP
        ...

    def retrieve_obj(self, bucket: str, key: str) -> GCPObject:
        # Implementation details of using gcp sdk to read data from GCP
        ...
The second version of s3.py (or maybe gcp.py?)

Unfortunately, just modifying this file won't be enough because we're leaking all these details to main.py which forces us to rename variables and make the change bigger and the review process longer and more error-prone.

Business tightly tied to the implementation details

🎯 Focus on the business value

This problem gets easier when we focus on what problem we're solving regardless of how.

Let's rewrite the code, but this time, focusing on the problem.

# Now located under: storages.py
class PurchaseReceiptsStorage:  # 👈 Do not leak "HOW" we're storing it
    def store(self, filename: str, file: bytes):  # 👈 Only asks for the absolutely necessary
        # Implementation details for either S3, GCP or whatever
        ...

    def retrieve_receipt(self, filename: str) -> bytes:  # 👈 Only asks for the absolutely necessary
        # Return some data type that can be easily consumed by the client regardless of details 👆
        # Implementation details for either S3, GCP or whatever
        ...
Final conversion from s3.py to storages.py
from storages import PurchaseReceiptsStorage  # 👈 Changes shouldn't modify this

storage = PurchaseReceiptsStorage()  # 👈 Business value clear
filename = "receipt.jpeg"

@app.route('/purchase-receipts', methods=['POST'])
def submit_receipt(receipt_upload: FileStorage):
    content = receipt_upload.read()
    storage.store(filename, content)  # 👈 Things that are expected for the caller to know

    return "OK"


@app.route('/purchase-receipts')
def get_receipt():
    content = storage.retrieve_receipt(filename)  # 👈 Returns a known common type not tied to any lib

    return content
main.py file focused on the business

Any change or maintenance that you need to do over the new PurchaseReceiptsStorage is pretty much agnostic. We're just storing a file, the client shouldn't care how. It's none of his business.

Business value free from annoying details
  • For storing we only provide a default filename, and the content itself.
  • For reading, we just specify the filename.
  • It's expected for the storage class to internally understand that "Since I manage the storage of receipts, I know which bucket, cloud, API, etc to use".