Abstract your code
Implementation abstraction makes your code flexible and decoupled from vendors or hard implementations, and finally, it's quite easy to follow, yet is constantly ignored.
This post would fit perfectly in a series named “Coding Practices that should be obvious, but for some unknown reason aren't”.
You might think that I'm about to mention interfaces, but not really. Python has no "interface" and you can still follow the principle to make your life (and hopefully yours colleagues too) easier.
Every time that you write a piece of code that is very specific on the “how-to” and not the “what-to” you're cursed with code hard to change and evolve (sometimes called “legacy code”).
Over time I noticed this simple concept being ignored over and over again, people leaking details when:
- “querying database” ❌ instead of querying for an entity
- “reading stream from Kafka” ❌ instead of listening to events
- “popping SQS messages” ❌ instead of receiving messages
- “pushing a function to celery” ❌ instead of scheduling a job
I'm about to show you one obvious example of where it might happen, and how to prevent it.
Just think about a Flask API that allows users to upload purchase receipts to store them in the cloud.
Here's the code split into two files main.py
and s3.py
, let's see them:
What annoys me most is that the implementation details are leaked all over the place to the caller. The way we defined our "Storage" class makes it impossible for the client to don't be overloaded with AWS terms: "Bucket name", "Key", S3Storage
class, and finally the return type S3Object
makes our code completely coupled to the implementation details!
Your code should only be tied to the business purpose.
💦 Leaking implementation details is bad
To make this statement obvious let's say that you recently received a lifetime 80% discount to use GCP to store such data, you pitched it to your team, and it sounds like a decent discount for the long term.
The appropriate solution that I'd expect would be to just replace the S3Storage
implementation and don't even touch the main.py
file that contains our API code.
Unfortunately, just modifying this file won't be enough because we're leaking all these details to main.py
which forces us to rename variables and make the change bigger and the review process longer and more error-prone.
🎯 Focus on the business value
This problem gets easier when we focus on what problem we're solving regardless of how.
Let's rewrite the code, but this time, focusing on the problem.
Any change or maintenance that you need to do over the new PurchaseReceiptsStorage
is pretty much agnostic. We're just storing a file, the client shouldn't care how. It's none of his business.
- For storing we only provide a default filename, and the content itself.
- For reading, we just specify the filename.
- It's expected for the storage class to internally understand that "Since I manage the storage of receipts, I know which bucket, cloud, API, etc to use".