Organize Python code like a PRO 🐍📦

gui commited 3 years ago

For every minute spent in organizing, an hour is earned.
by Benjamin Franklin

Python is different from languages like C# or Java where they enforce you to have classes named after the file they live in.

So far Python is one of the most flexible languages I had contact with and everything too flexible enhances the odds of bad decisions.

Do you want to keep all project classes in a single main.py file? Yes, it works.
Do you need to read an os environment var? Just read it right there.
Do you need to modify a function behavior? Why not a decorator!?

Many decisions that are easy to implement may backfire producing code that is extremely hard to maintain.

This is not necessarily bad if you know what you're doing.

During this chapter, I'm going to present to you guidelines that worked for me over the past working in different companies and with many different people.

🌳 Structure your Python project

Let's focus first on directory structure, file naming, and module organization.

I recommend you to keep all your module files inside a src dir, and all tests living side by side with it:

Top-Level project

<project>
├── src
│   ├── <module>/*
│   │    ├── __init__.py
│   │    └── many_files.py
│   │
│   └── tests/*
│        └── many_tests.py
│
├── .gitignore
├── pyproject.toml
└── README.md

Where <module> is your main module. If in doubt, consider what people would pip install and how you would like to import module.

Frequently it has the same name as the top project. This isn't a rule though.

🎯 The reasoning behind a `src` directory

I've seen many projects doing differently.

Some variations include no src dir with all project modules around the tree.

This is quite annoying because of the lack of order, producing things like (example):

non_recommended_project
├── <module_a>/*
│     ├── __init__.py
│     └── many_files.py
│
├── .gitignore
│
├── tests/*
│    └── many_tests.py
│
├── pyproject.toml
│
├── <module_b>/*
│     ├── __init__.py
│     └── many_files.py
│
└── README.md

It's boring to have things so apart due to the alphabetical sorting of the IDE.

The main reason behind the src dir is to keep active project code concentrated inside a single directory while settings, CI/CD setup, and project metadata can reside outside of it.

The only drawback of doing it is that you can't import module_a in your python code out of the box. We need to set up the project to be installed under this repository. We're going to discuss how to solve this soon in this chapter.

🏷️ How to name files

Rule 1: There are no files

First of all, in Python there are no such things as "files" and I noticed this is the main source of confusion for beginners.

If you're inside a directory that contains any __init__.py it's a directory composed of modules, not files.

See each module as a namespace.

I mean namespace because you can't say for sure whether they have many functions, classes, or just constants. It can have virtually all of them or just a bunch of some.

Rule 2: Keep things together as needed

It’s fine to have several classes within a single module, and you should do so. (when classes are related to the module, obviously.)

Only break it down when your module gets too big, or when it handles different concerns.

Often, people think it’s a bad practice due to some experience with other languages that enforce the other way around (e.g. Java and C#).

Rule 3: By default give plural names

As a rule of thumb, name your modules in the plural and name them after a business context.

There're exceptions to this rule though! Modules can be named core, main.py, and similar to represent a single thing. Use your judgment, if in doubt stick to the plural rule.

🔎 Real-life example when naming modules

I'll share a Google Maps Crawler project that I built as an example.

This project is responsible for crawling data from Google Maps using Selenium and outputting it (Read more here if curious).

This is the current project tree outlining exceptions to the #3 rule:

gmaps_crawler
├── src
│   └── gmaps_crawler
│        ├── __init__.py
│        ├── config.py 👈 (Singular)
│        ├── drivers.py
│        ├── entities.py
│        ├── exceptions.py
│        ├── facades.py
│        ├── main.py  👈 (Singular)
│        └── storages.py
│
├── .gitignore
├── pyproject.toml
└── README.md

It seems very natural to import classes and functions like:

from gmaps_crawler.storages import get_storage
from gmaps_crawler.entities import Place
from gmaps_crawler.exceptions import CantEmitPlace

I can understand that I might have one or many exception classes inside exceptions and so on.

The beauty about having plural modules is that:

They're not too small (e.g. one per class)
You can at any moment break it down into smaller modules if required
They give you a strong sense of knowing what might exist inside

🔖 Naming classes, functions, and variables

Some people claim naming things is hard. It gets less hard when you define some guidelines.

👊 Functions and Methods should be verbs

Functions and methods represent an action or actionable stuff.

Something "isn't". Something is "happening".

Actions are clearly stated by verbs.

A few good examples from REAL projects I worked on before:

def get_orders():
    ...

def acknowledge_event():
    ...

def get_delivery_information():
    ...

def publish():
    ...

A few bad examples:

def email_send():
    ...

def api_call():
   ...

def specific_stuff():
   ...

They're a bit unclear whether they return an object to allow me to perform the API call or if it actually sends the email for example.

I can picture a scenario like this:

email_send.title = "title"
email_send.dispatch()

Example of a misleading function name

Exceptions to this rule are just a few but they exist.

Creating a main() function to be invoked in the main entry point of your application is a good reason to skip this rule.
Using @property to treat a class method as an attribute is also valid.

🐶 Variables and Constants should be nouns

Should always be nouns, never verbs (which clarifies the difference between functions).

Good examples:

plane = Plane()
customer_id = 5
KEY_COMPARISON = "abc"

Bad examples:

fly = Plane()
get_customer_id = 5
COMPARE_KEY = "abc"

If your variable/constant is a list or collection, make it plural!

planes: list[Plane] = [Plane()] # 👈 Even if it contains only one item
customer_ids: set[int] = {5, 12, 22}
KEY_MAP: dict[str, str] = {"123": "abc"} # 👈 Dicts are kept singular

🏛️ Classes should be self explanatory, but Suffixes are fine

Prefer classes with self explanatory names. It's fine to have suffixes like Service, Strategy, Middleware, but only when extremely necessary to make its purpose clear.

Always name it in singular instead of plural. Plural reminds us of collections (e.g. if I read orders I assume it's a list or iterable), so remind yourself that once a class is instantiated it becomes a single object.

Classes representing entities

Classes that represent things from the business context should be named as is (nouns!). Like Order, Sale, Store, Restaurant and so on.

Example of suffixes usage

Let’s consider you want to create a class responsible for sending emails. If you name it just as "Email", its purpose is not clear.

Someone might think it may represent an entity e.g.

email = Email() # inferred usage example
email.title = "Title"
email.body = create_body()
email.send_to = "guilatrova.dev"

send_email(email)

You should name it "EmailSender" or "EmailService".

🐪 Casing conventions

By default follow these naming conventions:

Type	Public	Internal
Packages (directories)	`lower_with_under`	-
Modules (files)	`lower_with_under.py`	-
Classes	`CapWords`	-
Functions and methods	`lower_with_under()`	`_lower_with_under()`
Constants	`ALL_CAPS_UNDER`	`_ALL_CAPS_UNDER`

⚠️ Disclaimer about """private""" methods.

Some people found out that if you have __method(self) (any method starting with two underscores) Python won't let outside classes/methods invoke it normally which leads them to think it's fine.

If you came from a C# environment like myself it might sound weird that you can't protect a method.

But Guido (Python's creator) has a good reason behind it:

"We're all consenting adults here"

It means that if you're aware you shouldn't be invoking a method, then you shouldn't unless you know what you're doing.

After all, if you really decided to invoke that method, you're going to do something dirty to make it happen (known as "Reflection" in C#).

Mark your private method/function with a single initial underscore to state it's intended for private use only and live with it.

↪️ When to create a function or a class in Python?

This is a common question I received a few times.

If you follow the above recommendations you're going to have clear modules and clear modules are an effective way to organize functions:

from gmaps_crawler import storages

storages.get_storage()  # 👈 Similar to a class, except it's not instantied and has a plural name
storages.save_to_storage()  # 👈 Potential function inside module

Sometimes you can identify subsets of functions inside a module. When this happens a class makes more sense:

Example on grouping different subset of functions

Consider the same storages module with 4 functions:

def format_for_debug(some_data):
    ...

def save_debug(some_data):
    """Prints in the screen"""
    formatted_data = format_for_debug(some_data)
    print(formatted_data)


def create_s3(bucket):
    """Create s3 bucket if it doesn't exists"""
    ...

def save_s3(some_data):
    s3 = create_s3("bucket_name")
    ...

S3 is a cloud storage to store any sort of data provided by Amazon (AWS). It's like Google Drive for software.

We can say that:

The developer can save data in DEBUG mode (that just prints on the screen) or on S3 (that stores data on the cloud).
save_debug uses the format_for_debug function
save_s3 uses the create_s3 function

I can see two groups of functions and no reason to keep them in different modules as they seem small, thus I'd enjoy having them defined as classes:

class DebugStorage:
    def format_for_debug(self, some_data):
        ...

    def save_debug(self, some_data):
        """Prints in the screen"""
        formatted_data = self.format_for_debug(some_data)
        print(formatted_data)


class S3Storage:
    def create_s3(self, bucket):
        """Create s3 bucket if it doesn't exists"""
        ...

    def save_s3(self, some_data):
        s3 = self.create_s3("bucket_name")
        ...

Here's a rule of thumb:

Always start with functions
Grow to classes once you feel you can group different subsets of functions

🚪 Creating modules and entry points

Every application has an entry point.

It means that there's a single module (aka file) that runs your application. It can be either a single script or a big module.

Whenever you're creating an entry point, make sure to add a condition to ensure it's being executed and not imported:

def execute_main():
    ...


if __name__ == "__main__":  # 👈 Add this condition
    execute_main()

By doing that you ensure that any imports won't trigger your code by accident. Unless it's explicitly executed.

Defining main for modules

You might have noticed some python packages that can be invoked by passing down -m like:

python -m pytest
python -m tryceratops
python -m faust
python -m flake8
python -m black

Such packages are treated almost like regular commands since you can also run them as:

pytest
tryceratops
faust
flake8
black

To make this happen you need to specify a single __main__.py file inside your main module:

<project>
├── src
│   ├── example_module 👈 Main module
│   │    ├── __init__.py
│   │    ├── __main__.py 👈 Add it here
│   │    └── many_files.py
│   │
│   └── tests/*
│        └── many_tests.py
│
├── .gitignore
├── pyproject.toml
└── README.md

Don't forget you still need to include the check __name__ == "__main__" inside your __main__.py file.

When you install your module, you can run your project as python -m example_module.

📖 Hey!

This is an initial draft from a book that I'm writing!

~~If you're interested make sure to subscribe to the newsletter and follow me on Twitter to be notified when the book is out!~~

The first chapter is out with a special discount!

I'm also open to feedback, get in touch either through email or Twitter DMs if you have any.