Gui Commits

Python ChatGPT API and DeepSeek API: Straight‑to‑the‑Point Guide 🐍🤖

Guilherme Latrova — Wed, 04 Jun 2025 11:26:21 GMT

ChatGPT is the future and not knowing its capabilities will limit your ability to produce high quality code faster than your peers.

No fluff — just working answers and bite‑sized code samples.

🤔 Can I actually use ChatGPT in Python?

Yes! Integrating ChatGPT with Python is simpler than cooking instant noodles 🍜. All you need is to pip install openai, your API key, and a few lines of Python code. Let’s do this!

🤩 Is ChatGPT Python API free?

No. Even if you subscribe to ChatGPT Plus/Pro, API usage is billed separately.

OpenAI’s API runs on a pay-as-you-go model, so you can begin experimenting for just a few dollars. If you're just learning you can start with as little as $5 to get started.

💸 Python ChatGPT API Pricing

The prices vary by the model utilized and it's based per 1M tokens.

As of the time of this writing, consider these three examples using popular models o1, 4o and gpt-image-1 (to create that cool Studio Ghibli Photos):

Alias	Model	$ Input Token	$ Cache Input	$ Output Token
`gpt-4o`	`gpt-4o-2024-08-06`	$2.50	$1.25	$10.00
`o1`	`o1-2024-12-17`	$15.00	$7.50	$60.00
`gpt-image-1`	`gpt-image-1`	$10.00	$2.50	$40.00

Prices might have gone up (or down - who knows?), so make sure to evaluate the Official ChatGPT Pricing Page.

🧩 What's ChatGPT Input and Output tokens?

Tokens are how ChatGPT reads and writes text. Instead of full words, it breaks everything into small chunks called tokens — like pieces of words, punctuation, or spaces.

You pay for both the tokens you send (input) and the ones you get back (output).

Text	Token Count
Hello	1 token
Hello, world!	4 tokens (`Hello`, `,`, `world`, `!`)
I love Python.	4 tokens (`I`, `love`, `Python`, `.`)

Don't be fooled. 1 word != 1 token because 1 token is about 4 characters of English text and there are words that are counted differently due to their complexity.

Word Example	Token Count Breakdown
`unbelievable`	`un`, `believable` → 2 tokens
`extraordinary`	`extra`, `ordinary` → 2 tokens
`internationalization`	`international`, `ization` → 2 tokens
`transportation`	`trans`, `port`, `ation` → 3 tokens
`misunderstanding`	`mis`, `under`, `standing` → 3 tokens
`counterproductive`	`counter`, `pro`, `duct`, `ive` → 4 tokens
`disproportionately`	`dis`, `pro`, `portion`, `ately` → 4 tokens

And you don't even have to trust me, you can go straight to OpenAI Tokenizer and check it for yourself:

💰 Caching ChatGPT Tokens to save money

Being practical, how can I optimize my tokens so I pay less?

Well, you could shorten prompts and messages, after all we learned that fewer tokens = lower cost but that's not easy.

Instead of putting effort into picking the right tokens (or words), you can try to use caching as much as possible to your advantage.

When you use ChatGPT API you're automatically using the caching capability. The API caches the longest prefix of a prompt recently used. After the first call you get 5–10 minutes to reuse it.

Prompts can be cached to save you some money

Request #	Prompt	Token Count
#1	You're a Python Expert: How to print hello world?	11 Tokens - No cache
#2	You're a Python Expert: How to sum two numbers?	11 Tokens, 7 Tokens Cached
#3 (After ~10 min of inactivity)	You're a Python Expert: How to sum two numbers?	11 Tokens - No cache

Hitting caches would save you 50% of the current model pricing. (e.g. $2.5 regular -> $1.25 for cached).

You can check the usage when using the API so you can evaluate if the prompt worked as you expected. Let's see it later.

Don't forget you also pay for output tokens which you can't really control.

🔑 Creating a ChatGPT API Token (If you don't have one yet)

If you already have created your account you can skip this section.

Just give a cool name to your organization (either "Personal" or "Hobbies" should be fine).

Step 1 - Create a new org

Step 2 - Generate your key

Step 3 - Get your key

Step 4 - Pay to use ChatGPT API

Pay for it. If you're just experimenting, I recommend paying only $5 and sticking to model gpt-4.1-nano which is the cheapest.

How to create my ChatGPT API Token

Assuming you already have an account.

We can start by visiting OpenAI's Platform API Keys Page (Not the regular ChatGPT page).

Click on Create new secret key and submit it:

Creating a ChatGPT API Key

🧑‍💻 ChatGPT API Example using responses.create

To get you started you must install openai by running: pip install openai. I'm also installing rich to display the data nicely.

import typing as t
from openai import OpenAI
from rich import print

API_KEY: t.Final = "sk-proj-[REDACTED]"
MODEL: t.Final = "gpt-4.1-nano"  # Cheapest: $0.10 Input / $0.40 Output


class PythonExpert:
    def __init__(self):
        self.client = OpenAI(api_key=API_KEY)

    def ask_the_expert(self, question: str):
        response = self.client.responses.create(
            model=MODEL,
            instructions="You're a Python expert. Answer the question as best you can.",
            input=question,
        )

        return response


def main():
    expert = PythonExpert()

    question = 'How do I print "Hello, world!" in Python?'
    response = expert.ask_the_expert(question)
    answer = response.output_text

    print(f"[yellow]Question:[/yellow] {question}")
    print(f"\tAnswer: {answer}")


if __name__ == "__main__":
    main()

Note we're using client.responses.create. This is a recent release from 2025, and it supports background execution, web search, file search, memorize history, and interact with GUI.

You might decide to stick to the old client.chat.completions.create() if you want to keep the history and context locally.

🧑‍💻 How to measure ChatGPT API Token Caching?

Let's say you want to measure your prompts to see how much caching you can get to optimize costs.

ChatGPT Caching only works for prompts with > 1,024 tokens. For our example I’m going to provide an extremely long instruction and try to be as verbose as possible:

import typing as t
from openai import OpenAI
from rich import print
from rich.console import Console
from rich.table import Table as RichTable
from textwrap import dedent


API_KEY: t.Final = "sk-proj-[REDACTED]"
MODEL: t.Final = "gpt-4.1-nano"  # 👈 Cheapest: $0.10 Input / $0.40 Output


class PythonExpert:
    def __init__(self):
        self.client = OpenAI(api_key=API_KEY)

    # NOTE: 👇 Since we're simulating token caching, we need to ensure it has at least 1,024 tokens.
    # Part of this initial prompt boilerplate will ALSO be cached.
    BASE_PROMPT_BULLSHIT = dedent(
        """
    ─────────────────────────  PYTHON EXPERT SYSTEM REFERENCE GUIDE  ─────────────────────────

    SECTION 1  - Code Style (PEP 8)
    ▸ Prefer *snake_case* for variables and functions, *PascalCase* for classes, and UPPER_SNAKE_CASE for module -level constants.
    ▸ Keep lines ≤ 79 chars; wrap long expressions with implied line -continuations inside (), [] or {}.
    ▸ Imports: standard lib ▸ third -party ▸ local, each block alphabetised; never use wildcard imports.
    ▸ Use f -strings for interpolation; reserve `%` formatting for logging -style placeholders.

    SECTION 2  - Typing & Static Analysis
    ▸ Add *type hints* (PEP 484) to every public function: def square(n: int | float) -> int | float: …
    ▸ Avoid `Any`; prefer Protocols and generics for flexible APIs.
    ▸ Run *mypy* or *pyright* in CI; treat *warnings as errors* to prevent regressions.
    ▸ Use *typing -extensions* for back -ports of upcoming features (e.g. TypeAliasType).

    SECTION 3  - Performance & Profiling
    ▸ Use built -ins and std -lib (sum, max, heapq) before reaching for numpy or pandas; C -optimised code often beats naive C -extensions.
    ▸ Profile first!  `python  -m cProfile  -o stats.prof main.py` + *snakeviz* to visualise hotspots.
    ▸ Favour list -comprehensions over explicit loops where readability permits; avoid premature micro -optimisation.
    ▸ For numeric hotspots consider `numba` or Cython; for I/O hot paths, use buffering and async.

    SECTION 4  - Concurrency & Parallelism
    ▸ *asyncio* excels at I/O -bound workloads: await network, file, or DB calls without blocking the event -loop.
    ▸ For CPU -bound tasks use `concurrent.futures.ProcessPoolExecutor` or *multiprocessing*; the GIL limits pure threads.
    ▸ Shield long awaitables with `asyncio.to_thread` in 3.9+ when you need to run a sync function without freezing awaitables.
    ▸ Never share mutable state across processes without proper IPC (queues, managers, shared memory).

    SECTION 5  - Packaging & Distribution
    ▸ Adopt *pyproject.toml*; specify build -system (`[build -system] requires = ["setuptools>=64", "wheel"]`).
    ▸ `python  -m build` produces sdist + wheel; upload via *twine* to TestPyPI first.
    ▸ Use *semantic -versioning*: MAJOR → breaking, MINOR → features, PATCH → fixes.
    ▸ Provide rich metadata (classifiers, project -urls) so pip search surfaces your project.

    SECTION 6  - Testing Philosophy
    ▸ Prefer *pytest*; write small, deterministic tests—no sleeps or network calls.
    ▸ Isolate side -effects with fixtures + tmp_path; parametrize happy -path and edge cases.
    ▸ Aim for behaviour over implementation: changing internals should not break tests as long as public contract holds.
    ▸ Track coverage but don't chase 100 %; guard against critical regressions instead.

        SECTION 7 - Debugging & Logging
    ▸ Insert `breakpoint()` (Python 3.7+) to drop into pdb without imports; use `pdbpp` for nicer colours & sticky mode.
    ▸ Configure *logging* early: level via env var, write JSON logs in production, colourised human -friendly logs locally.
    ▸ Never log secrets; scrub tokens/IP addresses with custom filters or structlog processors.
    ▸ Prefer structured logging over free -text for easier log aggregation and querying.

    SECTION 8 - Security Best Practices
    ▸ Load secrets from the environment or a secrets -manager—never hard -code keys.
    ▸ Pin dependencies with hashes (`pip -tools`, `poetry lock --no-update`); audit with `pip-audit` or GitHub Dependabot.
    ▸ Validate user input; distrust deserialisation (yaml.load, pickle).  Use `json.loads` or pydantic models instead.
    ▸ Keep Python patched (security -fix releases); run containers as non -root and drop capabilities.

    SECTION 9 - Data Classes & Validation
    ▸ Use `@dataclass(slots=True, frozen=True)` for lightweight value objects; benefits: immutability & memory savings.
    ▸ For external data, model with *pydantic* or *attrs* for runtime validation and parsing.
    ▸ Document JSON schema; version breaking changes.  Provide migration scripts between schema versions.
    ▸ Convert between domain models and persistence DTOs to keep layers isolated.

    SECTION 10 - Command -line Interfaces (CLI)
    ▸ Prefer *typer* (built on click) for ergonomic CLIs with auto -generated help and type -hints.
    ▸ Support `--version`, `--help`, exit codes (0 success, non -zero failure).  Provide rich `stderr` messages for errors.
    ▸ Package entry -points under `[project.scripts]` in *pyproject.toml* so `pipx` users can install system -wide.
    ▸ Test CLI commands with `pytest` + `capsys` or *click.testing*'s runner.

    SECTION 11 - Configuration Management
    ▸ Hierarchy: CLI args ▶ env vars ▶ `.env` file ▶ config file ▶ defaults.  Later overrides earlier.
    ▸ Use *dynaconf* or `pydantic.Settings` for 12 -factor -style config loading.
    ▸ Keep secrets out of git; supply sample env files for local dev.
    ▸ Provide schema validation so a broken config fails fast at startup.

    SECTION 12 - Documentation & Docstrings
    ▸ Write *Google -style* or *NumPy -style* docstrings; include type hints, parameter descriptions, return values, raises.
    ▸ Generate docs with *mkdocs -material* or *Sphinx* + *autodoc*; host on GitHub Pages.
    ▸ Keep examples runnable: embed doctests or use *pytest -doctestplus*.
    ▸ Treat docs as code: review PRs, run spell -checkers (codespell), and enforce link rot checks.

    ───────────────────────────────────────────────────────────────────────────────────────────
    """
    )

    def ask_the_expert(self, question: str):
        response = self.client.responses.create(
            model=MODEL,
            instructions=self.BASE_PROMPT_BULLSHIT,
            input=question,
        )

        return response

# 👇 Organize & Output it nicely
def _create_table() -> RichTable:
    table = RichTable(title="Token Usage Summary")
    table.add_column("[bold]Prompt[/bold]")
    table.add_column("[bold]Input Tokens[/bold]", justify="right")
    table.add_column("[bold]Cached Tokens[/bold]", justify="right")
    table.add_column("[bold]Output Tokens[/bold]", justify="right")
    table.add_section()

    return table


QUESTIONS = [
    'How do I print "Hello, world!" in Python, and why is `print` a function rather than a statement?',
    # 👆 Verbose
    # 👇 Somewhat simple to simulate cache hitting
    'How do I print "Hello, world!" in Python?',
]


def main():
    expert = PythonExpert()
    table = _create_table()
    console = Console()

    for question in QUESTIONS:
        response = expert.ask_the_expert(question)

        answer = response.output_text
        input_tokens = response.usage.input_tokens
        cached = response.usage.input_tokens_details.cached_tokens
        output_tokens = response.usage.output_tokens

        print(f"[yellow]Question:[/yellow] {question}")
        print(f"\tAnswer: {answer}")

        table.add_row(
            question,
            str(input_tokens),
            str(cached),
            str(output_tokens),
        )

    console.print(table)


if __name__ == "__main__":
    main()

Token Usage through API Example

Since we know that gpt-4.1-nano charges $0.10 / $0.025 / $0.40 we can calculate:

Prompt #	Uncached Input Tokens	Cached Input Tokens	Output Tokens	Estimated Cost
#1	1,369	0	227	$0.0002277
#2	107	1,262	30	$0.00005926

📐 Structuring ChatGPT responses as JSON Models

When you're creating your tool you probably want a structured response to ensure the output has a known format.

You can achieve that by using Pydantic and ChatGPT's client.responses.parse.

Define your expected response model with Pydantic;
Update to use client.responses.parse
Update instructions to mention what you expect to be parsed
Pass your model as the text_format keyword argument
Get response.output_parsed

import typing as t
from openai import OpenAI
from rich import print
from pydantic import BaseModel

API_KEY: t.Final = "sk-proj-[REDACTED]"
MODEL: t.Final = "gpt-4.1-nano"  # Cheapest: $0.10 Input / $0.40 Output


# 👇 Define the expected output model
class ExpertResponse(BaseModel):
    explanation: str
    example_code: str


class PythonExpert:
    def __init__(self):
        self.client = OpenAI(api_key=API_KEY)

    def ask_the_expert(self, question: str) -> ExpertResponse:
        # 👇 Use `responses.parse`
        response = self.client.responses.parse(
            model=MODEL,
            # 👇 Explain what you're willing to receive
            instructions="You're a Python expert. Answer the question as best you can. Give a brief explanation and provide example code if applicable.",
            input=question,
            # 👇 Pass expected output model
            text_format=ExpertResponse,
        )

        # 👇 Pass the parsed model
        return response.output_parsed


def main():
    expert = PythonExpert()

    question = 'How do I print "Hello, world!" in Python?'
    response = expert.ask_the_expert(question)

    print(f"[yellow]Question:[/yellow] {question}")
    print(f"[yellow]Explanation:[/yellow]\n\t {response.explanation}")
    print(f"[yellow]Example Code[/yellow]:\n\t{response.example_code}")


if __name__ == "__main__":
    main()

And it works as expected:

JSON Output Example

Without it, you’d have to include more instructions to force ChatGPT to reply in the expected format. I did it many times, it's not a pleasant experience.

💰 Cheaper API alternative to OpenAI's ChatGPT

DeepSeek is significantly cheaper than ChatGPT (from my experience a bit slower though):

ChatGPT Model	Pricing/1M Input Tokens	DeepSeek Model	Pricing/1M Input Tokens
`gpt-4o`	$2.50	`deepseek-chat`	$0.27
`o3`	$10.0	`deepseek-reasoner`	$0.55

Still not cheap enough? Okay, you can get this even lower by using it during off-peak hours giving you up to 75% OFF:

DeepSeek Model	Cached	Regular Hours Pricing/1M Tokens	Discount Off-Peak Hours/1M Tokens
`deepseek-chat`	No	$0.27	$0.135
`deepseek-chat`	Yes	$0.07	$0.035
`deepseek-reasoner`	No	$0.55	$0.135
`deepseek-reasoner`	Yes	$0.14	$0.035

🐳 Python DeepSeek API vs ChatGPT API

Alright, let’s use DeepSeek, but... Now I have to install yet another lib and modify my working code?

No. You don't need to install anything else but openai (as we already did).

You can create your DeepSeek API Key here.

Create DeepSeek API Key

You just can't use client.responses.create, you need client.chat.completions.create.

import typing as t
from openai import OpenAI
from rich import print

API_KEY: t.Final = "sk-[REDACTED]"
MODEL: t.Final = "deepseek-chat"


class PythonExpert:
    def __init__(self):
        self.client = OpenAI(
            api_key=API_KEY,
            # 👇 The trick!
            base_url="https://api.deepseek.com/v1",
        )

    def ask_the_expert(self, question: str):
        response = self.client.chat.completions.create(
            model=MODEL,
            messages=[
                # 👇 System message to set the context, analogous to `instructions`
                {
                    "role": "system",
                    "content": "You're a Python expert. Answer the question as best you can.",
                },
                # 👇 Actual request
                {"role": "user", "content": question},
            ],
        )

        return response


def main():
    expert = PythonExpert()

    question = 'How do I print "Hello, world!" in Python?'
    response = expert.ask_the_expert(question)
    # 👇 Not much user friendly
    answer = response.choices[0].message.content

    print(f"[yellow]Question:[/yellow] {question}")
    print(f"\tAnswer: {answer}")


if __name__ == "__main__":
    main()

I hope you enjoyed learning these tricks. If it was useful give me a follow on X to be notified when I post new Python tricks.

Add docstrings to Python Enum members

Guilherme Latrova — Wed, 03 Jul 2024 18:39:16 GMT

Recently I learned that Python Enum members don't "support" docstrings natively which is quite annoying.

I had to manage a list of feature flags and provide a good description for each of them through FastAPI:

from enum import StrEnum, auto

class FeatureFlag(StrEnum):
    """Holds a list of available feature toggles"""
    
    ENABLE_CHAT_GPT = auto()
    """Requested by a few customers to enable Chat GPT within the app"""
    
    DARK_MODE = auto()
    """Fixes your color schema"""

Python Enum member docstring limitation

To my surprise when I extracted each member's __doc__ (which should contain the docstring) I got the FeatureFlag class docstring instead.

print(FeatureFlag.ENABLE_CHAT_GPT.__doc__) # ❌ output: Holds a list of available feature toggles

print(FeatureFlag.DARK_MODE.__doc__) # ❌ output: Holds a list of available feature toggles

This is definitively not what we expect. We want the enum member's docstring.

How to add docstring to enum members?

The best solution I could come up with is to override the __new__ method and require members to take two string values:

from enum import StrEnum, auto
import typing as t


class FeatureFlag(StrEnum):
    """Holds a list of available feature toggles"""
    
    # 👇 Magic method needed to enforce members to take docstrings
    def __new__(cls, value: str, docstr: str) -> t.Self:
        member = str.__new__(cls, value)

        member._value_ = value
        member.__doc__ = docstr.strip()  # 🪄 Magic

        return member


    ENABLE_CHAT_GPT = (auto(), """Requested by a few customers to enable Chat GPT within the app""")
        
    DARK_MODE = (auto(), """Fixes your color schema""")
    

print(FeatureFlag.ENABLE_CHAT_GPT.__doc__) # ✅ output: Requested by a few customers to enable Chat GPT within the app
print(FeatureFlag.DARK_MODE.__doc__) # ✅ output: Fixes your color schema

This might not be the cleanest, but it works and it's simple to maintain.

Now I can iterate over the enum members and take each "docstring":

flags = [dict(key=flag.value, description=flag.__doc__) for flag in FeatureFlags]
print(flags)

# outputs:
# [
#  {'key': 'enable_chat_gpt', 'description': 'Requested by a few customers to enable Chat GPT within the app'}, 
# {'key': 'dark_mode', 'description': 'Fixes your color schema'}
#]

Generic functions and generic classes in Python

Guilherme Latrova — Fri, 01 Mar 2024 23:20:35 GMT

"Generic" is a term used for any typing that might change based on the context.

If we have a function that may take either strings or ints and return the sum or concatenation of both values, without generic we would have to define two distinct functions:

def sum_numbers(v1: int, v2: int) -> int:
    return v1 + v2

def concat_strs(v1: str, v2: str) -> str:
    return v1 + v2

numbers = sum_numbers(10, 20)
strs = concat_strs("app", "le")

print(numbers)
print(strs)

Even though we got the proper typing, this isn't good. My code is mostly duplicated just for the sake of types - that's not really what we want here.

That's where generics can make our life easier. We reuse the code snippet while keeping the dynamic typing based on context.

This is quite popular in Typescript but doesn't seem as popular in Python.

🏷️ Python generic in functions

To make this happen we need to define a typing.TypeVar to be used as the type of each argument and output.

import typing as t


T = t.TypeVar("T") # Defines the TypeVar

# v1, v2 and outcome should all be of type T
def sum(v1: T, v2: T) -> T:
    return v1 + v2

numbers = sum(10, 20)
strs = sum("app", "le")

print(numbers)
print(strs)

Single generic function with proper typing

Now your IDE infers the output based on the argument type:

🤌 Narrow down TypeVar types

This will work with any value that can be "summed". This is problematic though because someone may attempt to pass an invalid type that wouldn't make sense.

Like:

will_fail = sum(Exception("what?"), Exception("lol"))

Even though the IDE will resolve the outcome to another Exception, the execution will raise an exception: TypeError: unsupported operand type(s) for +: 'Exception' and 'Exception'.

To resolve this we can limit allowed types to work properly using TypeVar's bound argument.

T = t.TypeVar("T", bound=str | int | float)

This will ensure your type checker catches if any other type that is non str, int, or float is passed as an argument.

🎩 Using generics in classes

Generic classes also exist, and they allow more complex configurations.

Imagine a data-layer class that reads data from a data source and parses to some Python model for our scenario.

typing.TypeVar is not enough anymore, we also need typing.Generic.

Each data-layer class will be responsible for:

Returning one entity based on id
Listing all entities
Creating an entity

Note we're not implementing this functionality as our purpose is only to understand how to use generics to define complex classes.

import typing as t
from datetime import datetime
from abc import ABC, abstractmethod

T = t.TypeVar("T") # 👈 We still use the TypeVar

# 👇 Now we must also rely on Generic to say the class
# accepts a type
class BaseDatabase(t.Generic[T], ABC):
    @abstractmethod
    def get_by_id(self, id: int) -> T:
        ...

    @abstractmethod
    def list_all(self) -> list[T]:
        ...

    @abstractmethod
    def create(self, entity: T) -> None:
        ...

We can start with the CompanyDatabase class:

class Company:  # Model sample for the company DB
    name: str
    phone: str
    address: str


class CompanyDatabase(BaseDatabase[Company]):
    def get_by_id(self, id: int) -> Company:
        return super().get_by_id(id)

    def list_all(self) -> list[Company]:
        return super().list_all()

    def create(self, entity: Company) -> None:
        return super().create(entity)

Note the IDE is capable of defining the correct typing by itself:

IDE setting correct types from generic

and even if we don't create the methods, it also guesses everything correctly:

class EmployeeDatabase(BaseDatabase[Employee]):
    pass

employee_db = EmployeeDatabase()

found_employee = employee_db.get_by_id(1)
all_employees = employee_db.list_all()

IDE guessing expected types from generic

👎 Overcoming Python generic limitations

Unfortunately, Python is not as strong as TypeScript regarding typing.

Functions, differently from typing, can't use the Generic which means we can't:

# ⚠️⚠️⚠️ Broken code for concept only:
import typing as T

T = t.TypeVar("T")

def get_something(t.Generic[T], v: str) -> T:
    ...

some_str = get_something[str]("str")
some_int = get_something[int]("10")

This won't work.

PEP 0484 suggests we pass the actual type as an argument to allow proper inference.

import typing as t


T = t.TypeVar("T")

# 👇 We must define whether we're receiving a TYPE of T
def get_something(output_type: t.Type[T], v: str) -> T:
    ...

# 👇 Now it works
some_str = get_something(str, "str")
some_int = get_something(int, "10")

The IDE recognizes it correctly:

Overcoming Python's limitation

But I still don't like it 😅 is it just me or it seems a bit hacky?

Using generic classes to behave as functions

I don't know you, but I'd like to keep the syntax we already follow using brackets, like:

x = list[str] # [str]
x = set[int] # [int]

class EmployeeDatabase(BaseDatabase[Employee]): # [Employee]
    ...

# WTH? This feels weird
x = get_something(int, 10)

I must implement something to feel more natural like we do on TypeScript.

Typescript relies consistently on <> e.g. and . I do believe Python should follow the same logic for [].

To make this happen we must define a class and override its __new__ magic method to behave like a function:

import typing as t

T = t.TypeVar("T")

# 👇 Keep class name lowercase so it feels like a function, name it as you want your '''function''' to be named:
class get_something(t.Generic[T]):

    # 👇 This is the secret
    def __new__(
        cls,
        v: str, # Add here as many args you think your function should take
    ):
        generated_instance = super().__new__(cls)
        return generated_instance.execute(v)

    # 👇 Pretend this is your actual function implementation, name it anything you wish
    def execute(self, v: str) -> T: # 👈 Define T as the return type
        ... # Do whatever you want

        return t.cast(T, v) # 👈 Ensure returned type is T

# 💁‍♂️🪄🐰 It just works
some_str = get_something[str]("str")
some_int = get_something[int]("10")

Overcoming Python's limitation again but pretty

It feels better to me.

Note this is not something "new" I'm coming up with. Some standard Python ""functions"" (that are not functions) do the same as defaultdict:

defaultdict lied all the time to you

Follow me for more Python magic.

How to run pytest in parallel on GitHub actions

Guilherme Latrova — Sun, 11 Feb 2024 20:13:14 GMT

In my current company, we write lots of integration tests. Integration tests are great because they encompass the whole functionality of a use case end to end which frequently includes the database.

The downside is that it tends to be slow. As for each test you need to set up the data before, execute, and then clear the database for the next tests.

As you write many tests it's expected the whole suite case will take ~5min to complete or even more. Today it takes around ~12min.

This is not good for Pull Requests as opening any PR would take 12 minutes to allow any developer to merge his work:

Before: Integration tests take 12min to complete

This is too slow to merge a PR. The quickest way to improve our time would be to parallelize the integration tests.

As result we cut down the time back to ~5min which is +50% performance improvement with a small effort:

Then: Integration tests broken into 4 workers take a total of 5min to complete

In this article, I'm going to explain what was done and how you can reproduce it in your environment.

I know CircleCI has a feature and guide dedicated to splitting tests that is even easier to use and set up, but this is not common for other CIs. For GitHub Actions we had to implement something similar ourselves.

🐢 How it was before

We had two parallel jobs already:

Unit testing and linting (~2min)
Integration testing (~13min)

Before: GitHub Actions Workflow

For integration testing we need to use MongoDB and Redis to test our features end to end so we get one container up for each.

We have some Machine Learning and AWS specific tests, so we decided to exclude them from our Pull Request checks and just test them locally.

name: Python lint and test

on:
  pull_request:

permissions:
  contents: read

jobs:
  unit_tests_lint: # 👈 We're not intested as these are fast enough
    runs-on: ubuntu-latest
    steps: ...

  integration_tests: # 👈🎯 Focus here
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v3

      - uses: ./.github/workflows/setup_python # 👈 Installs/Caches poetry

      - run: mv ./.github/.env .env # 👈 Default vars for our tests

        # 👇 Allows our docker compose to be used within GitHub container
      - uses: KengoTODA/actions-setup-docker-compose@main
        with:
          version: "1.29.2"

        # 👇 Now we can just get the services we use up
      - run: docker-compose up -d mongo redis

        # 👇 Executes our integration test suite
      - name: Integration Tests without ML
        run: poetry run pytest src/tests/integration -m 'not (ml or aws_deps)'

🔀 Pytest split tests

We want somehow to split the tests into groups so we can test them in parallel.

Overview: Before vs Goal

Before explicitly touching the GitHub workflow yaml, we need to teach pytest how to split tests.

Luckily pytest supports hooks we can leverage to achieve our goal.

For this, we need pytest_collection_modifyitems which we create inside src/tests/integration/conftest.py.

We need to create the conftest file inside the proper directory otherwise, it might affect all tests (including unit ones which we don't want to parallelize).

Picture something as:

src
└── tests
    ├── unit
    │   ├── test_xxx.py
    │   └── test_yyy.py
    │
    └── integration
        ├── conftest.py # 👈
        ├── test_xxx.py
        └── test_yyy.py

Let's start with this stub, and increment slowly:

import pytest

def pytest_collection_modifyitems(
    session: pytest.Session,
    config: pytest.Config,
    items: list[pytest.Item] # 👈 Contains an ordered list of tests pytest found
) -> None:
    selected = [...] # Decide how to split/pick
    deselected = [...] # Decide how to deselect remaining tests

    config.hook.pytest_deselected(items=deselected) # 👈 Marks as deselected
    items[:] = selected # 👈 Overwrites current selection

We need something robust that achieves three minor objectives:

(Local DevExp) Identify when running locally so we don't split
(Purpose) Smartly select the proper range for each worker
(Maintenance) Be easily extensible to X workers

My approach here was to take environment variables as optional arguments:

Environment variable	Purpose	Example value
GITHUB_WORKER_ID	Holds the current worker id	1
GITHUB_TOTAL_WORKERS	Counts how many workers we have in total	400

Now we can start filling in these values:

import math
import os
import pytest

def pytest_collection_modifyitems(
    session: pytest.Session,
    config: pytest.Config,
    items: list[pytest.Item]
) -> None:
    # 👇 Make these vars optional so locally we don't have to set anything
    current_worker = int(os.getenv("GITHUB_WORKER_ID", 0)) - 1
    total_workers = int(os.getenv("GITHUB_TOTAL_WORKERS", 0))

    # 👇 If there's no workers we can affirm we won't split
    if total_workers:
        # 👇 Decide how many tests per worker
        num_tests = len(items)
        matrix_size = math.ceil(num_tests / total_workers)

        # 👇 Select the test range with start and end
        start = current_worker * matrix_size
        end = (current_worker + 1) * matrix_size

        # 👇 Set how many tests are going to be deselected
        deselected_items = items[:start] + items[end:]
        config.hook.pytest_deselected(items=deselected_items)

        # 👇 Set which tests are going to be handled
        items[:] = items[start:end]
        print(f" Executing {start} - {end} tests")

Now you can run your integration tests locally and... Nothing changed which is our goal.

🐇 Split pytest tests across GitHub workers

Whenever we want to spin multiple workers on GitHub we use matrix and pass any values we want. It designs a custom worker for each matrix value.

This is commonly used to run tests in different OS or python versions. Take this example from Gracy.

We run the same suite for Linux only (1) across 4 Python versions. So 1 * 4 = 4 parallel workers running tests.

For our case, we just want to assign worker ids (bare ints) which is fine to do as:

  integration_tests:
  runs-on: ubuntu-latest
+ strategy:
+     matrix:
+     worker_id: [1, 2, 3, 4]

  steps:
      - uses: actions/checkout@v3

      - uses: ./.github/workflows/setup_python

      - run: mv ./.github/.env .env

      - uses: KengoTODA/actions-setup-docker-compose@main
        with:
          version: "1.29.2"

      - run: docker-compose up -d mongo redis

+     # 👇 We need to set up the env vars from values taken from the matrix
+     - name: Set up worker env vars
+       run: |
+         echo "GITHUB_WORKER_ID=${{matrix.worker_id}}" >> $GITHUB_ENV
+         echo "GITHUB_TOTAL_WORKERS=4" >> $GITHUB_ENV

+     # 👇 Rename to something clearer
-     - name: Integration Tests without ML
+     - name: Integration Tests without ML - Worker. ${{matrix.worker_id}}
        run: poetry run pytest src/tests/integration -m 'not (ml or aws_deps)'

Note for this case I explicitly defined GITHUB_TOTAL_WORKERS to 4, so our snippet will count all the tests we have and split them evenly for each worker.

GitHub Workflow split across 4 workers

Using this example, each dev that wants to merge a PR will have to wait for ~4m in contrast to ~13min before.

There's still space for further improvement though:

Worker	Time Spent
3	2m 11s
2	2m 42s
4	3m 55s
1	4m 31s

Worker 3 finished early while Worker 1 kept running for another ~2min.

This means that if we group and better split slow tests for each worker to each worker we can probably get all done in around ~3m 30s.

This feels like an idea for another blog post though.

If you learned something new today consider giving me a follow on X.

Effective Python Async like a PRO 🐍🔀

Guilherme Latrova — Thu, 15 Dec 2022 14:23:47 GMT

I noticed some people using the async syntax without knowing what they were doing.

First, they think async is parallel which is not true as I explain in another article.

Then they write code that doesn't take any advantage of Python async. In other words, they write sync code with async syntax.

The goal of this post is to point out these performance issues and help you benefit the most from async code.

🤔 When to use Python Async

Async only makes sense if you're doing IO.

There's ZERO benefit in using async to stuff like this that is CPU-bound:

import asyncio


async def sum_two_numbers_async(n1: int, n2: int) -> int:
    return n1 + n2


async def main():
    await sum_two_numbers_async(2, 2)
    await sum_two_numbers_async(4, 4)


asyncio.run(main())

Your code might even get slower by doing that due to the Event Loop.

That's because Python async only optimizes IDLE time!

Sync vs Async vs Parallel pic.twitter.com/hZEXfkKmU0
— Gui Latrova (@guilatrova) December 13, 2022

If these concepts are new to you, read this article first:

Async python in real life 🐍🔀

Await Async Python applied with real examples. I show a slow API server and a slow database, and explain why async is not parallel but concurrent....

Gui CommitsGuilherme Latrova

IO-bound operations are related to reading/writing operations.

A good example would be:

Requesting some data from HTTP
Reading/Writing some json/txt file
Reading data from a database

👆 All these operations consist of waiting for the data to be available.

While the data is UNAVAILABLE the EVENT LOOP does something else.

This is Concurrency.

NOT ~~Parallelism~~.

🖼️ Python Async Await Example

Let's set up a scenario to get started.

We need to build a simple Pokedex that queries for 3 pokemons simultaneously (so we benefit from async).

After querying the pokemons we're going to build an object with them, so:

Step	Operation type
Query pokeapi.co	IO-bound
Build an object holding the data	CPU-bound

I'll be using pydantic for model parsing and httpx for HTTP as its syntax is compatible with requests.

🐌 Use Python `async` and `await`

Let's start with the basic scenario that everybody writes and proudly says: "the code is async".

Take your time to visualize:

The model class
The parse_pokemon function (CPU-bound)
The get_pokemon function (IO-bound)
The get_all function

import asyncio
from datetime import timedelta
import time
import httpx
from pydantic import BaseModel

class Pokemon(BaseModel): # 👈 Defines model to parse pokemon
    name: str
    types: list[str]

def parse_pokemon(pokemon_data: dict) -> Pokemon: # 👈 CPU-bound operation
    print("🔄 Parsing pokemon")

    poke_types = []
    for poke_type in pokemon_data["types"]:
        poke_types.append(poke_type["type"]["name"])

    return Pokemon(name=pokemon_data['name'], types=poke_types)

async def get_pokemon(name: str) -> dict | None: # 👈 IO-bound operation
    async with httpx.AsyncClient() as client:
        print(f"🔍 Querying for '{name}'")
        resp = await client.get(f"https://pokeapi.co/api/v2/pokemon/{name}")
        print(f"🙌 Got data for '{name}'")

        try:
            resp.raise_for_status()

        except httpx.HTTPStatusError as err:
            if err.response.status_code == 404:
                return None

            raise

        else:
            return resp.json()

async def get_all(*names: str): # 👈 Async
    started_at = time.time()

    for name in names: # 👈 Iterates over all names
        if data := await get_pokemon(name): # 👈 Invokes async function
            pokemon = parse_pokemon(data)
            print(f"💁 {pokemon.name} is of type(s) {','.join(pokemon.types)}")
        else:
            print(f"❌ No data found for '{name}'")

    finished_at = time.time()
    elapsed_time = finished_at - started_at
    print(f"⏲️ Done in {timedelta(seconds=elapsed_time)}")


POKE_NAMES = ["blaziken", "pikachu", "lugia", "bad_name"]
asyncio.run(get_all(*POKE_NAMES))

This produces the following output:

🔍 Querying for 'blaziken'
🙌 Got data for 'blaziken'
🔄 Parsing pokemon
💁 blaziken is of type(s) fire,fighting

🔍 Querying for 'pikachu'
🙌 Got data for 'pikachu'
🔄 Parsing pokemon
💁 pikachu is of type(s) electric

🔍 Querying for 'lugia'
🙌 Got data for 'lugia'
🔄 Parsing pokemon
💁 lugia is of type(s) psychic,flying

🔍 Querying for 'bad_name'
🙌 Got data for 'bad_name'
❌ No data found for 'bad_name'

⏲️ Done in 0:00:02.152331

This is bad usage for this scenario.

If you analyze the output you'll understand that:

We're requesting one HTTP resource at a time thus it doesn't matter if we use async or not.

Let's fix that! 🧑‍🏭

Use Python `asyncio.create_task` and `asyncio.gather`

If you want 2 or more functions to run concurrently, you need asyncio.create_task.

Creating a task triggers the async operation, and it needs to be awaited at some point.

For example:

task = create_task(my_async_function('arg1'))
result = await task

As we're creating many tasks, we need asyncio.gather which awaits all tasks to be done.

This is our code now (check the get_all function):

import asyncio
from datetime import timedelta
import time
import httpx
from pydantic import BaseModel

class Pokemon(BaseModel):
    name: str
    types: list[str]

def parse_pokemon(pokemon_data: dict) -> Pokemon:
    print("🔄 Parsing pokemon")

    poke_types = []
    for poke_type in pokemon_data["types"]:
        poke_types.append(poke_type["type"]["name"])

    return Pokemon(name=pokemon_data['name'], types=poke_types)

async def get_pokemon(name: str) -> dict | None:
    async with httpx.AsyncClient() as client:
        print(f"🔍 Querying for '{name}'")
        resp = await client.get(f"https://pokeapi.co/api/v2/pokemon/{name}")
        print(f"🙌 Got data for '{name}'")

        try:
            resp.raise_for_status()

        except httpx.HTTPStatusError as err:
            if err.response.status_code == 404:
                return None

            raise

        else:
            return resp.json()

async def get_all(*names: str):
    started_at = time.time()

    # 👇 Create tasks, so we start requesting all of them concurrently
    tasks = [asyncio.create_task(get_pokemon(name)) for name in names]

    # 👇 Await ALL
    results = await asyncio.gather(*tasks)

    for result in results:
        if result:
            pokemon = parse_pokemon(result)
            print(f"💁 {pokemon.name} is of type(s) {','.join(pokemon.types)}")
        else:
            print(f"❌ No data found for...")

    finished_at = time.time()
    elapsed_time = finished_at - started_at
    print(f"⏲️ Done in {timedelta(seconds=elapsed_time)}")


POKE_NAMES = ["blaziken", "pikachu", "lugia", "bad_name"]
asyncio.run(get_all(*POKE_NAMES))

And this is the output:

🔍 Querying for 'blaziken'
🔍 Querying for 'pikachu'
🔍 Querying for 'lugia'
🔍 Querying for 'bad_name'

🙌 Got data for 'lugia'
🙌 Got data for 'blaziken'
🙌 Got data for 'pikachu'
🙌 Got data for 'bad_name'

🔄 Parsing pokemon
💁 blaziken is of type(s) fire,fighting
🔄 Parsing pokemon
💁 pikachu is of type(s) electric
🔄 Parsing pokemon
💁 lugia is of type(s) psychic,flying
❌ No data found for...

⏲️ Done in 0:00:00.495780

We dropped from ~2s to 500ms just by using Python async correctly.

Note how:

We query everything right away in the order passed (e.g. blaziken first)
We retrieve the data in a random order as they become available (e.g. Now lugia comes first)
We parse the data in sequence (it's CPU-bound anyway)

Use Python `asyncio.as_completed`

There will be moments when you don't have to await for every single task to be processed right away.

That's similar to our scenario, we can start parsing the data right after the first data becomes available.

We do this by using asyncio.as_completed which returns a generator with completed coroutines:

import asyncio
from datetime import timedelta
import time
import httpx
from pydantic import BaseModel

class Pokemon(BaseModel):
    name: str
    types: list[str]

def parse_pokemon(pokemon_data: dict) -> Pokemon:
    print(f"🔄 Parsing pokemon '{pokemon_data['name']}'")

    poke_types = []
    for poke_type in pokemon_data["types"]:
        poke_types.append(poke_type["type"]["name"])

    return Pokemon(name=pokemon_data['name'], types=poke_types)

async def get_pokemon(name: str) -> dict | None:
    async with httpx.AsyncClient() as client:
        print(f"🔍 Querying for '{name}'")
        resp = await client.get(f"https://pokeapi.co/api/v2/pokemon/{name}")
        print(f"🙌 Got data for '{name}'")

        try:
            resp.raise_for_status()

        except httpx.HTTPStatusError as err:
            if err.response.status_code == 404:
                return None

            raise

        else:
            return resp.json()

async def get_all(*names: str):
    started_at = time.time()

    tasks = [asyncio.create_task(get_pokemon(name)) for name in names]

    # 👇 Process the tasks individually as they become available
    for coro in asyncio.as_completed(tasks):
        result = await coro # 👈 You still need to await

        if result:
            pokemon = parse_pokemon(result)
            print(f"💁 {pokemon.name} is of type(s) {','.join(pokemon.types)}")
        else:
            print(f"❌ No data found for...")

    finished_at = time.time()
    elapsed_time = finished_at - started_at
    print(f"⏲️ Done in {timedelta(seconds=elapsed_time)}")


POKE_NAMES = ["blaziken", "pikachu", "lugia", "bad_name"]
asyncio.run(get_all(*POKE_NAMES))

The benefit is not easily visible:

🔍 Querying for 'blaziken'
🔍 Querying for 'pikachu'
🔍 Querying for 'lugia'
🔍 Querying for 'bad_name'

🙌 Got data for 'blaziken'
🔄 Parsing pokemon 'blaziken'
💁 blaziken is of type(s) fire,fighting
🙌 Got data for 'bad_name'
🙌 Got data for 'lugia'
🙌 Got data for 'pikachu'
❌ No data found for...
🔄 Parsing pokemon 'lugia'
💁 lugia is of type(s) psychic,flying
🔄 Parsing pokemon 'pikachu'
💁 pikachu is of type(s) electric

⏲️ Done in 0:00:00.316266

We still query everything at once (which is good).

Note how the order is completely mixed up though.

It means that Python processed the data as soon as it got available, giving enough time for other requests to finish later.

You'll become a better developer if you understand when/why to use async, await, create_task, gather, and as_completed.

This is part of the book I'm currently writing. If you want to stop writing 'OK-code' that works and start writing 'GREAT-code', you should consider getting your copy before I finish writing it (I'll increase the price once it's released).

Get your copy here:

Python Like a PRO 🐍📚 Book

⚠️📚 This book is still under development (that’s why it’s so cheap right now, the price will increase once all chapters are published).You need to know what the hell you’re doing 🔥🐍Python is one of the most flexible languages I have had contact with.Everything too flexible enhances the odds of ba…

Gumroad

🔀 Real-life scenario using Async IO

I'm currently working for another SF startup: Silk Security and we rely a lot on third-party integrations and their APIs.

We query a lot of data, and we need to do it as fast as possible.

For example, we query Snyk's API to collect code vulnerabilities.

Snyk's data is composed of Organizations that contain many Projects that contain many Issues.

It means that we need to list all projects and organizations before getting any issues.

So picture it as:

Snyk Flow Overview

Note how many queries we need to do! We do them concurrently.

We need to be careful with rate limiting issues that the API may throw. To resolve that we limit the number of queries we do in a single shot, and we start running some processing before querying for more data.

This allows us to gain time and don't hit any rate limits imposed by the API while.

See a redacted code snippet from a real project running in production:

def _iter_grouped(self, issues: list[ResultType], group_count: int):
    group_count = min([len(issues), group_count])

    return zip(*[iter(issues)] * group_count)


async def get_issue_details(self):
    ...

    # NOTE: We need to be careful here, we can't create tasks for every issue or Snyk will raise 449
    # Instead, let's do it in chunks, and let's yield as it's done, so we can spend some time processing it
    # and we can query Snyk again.
    chunk_count = 4 # 👈 Limit to 4 queries at a time
    coro: Awaitable[tuple[ResultType | None]]
    for issues in self._iter_grouped(issues, chunk_count):
        tasks = [asyncio.create_task(self._get_data(project, issue)) for issue in issues]

        for coro in asyncio.as_completed(tasks):
            issue, details = await coro

            yield issue, details

We can represent it as:

Generators + asyncio.as_completed flow

If you learned something new today consider giving me a follow on Twitter. I frequently share Python content and cool projects. DMs are open to any feedback.

Pyrun: Execute Python inside your Twitter, Facebook, Linkedin

Guilherme Latrova — Thu, 01 Sep 2022 22:04:10 GMT

Pyrun

Extracts data from tweets and runs Python code inside Twitter

Chrome Web Store

Stack: Typescript, React, Browser Extension, Pyodide

Running Python inside Twitter with a single click

👨🏻‍💻 Pyrun integrates a small IDE window in your feed, so you can execute Python code with a single button!

🤔 Why execute Python inside Twitter?

At this point probably you might know I'm a huge fan of making learning interactive and engaging.

That's how my brain works. I see then I try then I understand.

I believe that shallow consumption without action prevents us from learning. Reading without practice is not good enough.

Here's where this extension comes in, I frequently write tweets that teach some Python:

If you have a list and want to take any random value from it in Python 🐍

👉 Use choice

Easy! #pyrun pic.twitter.com/LvlEWD9rxe
— Gui Latrova (@guilatrova) July 25, 2022

👤 Who's this for?

It's for content creators who believe talk is cheap and want to improve their engagement with their audience.

It's for the avid learners who want to execute, edit and feel the code!

Pyrun Working Demo

✍️ I'm a creator, how can I produce Python tweets that my audience can execute?

💁‍♂️ Easy:

Post an image from carbon, snappify, or anything else as you normally would
Put the raw code as your image's ALT keeping all white spaces, comments, etc.
When posting make sure to include the #pyrun tag anywhere in your tweet

That's it. People who don't use the extension can still consume your content as before (static boring images), and extension users can now EXECUTE IT!

🧐 How to run Python inside other pages?

The proposal is simple:

Tweets and Posts teaching Python can be executed with a single click. No installs. No manual typing.

This idea struck me randomly during a night stay in São Paulo at 4 am.

What do you do when you have a stupid idea that you don't even know if it works?

You spend the next hours trying to implement it of course!

This is a test for a project I'm currently building at 4 am (Yes).

Please, ignore it and keep scrolling. 😁

Thank you! #pyrun pic.twitter.com/oHPXZLlzuB
— Gui Latrova (@guilatrova) July 15, 2022

👓 Reading the Python Code

The first challenge was: How to read the Python code?

I thought it would be easy to use OCR to read images with monospace fonts. Unfortunately, it didn't work.

The OCR tools I used had to be in Javascript since I'm injecting code into the browser. Such tools aren't built to "read code", but actual words and sentences.

So it's extremely tricky for these tools to recognize: def func: because it doesn't make much sense when read by a human.

Let's not even mention spaces, tabs, and line breaks.

The best I could do is extract code from the ALT which preserves the original content and all characters.

Tweet's ALT Example

Next I had to:

Find relevant tweets
Read image alt's

to be able to capture it.

🕵️ Getting relevant tweets

Finding the correct tweet seemed hard at first. How can I know the content (1) is about Python, (2) has code, and that (3) the image's alt is properly set?

The best approach I could do is to enforce tweets to have some specific tag #pyrun. This is both a technical limitation and a feature.

Now your audience can filter every tweet that can be executed!

I wrote an xpath query to look for:

const TARGET_TWEET_TAG = "#pyrun";
const xpath = `//a[text()='${TARGET_TWEET_TAG}']`;

I also added some simple styling to make it stand out and appended an execute button right after it.

Tweet standing out with #pyrun tag

Then it worked and it was awesome!

"I nailed it" - I thought...

Until I realized that Twitter loads tweets as you scroll. So I need to keep "listening" for new tweets to inject my code into.

Finding new tweets as you scroll

Once I figured it out it was easy.

You can notice that as you scroll some logs are being emitted counting how many tweets it finds.

It works the same for other tools:

Pyrun running on Facebook and Linkedin example: pic.twitter.com/MomugKMoPq
— Gui Latrova (@guilatrova) September 15, 2022

🐍 Run Pyodide inside a Chrome Extension

This issue took me 2 months due to my lack of experience building Chrome Extensions 🙈 (Hey, better late than never).

I learned that I can't just follow Pyodide's tutorial to get it to work. It fails miserably. Installing the npm package didn't work either.

Then I decided to find who else did something similar before and I stumbled on Swindle.

Swindle is an open-source extension that allows you to run Python (powered by Pyodide) inside the DevTools.

I noticed it has its own "Pyodide bootstrap flow" and I realized I would probably have to do the same...

It was not a pleasant experience. I opened the original Pyodide's file on Github and replicated every line, recompiled, repackaged, and retested it.

It's surprising that it took me only 2 months 😅.

Then I learned that the recent Chrome manifest v3 forbids the usage of Javascript's eval inside extensions and guess what? I also found out that Pyodide uses Javascript's eval.

😮‍💨 That was a nightmare. It cost me so much to get it working and then this...

"i hate programming" meme based upon @Nasser_Junior comic. pic.twitter.com/KrpAmHL644
— nixCraft (@nixcraft) November 29, 2020

Looks like the manifest v3 requires you to run eval code inside an iframed sandbox.

That's how you can imagine it:

Main window and sandbox communicating

Effectively, that's what you can find in your DOM if you install the extension and inspect the page:

IDE and Sandbox in DOM

The

holds a React rendered IDE while the is an invisible element responsible only for listening to posted messages, executing the code, and emitting outputs back to the main window.

🧑‍💻 Editor area and Output console

It was simple to set up an editor. I just had to install react-ace.

I didn't have the same luck with the console though.

I tried many but none of them was simple enough for my needs: a styled output that I can jot many lines.

I ended up building a simple div that keeps adding lines as

inside. It works fine:

Output example

🧑‍💻 What about the future?

Content creators are everywhere and the Social media tools share some "common features".
They allow posts with tags and images and images with ALT.
This is enough to get this extension up and running anywhere.

Take LinkedIn as an example:

What if I could do the same I did for Twitter inside Linkedin? 🐍

Both posts with images +ALT and tags 🤔

🤷‍♂️ The hardest work is already implemented: An extension holding an IDE that runs Python

🥷 I found out I can "hack" the HTML to allow higher ALT length. (and it works!) pic.twitter.com/lWRZbr4Bco
— Gui Latrova (@guilatrova) September 3, 2022

I imagine supporting more Social Media tools depending on the audience feedback.

Today this extension supports just Python, but it should be quite easy to support Javascript since it requires no setup/preparation!

Make sure to follow me on Twitter to know if any of these will ever happen.

🌟 Tips

I never built a chrome extension before. I had many challenges and wanted to develop one fast.

I abused on many different open source projects to learn how they solve similar problems.

I'm going to list some of them for reference (and also to say thanks to the maintainers!):

Manifest v3 + Sandboxes

I learned more about web3 and sandboxes by looking at how this project worked

GitHub - jorgenbuilder/chrome-dfinity-decoder: Decode responses from the Dfinity blockchain in chrome devtools

Decode responses from the Dfinity blockchain in chrome devtools - GitHub - jorgenbuilder/chrome-dfinity-decoder: Decode responses from the Dfinity blockchain in chrome devtools

GitHubjorgenbuilder

Even though I couldn't take any meaningful piece of code, both projects helped me understand that I had to define a custom bootstrapping for Pyodide.

GitHub - grimmer0125/embedded-pydicom-react-viewer: Medical DICOM file P10 Viewer/Chrome Extension + Python Code In Browser (-Pyodide-> WebAssembly) + Pydicom parser + TypeScript React App (CRA). Use d4c-queue npm lib.

Medical DICOM file P10 Viewer/Chrome Extension + Python Code In Browser (-Pyodide-> WebAssembly) + Pydicom parser + TypeScript React App (CRA). Use d4c-queue npm lib. - GitHub - grimmer0125/embe...

GitHubgrimmer0125

GitHub - Mario2334/swindle

Contribute to Mario2334/swindle development by creating an account on GitHub.

GitHubMario2334

GitHub - alexmojaki/futurecoder: 100% free and interactive Python course for beginners

100% free and interactive Python course for beginners - GitHub - alexmojaki/futurecoder: 100% free and interactive Python course for beginners

GitHubalexmojaki

Python Match Case is more powerful than you think 🐍🕹️

Guilherme Latrova — Mon, 22 Aug 2022 11:05:31 GMT

Python 3.10 brought the match case syntax which is similar to the switch case from other languages.

It's just similar though. Python's match case is WAY MORE POWERFUL than the switch case because it's a Structural Pattern Matching.

You don't know what I mean? I'm going to show you what it can do with examples!

Note that if you're reading this article in AMP mode or from mobile you won't be able to run Python code from your browser, but you can still see the code samples.

Example of executing Python interactively

Match Case is similar to a Switch Case

It's still possible to use match case as a common switch case:

from http import HTTPStatus
import random

http_status = random.choice(
    [
        HTTPStatus.OK,
        HTTPStatus.BAD_REQUEST,
        HTTPStatus.INTERNAL_SERVER_ERROR,
    ]
)

# 👇 Simplest example, can be easily replaced by a dictionary
match http_status:
    case HTTPStatus.OK: # 👈 "case" + "value" syntax
        print("Everything is good!")

    case HTTPStatus.BAD_REQUEST:
        print("You did something wrong!")

    case HTTPStatus.INTERNAL_SERVER_ERROR:
        print("Oops... Is the server down!?.")

    case _: # 👈 Default syntax
        print("Invalid or unknown status.")

Boring, right? It can be easily replaced by a dictionary with fewer lines, see:

from http import HTTPStatus
import random

http_status = random.choice(
    [
        HTTPStatus.OK,
        HTTPStatus.BAD_REQUEST,
        HTTPStatus.INTERNAL_SERVER_ERROR,
    ]
)

dictmap = {
    HTTPStatus.OK: "Everything is good!",
    HTTPStatus.BAD_REQUEST: "You did something wrong!",
    HTTPStatus.INTERNAL_SERVER_ERROR: "Oops... Is the server down!?.",
}

message = dictmap.get(http_status, "Invalid or unknown status.")
print(message)

Match Case matching many different values

As I mentioned initially, the match case goes beyond a regular switch case.

Let's match specific status codes with the or statement by using |:

from http import HTTPStatus
import random

http_status = random.choice(list(HTTPStatus))

match http_status:
    case 200 | 201 | 204 as status:
        # 👆 Using "as status" extracts its value
        print(f"Everything is good! {status = }") # 👈 Now status can be used inside handler

    case 400 | 404 as status:
        print(f"You did something wrong! {status = }")

    case 500 as status:
        print(f"Oops... Is the server down!? {status = }")

    case _ as status:
        print(f"No clue what to do with {status = }!")

Note we used as status to extract the value into a variable that can be used inside the handler.

Match Case with conditionals (guards)

Not exciting yet? Ok, let's improve it a little.

You can see we are missing many status codes in the previous example.

What if we want to match ranges as:

<200,
200-399,
400-499, and
>=500?

We can use guards for that:

from http import HTTPStatus
import random

http_status = random.choice(list(HTTPStatus))

match http_status:
    # 💁‍♂️ Note we don't match a specific value as we use "_" (underscore)
    # 👇✅ Match any value, as long as status is between 200-399
    case _ as status if status >= HTTPStatus.OK and status < HTTPStatus.BAD_REQUEST:
        print(f"✅ Everything is good! {status = }")
        # 👆📤 We took 'status' by using the 'as status' syntax

    # 👇❌ Match any value, as long as status is between 400-499
    case _ as status if status >= HTTPStatus.BAD_REQUEST and status < HTTPStatus.INTERNAL_SERVER_ERROR:
        print(f"❌ You did something wrong! {status = }")

    # 👇💣 Match any value, as long as status is >=500
    case _ as status if status >= HTTPStatus.INTERNAL_SERVER_ERROR:
        print(f"💣 Oops... Is the server down!? {status = }.")

    # 👇❓ Match any value that we didn't catch before (<200)
    case _ as status:
        print(f"❓ No clue what to do with {status = }!")

👆 Note we didn't use any specific value inside our case statements.

We used _ (underscore) to match all because we wanted to check ranges instead of specific values.

We call "guards" when we validate the matched pattern using an if as we did above.

Match Case lists value, position, and length

You can match lists based on values at a specific position and even length!

See some examples below where we match:

Any list with 3 items by using and extracting these items as vars
Any list with more than 3 items by using *_
Any list starting with a specific value + possible combinations
Any list starting with a specific value

baskets = [
    ["apple", "pear", "banana"], # 🍎 🍐 🍌
    ["chocolate", "strawberry"], # 🍫 🍓
    ["chocolate", "banana"], # 🍫 🍌
    ["chocolate", "pineapple"], # 🍫 🍍
    ["apple", "pear", "banana", "chocolate"], # 🍎 🍐 🍌 🍫
]

def resolve_basket(basket: list):
    match basket:

        # 👇 Matches any 3 items
        case [i1, i2, i3]: # 👈 These are extracted as vars and used here 👇
            print(f"Wow, your basket is full with: '{i1}', '{i2}' and '{i3}'")

        # 👇 Matches >= 4 items
        case [_, _, _, *_] as basket_items:
            print(f"Wow, your basket has so many items: {len(basket_items)}")

        # 👇 2 items. First should be 🍫, second should be 🍓 or 🍌
        case ["chocolate", "strawberry" | "banana"]:
            print("This is a superb combination. 🍫 + 🍓|🍌")

        # 👇 2 items. First should be 🍫, second should be 🍍
        case ["chocolate", "pineapple"]:
            print("Eww, really? 🍫 + 🍍 = ?")

        # 👇 Any amount of items starting with 🍫
        case ["chocolate", *_]:
            print("I don't know what you plan but it looks delicious. 🍫")

        # 👇 If nothing matched before
        case _:
            print("Don't be cheap, buy something else")


for basket in baskets:
    print(f"📥 {basket}")
    resolve_basket(basket)
    print()

Match Case dicts

We can do a lot with dicts!

Let's see many examples with dicts holding str keys and either int or str as their values.

We can match existing keys, value types, and dict length.

mappings: list[dict[str, str | int]] = [
    {"name": "Gui Latrova", "twitter_handle": "@guilatrova"},
    {"name": "John Doe"},
    {"name": "JOHN DOE"},
    {"name": 123456},
    {"full_name": "Peter Parker"},
    {"full_name": "Peter Parker", "age": 16}
]

def resolve_mapping(mapping: dict[str|int]):
    match mapping:
        # 👇 Matches any
        #    (1) "name" AND any (2) "twitter_handle"
        case {"name": name, "twitter_handle": handle}:
            print(f"😉 Make sure to follow {name} at {handle} to keep learning") # 😉 This is good advice

        # 👇 Matches any
        #    (1) "name" (2) if val is str and (3) it's all UPPER CASED
        case {"name": str() as name} if name == name.upper():
            print(f"😥 Hey, there's no need to shout, {name}!")

        # 👇 Matches any
        #    (1) "name" (2) if val is str. It will fall here whenever the above 👆 doesn't match
        case {"name": str() as name}:
            print(f"👋 Hi {name}!")

        # 👇 Matches any
        #    (1) "name" (2) if val is int.
        case {"name": int()}:
            print("🤖 Are you a robot or what? How can I say your name? ")

        # 👇 Matches any
        #    (1) "full_name" (2) and NOTHING else
        case {"full_name": full_name, **remainder} if not remainder:
            print(f"Thanks mr/ms {full_name}!")

        # 👇 Matches any
        #    (1) "full_name" (2) and ANYTHING else
        case {"full_name": full_name, **remainder}:
            print(f"Just your full name is fine! No need to share {list(remainder.keys())}")


for mapping in mappings:
    print(f"📥 {mapping}")
    resolve_mapping(mapping)
    print()

Match Case classes instances and props

The first time I saw:

class Example:
    ...

var = Example()

match var:
    case Example(): # 👈 This syntax is a bit weird
        ...

I thought we could be instantiating the class 😅 which is wrong.

This syntax means: "Instance of type Example with any props."

Above you probably saw we doing that for int() and str(). The logic is the same.

Check a few examples:

Matching a class instance with the property name equals End
Matching any instance based on the type
Matching instances with specific properties set to 0
Extracting class properties to be used inside the handler

from dataclasses import dataclass

@dataclass
class Move:
    x: int # horizontal
    y: int # vertical

@dataclass
class Action:
    name: str

@dataclass
class UnknownStep:
    random_value = "Darth Vader riding a monocycle"


steps = [
    Move(1, 0),
    Move(2, 5),
    Move(0, 5),
    Action("Work"),
    Move(0, 0),
    Action("Rest"),
    Move(0, 0),
    UnknownStep(),
    Action("End"),
]


def resolve_step(step):
    match step:
        # 👇 Match any action that has name = "End"
        case Action(name="End"): # 👈 Note we're not instantiating anything
            print("🔚 Flow finished")

        # 👇 Match any Action type
        case Action():
            print("👏 Good to see you're doing something")

        # 👇 Match any Move with x,y == 0,0
        case Move(0, 0):
            print("💂 You're not really moving, stop pretending")

        # 👇 Match any Move with y = 0
        case Move(x, 0):
            print(f"➡️ You're moving horizontally to {x}")

        # 👇 Match any Move with x = 0
        case Move(0, y):
            print(f"🔝 You're moving vertically to {y}")

        # 👇 Match any Move type
        case Move(x, y):
            print(f"🗺️ You're moving to ({x}, {y})")

        # 👇 When nothing matches
        case _:
            print(f"❓ I've got not idea what you're doing")


for step in steps:
    print(f"📥 {step}")
    resolve_step(step)
    print()

Keep Learning with me

🐍 Python Match Case got released a few months ago but is still not well comprehended.

Too many people think this is a regular switch case.

Guess what? It's not! 🙅🏻

This is DAY ONE of a series of tweets on how powerful Python Match is to be executed with #pyrun pic.twitter.com/VJJsapkL3W
— Gui Latrova (@guilatrova) September 7, 2022

Organize Python code like a PRO 🐍📦

Guilherme Latrova — Thu, 07 Jul 2022 09:36:38 GMT

For every minute spent in organizing, an hour is earned.
by Benjamin Franklin

Python is different from languages like C# or Java where they enforce you to have classes named after the file they live in.

So far Python is one of the most flexible languages I had contact with and everything too flexible enhances the odds of bad decisions.

Do you want to keep all project classes in a single main.py file? Yes, it works.
Do you need to read an os environment var? Just read it right there.
Do you need to modify a function behavior? Why not a decorator!?

Many decisions that are easy to implement may backfire producing code that is extremely hard to maintain.

This is not necessarily bad if you know what you're doing.

During this chapter, I'm going to present to you guidelines that worked for me over the past working in different companies and with many different people.

🌳 Structure your Python project

Let's focus first on directory structure, file naming, and module organization.

I recommend you to keep all your module files inside a src dir, and all tests living side by side with it:

Top-Level project


├── src
│   ├── /*
│   │    ├── __init__.py
│   │    └── many_files.py
│   │
│   └── tests/*
│        └── many_tests.py
│
├── .gitignore
├── pyproject.toml
└── README.md

Where is your main module. If in doubt, consider what people would pip install and how you would like to import module.

Frequently it has the same name as the top project. This isn't a rule though.

🎯 The reasoning behind a `src` directory

I've seen many projects doing differently.

Some variations include no src dir with all project modules around the tree.

This is quite annoying because of the lack of order, producing things like (example):

non_recommended_project
├── /*
│     ├── __init__.py
│     └── many_files.py
│
├── .gitignore
│
├── tests/*
│    └── many_tests.py
│
├── pyproject.toml
│
├── /*
│     ├── __init__.py
│     └── many_files.py
│
└── README.md

It's boring to have things so apart due to the alphabetical sorting of the IDE.

The main reason behind the src dir is to keep active project code concentrated inside a single directory while settings, CI/CD setup, and project metadata can reside outside of it.

The only drawback of doing it is that you can't import module_a in your python code out of the box. We need to set up the project to be installed under this repository. We're going to discuss how to solve this soon in this chapter.

🏷️ How to name files

Rule 1: There are no files

First of all, in Python there are no such things as "files" and I noticed this is the main source of confusion for beginners.

If you're inside a directory that contains any __init__.py it's a directory composed of modules, not files.

See each module as a namespace.

I mean namespace because you can't say for sure whether they have many functions, classes, or just constants. It can have virtually all of them or just a bunch of some.

Rule 2: Keep things together as needed

It’s fine to have several classes within a single module, and you should do so. (when classes are related to the module, obviously.)

Only break it down when your module gets too big, or when it handles different concerns.

Often, people think it’s a bad practice due to some experience with other languages that enforce the other way around (e.g. Java and C#).

Rule 3: By default give plural names

As a rule of thumb, name your modules in the plural and name them after a business context.

There're exceptions to this rule though! Modules can be named core, main.py, and similar to represent a single thing. Use your judgment, if in doubt stick to the plural rule.

🔎 Real-life example when naming modules

I'll share a Google Maps Crawler project that I built as an example.

This project is responsible for crawling data from Google Maps using Selenium and outputting it (Read more here if curious).

This is the current project tree outlining exceptions to the #3 rule:

gmaps_crawler
├── src
│   └── gmaps_crawler
│        ├── __init__.py
│        ├── config.py 👈 (Singular)
│        ├── drivers.py
│        ├── entities.py
│        ├── exceptions.py
│        ├── facades.py
│        ├── main.py  👈 (Singular)
│        └── storages.py
│
├── .gitignore
├── pyproject.toml
└── README.md

It seems very natural to import classes and functions like:

from gmaps_crawler.storages import get_storage
from gmaps_crawler.entities import Place
from gmaps_crawler.exceptions import CantEmitPlace

I can understand that I might have one or many exception classes inside exceptions and so on.

The beauty about having plural modules is that:

They're not too small (e.g. one per class)
You can at any moment break it down into smaller modules if required
They give you a strong sense of knowing what might exist inside

🔖 Naming classes, functions, and variables

Some people claim naming things is hard. It gets less hard when you define some guidelines.

👊 Functions and Methods should be verbs

Functions and methods represent an action or actionable stuff.

Something "isn't". Something is "happening".

Actions are clearly stated by verbs.

A few good examples from REAL projects I worked on before:

def get_orders():
    ...

def acknowledge_event():
    ...

def get_delivery_information():
    ...

def publish():
    ...

A few bad examples:

def email_send():
    ...

def api_call():
   ...

def specific_stuff():
   ...

They're a bit unclear whether they return an object to allow me to perform the API call or if it actually sends the email for example.

I can picture a scenario like this:

email_send.title = "title"
email_send.dispatch()

Example of a misleading function name

Exceptions to this rule are just a few but they exist.

Creating a main() function to be invoked in the main entry point of your application is a good reason to skip this rule.
Using @property to treat a class method as an attribute is also valid.

🐶 Variables and Constants should be nouns

Should always be nouns, never verbs (which clarifies the difference between functions).

Good examples:

plane = Plane()
customer_id = 5
KEY_COMPARISON = "abc"

Bad examples:

fly = Plane()
get_customer_id = 5
COMPARE_KEY = "abc"

If your variable/constant is a list or collection, make it plural!

planes: list[Plane] = [Plane()] # 👈 Even if it contains only one item
customer_ids: set[int] = {5, 12, 22}
KEY_MAP: dict[str, str] = {"123": "abc"} # 👈 Dicts are kept singular

🏛️ Classes should be self explanatory, but Suffixes are fine

Prefer classes with self explanatory names. It's fine to have suffixes like Service, Strategy, Middleware, but only when extremely necessary to make its purpose clear.

Always name it in singular instead of plural. Plural reminds us of collections (e.g. if I read orders I assume it's a list or iterable), so remind yourself that once a class is instantiated it becomes a single object.

Classes representing entities

Classes that represent things from the business context should be named as is (nouns!). Like Order, Sale, Store, Restaurant and so on.

Example of suffixes usage

Let’s consider you want to create a class responsible for sending emails. If you name it just as "Email", its purpose is not clear.

Someone might think it may represent an entity e.g.

email = Email() # inferred usage example
email.title = "Title"
email.body = create_body()
email.send_to = "guilatrova.dev"

send_email(email)

You should name it "EmailSender" or "EmailService".

🐪 Casing conventions

By default follow these naming conventions:

Type	Public	Internal
Packages (directories)	`lower_with_under`	-
Modules (files)	`lower_with_under.py`	-
Classes	`CapWords`	-
Functions and methods	`lower_with_under()`	`_lower_with_under()`
Constants	`ALL_CAPS_UNDER`	`_ALL_CAPS_UNDER`

⚠️ Disclaimer about """private""" methods.

Some people found out that if you have __method(self) (any method starting with two underscores) Python won't let outside classes/methods invoke it normally which leads them to think it's fine.

If you came from a C# environment like myself it might sound weird that you can't protect a method.

But Guido (Python's creator) has a good reason behind it:

"We're all consenting adults here"

It means that if you're aware you shouldn't be invoking a method, then you shouldn't unless you know what you're doing.

After all, if you really decided to invoke that method, you're going to do something dirty to make it happen (known as "Reflection" in C#).

Mark your private method/function with a single initial underscore to state it's intended for private use only and live with it.

↪️ When to create a function or a class in Python?

This is a common question I received a few times.

If you follow the above recommendations you're going to have clear modules and clear modules are an effective way to organize functions:

from gmaps_crawler import storages

storages.get_storage()  # 👈 Similar to a class, except it's not instantied and has a plural name
storages.save_to_storage()  # 👈 Potential function inside module

Sometimes you can identify subsets of functions inside a module. When this happens a class makes more sense:

Example on grouping different subset of functions

Consider the same storages module with 4 functions:

def format_for_debug(some_data):
    ...

def save_debug(some_data):
    """Prints in the screen"""
    formatted_data = format_for_debug(some_data)
    print(formatted_data)


def create_s3(bucket):
    """Create s3 bucket if it doesn't exists"""
    ...

def save_s3(some_data):
    s3 = create_s3("bucket_name")
    ...

S3 is a cloud storage to store any sort of data provided by Amazon (AWS). It's like Google Drive for software.

We can say that:

The developer can save data in DEBUG mode (that just prints on the screen) or on S3 (that stores data on the cloud).
save_debug uses the format_for_debug function
save_s3 uses the create_s3 function

I can see two groups of functions and no reason to keep them in different modules as they seem small, thus I'd enjoy having them defined as classes:

class DebugStorage:
    def format_for_debug(self, some_data):
        ...

    def save_debug(self, some_data):
        """Prints in the screen"""
        formatted_data = self.format_for_debug(some_data)
        print(formatted_data)


class S3Storage:
    def create_s3(self, bucket):
        """Create s3 bucket if it doesn't exists"""
        ...

    def save_s3(self, some_data):
        s3 = self.create_s3("bucket_name")
        ...

Here's a rule of thumb:

Always start with functions
Grow to classes once you feel you can group different subsets of functions

🚪 Creating modules and entry points

Every application has an entry point.

It means that there's a single module (aka file) that runs your application. It can be either a single script or a big module.

Whenever you're creating an entry point, make sure to add a condition to ensure it's being executed and not imported:

def execute_main():
    ...


if __name__ == "__main__":  # 👈 Add this condition
    execute_main()

By doing that you ensure that any imports won't trigger your code by accident. Unless it's explicitly executed.

Defining main for modules

You might have noticed some python packages that can be invoked by passing down -m like:

python -m pytest
python -m tryceratops
python -m faust
python -m flake8
python -m black

Such packages are treated almost like regular commands since you can also run them as:

pytest
tryceratops
faust
flake8
black

To make this happen you need to specify a single __main__.py file inside your main module:


├── src
│   ├── example_module 👈 Main module
│   │    ├── __init__.py
│   │    ├── __main__.py 👈 Add it here
│   │    └── many_files.py
│   │
│   └── tests/*
│        └── many_tests.py
│
├── .gitignore
├── pyproject.toml
└── README.md

Don't forget you still need to include the check __name__ == "__main__" inside your __main__.py file.

When you install your module, you can run your project as python -m example_module.

📖 Hey!

This is an initial draft from a book that I'm writing!

~~If you're interested make sure to subscribe to the newsletter and follow me on Twitter to be notified when the book is out!~~

The first chapter is out with a special discount!

Python Like a PRO 🐍📚 Book

Gumroad

I'm also open to feedback, get in touch either through email or Twitter DMs if you have any.

Python 3.11 What's New?

Guilherme Latrova — Tue, 28 Jun 2022 13:48:44 GMT

Here is a selection of the major changes coming in Python 3.11:

1️⃣ Better Error Handling

Better error messages to easily spot issues in your code.

Consider some code trying to read an invalid key from your dict.

In this case, instagram is an unexistent key that process_dict attempts to read:

invalid_dict = dict(
    youtube="Gui Commits",
    blog="https://guicommits.com",
    twitter="guilatrova",
)

def process_dict(input_dict: dict):
    handles = ["youtube", "blog", "twitter", "instagram"]
    social_links = []
    for handle in handles:
        social_links += input_dict[handle]

process_dict(invalid_dict)

If you were running it in Python 3.10, you would see this:

Python 3.10 Displaying an error

Now in Python 3.11 that's how it looks like:

The characters "~" and "^" point out exactly where the issue is located.

Python 3.11 Displaying an error

2️⃣ Exception Groups

Now we have an ExceptionGroup that groups many other exceptions inside.

Each exception living inside a group can be individually captured by the except* syntax. See:

def raise_exception_group():
    raise ExceptionGroup(
        "Description from ExceptionGroup", # Group wrapper
        [
            ValueError("ValueError"), # First exception in group
            TypeError("TypeError") # Second exception in group
        ]
    )


def main():
    try:
        raise_exception_group()
    except* ValueError:
        print("Value Error!")
    except* TypeError:
        print("Type Error!")


main()

Note the code above would print both Value Error! and Type Error!. All except blocks are executed if any exception exists inside the ExceptionGroup.

It can get a bit crazy since you have groups inside groups!

def raise_exception_group():
    raise ExceptionGroup(
        "Description from ExceptionGroup",
        [
            ValueError("ValueError"),
            TypeError("TypeError"),
            ExceptionGroup("Another group", # Second group inside main group
                [
                    Exception("Another error")
                ]
            )
        ]
    )

3️⃣ Exception `add_note`

Now you can keep enriching your exceptions with further data to add even more context to them!

def raise_exception():
    raise Exception("How you aren't subscribed to this blog's newsletter? 🤯😱")


def main():
    try:
        raise_exception()
    except Exception as ex:
        ex.add_note("Not following on Twitter would also be a mistake! 😜") # 👈 New method!
        raise


main()

So your traceback would display all notes added:

Python 3.11 exception's add_note feature giving you some good advice

4️⃣ New Type Hints

`Self` Type

The self type came to resolve a common issue where Python/IDE can't infer the self type.

Consider this working code from previous Python 3.10:

from __future__ import annotations

class Shape:
    def set_scale(self, scale: float) -> Self:
        self.scale = scale
        return self

class Circle(Shape):
    def set_radius(self, r: float) -> Self:
        self.radius = r
        return self

# 🔴 Invalid inferred type!
Circle().set_scale(0.5)
Circle().set_scale(0.5).set_radius(2.7)

My IDE keeps telling me that Circle.set_scale returns Shape, which is not true!

This gets resolved with the new Self type:

from typing import Self  # 👈 New typing

class Shape:
    def set_scale(self, scale: float) -> Self:
        self.scale = scale
        return self

class Circle(Shape):
    def set_radius(self, r: float) -> Self:
        self.radius = r
        return self

# 🟢 Correct inferred type!
Circle().set_scale(0.5)
Circle().set_scale(0.5).set_radius(2.7)

`LiteralString`

Think LiteralString as specific strings typed and expected by the developer.

from typing import LiteralString # 👈 New import

def inferred(s: str) -> bool:
    if s == "Did you already signed up for our newsletter?":
        print(s)  # 👈 Python identifies this as Literal String
        return True
    else:
        print(s)  # 👈 This is still any string
        return False


def defined(s: LiteralString) -> bool:
    print(s)
    return True


defined("literal value")

It can be useful to spot and prevent SQL Injection issues.

`NotRequired` for `TypedDict`

This feature is already common in TypeScript.

Consider you can define one key's dictionary with either:

One value as potentially null (already supported)
One key as potentially missing (new in Python 3.11)

See a few examples of how it would work for a dict with three keys:

blog as required key and required value
twitter as required key (but optional value)
instagram as optional key (but required value)

from typing import NotRequired, TypedDict

class MediaLinks(TypedDict):
    blog: str
    twitter: str | None # 👈 Key is required, value can be either null or str
    instagram: NotRequired[str] # 👈 Key is not required, but if set it should be str


# 🟢 Valid
links_with_instagram: MediaLinks = {"blog": "Gui Commits", "twitter": "@guilatrova", "instagram": "non existent"}
links_without_instagram: MediaLinks = {"blog": "Gui Commits", "twitter": None}


# 🔴 Invalid
links1: MediaLinks = {"blog": "Gui Commits", "twitter": "@guilatrova", "instagram": None}
# 1) 'instagram' can't be None. Mypy output:
# Incompatible types (expression has type "None", TypedDict item "instagram" has type "str")

links2: MediaLinks = {"blog": "Gui Commits"}
# 2) 'twitter' is expected. Mypy output:
# Missing key "twitter" for TypedDict "MediaLinks"

links3: MediaLinks = {"blog": "Gui Commits", "twitter": "@guilatrova", "youtube": "Someday?"}
# 3) 'youtube' isn't expected. Mypy output:
# Extra key "youtube" for TypedDict "MediaLinks"

Remember that Python validates nothing for you.

You must use Mypy to guarantee your typings are correct.

💡 How to use new types before Python 3.11 is released?

Did you know you can start using NotRequired, Self, and LiteralString types already?

You just have to install typing_extensions:

👉 pip install typing_extensions

💁‍♂️ Impot types from typing_extensions and that's it.

5️⃣ Performance Improvement

Python 3.11 is on average 25% faster than 3.10.

Due to improvements on CPython providing faster startup and faster runtime.

6️⃣ New default module: `tomllib`

Now Python supports reading toml by default:

import tomllib

toml_str = """
[build]
python-version = "3.11.0"
python-implementation = "CPython"

[tool.poetry.dev-dependencies]
tryceratops = "^1.1.0"
"""

data = tomllib.loads(toml_str)
print(data)

Toml is very common in pyproject.toml files. See this real file from the Tryceratops project.

When releasing your Python code to production🐍

Keep your PRODUCTION and DEV dependencies split!

Use something powerful like Poetry from @SDisPater

• poetry add [PROD_DEPENDENCY]
• poetry add -D [DEV_DEPENDENCY]

(No, you can't rage pip install) pic.twitter.com/T0V0ODjtWc
— Gui Latrova (@guilatrova) June 26, 2022

7️⃣ StrEnum

New type StrEnum with auto() support so you don't have to type

from enum import StrEnum, auto

class Status(StrEnum):
    ON_HOLD = auto()
    IN_PROGRESS = auto()
    ON_REVIEW = auto()
    DONE = auto()


print(Status.ON_HOLD) # Output: on_hold
print(Status.ON_HOLD == "on_hold") # Output: True

Now that you know about Python 3.11 changes

If you enjoyed this article did you know you can receive such updates in a shorter/funnier way?

Check out this 👇 thread and consider following @guilatrova 😉

Hey, do you know the major changes from 🐍 Python 3.11 (Beta) so far?

Here's a funny/short thread to keep you updated!

✅ Summary

1. Performance improvement
2. Better error messages
3. Exception groups
4. Exception add_note()
5. New type hints
6. New standard module

1/10 pic.twitter.com/TOg3GPtunN
— Gui Latrova (@guilatrova) June 13, 2022

Building a Blockchain with Python 🐍⛓️

Guilherme Latrova — Fri, 20 May 2022 11:18:51 GMT

Always asked yourself how blockchain works? Well, ask no more. We’re going to implement a simple version of a Blockchain written in Python interactively. 👈 It means that you will be able to run the code straight from your browser.

With just enough theory and a high focus on practice.

Note that if you're reading this article in AMP mode or from mobile you won't be able to run Python code from your browser, but you can still see the code samples and illustrations.

Example of executing Python interactively

The purpose here is to give you “enough” theory so it doesn’t get boring and so we take baby steps towards learning about blockchains.

⛓️ What is a blockchain?

Blockchain is the technology that powers Bitcoin, Ethereum, and Crypto. It’s a public ledger that guarantees information is decentralized.

For Bitcoin it means transactions are records and that each record is compliant and secure in the ledger.

It literally manages a chain of blocks, and each block contains many transactions.

A chain of blocks

Blocks can be identified by two means: height or hash. The hash is long and complex while height is quite easy since it's just the block number in the chain.

In our example above it's fine to say that the block of height 1 is also the block of hash 6b86.

The first block ever is called the Genesis Block 🌱, it's a special block with a single transaction (50 BTC to Satoshi Nakamoto).

What makes it special is that it's not tied to any real previous block, that's the only and unique exception to the Blockchain.

Since the blockchain is public and open, you can check and see the genesis block here (or any other block really).

I'm going to abstract how transactions are validated, so we can focus on the blockchain itself. We're assuming that every transaction is legit and needs no more than "from who", "to who" and "how much".

If you're interested in knowing more, I'll be happy to make a part two, and implement transaction details, validation, and some cryptography (yet again, interactively!).

Here're the entities Block and Transaction we're creating for the scope of this article:

Blockchain Entities

See as Python Code 🐍

How to use yield in Python

Guilherme Latrova — Thu, 24 Mar 2022 10:29:56 GMT

There’s nothing scarier than a code that you don’t understand. A few years ago I got concerned every time that I spotted a yield command. It seemed to be returning something... Somehow it worked even though I never used that command before. Is it like return?

Well, they’re similar, but work differently for sure! You're about to learn how yield works and what the hell are generators in a FUN WAY.

You'll be able to run the python commands right away from this blog as we progress so you can not only understand but SEE how it works.

(It might be disabled if reading on AMP mode or on small screens though. If that's the case you might want to come back to this article later.)

You can open the Python environment by either hitting ctrl + ' or manually clicking on the bar at the bottom of this blog.

Executing the Python code can be achieved by either ctrl + Enter / cmd + Enter, or just click Run.

Example of executing Python interactively

What does `yield` do in Python?

Any yield command automatically makes your code return a generator object. Generators are functions that can be paused and resumed.

Try this:

def mygenerator():
    n = 1
    yield n
    n += 2
    yield n

print(mygenerator())

By just using yield we create generators

And you’re going to see something like: as your output.

Not much impressive right? That's because we didn't use the generator object just yet. Let’s see what’s inside the generator we just returned.

Please, wrap the object with next:

def mygenerator():
    n = 1
    yield n
    n += 2
    yield n

print(next(mygenerator()))

Now you should see 1 as the result.

You're probably familiar with the return command and this would be the expected output if any function attempted to return twice: The first return interrupts the function and returns its results.

As we discussed before, this is not true for generators, further yield commands along the way are on hold.

Let's make it obvious by adding prints, and by forcing it to run fully with list:

def mygenerator():
    n = 1
    print("A short pause")
    yield n
    
    print("The pause is over!")
    n += 2
    yield n

print(list(mygenerator()))

You should see all our prints, and all yield values ( [1, 3]).

Cool right?

Unfortunately, it was not so obvious to see the "pauses" happening. Let's be a bit more verbose now by iterating over the generator, and doing calculations before the generator finishes:

def mygenerator():
    print("Gen: Yielding 1")
    yield 1

    print("Gen: Yielding 2")
    yield 2
    
    print("Gen: Yielding 3")
    yield 3


gen = mygenerator()
for n in gen:
    print(f"For: I got {n}")
    print("For: Since generator is paused, I can do some calculation")
    print(f"For: {n} * 10 = {n * 10}") 
    print()

Note how we can calculate (multiply by ten) even before the generator finishes processing!

My proposal here is to make things stupidly obvious. So here's how it would differ from a regular function, try this and check the output:

def myregularfunc():
    ret = []
    for n in range(1, 4):
    	print(f"Fun: Preparing {n}")
    	ret.append(n)

    print("Returning full list")
    return ret
    
ret = myregularfunc()
print()
for n in ret:
    print(f"For: I got {n}")
    print(f"For: {n} * 10 = {n * 10}") 
    print()

Note how inevitably we have to wait for myregularfunc to end in order to do anything we need with its results.

🏷 Python type hint for Generators

Do you want to find out how to type hint your generator functions? Easy!

from typing import Iterator, Iterable, Generator

def mygen1() -> Iterator[int]:
    """Iterator is fine"""
    yield 1
    
def mygen2() -> Iterable[int]:
    """Iterable is equally fine"""
    yield 2

def mygenverbose() -> Generator[str, None, None]:
    """
    Verbose way of defining a generator, it receives 3 types:
    (1) Yield type
    (2) Send type (if set, otherwise None)
    (3) Return type (if returning at the end, otherwise None)    
    """
    yield "That's verbose"

print(mygen1)
print(mygen2)
print(mygenverbose)

According to Python docs

I'm personally used to Generator[type, None, None] for keeping things explicit. Feel free to pick whatever you prefer.

🔂 Generators of generators

There're also moments where we want to generate from generators. Using just yield won't work, see:

from typing import Generator

def starting_five() -> Generator[int, None, None]:
    """Generator that returns integers from 1-5"""
    for n in range(1, 6):
        yield n
        
        
def ending_five() -> Generator[int, None, None]:
    """Generator that returns integers from 6-10"""
    for n in range(6, 11):
        yield n
        
        
def all_ten() -> Generator[int, None, None]:
    """Generator that relies on other generators"""
    yield starting_five() # This is broken
    yield ending_five()  # This is broken
    
    
print(list(all_ten()))

By doing this you're creating a generator that returns generator objects (The type hint would be something like: Generator[Generator[int, None, None], None, None]) which is not what we want!

To get it working you have to use yield from!

Fix it yourself! Replace the bare yield ... from all_ten with yield from ...

And you should have the expected result: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

⏱ Python Generators Use Cases

I enjoy bringing to you real examples that I actually use, and not just a bunch of syntax. So here're 3 times where I had and still have to deal with generators:

1️⃣ To make your code cleaner

Have you realized that functions that build lists to return need more lines? You need to set up the list, append, and then return it. The Effective Python book recommends generators whenever you need to return a list or iterable.

from typing import List, Generator


def regularfunc() -> List[int]:
    """This works and it's fine."""
    ret = []
    for n in range(10):
        ret.append(n * 10)
        
    return ret
    

def smallergen() -> Generator[int, None, None]:
    """This also works. Favor this one."""
    for n in range(10):
        yield n * 10
        
        
# Intended Usage:
print("Regular func:", regularfunc())
print("Smaller gen:", list(smallergen()))

2️⃣ To read large files (and save memory)

Once again: Generators are functions that can be paused and resumed.

It means that when you need to load a big file like a CSV you can read it line by line, instead of loading the whole thing and wasting all your memory.

Check:

def create_dummy_file():
    with open("loadme.csv", "w") as f:
        f.write("Num,Name\n")
    
        for n in range(10):
            f.write(f"{n},John Doe\n")
            
            
def read_the_whole_file():
    with open("loadme.csv", "r") as f:
        return f.readlines()


def read_line_by_line():
    with open("loadme.csv", "r") as f: 
        for row in f:
            yield row
            

create_dummy_file()

# Bad idea when file is big:
csvcontent = read_the_whole_file()
for row in csvcontent:
    # Even though we only use one line, it already loaded all lines
    print("Eager load:", row)

# Good idea when file is big:
csvcontent = read_line_by_line()
for row in csvcontent:
    # only loads a row to process
    print("Lazy load:", row)

Even though the output is the same, whenever you're reading a file (that might be big) and requires processing, stick to generators. It may save you from some MemoryErrors.

3️⃣ When writing tests with Pytest

Pytest has the concept of fixtures (out of the scope from this article so I'll be short), and there are commands that you want to be executed before AND after a test. See:

FILECONTENT = """
import pytest

@pytest.fixture
def notifyonlybeforefixture():
    print('Before the test ends')
    
    
@pytest.fixture
def notifycompletefixture():
    print('Before the test ends')
    yield # trick!
    print('After the test ends')
    


def test_fixtures_before(notifyonlybeforefixture):
    print("During test")
    assert False
    
    
def test_fixtures_complete(notifycompletefixture):
    print("During test")
    assert False
"""

# Note: This is only needed so you can test from your browser.
with open("mytest.py", "w") as f:
    f.write(FILECONTENT)


import pytest
pytest.main(["-v", "--cache-clear", "mytest.py"])

Since you're running Python from your browser, there's no default file. We have to generate one named mytest.py , so we can execute pytest on it. I hope this quirck won't make it too hard for you to understand.

Note how just adding a yield inside our notifycompletefixture function is enough to let pytest know it's ok to execute the test, and once it's completed, get back and resume from where it was left.

That's valuable when you need to tear down things (maybe close a file, clear database state, or similar).

By the way, do you know what else can be paused and resumed? Python Async. If you don't know what I'm talking about you should totally read this article.

Thank you for reading this far! I deeply appreciate your time and I hope you had fun playing with Python without having to leave the browser!

That was my first attempt to make Python learning fun and less boring. It would be an honor for me to hear your feedback: How was it?

Message @guilatrova

Abstract your code

Guilherme Latrova — Wed, 26 Jan 2022 23:41:37 GMT

Implementation abstraction makes your code flexible and decoupled from vendors or hard implementations, and finally, it's quite easy to follow, yet is constantly ignored.

This post would fit perfectly in a series named “Coding Practices that should be obvious, but for some unknown reason aren't”.

You might think that I'm about to mention interfaces, but not really. Python has no "interface" and you can still follow the principle to make your life (and hopefully yours colleagues too) easier.

Every time that you write a piece of code that is very specific on the “how-to” and not the “what-to” you're cursed with code hard to change and evolve (sometimes called “legacy code”).

Over time I noticed this simple concept being ignored over and over again, people leaking details when:

“querying database” ❌ instead of querying for an entity
“reading stream from Kafka” ❌ instead of listening to events
“popping SQS messages” ❌ instead of receiving messages
“pushing a function to celery” ❌ instead of scheduling a job

I'm about to show you one obvious example of where it might happen, and how to prevent it.

Just think about a Flask API that allows users to upload purchase receipts to store them in the cloud.

Flask API allows uploading files to S3

Here's the code split into two files main.py and s3.py, let's see them:

# Located under: s3.py
class S3Storage:
    def store(self, bucket: str, key: str, file: bytes):
        # Implementation details of using botocore to upload bytes to S3
        ...

    def retrieve_obj(self, bucket: str, key: str) -> S3Object:
        # Note it returns a specific "S3 Object" 👆
        # Implementation details of using botocore to read data from S3
        ...

The initial version of s3.py

# Located under: main.py
from s3 import S3Storage
...

s3_storage = S3Storage()
s3_bucket = "purchase-receipts"
s3_key = "submissions/receipt.jpeg"


@app.route('/purchase-receipts', methods=['POST'])
def submit_receipt(receipt_upload: FileStorage):
    s3_content = receipt_upload.read()
    s3_storage.store(s3_bucket, s3_key, s3_content)

    return "OK"


@app.route('/purchase-receipts')
def get_receipt():
    s3_obj = s3_storage.retrieve_obj(s3_bucket, s3_key)
    content = s3_obj.response['Body'].read()

    return content

The initial version of main.py

What annoys me most is that the implementation details are leaked all over the place to the caller. The way we defined our "Storage" class makes it impossible for the client to don't be overloaded with AWS terms: "Bucket name", "Key", S3Storage class, and finally the return type S3Object makes our code completely coupled to the implementation details!

Your code should only be tied to the business purpose.

💦 Leaking implementation details is bad

To make this statement obvious let's say that you recently received a lifetime 80% discount to use GCP to store such data, you pitched it to your team, and it sounds like a decent discount for the long term.

The appropriate solution that I'd expect would be to just replace the S3Storage implementation and don't even touch the main.py file that contains our API code.

# Located under: s3.py
class S3Storage:
    def store(self, bucket: str, key: str, file: bytes):
        # Implementation details of using botocore to upload bytes to S3
        ...

    def retrieve_obj(self, bucket: str, key: str) -> S3Object:
        # Implementation details of using botocore to read data from S3
        ...

# you probably would need to rename the file as well to gcp.py for consistency
# 👇 Would become something like this
class GCPStorage:
    def store(self, bucket: str, blob_name: str, blob: bytes):
        # Implementation details of using gcp sdk to upload bytes to GCP
        ...

    def retrieve_obj(self, bucket: str, key: str) -> GCPObject:
        # Implementation details of using gcp sdk to read data from GCP
        ...

The second version of s3.py (or maybe gcp.py?)

Unfortunately, just modifying this file won't be enough because we're leaking all these details to main.py which forces us to rename variables and make the change bigger and the review process longer and more error-prone.

Business tightly tied to the implementation details

🎯 Focus on the business value

This problem gets easier when we focus on what problem we're solving regardless of how.

Let's rewrite the code, but this time, focusing on the problem.

# Now located under: storages.py
class PurchaseReceiptsStorage:  # 👈 Do not leak "HOW" we're storing it
    def store(self, filename: str, file: bytes):  # 👈 Only asks for the absolutely necessary
        # Implementation details for either S3, GCP or whatever
        ...

    def retrieve_receipt(self, filename: str) -> bytes:  # 👈 Only asks for the absolutely necessary
        # Return some data type that can be easily consumed by the client regardless of details 👆
        # Implementation details for either S3, GCP or whatever
        ...

Final conversion from s3.py to storages.py

from storages import PurchaseReceiptsStorage  # 👈 Changes shouldn't modify this

storage = PurchaseReceiptsStorage()  # 👈 Business value clear
filename = "receipt.jpeg"

@app.route('/purchase-receipts', methods=['POST'])
def submit_receipt(receipt_upload: FileStorage):
    content = receipt_upload.read()
    storage.store(filename, content)  # 👈 Things that are expected for the caller to know

    return "OK"


@app.route('/purchase-receipts')
def get_receipt():
    content = storage.retrieve_receipt(filename)  # 👈 Returns a known common type not tied to any lib

    return content

main.py file focused on the business

Any change or maintenance that you need to do over the new PurchaseReceiptsStorage is pretty much agnostic. We're just storing a file, the client shouldn't care how. It's none of his business.

Business value free from annoying details

For storing we only provide a default filename, and the content itself.
For reading, we just specify the filename.
It's expected for the storage class to internally understand that "Since I manage the storage of receipts, I know which bucket, cloud, API, etc to use".

Do you want to code a feature that might change?

Yes, it always changes.

Abstract your code from implementation details and you'll be good. pic.twitter.com/gWJKAE0r3P
— Gui Latrova (@guilatrova) November 18, 2021

AWS Organizations with Terraform Workspaces

Guilherme Latrova — Thu, 25 Nov 2021 11:37:59 GMT

There are three boring things in life that DevOps engineers need to do:

Grant the correct set of permissions per dev, so they don't explore more than they should, and don't have less than they have to;
Replicate resources across environments;
Watch out for costs;

And for all of those, having AWS Organizations with terraform workspaces is the way to go.

I'm about to show you how to use AWS Organizations to your advantage with meaningful examples, and how to use terraform to manage it and replicate resources across them.

This post you're reading assumes you know the minimal about AWS and Terraform. I'll mention things like IAM, SQS, and show some terraform code.

🗄️ What are AWS Organizations?

AWS Organizations

It's an AWS account that is defined as an organization and that manages children AWS accounts. The parent (or root) account is then responsible for paying the bills of these accounts.

AWS Organizations sample

AWS Organizations with expanded accounts sample

✨ Benefits of AWS organizations

A well structured setup makes a lot of sense for:

Different environments (i.e. staging and production)
Specific projects that may use many resources and that require observation
Isolated customer projects

By default, a setup for different projects will allow you to have:

Safer configuration of resources since they can't touch each other unless you explicitly allow it to. Also, it's a good incentive for you to create more VPCs, Roles, etc, instead of reusing the same.

Avoid resource naming hell by not creating, for example, 3 RDS instances named main-database-dev, main-database-staging, and main-database-production, and by not connecting services from different environments by mistake to them. All of them can perfectly have the exact same name main-database living on their isolated organization.

Consolidated billing of accounts, so you can understand which customer of yours consumes more resources, how much it costs you to keep a staging environment, and so on.

Billing allows you to filter by account

🔓 How do permissions and access across accounts work?

I'll use a realistic example from past places I worked.

It's common for companies to have at least three environments, and so it's good to keep them split as different organizations:

AWS Organizations per environment

Environment	Motivation
dev	Environment with constant changes and testing, very unstable.
staging	Slightly stabler environment, used frequently for testing and release candidates.
production	Stable environment, this is the one customers use and consume.

Since the organizations are different accounts with their own set of IAM, RDS, VPCs, etc the only way for the root account to interact with any of the children accounts is by assuming a role.

AWS Roles can limit access to resources and organizations

Note that you can, easily, create custom IAM users inside each new account as you would normally do. I don't recommend that approach because:

Now you have to watch out for many different user accounts (i.e. all of them are rotating their secrets? Do they have MFA enabled? etc)
Other engineers have many access key/access secrets

It's way easier (and consequently safer) to manage such access through roles because then you can limit which user from the root account can assume what role from the child account.

AWS IAM set to allow John to access staging resources as an admin

As soon as you created a managed child account, you need to deal with roles and permissions.

It might be hard to visualize all the permissions in place. Let's break it down to keep things simple:

1️⃣ Root account: Permission to assume a role in the Staging account

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "",
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            // 👇 000000000000 represents the child AWS account
            "Resource": "arn:aws:iam::000000000000:role/adm-role"
            // 👆 'adm-role' the role that lives inside Staging account
        }
    ]
}

If you prefer terraform, here's the creation of an IAM group that has the permissions mentioned above:

# 1️⃣ Set up some variables for organization
locals {
    # ROOT account
    group_name = "staging-developers"
    policy_name = "staging-access"
    iam_path = "/"

    # PROJECT account
    child_aws_account = "000000000000"
    role_in_child_aws_account = "adm-role"

}

# 2️⃣ Create a group that can access staging
resource "aws_iam_group" "staging_group" {
  name = local.group_name
  path = local.iam_path
}

# 3️⃣ Defines what can be done on what/where
data "aws_iam_policy_document" "staging_access_spec" {
  statement {
    actions = [
      "sts:AssumeRole",  # 👈 You can "AssumeRole"
    ]

    # 👇 Upon this resource (i.e. inside this AWS account, this role)
    resources = ["arn:aws:iam::${local.child_aws_account}:role/${local.role_in_child_aws_account}"]
  }
}

# 4️⃣ 👇 For the group we just created, attach the policy we just defined
resource "aws_iam_group_policy" "staging_group_policy" {
  name  = local.policy_name
  group = aws_iam_group.staging_group.name

  policy = data.aws_iam_policy_document.staging_access_spec.json
}

2️⃣ Staging account: adm-role permissions to resources and trusted relationships

Note: By default, AWS creates the exact definitions below when you create a child org.

Let's state that since it's staging, it's fine to be permissive. This role contains an admin policy granting access to everything.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "*",
            "Resource": "*"
        }
    ]
}

But we limit who can assume that role! The trusted entity allows just one specific AWS account to do that:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::000000000000:root"  // 👈 Where 0... is the full account id
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

We can narrow down the permissions by having many roles in a production account and allowing only a subset of users to assume specific roles.

🛂 Accessing AWS child accounts from root account

It might sound that it would be extremely boring or slow to access such accounts. Turns out that once you understand how the permissions works (as explained above), it becomes simple. See:

🪖 Set up AWS console to access child account

To do that, you need to be logged as an IAM user (i.e. you can't do this with a root user).

AWS Console options to switch role

Then you're free to access children accounts by filling the form:

AWS Console form to assume a new role for another account

And finally, you must see the console again, but this time it's another account:

AWS Orgs look when you assume a new role

🪖🎖️ Pro Tip: Use an extension to skip the form

Even though the above method is easy, it's quite boring.

Instead, I recommend using the extension AWS Extend Switch Roles extension (available for Chrome, Firefox, and Edge).

It allows you to switch between roles easily, so you're always one click away to impersonate any organization:

Switch Roles extension in action

The example configuration would be like this:

[Staging Account]
aws_account_id = 000000000000
role_name = admin-role
region = us-east-1

⛑️ Setup AWS CLI to assume the role

Accessing such accounts through the CLI is even easier, and no, you don't have to manually run aws sts assume-role.

Here is the setup for your .aws/credentials:

# 👇 Usual setup of a regular user
[root-account]
aws_access_key_id = AKIAXXXXXXXXXXXXXXXX
aws_secret_access_key = XXXXXXXXXXXXXXXX

# 👇 Define aws organization
[staging-account]
role_arn = arn:aws:iam::000000000000:role/admin-role # 👈 Which role to assume?
source_profile = root-account # 👈 Use the above profile to assume this role

So anytime you do export AWS_PROFILE=staging-account, your AWS CLI will automatically assume the role for you and give access to the resources you should have. Pretty cool uh?

Check it by yourself:

❯ export AWS_PROFILE=staging-account
❯ aws sts get-caller-identity

// Output
{
    "UserId": "AROATLMRPSWKBES5PAFXV:botocore-session-1637767588",
    "Account": "000000000000",
    "Arn": "arn:aws:sts::000000000000:assumed-role/admin-role/botocore-session-1637767588"
}

🪐 Replicating resources with Terraform workspaces

Since now you have different AWS accounts, you might wonder how it would work with Terraform. I have seen too many repetitions both inside AWS (resource naming hell) and within terraform (WET code). Let's start solving the resource replication first, and then we move to set up the AWS multi-org.

As an example, let's consider we want to have different SQS queues per environment, and for simplicity let's just stick to staging and production accounts.

So, the first thing that I see people doing is:

.
src
└── queues
    ├── main.tf  # 👈 Resources defined here
    ├── vars.tf
    └── state.tf # 👈 Terraform backend state config

# main.tf
# 👇 Defines we're using AWS cloud provider
provider "aws" {
  region = "us-east-1"
}

# 👇 Defines SQS for staging
module "staging_sqs" {
  source  = "terraform-aws-modules/sqs/aws"
  version = "~> 2.0"

  # 👇 Naming hell, we add a prefix to specify the var
  name = var.staging_sqs_name
  message_retention_seconds = 86400  # 👈 Staging messages set to 1day

  # 👇 We tag the resource with the env
  tags = {
    env = "staging"
  }
}

# 👇 Defines SQS for prod
module "prod_sqs" {
  source  = "terraform-aws-modules/sqs/aws"
  version = "~> 2.0"

  # 👇 Naming hell, we add a prefix to specify the var
  name = var.prod_sqs_name
  message_retention_seconds = 259200  # 👈 Prod messages set to 3days

  # 👇 We tag the resource with the env
  tags = {
    env = "prod"
  }
}

Three things to keep in mind:

We don't want to copy/paste code to represent the same resource per environment, that sucks
Different environments have different resource configurations (e.g. Staging databases can be way smaller than production ones), and for our example, the SQS queues have different tags and message retention time
We need to create them in different AWS accounts but associated with the root account

That's where terraform workspaces shine! You can have the same definition of resources with different states.

Terraform replicating resources with workspaces

If you never tried I recommend you to play with it by running:

❯ terraform workspace list
* default

❯ terraform workspace new staging
Created and switched to workspace "staging"!

You're now on a new, empty workspace. Workspaces isolate their state,
so if you run "terraform plan" Terraform will not see any existing state
for this configuration.

❯ terraform workspace list
  default
* staging

Consider that every workspace has an independent state, and you can switch between workspaces by terraform workspace select .

Considering you created two workspaces named staging and prod, we need to modify our directory structure a bit to keep different configs and update our main.tf file:

.
src
└── queues
    ├── workspaces # 👈 New directory
    │   ├── prod.tfvars # 👈 I recommend naming it after the workspace to keep it obvious
    │   └── staging.tfvars
    ├── main.tf
    ├── vars.tf
    └── state.tf

And now our main.tf can be updated to be like:

provider "aws" {
  region = "us-east-1"
}

module "sqs" {  # 👈 Cleaner name
  source  = "terraform-aws-modules/sqs/aws"
  version = "~> 2.0"

  name = "${terraform.workspace}-sqs" # 👈 Keeping the same structure for example
  message_retention_seconds = var.message_retention_seconds

  # 👇 We tag the resource with the env
  tags = {
    env = terraform.workspace
  }
}

Way better right? The biggest difference is the directory structure, where we included two new files prod.tfvars and staging.tfvars.

These files are quite simple though, see staging.tfvars:

# staging.tfvars
message_retention_seconds = 86400

and prod.tfvars:

# prod.tfvars
message_retention_seconds = 259200

And finally, the expected usage would be, for example:

❯ terraform workspace select staging
Switched to workspace "staging".

# 👇 Now you must use `-var-file`
terraform plan -var-file ./workspaces/staging.tfvars
terraform apply -var-file ./workspaces/staging.tfvars

Here's how the state is stored inside an S3 bucket:

Terraform workspaces state stored in an S3 bucket

🏗🪐 Terraform with AWS Organizations (multi-accounts)

That's way better already, but we still have resources located in the same account, let's fix that by telling Terraform to use different accounts for managing the resources.

Terraform assumes different roles to create resources

Given that you have a nice and reusable directory structure. We just need to modify three places:

main.tf with the assume_role option
prod.tfvars, and staging.tfvars with respective AWS config vars

# main.tf
provider "aws" {
  region = "us-east-1"

  # 👇 Identifies which role terraform should assume when planning and applying resources
  assume_role {
    role_arn = var.aws_role
    # 👆 We can keep different vars per environment!
  }
}

...

Keep in mind that doing it means terraform still stores the state in a bucket located in the root account, but any interaction with the cloud resources will assume a new role before.

Of course, you can set the aws_role var per environment:

# staging.tfvars
aws_role = "arn:aws:iam::00000000000:role/admin-role"

# prod.tfvars
aws_role = "arn:aws:iam::00000000000:role/admin-role"

✨ And that's it! Now you have reusable terraform code spread across environments, and resources well named. Cheers! 🍻

👀 Want to see a real project?

I'm building a microservice architecture in public in a new series called the AntifragileDev while keeping the whole code open source and sharing my journey and learnings as I go.

You probably saw some references from my new project in the AWS org examples above, right? 😁 I'll share everything (including costs) of keeping a microservice architecture up and running.

If that's something you're interested in, you should follow me on Twitter.

Did you know you can create multiple AWS accounts for better management?

This feature is called "AWS organizations".

The root account will be in charge of paying for the usage of children accounts.

👇🧵 Thread pic.twitter.com/AjXPdsGlgG
— Gui Latrova (@guilatrova) November 23, 2021

This post is part of this project, where I bring real needs, build all microservices in public, and keep them open source.

👇 And by the way, this is the infrastructure project I meant, feel free to explore!

GitHub - guilatrova/restaurant-directory-listing-infra: Terraform infrastructure code that generates the infra for a Restaurant Listing Directory using AWS Organizations

Terraform infrastructure code that generates the infra for a Restaurant Listing Directory using AWS Organizations - GitHub - guilatrova/restaurant-directory-listing-infra: Terraform infrastructure ...

GitHubguilatrova

Project: Google Maps Crawler 🗺🪲

Guilherme Latrova — Fri, 19 Nov 2021 14:49:02 GMT

Stack: Python, Selenium

GitHub - guilatrova/GMaps-Crawler: Google Maps crawler using Selenium

Google Maps crawler using Selenium. Contribute to guilatrova/GMaps-Crawler development by creating an account on GitHub.

GitHubguilatrova

This is the first project created for the Antifragile Dev series, and its purpose is to collect data from Google Maps and do pretty much whatever we want with it.

🤖 What is Selenium

Let's keep it simple: Selenium is a tool that manipulates and interacts with the browser as a regular user would.

It can be used to automate tests by simulating user behavior e.g. like typing, clicking, scrolling, interacting with contents, and checking outputs are being correctly displayed.

For the scope of this project, I didn't test anything, instead, I used it to capture data that would be very boring to do manually - we call it "web scrapping".

Google Maps Crawler running - Example

Create a Selenium Webdriver

This is how it looks like to create a "Selenium web driver" that will interact with Google Chrome:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager

IMPLICT_WAIT = 5


def create_driver(headless=False):
    chrome_options = Options()
    if headless:  # 👈 Optional condition to "hide" the browser window
        chrome_options.headless = True

    driver = webdriver.Chrome(ChromeDriverManager().install(), chrome_options=chrome_options) 
    # 👆  Creation of the "driver" that we're using to interact with the browser
    
    driver.implicitly_wait(IMPLICT_WAIT) 
    # 👆 How much time should Selenium wait until an element is able to interact

    return driver

Creating a Chrome driver with Selenium

👆 Note we're using ChromeDriverManager to install the required dependencies for Selenium to manipulate the Chrome browser. That makes the setup a lot easier!

The minimum knowledge you need to get started now:

Visit a page
Find an element on the page you want to interact
Wait for something to happen
Interact with the element

To be able to do any of those is important that you understand a thing or two about HTML, check some basic commands:

driver = create_driver()  # Method defined in previous examples

driver.get(url)  # 👈 Visits a page

# 👇 Finding elements

driver.find_elements(By.XPATH, "*")          # 👈 Get all direct elements
driver.find_element(By.CSS_SELECTOR, "#btn") # 👈 Get one element with id "btn"
driver.find_elements(By.TAG_NAME, "h1")      # 👈 Get all 'h1' elements
driver.find_elements(By.CLASS_NAME, "cls")   # 👈 Get all elements with classname "cls"

Examples on visiting a page and finding elements with Selenium

Defining "the best" way to find an element is harder though...

🐛 How to debug Selenium with VSCode

My debug process for such applications is always the same. These sites don't want to help you scrap their content, so they make it really hard with random ids and class names.

Consider you want to get the business hours from a restaurant, it's not as straightforward as it looks like, because nothing makes much sense:

Random ids hard to understand

To scrape data from such sites it's quite painful, and consider they might change it anytime and of course they won't notify you.

This is hard to do at a first shot, so I'm sharing some tricks I do to make my life less painful. You can set breakpoints in VSCode at specific moments, and then manipulate the driver right from the debug window.

It's great to minimize guesswork.

💡 Tips and challenges on scraping data with Python and Selenium!

1️⃣ Data inconsistency

As I was trying to scrape business hours from restaurants on Google Maps, I found some random text in the middle! [...]

👇🧵 Solution and more challenges in thread pic.twitter.com/gAEPbP7hOM
— Gui Latrova (@guilatrova) November 17, 2021

Finally, ensure to make your code readable, Selenium scripts get messy very quickly, so you always want meaningful methods and functions.

Check a small piece of this project code:

    def get_place_details(self):
        self.wait_restaurant_title_show()

        # DATA
        restaurant_name = self.get_restaurant_name()
        address = self.get_address()
        place = Place(restaurant_name, address)

        if self.expand_hours():
            place.business_hours = self.get_business_hours()

        # TRAITS
        place.extra_attrs = self.get_place_extra_attrs()
        traits_handler = self.get_region(PlaceDetailRegion.TRAITS)
        traits_handler.click()
        place.traits = self.get_traits()

        # REVIEWS
        place.rate, place.reviews = self.get_review()

        # PHOTOS
        place.photo_link = self.get_image_link()

        self.storage.save(place)
        self.hit_back()

The goal is for the code to be self-explanatory and simple to read.

🤔 Why you didn't use the Google Maps API?

Mostly due to some feature limitations and rate-limiting.

Also, I'm still hacking this project and I don't even know whether it will work, so I felt like just trying to get something simple real quick to move on.

🟢 What's next?

Since we're willing to build a microservice architecture, we took our initial step:

To get some source of data to display

Now we must publish it to SQS as an event. Unfortunately, we don't have any infra yet... Well, I guess it's time for terraform and CDK. If you don't know those yet, it will be your chance to learn something fun and implemented in a real project.

Watch out for the next blog posts!

🟡 What's pending?

I made a few decisions that are worth sharing:

The application is not scaling yet

The application doesn't crawl until the last page

I'm running it from my own computer

👆 I don't want to bother about collecting more cities, running into other schedules, etc. I'll get back to it later. We must progress and deliver something simpler but working, and having ~10 restaurants is enough for now.

Follow me on Twitter to keep watching as the project evolves!

How to use Python 🐍 iterators IRL applied to a real project!

Let's go step-by-step on how to transform 💩 code in art:

👇🧵 Open Source Repository at the end pic.twitter.com/DQf4wK34nZ
— Gui Latrova (@guilatrova) November 18, 2021

Restaurant Directory Listing - Call for Proposal

Guilherme Latrova — Thu, 11 Nov 2021 21:55:00 GMT

Frustrated by regular schooling, and motivated to build things on my own I'm willing to build a microservice architecture in public with friends as promised:

I'm planning to build a whole microservice architecture in public, sharing all decisions and keeping the whole code open source for anyone to give feedback and learn from.

Follow me if you're interested in that kind of stuff!
— Gui Latrova (@guilatrova) September 28, 2021

And by my friends I mean you. You're welcome to join, do things on your own terms, and share your progress.

I'll open space here in my blog to people who wrap up different solutions.

If you're a watcher, you're welcome too! We're building everything in public, so you can observe and criticize.

🏗 The Project

We're building a directory listing for restaurants inspired by nomadlist.com.

Nomad List

🌍 Go nomad: Join a global community of remote workers living around the world

Nomad List

So instead of looking for restaurants as someone would on Google, we're creating an app to list them focusing on the pictures and other attributes the restaurant may have.

The goal is not to be original, but to build in public, and share learnings.

🌟 Challenges to keep in mind

Backend

We want to have some initial data to display, so find a way to extract data from Google Maps or some other source you trust
Ensure that updates on your source can be reflected on your app if/when you wish to

Frontend

We want to allow visitors to rank places, favorite them, and filter stuff like "coffee shop", or maybe "nice place for kids"
By default, we show the restaurants in the same city the visitor is

And that's it. It's intentionally open-ended to push you to bootstrap your own solutions and put into practice your problem-solving skills.

I'm going to share my approach to the issue later on this blog, and progress on Twitter. Subscribe to the newsletter so you won't miss it!

🗺️ How is it going to work?

It's simple. No need for fancy project launches, it's just us: friends hacking and learning together.

You're going to read the proposed challenge above, understand the features, define which technologies you want to use, design your architecture, and start building it.

You got the control! Wanna try that amazing framework? Maybe play with Deno? Why not Flutter? Well, you decide, that's your call!

There's no need to do it in a "microservice fashion way" either. Feel free to wrap up your monolith, that's ok.

Yet, if you're a beginner and there's something you have no clue on how to do, you can watch first and then replicate it on your own terms.

Once you have it defined, do as follows:

Fork this repo: https://github.com/guilatrova/antifragile-dev-1
Follow the template
Push your changes to your own repo, and
Get in touch to let me know you're participating, so I can share your work publicly:

Reach out either through Twitter or my personal email:

Let the world know you're participating

Don't forget to share it with your friends, they might have an opinion on how to do it themselves, or they might enjoy watching you do it.

And the most important: as you evolve, share your work!

Excited to see your evolution! If you want to see mine as well, consider following me on Twitter and subscribing to the newsletter for updates on all the participants.

Gui Commits

Python ChatGPT API and DeepSeek API: Straight‑to‑the‑Point Guide 🐍🤖

🤔 Can I actually use ChatGPT in Python?

🤩 Is ChatGPT Python API free?

💸 Python ChatGPT API Pricing

🧩 What's ChatGPT Input and Output tokens?

💰 Caching ChatGPT Tokens to save money

🔑 Creating a ChatGPT API Token (If you don't have one yet)

How to create my ChatGPT API Token

🧑‍💻 ChatGPT API Example using responses.create

🧑‍💻 How to measure ChatGPT API Token Caching?

📐 Structuring ChatGPT responses as JSON Models

💰 Cheaper API alternative to OpenAI's ChatGPT

🐳 Python DeepSeek API vs ChatGPT API

Add docstrings to Python Enum members

Python Enum member docstring limitation

How to add docstring to enum members?

Generic functions and generic classes in Python

🏷️ Python generic in functions

🤌 Narrow down TypeVar types

🎩 Using generics in classes

👎 Overcoming Python generic limitations

Using generic classes to behave as functions

How to run pytest in parallel on GitHub actions

🐢 How it was before

🔀 Pytest split tests

🐇 Split pytest tests across GitHub workers

Effective Python Async like a PRO 🐍🔀

🤔 When to use Python Async

🖼️ Python Async Await Example

🐌 Use Python async and await

Use Python asyncio.create_task and asyncio.gather

Use Python asyncio.as_completed

🔀 Real-life scenario using Async IO

Pyrun: Execute Python inside your Twitter, Facebook, Linkedin

🤔 Why execute Python inside Twitter?

👤 Who's this for?

✍️ I'm a creator, how can I produce Python tweets that my audience can execute?

🧐 How to run Python inside other pages?

👓 Reading the Python Code

🕵️ Getting relevant tweets

🐍 Run Pyodide inside a Chrome Extension

🧑‍💻 Editor area and Output console

🧑‍💻 What about the future?

🌟 Tips

Manifest v3 + Sandboxes

Pyodide related projects

Python Match Case is more powerful than you think 🐍🕹️

Match Case is similar to a Switch Case

Match Case matching many different values

Match Case with conditionals (guards)

Match Case lists value, position, and length

Match Case dicts

Match Case classes instances and props

Keep Learning with me

Organize Python code like a PRO 🐍📦

🌳 Structure your Python project

🎯 The reasoning behind a src directory

🏷️ How to name files

🔎 Real-life example when naming modules

🔖 Naming classes, functions, and variables

👊 Functions and Methods should be verbs

🐶 Variables and Constants should be nouns

🏛️ Classes should be self explanatory, but Suffixes are fine

🐪 Casing conventions

⚠️ Disclaimer about """private""" methods.

↪️ When to create a function or a class in Python?

Example on grouping different subset of functions

🚪 Creating modules and entry points

Defining main for modules

📖 Hey!

Python 3.11 What's New?

1️⃣ Better Error Handling

2️⃣ Exception Groups

3️⃣ Exception add_note

4️⃣ New Type Hints

Self Type

LiteralString

NotRequired for TypedDict

💡 How to use new types before Python 3.11 is released?

🐌 Use Python `async` and `await`

Use Python `asyncio.create_task` and `asyncio.gather`

Use Python `asyncio.as_completed`

🎯 The reasoning behind a `src` directory

3️⃣ Exception `add_note`

`Self` Type

`LiteralString`

`NotRequired` for `TypedDict`

6️⃣ New default module: `tomllib`

What does `yield` do in Python?