Reading Time: 7 minutes

In this post, I would like to present some SQL problems along with the way how to solve them. In the following examples, I use the PostgreSQL, most of them will also work in other databases.

 

In the beginning, let’s take a table with user’s comments:

Goal: We need to find the last user’s comment and its date

We could try to group data by users.id and use MAX function on created_at, but content and id will not be available.

The solution to this problem is to use DISTINCT ON with ORDER BY

We received unique values for a user according to the selected order. Please note that the columns used in the DISTINCT ON clause must be located also at the beginning of ORDER BY

 

In the next example, we have a table of comments, each of them contains its unique number

As we can see, we lack numbers from 197/2018 to 201/2018

Goal: We need to find all the missing numbers

Helpful for us will be the GENERATE_SERIES function, let’s look at an example:

This function has generated a series of data for the given range. Let’s try to connect it with our comments, choosing the proper joining condition.

We are on the right way, to get missing numbers we need to check where is the NULL

We have found the missing numbers!

 

We’ll use the comments table again, to show another issue

Goal: Let’s find how many comments we received in each hour for a certain day.

We will probably want to group the data by the creation date: date_trunc (‘hour’, created_at)

This is the right approach, but we miss the other hours, where the number of entries is equal to 0. We can use GENERATE_SERIES again, this time we will generate a series of dates:

We are getting close to our goal, we must combine the two previous queries.

Success!

 

Sometimes we decide to use a custom solution to store certain data in arrays. In this case, I will use a simple implementation of the surveys:

As we can see, this is a multiple choice survey, the options table looks like:

The responses sent by the user are stored in the survey_responses table, which looks like:

Goal: We have to count how many times each answer has been chosen

We will use the UNNEST function which expands an array to a set of rows

Let’s add grouping by the response

The result is ready, we can add another join with survey_options to get the answer’s content.

The report is ready!

 

Consider now the following table containing users with basic information such as first name, last name, and age.

Goal: We have to count how many users are in specific age ranges, e.g., under 25, between 25 and 50, and over 50.

This is not a problem if we use a separate query for each case, but if we want to do it with a single query? We can use the expression in the aggregate function. We check the condition, then cast it on int, true gives us the value 1, false 0. All you need to do is use SUM function on these values.

It is done!

 

In the next example, we’ll use a similar table with users, this time they have a certain number of points, it is kind of ranking.

Goal: We want to find, how many points each user lacks to the leader

We will use a window function to solve our problem. Let’s add the column which contains the highest result, we will use the MAX function.

In leader_points column all rows have the same value, is a number of points that leader got. All we need to do, is calculate the difference of points, then the solution is ready:

However, we must be aware of certain problems that may come up if we add some conditions. Window functions work on data set of the current query, not on the whole table. If we want to check the difference between a leader for people under 20:

Saul is the leader now, not Emmy. To get the correct results, window function must be used in the subquery.

 

Goal 2: Let’s change the rules, we have to check how much points user is missing for the predecessor

Here we can use window functions too, this time the LAG function – which refers to the previous a row in the result, let’s look at an example:

In previous_points we can see the result that the predecessor received. Simply subtract the values from both columns and we have the result. We must remember that, in the case of the leader, we have NULL value, so we need to use COALESCE

 

The last example I would like to show the transfer of funds between two clients (functionality similar to the banking system), we have 2 users:

Goal: We want to send 100$ from Saul to Emmy.

Before the transfer, make sure that the status has not changed – Saul has at least 100$. In the network, we can find different solutions to this problem, they usually use transactions and locking. The following method performs an atomic update, using only a single query:

In WHERE we restrict the update to 2 users, then we check all users definitely exist and whether the balance is sufficient. In CASE statement, we consider two cases – the user’s balance of id 14 must be increased whereas the user’s with id 11 decreased.

Thank you for reading, I hope that the examples will be useful for you.