Tips to Improve your SQL

This list is for anyone who has worked with or is interested in working with SQL. The tips here will range from Beginner tips to help people who are just starting to use SQL, to Intermediate tips that will help people who have already been dabbling with SQL.


1. Format your code for readability (Beginner)

I know this one seems obvious but a lot of SQL is written without regard to formatting. By taking just a few extra moments to format your code, you help increase readability and make the code easier to understand. You can rest assured that months, or even years, from now those SQL queries will still be easier to modify. And of course, this will also help avoid the hassle of straining your eyes while trying to decipher a block of text.

The following 2 queries do the same thing. They JOIN users, purchases and items in order and calculate the total each user spent on purchases. Those results will be ordered by name while also filtering out users that paid less than $10.00. As you can see below, it is much easier to understand the clauses like SELECT, FROM, JOIN, GROUP BY, when they are separated. As your queries continue to grow, the benefits of formatting your SQL become even more apparent.

Unformatted:

select users.id, users.name, sum(item.price) from users join purchases on purchases.user_id = users.id join items on items.id = purchases.item_id group by users.id, users.name group by users.id, users.name order by users.name having sum(items.price) > 10.00;

Formatted:

SELECT
  users.id,
  users.name,
  SUM(items.price)
FROM users
  JOIN purchases
    ON purchases.user_id = users.id
  JOIN items
    ON items.id = purchases.item_id
GROUP BY users.id, users.name
ORDER BY users.name
HAVING SUM(items.price) > 10.00;

2. Table aliases when querying for more than 1 table (Beginner)

Table aliases help shorten your code while increasing readibility and eliminating ambiguity.

Without Alias

SELECT
  name,
SUM(price)
FROM users
  JOIN purchases
    ON purchases.user_id = users.id
  JOIN items
    ON items.id = purchases.item_id;

With Alias

SELECT
  u.name,
SUM(i.price)
FROM users u
  JOIN purchases p
    ON p.user_id = u.id
  JOIN items i
    ON i.id = p.item_id;

While it’s obvious to see the condensed query is easer to read, it also removes ambiguity which can lead to errors. For example, lets say the users table and the items table both have a name column, then the SQL interpreter won’t be able determine which of the two you are referring to. As an alternative, you could simply write users.name, but the alias takes up less space which is especially beneficial when dealing with long multi-line queries.


3. Common Table Expressions or CTEs for readability (Beginner)

You may already use CTE’s and refer to them as WITH queries. Common Table Expressions are used to evaluate a query and have a temporary result for subsequent use in additional SQL statements. This again allows for better readability than using subqueries. The following queries perform the same task: Retrieve the names and total money spent by users with subscriptions.

Subquery (No CTE)

SELECT
  su.name,
SUM(i.price)
FROM (
  SELECT u.*
  FROM users u
    JOIN subscriptions s
      ON s.user_id = u.id
) su
  JOIN purchases p
    ON p.user_id = su.id 
  JOIN items i
    ON i.id = p.item_id;

CTE

WITH subscribed_users AS (
  SELECT u.*
FROM users u
  JOIN subscriptions s
    ON s.user_id = u.id
)
SELECT
  su.name,
SUM(i.price)
FROM subscribed_users su
  JOIN purchases p
    ON p.user_id = su.id
  JOIN items i
    ON i.id = p.item_id;

In the examples above, the first SQL statement uses a subquery to fetch users with subscriptions only, and JOIN the results with purchases and items. This can quickly become difficult to read, understand, and debug.

However, with the CTE example we can easily read the component to get users with subscriptions before moving on to the rest of the query.


4. Common Table Expressions for controlling the query planner (Intermediate)

In some flavors of SQL, like PostgreSQL, CTEs also force the query planner (the underlying system/code that decides exactly how to evaluate your code) to evaluate the query in the order you’ve provided.

While the query planner does it’s very best to find the optimal way to evaluate your queries. Sometimes, for reasons that are complicated to debug, the query planner won’t evaluate filtering in the most optimal order.

For instance, let’s look at the queries from the previous example. Imagine we had millions of users, along with millions of items and billions of purchases. Evaluating a JOIN between that many users, items, and purchases is costly. What if only a tenth of those users had subscriptions?

The idea behind the subquery is that we can filter down to 10% of members with subscriptions before JOINing them with the millions of items and billion of purchases.

It’s also worth noting at times the query planner will fail to filter properly and instead perform the JOIN using the table used in the subquery, comapring to all of the records for all of the tables in the query. However, by using a CTE, you force the query planner to evaluate the CTE first. This guarantees it will JOIN only the subscribed users to items and purchases, which will drastically reduce the amount of comparisons (the ON clause) that need to be made, resulting in a dramatic reduction of time for evaluation and execution.

5. Optimize with Indices (Intermediate)

An index in a database is similar to an index in a book (depends on the database solution). Some database solutions create indices by storing references to the location of the data (exactly like book indices), other solutions will actually store the entire record (imagine if each word in a book’s index was followed by the paragraph it was referring to), and some solutions use a combination of both.

The purpose of indices are to make it easier and quicker on us and the query planner to find certain things. There are algorithms that the query planner can leverage to find records indexed/sorted by a column (read more on sorting algorithms here).

The best columns to index are columns that are queried often. For example, if you query a table by the name column often, then an index on the name column would be a good idea. Primary keys (the id column) and foreign keys (a column that references another table’s id) are good candidates for indices because they are often used in JOINs.

Pitfall: Don’t just create indices for every column though! Every time the table’s data changes, whether it’s an INSERT, UPDATE or DELETE then that means the index has to be updated as well. If you have several column indices on a table that changes a lot, each index has to be recalculated after every change which could end up being more costly in the long run than not having indices, or having just 1 index.

More Tips on the Way!

We have more articles with more tips on SQL coming. If there is anything in particular you wish we could explore, explain, or offer our tips on please let us know!

In the meantime you can work on your SQL skills by finding tutorials online (plenty of free material that can be found easily on Google) or by completing challenges on HackerRank or LeetCode. Have fun learning!