Nowadays, relational databases are one of the most popular solutions for storing data. If you have a small storage and simple model — you’re probably starting with RDBMS while you design the system. But while your system is growing, you’re storing more and more data. In the end you can end with degrading of your database: queries become too slow and data can be unavailable, performance is poor and you can’t improve it by just adding more CPU or memory. Here, let me explain how you can solve these problems.
Let’s start from the beginning. You have a relational database. It is well structured and your business is happy. All writing to the DB coming in real-time by batches. Besides, you have comprehensive reports that are built on several tables.
Everything is great… until data becomes too large! During this time, the database experiences a lot of problems with additional data and some processes take longer than usual. This all was happening smoothly and invisible, before you found a lot of processes had been rejected! You’re trying to exceed timeouts, to chunk some data and you see how your life is going on harder day by day… What you can do with that? Here’s some tips!
The first obvious thing to do is to grab statistics. Different databases have their own instruments for this. The common things — they could show you which queries were running, how many times they were executed, how long they took to execute, CPU usage.
Also for each individual query you could try to find how the query will be executed, it is called query explainer. This shows you what the database should do in order to provide results for particular queries. But be aware that query explainer on different databases, e.g. on test and prod environments, it could show different results. And moreover, on the same database for the same queries, but different parameters it could be either different.
Analyze the result and take decision how to change your queries. You could also add hints to the queries.
If optimization doesn’t help you, then you could think about adding additional indexes.
Of course, you can’t just add any indexes, it doesn’t work in this way! So, you should do it wisely. Firstly, remember that as read data that have indexes will be faster, but changing with data as deleting/upserting will be slower. Secondly, indexes require additional space on your disk, as rule of thumb indexes could require almost 30–50 % of data volume! You should check in the documentation of the databases which type of indexes it supports and if they align with your problem.
Adding indexes should be done together with optimizing query steps.
When database execute query it tries to find the most optimal way to do it. And statistics on indexes help to make correct decision, but statistics could be old, and database could start make incorrect decision on order of the query. This could be easily seeing by query explainer.
From my experience updating statistics could be done in parallel with all other processes. And it’s not consuming task. But check the documentation on your database.
Because of inserting, deleting and updating index data could be stored not in optimized way. If we are talking on tree indexes then the tree should be balanced in order for database make less time for search particular key.
Be careful and remember, that reorganization of indexes is a heavy task and usually shouldn’t be executed during your other queries on that table. That should be a separate isolated query, that you can schedule on the time when your system is less loaded.
The other thing you could do is just remove constraints from the fields. Which constraints I’m talking about? NOT NULL, NOT EMPTY, or a more comprehensive one. Check how many CPUs resources you could save by removing them. In some cases this is a bottleneck. This approach can help you to gain the profit almost for free! And constraints can be out on client-level that you can allow to interact with the database rather than on the database itself.
Remove foreign keys
When you just started to learn relational databases, you could see that one interesting feature is to add foreign keys on tables. That helps to avoid inserting incorrect data. But, as for other constraints, this one consumes a lot of valuable computation power! Consider removing it. Some experienced developers say that you should never use foreign keys, and I can agree with them and vote for even not creating them!
Another step is don’t do what you learned about relational databases: don’t use normalization. Do opposite — denormalization of your data. Denormalization usually means that you could remove joins in your queries. But the side-effects are:
- You should manually handle consistency of data in different table and use transactions for it
- That brings overhead on inserting data and makes queries be slower
- That requires more space on the disk
Distributes data among servers
You could split your data among different servers. Think for features that could help to separate your model on different servers logically. For example, you could split by regions, by date, or some client’s attributes. This fully depends on your business model.
The other question is about how to do it. It’s a non-trivial task that requires a lot of changes in the model and APIs for using data. Such migration is a major release for your system.
Split database by services
Or you could try to split data on two different parts: “online data” and “reporting data”. The disadvantage is that the synchronization task becomes very complex and requires a lot of architectural decisions.
Improve the hardware for database
Yes, seriously, buy better hardware for your database. If your start with very cheap host, then you could easily upgrade it. But you can’t do it infinitely, more comprehensive hardware will become very costly and not only in price of buying but also in maintaining it.
There are specialized hardware for the databases that could cost sky-high prices. But I don’t think that it’s your case, this is mostly for the huge companies that store terabytes of data.
We tried different steps in order to make the database work faster. I can’t say that I mentioned all the possible ways, but I highlighted the most important of them. I hope you liked it!
Make your business happier with new solutions for your database!