Library

More details about 1.7 Terabyte Firebird SQL database

Alexey Kovyazin, 26-May-2014
Some days ago we have published an article about our tests, devoted to the relationship between Firebird performance and databases growth, where we have tested (among others) the 1.7 Terabyte Firebird SQL database. Here you can find more details about such a big database.

Tables

In the table 1 you can find list of tables with key characteristics. This statistics was taken with gstat –a –r and interpreted with our IBAnalyst tool.

Table Records RecLength, bytes Data Pages Table size, Mb Indices size, Mb Total, %
ORDER_LINE 6300024797 60.09 38880344 607505.3 50582.3 34
STOCK 2100000000 298.88 43783806 684121.9 15809.61 39
ORDERS 630004141 29.00 2692324 42067.56 4250.38 2
HISTORY 630000434 48.77 3446455 53850.86 0.00 3
CUSTOMER 630000000 577.52 24226054 378532.0 8675.78 21
NEW_ORDER 188999981 13.00 623764 9746.31 1260.84 1
DISTRICT 210000 103.92 1860 29.06 1.27 ~0
ITEM 100000 82.73 756 11.81 0.52 ~0
WAREHOUSE 21000 97.90 179 2.80 0.11 ~0

Table 1. Tables and their main parameters in 1.7Terabyte Firebird SQL database

As you can see, the biggest table is ORDER_LINE – it contains more than 6 billion of records. It’s size is about 600Gb and there are 50Gb of indices for this table. This table occupies 34% of the database.

Table STOCK is also very big – it contains ~2 billion of records. Though the record number is smaller than in table ORDER_LINE, STOCK occupies ~680Gb in the database (39%), because record length in STOCK is 298.88 bytes against only 60.09 bytes in ORDER_LINE. Accordingly, size of associated indices for STOCK is only 15Gb.

The third biggest table, CUSTOMER, has even bigger record length – 577.52 bytes, and occupies 378Gb (21%) with only 630 million of records.

Indices

There are not many indices in this database, since it is designed only for tests – majority of tables have only 1 index – primary key. In the real world applications developers create many indices to serve particular users requests, but here indices are concentrated on the fastest inserts and updates of the data – as a result, indices size is only 5-10% of table size, while usually it is 30-50% (if you have indices size more than 50% of the table size, consider redesign of database schema, or, may be, drop useless or non-effective indices).

In the table 2 you can find the list of all indices in this database and their key characteristics, taken from gstat statistics and analyzed by IBAnalyst:

Index Table Depth Keys # Key length, bytes # of unique keys Size, Mb
ORDER_LINE_PK ORDER_LINE 4 6300024797 1.41 6300024797 50582.34
STOCK_PK STOCK 4 2100000000 1.00 2100000000 15809.61
ORDERS_PK ORDERS 3 630004141 1.01 630004141 4250.38
CUSTOMER_LAST CUSTOMER 3 630000000 0.00 1000 4029.64
CUSTOMER_PK CUSTOMER 3 630000000 1.01 188999981 1260.84
NEW_ORDER_PK NEW_ORDER 3 188999981 1.01 188999981 1260.84
DISTRICT_PK DISTRICT 2 210000 1.37 210000 1.27
ITEM_PK ITEM 2 100000 1.00 100000 0.52
WAREHOUSE_PK WAREHOUSE 2 21000 1.00 21000 0.11

Table 2. Indices of Firebird 1.7Tb database

As you can see, all indices have very small key size – the biggest is 1.41, it means that index is very effectively compressed – for example, primary for ORDER_LINE contains 6 billion of keys in 50 Gb of space. This is very good result.

There are 2 indices which have depth = 4. It means that engine need to perform 4 reads of index pages to find requested value. It is recommended that index depth should be not higher than 3, and if it is higher, database page size should be increased. However, we already have the maximum page size in this database (16Kb), so we need to live with it.

Heating

Before we will continue with queries, let's perform “heating”. When database is big, system data, associated with tables, are also big, and it takes some time for Firebird to cache necessary system pages. For example, table ORDER_LINE has 10110 pointer pages. Heating is necessary for any big database.

Without heating the first execution of queries will take significantly longer than with heating.

To heat database cache, let’s perform series of simple queries to load system data into cache:

select first 1 * from ORDER_LINE;
select first 1 * from STOCK;
select first 1 * from customer;
select first 1 * from ORDERS;
select first 1 * from ITEM;

This is enough for such a simple data schema. For big and complex databases with hundreds of tables heating schema could be much more complex, it should be carefully developed.

As a result, memory consumption of Firebird process (Firebird SuperServer 2.5.2 64 bit) grows:

heating firebird cache

Figure 1. Heating of database cache

Queries

After heating we can run several typical queries to estimate how users will work with such a big Firebird database: what will be response times for typical queries and read operations.

Query Plan and statistics Description
select first 10 w_id, w_name, c_id, c_last
from WAREHOUSE, customer
where c_w_id = w_id and c_w_id = 10000
PLAN JOIN (WAREHOUSE NATURAL,
CUSTOMER INDEX (CUSTOMER_PK))
STATISTICS
Current memory = 166919744
Delta memory = 5616
Max memory = 166977488
Elapsed time= 0.08 sec
Buffers = 10000
Reads = 1
Writes 0
Fetches = 51
Join table WAREHOUSE (21000 records) with CUSTOMER (630 million records), with condition for specific warehouse
select count(*)
from WAREHOUSE, customer
where c_w_id = w_id and c_w_id = 10000
PLAN JOIN (WAREHOUSE INDEX (WAREHOUSE_PK),
CUSTOMER INDEX (CUSTOMER_PK))
COUNT
============
30000

Current memory = 166918560
Delta memory = -1184
Max memory = 166977488
Elapsed time= 0.16 sec
Buffers = 10000
Reads = 1175
Writes 0
Fetches = 60040
Count records for the previous query.
SELECT first 10 *
FROM ORDER_LINE
WHERE OL_W_ID = 10050
PLAN (ORDER_LINE INDEX (ORDER_LINE_PK))
STATISTICS
Current memory = 167013640
Delta memory = 95080
Max memory = 167055968
Elapsed time= 0.80 sec
Buffers = 10000
Reads = 10285
Writes 0
Fetches = 10424
Query to the largest table ORDER_LINE (~6billion records) with condition to select order_line records for specified warehouse id. Since primary key ORDER_LINE_PK is composite, and contains warehouse ID (CONSTRAINT ORDER_LINE_PK: Primary key (OL_W_ID, OL_D_ID, OL_O_ID, OL_NUMBER), it is effectively used in this query.
SELECT count(*)
FROM ORDER_LINE
WHERE OL_W_ID = 10050;
PLAN (ORDER_LINE INDEX (ORDER_LINE_PK))

COUNT
============
299509
Current memory = 167125464
Delta memory = -7160
Max memory = 167267976
Elapsed time= 3.41 sec
Buffers = 10000
Reads = 1994
Writes 0
Fetches = 599170
Count for previous query. The second query with the same parameters will be much faster, but queries with different parameters show similar result.
Select first 10 w_id, w_name, c_id, c_last
rom WAREHOUSE, CUSTOMER
where c_w_id = w_id and (c_w_id > 8000)
and (c_w_id < 10000)
PLAN JOIN (WAREHOUSE INDEX (WAREHOUSE_PK),
CUSTOMER INDEX (CUSTOMER_PK))

STATISTICS
Current memory = 167687664
Delta memory = -651888
Max memory = 168931400
Elapsed time= 0.19 sec
Buffers = 10000
Reads = 31
Writes 0
Fetches = 63
Query with join and 2 conditions.
select first 10 c1.C_ID, c1.C_FIRST, c1.C_LAST, o1.O_ID,
o1.O_OL_CNT, i1.I_NAME, i1.I_PRICE, ol1.OL_AMOUNT
from customer c1
join orders o1 on (c1.c_w_id = o1.O_w_ID
and c1.C_D_ID = o1.O_D_ID and c1.C_ID = o1.O_C_ID)
join ORDER_LINE ol1 on (ol1.ol_w_id = c1.C_W_ID
and ol1.OL_D_ID = c1.C_D_ID
and ol1.OL_O_ID = o1.O_ID)
join Item i1 on (ol1.OL_I_ID = i1.i_id)
where o1.o_d_id = 1 and o1.o_w_id = 1 and c1.c_id = 10;
PLAN JOIN (C1 INDEX (CUSTOMER_PK),
O1 INDEX (ORDERS_PK),
OL1 INDEX (ORDER_LINE_PK),
I1 INDEX (ITEM_PK))

STATISTICS
Current memory = 166807128
Delta memory = 6016
Max memory = 166879200
Elapsed time= 0.20 sec
Buffers = 9999
Reads = 1
Writes 0
Fetches = 4960
Query to show details for specific customer in the specific warehouse and district. It demonstrates performance of more complex joins.

Summary

As you can see, 1.7 Terabyte Firebird database shows pretty good performance for common read queries. If query is well-designed and uses good indices, performance is good even on the very big database. Of course, there are pain points, like long fetch time for very large datasets and long time for count operations (since Firebird visits all pages to get an exact number of records for the particular query in the specific transaction), but these pain points are well-known by experienced Firebird developers, and there are methods to design database in the way to workaround them.

But what about performance degradation for inserts and updates? Is it possible to compensate it with smart tuning of Firebird configuration?

In the next article we will consider tuning options for Firebird 1.7 terabyte database – we will boost our bird and make it fly.