Lesson 1 of the SQL Optimization Course: The Art of MySQL Data Types：Part 1

Lesson 1 of the SQL Optimization Course: The Art of MySQL Data Types

Introduction

For relational databases, the design of tables and SQL is written are particularly crucial. It wouldn’t be an exaggeration to say that they account for 90% of performance. So this time, specifically targeting these two major knowledge areas, we’ll conduct a detailed analysis for you, peeling back the layers.

This Series uses plain and understandable language and selects a large number of examples to elaborate on the subtleties for you.

🧑‍💻 Target audience:

DBA
Database developers
Students

We will use MySQL as the demonstration database.

Tips: Because this is the first lession, there is quite a lot of content. Please enjoy.

Lession 1: The Art of MySQL Data Types

Today’s opening topic is the data types of MySQL, which mainly includes two parts:

The selection of MySQL field types, where the appropriateness of field type definitions directly affects the occupation of disk space and the optimal performance of MySQL.
The application of MySQL data types in stored programs, such as stored functions, stored procedures, triggers, etc.

Introduction to MySQL Data Types

MySQL data types can be roughly divided into the following categories:

Numeric types: tinyint, int, bigint
Floating-point/fixed-point types: float, double, decimal, etc.
String types: char, varchar
Date and time types: date, datetime, time, timestamp
Binary types: binary, varbinary
Bit types
Enumeration types
Set types

Of course, these are not all. The following content will be introduced in the next lession.

Large object types, including: text, blob
Document types, Json

1. Numeric types

If it is used to store integers, different types should be chosen according to the different ranges.

Range	Data type	Size
-128 - 127 / 0- 255	tinyint	1 Byte
-pow(2, 31) - pow(2,31)-1 / 0 - pow(2,31)-1	int32	1 Byte
-pow(2, 63) - pow(2,63)-1 / 0 - pow(2,63)-1	int64	1 Byte

The above are several examples of integer selection. Integers have the widest range of applications. They can be used to store numbers, timestamps, and encodings of other types converted to numbers, such as IPv4, etc.

Example 1

Using int32 to store IPv4 addresses saves space compared to simply using strings. For table x1 and field ipaddr, use the function inet_aton, and for retrieval, use the function inet_ntoa.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
mysql>create table x1( ipaddr int unsigned);
Query OK, 0 rows affected (0.04 sec)

mysql>insert into x1 values (inet_aton('192.168.2.171'));
Query OK, 1 row affected (0.01 sec)

mysql>insert into x1 values (inet_aton('192.168.2.172'));
Query OK, 1 row affected (0.01 sec)

mysql>insert into x1 values (inet_aton('192.168.2.173'));
Query OK, 1 row affected (0.01 sec)

mysql>insert into x1 values (inet_aton('192.168.2.174'));
Query OK, 1 row affected (0.01 sec)

Alter the table structure and add a virtual column to convert the field ipaddr.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
mysql>alter table x1 add column \
   ->ipaddr_real varchar(20) generated always as (inet_ntoa(ipaddr)) virtual;
Query OK, 0 rows affected (0.02 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql>select * from x1;
+------------+---------------+
| ipaddr     | ipaddr_real   |
+------------+---------------+
| 3232236203 | 192.168.2.171 |
| 3232236204 | 192.168.2.172 |
| 3232236205 | 192.168.2.173 |
| 3232236206 | 192.168.2.174 |
+------------+---------------+
4 rows in set (0.00 sec)

Example 2

For a simple comparison of the occupied disk space size, I defined three tables t1, t2, and t3. The fields are tinyint, int32, and int64 respectively. Each table has the same data, and the number of records is 20 million rows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
mysql>CALL `get_schema_records`();

+------------+------------+------------+
| t1 records | t2 records | t3 records |
+------------+------------+------------+
|   20000000 |   20000000 |   20000000 |
+------------+------------+------------+
1 row in set (1.89 sec)

Query OK, 0 rows affected (1.89 sec)

Check the occupied disk space. The table t3 occupies the most, and t1 occupies the least. Therefore, if the storage range of integers has a fixed upper limit and there is no need for capacity expansion in the future, it is recommended to choose the smallest type. Of course, this also applies to other types.

1
2
3
4
5
root@ytt-pc:/var/lib/mysql/3305/ytt# ls -sihl
Total 3.0G
3541825 861M -rw-r----- 1 mysql mysql 860M 12月 10 11:36 t1.ibd
3541820 989M -rw-r----- 1 mysql mysql 988M 12月 10 11:38 t2.ibd
3541823 1.2G -rw-r----- 1 mysql mysql 1.2G 12月 10 11:39 t3.ibd

2. Floating-point numbers / Fixed-point numbers

First, for floating-point numbers, both float and double represent floating-point numbers. The simple difference to remember is that float occupies 4 Byte by default.

The p in float(p) represents the minimum precision of the integer part. If p > 24, it is directly converted to double, occupying 8 Byte. The maximum value of p is 53, but the maximum value has the problem of inaccurate calculation.

Then for fixed-point numbers, including decimal and its synonym numeric, the integer part and the decimal part of the fixed-point number are stored separately, and the maximum effective precision cannot exceed 65.

So the difference from float is in precise storage. If precise storage or precise calculation is necessary, it is best to define it as decimal.

Example 3

Create a table y1 and give different types to fields f1, f2, and f3 respectively.

1
2
mysql>create table y1(f1 float,f2 double,f3 decimal(10,2));
Query OK, 0 rows affected (0.03 sec)

Insert some values.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
mysql>insert into y1 values (10.2,10.2,10.2);
Query OK, 1 row affected (0.01 sec)

mysql>insert into y1 values (100.12,100.12,100.12);
Query OK, 1 row affected (0.01 sec)

mysql>insert into y1 values (1001.12,1001.12,1001.12);
Query OK, 1 row affected (0.01 sec)

mysql>insert into y1 values (12001.12,12001.12,12001.12);
Query OK, 1 row affected (0.01 sec)

mysql>insert into y1 values (12001222.12,12001222.12,12001222.12);
Query OK, 1 row affected (0.00 sec)

Retrieve the number of records and find that the number of records stored in field f1 is already inaccurate.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
mysql>select * from y1;
+---------+----------+----------+
| f1      | f2       | f3       |
+---------+----------+----------+
|    10.2 |     10.2 |    10.20 |
|  100.12 |   100.12 |   100.12 |
| 1001.12 |  1001.12 |  1001.12 |
| 12001.1 | 12001.12 | 12001.12 |
+---------+----------+----------+
4 rows in set (0.00 sec)

Then increase the precision of f3 and see the storage of double.

1
2
3
mysql>alter table y1 modify f3 decimal(50,10);
Query OK, 5 rows affected (0.06 sec)
Records: 5  Duplicates: 0  Warnings: 0

Insert a larger record.

1
2
mysql>insert into y1 values (123456789010.1234567,123456789010.1234567,123456789010.1234567);
Query OK, 1 row affected (0.01 sec)

Retrieve and find that only the value of f3 is precise.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
mysql>select * from y1;
+--------------+--------------------+-------------------------+
| f1           | f2                 | f3                      |
+--------------+--------------------+-------------------------+
|         10.2 |               10.2 |           10.2000000000 |
|       100.12 |             100.12 |          100.1200000000 |
|      1001.12 |            1001.12 |         1001.1200000000 |
|      12001.1 |           12001.12 |        12001.1200000000 |
|     12001200 |        12001222.12 |     12001222.1200000000 |
| 123457000000 | 123456789010.12346 | 123456789010.1234567000 |
+--------------+--------------------+-------------------------+
6 rows in set (0.00 sec)

3. String types

String types, like integer types, have a wide range of uses. They are used to store characters, strings, and all unknown types in MySQL.

It can simply be said to be an all-purpose type!

Type	Max Value	Length
chat	255	Immutable
varchar	65535	Variable

char(10) represents that a maximum of 10 characters are supported for storage. Although varchar(10) can store the same number of characters as char(10), the difference is that the varchar type stores the actual size, while the char stores the theoretical fixed size. The specific number of bytes is related to the character set.

Example 4

For example, in the following table t4, there are two fields c1 and c2, which are char and varchar respectively.

1
2
mysql>create table t4 (c1 char(20),c2 varchar(20));
Query OK, 0 rows affected (0.02 sec)

Insert a record.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
mysql>set @a = "I am a DBA!";
Query OK, 0 rows affected (0.00 sec)

mysql>insert into t4 values (@a,@a);
Query OK, 1 row affected (0.00 sec)

mysql>select * from t4;
+--------------------------------------+--------------------------------------+
| c1                                   | c2                                   |
+--------------------------------------+--------------------------------------+
| I am a DBA! | I am a DBA! |
+--------------------------------------+--------------------------------------+
1 row in set (0.00 sec)

Take out the values of the two fields respectively and find that c1 has much more content than c2, including the number of characters and the number of bytes.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
mysql>SELECT 'c1' AS 'column list',
   -> char_length(c1) '  as characters',
   -> length(c1) ' as bytes'
   -> FROM t4
   -> UNION all
   -> SELECT 'c2',
   -> char_length(c2) as ' characters',
   -> length(c2) as ' bytes'
   -> FROM t4;
+-------------+---------------+----------+
| column list | as characters | as bytes |
+-------------+---------------+----------+
| c1          |            20 |       44 |
| c2          |            11 |       22 |
+-------------+---------------+----------+
2 rows in set, 4 warnings (0.00 sec)

So in the selection of char and varchar, attention should be paid to whether the value range is appropriate. For example, for a fixed-length value, char must be selected; for an uncertain value, varchar is selected.

4. Data tpyes

Date types include date, time, datetime, timestamp, and year. Year occupies 1 Byte, and date occupies 3 Bytes.

When not including decimal places, time, timestamp, and datetime occupy 3 Bytes, 4 Bytes, and 8 Bytes respectively; the decimal place part is calculated for disk occupation separately, as shown in the following table.

Decimal	Byte
0	0
1,2	1
3,4	2
5,6	3

Note: The timestamp represented by timestamp is an integer stored in int32, and its value range is from ‘1970-01-01 00:00:01.000000’ to ‘2038-01-19 03:14:07.999999’; The value range of datetime is from ‘1000-01-01 00:00:00.000000’ to ‘9999-12-31 23:59:59.999999’.

To sum up, the selection of date types follows the following principles:

If the time may exceed the timestamp range, datetime is preferred.
If it is necessary to obtain the year value separately, such as partitioning by year or retrieving by year, it is best to add a year type to the table to participate.
If it is necessary to obtain the date or time separately, it is best to store it separately instead of simply using datetime or timestamp. When retrieving later, add a function filter to avoid additional consumption caused by writing SQL later.
If there is a need to save milliseconds and the like, it is best to use the characteristics of the time type itself instead of directly using the character type instead. The additional consumption of resource caused by the type conversion inside MySQL also needs to be considered.

Example 5

Create table t5, separate all these possible required fields, so it will be very easy to write SQL statements in the future.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
mysql>create table t5 (c1 date,c2 datetime(3),c3 timestamp(3),c4 time(3),c5 year);
Query OK, 0 rows affected (1.01 sec)

mysql>set @a='2018-03-25 12:22:33.342';
Query OK, 0 rows affected (0.00 sec)

mysql>insert into t5 values (date(@a), @a,@a,time(@a),year(@a));
Query OK, 1 row affected (0.01 sec)

mysql>select * from t5;
+------------+-------------------------+-------------------------+--------------+------+
| c1         | c2                      | c3                      | c4           | c5   |
+------------+-------------------------+-------------------------+--------------+------+
| 2018-03-25 | 2018-03-25 12:22:33.342 | 2018-03-25 12:22:33.342 | 12:22:33.342 | 2018 |
+------------+-------------------------+-------------------------+--------------+------+

Of course, this situation occupies additional disk space. If you want to strike a balance between ease of use and large space occupation, you can use the virtual column of MySQL for real-time calculation. For example, suppose the field c5 does not exist and you want to get the result of c5.

1
2
3
mysql>alter table t5 drop c5, add c5 year generated always as (year(c1)) virtual;
Query OK, 1 row affected (2.46 sec)
Records: 1  Duplicates: 0  Warnings: 0

5. Binary types

Binary and varbinary correspond to the binary storage of char and varchar, and the related characteristics are the same. The differences are as follows:

binary(10)/varbinary(10) represents not the number of characters, but the number of bytes.
The line terminators are different. The line terminator of char is \0, and the line terminator of binary is 0x00.
Because it is binary storage, character encoding and sorting rules like this are directly invalid.

Example 6

Let’s take a look at this simple example of binary access, still the previous variable @a.

Remember! Here, you need to calculate the number of bytes occupied by @a in advance to prevent storage overflow.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
mysql>set @a = "I am a DBA!";
Query OK, 0 rows affected (0.00 sec)

mysql>create table t6 (c1 binary(28),c2 varbinary(28));
Query OK, 0 rows affected (0.03 sec)

mysql>insert into t6 values (@a,@a);
Query OK, 1 row affected (0.01 sec)

mysql>select * from t6;
+------------------------------+------------------------------+
| c1                           | c2                           |
+------------------------------+------------------------------+
| I am a DBA!         | I am a DBA!         |
+------------------------------+------------------------------+
1 row in set (0.00 sec)

6. Bit types

bit is the type for storing bit positions in MySQL, supporting a maximum of 64 bit positions, and is directly stored in binary form. It is generally used to store status-type information. For example, gender, true or false, etc. It has the following characteristics:

For bit(8), if only 1 bit is stored simply, 0 is padded on the left as 00000001.
When querying, data can be filtered directly in decimal.
If an index is added to this field, MySQL will not do type conversion by itself and can only filter with binary.

Example 7

Create table c1, and define a bit position for the field gender.

1
2
mysql>create table c1(gender bit(1));
Query OK, 0 rows affected (0.02 sec)

Insert two records.

1
2
3
4
5
mysql>insert into c1 values (b'0');
Query OK, 1 row affected (0.01 sec)

mysql>insert into c1 values (b'1');
Query OK, 1 row affected (0.00 sec)

Retrieve the data and display it directly in decimal.

1
2
3
4
5
6
7
8
mysql>select gender+0 'f1'  from c1;
+------+
| f1   |
+------+
|    0 |
|    1 |
+------+
2 rows in set (0.01 sec)

It is also possible to use type display conversion.

1
2
3
4
5
6
7
8
mysql>select cast(gender as unsigned)  'f1' from c1;
+------+
| f1   |
+------+
|    0 |
|    1 |
+------+
2 rows in set (0.00 sec)

Filtering data is the same. Either binary or direct decimal is fine.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
mysql>select conv(gender,16,10) as gender \
   -> from c1 where gender = b'1'; 
+--------+
| gender |
+--------+
| 1      |
+--------+
1 row in set (0.00 sec)

mysql>select conv(gender,16,10) as gender \
    -> from c1 where gender = '1';
+--------+
| gender |
+--------+
| 1      |
+--------+
1 row in set (0.00 sec)

In fact, for such a scenario, it can also be defined as char(0), which is also a very optimized usage similar to bit.

1
2
mysql>create table c2(gender char(0));
Query OK, 0 rows affected (0.03 sec)

Now I’ll simply create some test data for table c1.

1
2
3
4
5
6
7
mysql>select count(*) from c1;
+----------+
| count(*) |
+----------+
| 33554432 |
+----------+
1 row in set (1.37 sec)

Insert all the data of c1 into c2.

1
2
3
mysql>insert into c2 select if(gender = 0,'',null) from c1;
Query OK, 33554432 rows affected (2 min 18.80 sec)
Records: 33554432  Duplicates: 0  Warnings: 0

The disk space occupied by the two tables is about the same.

1
2
3
4
root@ytt-pc:/var/lib/mysql/3305/ytt# ls -sihl
Total 1.9G
4085684 933M -rw-r----- 1 mysql mysql 932M 12月 11 10:16 c1.ibd
4082686 917M -rw-r----- 1 mysql mysql 916M 12月 11 10:22 c2.ibd

The retrieval methods are slightly different, but the efficiency is about the same.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
mysql>select count(*) from c1 where gender = 0;
+----------+
| count(*) |
+----------+
| 16772213 |
+----------+
1 row in set (12.03 sec)

mysql>select count(*) from c2 where gender = '';
+----------+
| count(*) |
+----------+
| 16772213 |
+----------+
1 row in set (12.53 sec)

So, it can be said that the character type is truly an all-purpose type.

7. Enumeration types

Enumeration types, namely enum. It is suitable for situations where all known values have been planned in advance and it is best not to add new values in the future. Enumeration types have the following characteristics:

It occupies a maximum of 2 Bytes.
It supports a maximum of 65,535 different elements.
MySQL stores it in the form of subscripts in the background, that is, in the form of tinyint or smallint, and the subscript starts from 1.
When sorting, it is sorted according to the subscript, not according to the data type of the elements inside. So pay special attention to this point.

Example 8

Create table t7.

1
2
mysql>create table t7(c1 enum('mysql','oracle','dble','postgresql','mongodb','redis','db2','sql server'));
Query OK, 0 rows affected (0.03 sec)

Insert data using subscripts.

1
2
3
4
5
mysql>insert into t7 values (1);
Query OK, 1 row affected (0.40 sec)

mysql>insert into t7 values (2);
Query OK, 1 row affected (0.00 sec)

Insert data using real elements.

1
2
3
4
5
6
7
8
mysql>insert into t7 values ('postgresql');
Query OK, 1 row affected (0.01 sec)

mysql>insert into t7 values ('dble');
Query OK, 1 row affected (0.01 sec)

mysql>insert into t7 values ('sql server');
Query OK, 1 row affected (0.01 sec)

When the query results come out, it is found that it is sorted according to the previously defined elements, that is, sorted by subscript.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
mysql>select * from t7 order by c1;

+------------+
| c1         |
+------------+
| mysql      |
| oracle     |
| dble       |
| postgresql |
| sql server |
+------------+
5 rows in set (0.00 sec)

8. Set types

The set type SET is similar to enumeration. You also need to know in advance how many elements there are. SET has the following characteristics:

It occupies a maximum of 8 Bytes, int64.
It is stored internally in the form of binary bits. If the corresponding subscripts are viewed in decimal, they are 1, 2, 4, 8,…, pow(2, 63) respectively.
It supports a maximum of 64 different elements. For duplicate element insertions, the duplicates are directly removed when retrieved.
Elements can be inserted in combination. For example, those with subscripts 1 and 2 can be inserted together, and you can directly insert 3.

Example 9

Define the field c1 of table c7 as a set type, containing 8 values, that is, the maximum of the following table is pow(2, 7).

1
2
mysql>create table c7(c1 set('mysql','oracle','dble','postgresql','mongodb','redis','db2','sql server'));
Query OK, 0 rows affected (0.02 sec)

Insert all combinations from 1 to 128.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
mysql>INSERT INTO c7
WITH RECURSIVE ytt_number (cnt) AS (
        SELECT 1 AS cnt
        UNION ALL
        SELECT cnt + 1
        FROM ytt_number
        WHERE cnt < pow(2, 7)
    )
SELECT *
FROM ytt_number;
Query OK, 128 rows affected (0.01 sec)
Records: 128  Duplicates: 0  Warnings: 0

When querying, you can also directly use the subscript or the element to search.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
mysql>select * from c7 where c1 = 1;
+-------+
| c1    |
+-------+
| mysql |
+-------+
1 row in set (0.00 sec)

mysql>select * from c7 where c1 = 'mysql';
+-------+
| c1    |
+-------+
| mysql |
+-------+
1 row in set (0.00 sec)

9.Usage of data types in stored functions

In functions, except for explicitly declared variables, the data types of default Session variables are very weak and are freely converted depending on the given values.

Example 10

Define a function that returns the product of two given parameters. There are two variables in the definition. One is v_tmp explicitly defined as int64, and the other, @vresult, freely changes its type depending on the type of the given value.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
DELIMITER $$
CREATE DEFINER=`ytt`@`127.0.0.1` FUNCTION `ytt_sample_data_type`(`f1` INT, `f2` INT) RETURNS varchar(100) CHARSET latin1
   NO SQL
begin
declare v_tmp bigint default 0;
set v_tmp = f1 * f2;
set @v_result = concat('The result is: ''',v_tmp,'''.');
return @v_result;
end$$
DELIMITER ;

Call it simply.

1
2
3
4
5
6
7
mysql>select ytt_sample_data_type(1111,222) 'result';
+--------------------------+
| result                   |
+--------------------------+
| The result is: '246642'. |
+--------------------------+
1 row in set (0.00 sec)

Summary

This article has given a brief introduction to the basic data types of MySQL and used some easy-to-understand examples to sort out these types. In actual scenarios, it is recommended to choose the most suitable type, and it is not recommended to simply follow the maximization principle for all data types. For example, if varchar(100) can be used, do not use varchar(1000).

👋 See you in the next lesson.

What is SQLFlash?

SQLFlash is your AI-powered SQL Optimization Partner.

Based on AI models, we accurately identify SQL performance bottlenecks and optimize query performance, freeing you from the cumbersome SQL tuning process so you can fully focus on developing and implementing business logic.

Lesson 1 of the SQL Optimization Course: The Art of MySQL Data Types：Part 1

Table of Contents