MySql UTF8 character set

MySQL database has two implementations for utf8 character set:

  • utf8 (As of MySQL 4.1)
  • utf8mb4 (As of MySQL 5.5)

In this writing, I am going to explain the difference between these two character set and how you can store data with utf8mb4 character set in your MySQL database.

The key point is, utf8 uses a maximum of 3 bytes per character and the utf8mb4 character set uses a maximum of 4 bytes per character. For this reason, utf8mb4 can store additional characters that cannot be stored by utf8mb3 (alias for utf8) character set.

To be consistent, I suggest that all of the below items have utf8mb4 character set if you want to store data properly. I haven’t try that but, you could loose data or maybe get an exception during dml statement if one of the items is not properly set to utf8mb4.

  • Server Character Set
  • Database Character Set
  • Schema Character Set
  • Table Character Set
  • Column Character Set

Now, I am going to explain how you can change each items’ character set value and how you can check the values of them. I did my experimentation on MySQL 5.7 and there could be additional or different steps for other versions. The configurations are inherited from each other and it should be enough to satisfy most generic condition. If you encounter a problem, you can check configurations from top to the bottom to be sure everything is as expected.

Server Character Set:

While MySQL database standing up, it reads necessary properties from different configuration files. You can find related information from this link. For my Ubuntu environment, I choose to change ‘/etc/mysql/my.cnf’ file and for my Windows environment I choose to change ‘C:/ProgramData/MySQL/MySQL Server 5.7/my.ini’ file. But, be careful while changing files. Because, if you define same property twice or remove a required property, your database will not stand up. Suspect from your conf files and investigate deeper your changes if any unexpected behavior occurs. Here is the configurations:

[client]
default-character-set=utf8

[mysql]
default-character-set=utf8

[mysqld]
character-set-server=utf8mb4
collation-server=utf8mb4_general_ci

In MySQL database, there isn’t any different configuration parameter to configure default-character-set option for utf8mb4 different than utf8. As you can see, we change character-set-server to ‘utf8mb4’ and collation-server to ‘utf8mb4_general_ci’. Collation affects the order of characters when you need ordering. Further reading about collations can be found from this link. After you change the files, MySQL Server must be restarted. Below commands may change according to service names and for Windows you may need to open cmd as an administrator.

On Linux:

sudo service mysql restart

On Windows:

net stop MySQL57
net start MySQL57

When MySQL server started, you can connect to database and run below command to check if your changes works:

show variables
where variable_name like '%char%';

You must see character set of server similar to below:
MySQL Character Set

Database Character Set:
Database character set can be specified when you are creating the database or you can alter the database after you created it.

During create phase below command will be enough:

CREATE DATABASE your_database_name CHARACTER SET utf8mb4;

For an existing database, you can change character set using below command:

ALTER DATABASE your_database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;

You can control the changes with the same query as server character set:

show variables
where variable_name like '%char%';

Database Character Set

Schema Character Set:
If you already executed above steps, you do not need to do anything specific for the rest. But just for the migration issues, I will explain how to alter and check the status of remaining database objects.

For an existing schema:

ALTER SCHEMA your_schema_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;

You can check the changes with below query:

select *
from information_schema.schemata 
where schema_name = "your_schema_name";

Table Character Set:

For an existing table:

ALTER TABLE your_schema_name.your_table_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;

You can control the changes with below query:

select *
from information_schema.tables t,
     information_schema.collation_character_set_applicability ccsa
where ccsa.collation_name = t.table_collation
and   t.table_schema = "your_schema_name"
and   t.table_name = "your_table_name";

Columns Character Set:

For an existing column:

ALTER TABLE your_schema_name.your_table_name MODIFY your_column_name VARCHAR(4000) CHARACTER SET utf8mb4;

You can control the changes with below query:

select *
from information_schema.columns 
where table_schema = "your_schema_name"
and   table_name = "your_table_name"
and   column_name = "your_column_name";

Connection String of Application:
If you want connection of your application to use utf8, you could use below connection string in your Java application. I think same approach can be implemented for other languages.

jdbc:mysql://localhost:3306/your_database_name?useUnicode=true

Developers Rock!!!

Advertisements
This entry was posted in Java, MySql and tagged , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s