Introduction
Tungsten Clustering depends on a number of prerequisites and best practices to function optimally.
In this blog post, we explore a critical, yet easily-overlooked step when installing a Tungsten Cluster node - configuring the available number of open files when using `systemd` control.
To ensure proper functioning of a Tungsten Cluster, please ensure that start-at-boot / stop-at-shutdown has been configured using deployall
. Please read the previous post in this series for more details:
https://www.continuent.com/resources/blog/tungsten-clustering-plugging-holes-risk-mitigation-through-best-practices.
The open files limit (LimitNOFILE
) is one key setting that must be correct in the systemd
service files for both MySQL Server and the tungsten-owned processes. If set too low, or not set at all, there will be insufficient resources available for production loads.
In a near-future version, the systemd service files installed by the Tungsten deployall
script will already have the correct values.
The Question
Recently a customer asked us:
“What caused the MySQL Server to run out of available open files?”
Plug The Hole: Root Cause
Tungsten-owned processes and the MySQL Server did not have the proper number of open files configured when under systemd
control.
Tell Me More
Although you may have the correct settings in the limits.conf
file (as seen by ulimit -a
), these values are not applied to services started via systemd
. (https://access.redhat.com/solutions/1257953)
You may see entries like the following in the MySQL Server error log:
[Warning] [MY-000000] [Server] Too many connections
...
[Warning] [MY-010140] [Server] Could not increase number of max_open_files to more than 10000 (request: 65535)
If the setting is missing from the systemd service file, then the defaults provided by the operating system may be too low depending on the OS vendor and version.
If the setting LimitNOFILE
is listed in the systemd service file, it is possible that the provided value is too low for a production server.
For example, you may check the current values via the `systemctl show
` command.
This example shows the values BEFORE `deployall
` has been run, which means that the Tungsten-specific service files are not yet found under /etc/systemd/system/
.
The values displayed are the default values provided by the OS:
shell> for d in tconnector tmanager treplicator mysqld; do echo $d; sudo systemctl show $d | grep LimitNOFILE; done
tconnector
LimitNOFILE=1048576
tmanager
LimitNOFILE=1048576
treplicator
LimitNOFILE=1048576
mysqld
LimitNOFILE=18446744073709551615
After the `deployall
` command has been run, the values displayed are the default values provided by the OS when the service file exists, which is very different from the one we saw above:
shell> for d in tconnector tmanager treplicator mysqld; do echo $d; sudo systemctl show $d | grep LimitNOFILE; done
tconnector
LimitNOFILE=4096
tmanager
LimitNOFILE=4096
treplicator
LimitNOFILE=4096
mysqld
LimitNOFILE=18446744073709551615
The above Tungsten-specific values are too low for a production server! (the mysqld value is fine)
By the way, in our lab environment, the MySQL Server has already been configured properly:
shell> sudo ls -l /etc/systemd/system/*.service
-rw-r--r-- 1 root root 2014 Dec 20 17:58 /etc/systemd/system/mysqld.service
shell> sudo grep -H LimitNOFILE /etc/systemd/system/*.service
/etc/systemd/system/mysqld.service:LimitNOFILE=infinity
shell> sudo systemctl show mysqld | grep LimitNOFILE
LimitNOFILE=18446744073709551615
If you do NOT have the service file in place for the MySQL Server, you may need to manually copy the template into place before editing it:
shell> sudo cp /lib/systemd/system/mysql.service /etc/systemd/system/
Please note that the filename mysql.service
shown above will vary based on multiple factors. You must check to be sure you are using the correct file. For example, in some cases the filename would be mysqld.service
instead.
These steps are clearly shown in our prerequisites docs:
https://docs.continuent.com/tungsten-clustering-7.0/prerequisite-host.html#prerequisite-host-user
Plug the Hole: Solution
The solution is to configure the correct number of open files for both the Tungsten Cluster and the MySQL Server when using systemd
, then restart the processes to ensure the new settings take effect.
Be sure to have followed all the steps needed to properly deploy the Tungsten software into systemd
control using the deployall
command shown in the documentation here:
https://docs.continuent.com/tungsten-clustering-7.0/prerequisite-host.html#prerequisite-host-user
and also explained in the previous blog post here:
https://www.continuent.com/resources/blog/tungsten-clustering-plugging-holes-risk-mitigation-through-best-practices
Once the deployall
steps have been completed, you will find 3 new systemd
service files for the Tungsten-owned processes (along with the existing one for mysqld
):
shell> sudo ls -l /etc/systemd/system/*.service
-rw-r--r-- 1 root root 2014 Dec 20 17:58 /etc/systemd/system/mysqld.service
-rw-r--r-- 1 root root 401 Dec 20 19:50 /etc/systemd/system/tconnector.service
-rw-r--r-- 1 root root 421 Dec 20 19:50 /etc/systemd/system/tmanager.service
-rw-r--r-- 1 root root 477 Dec 20 19:50 /etc/systemd/system/treplicator.service
shell> sudo grep -H LimitNOFILE /etc/systemd/system/*.service
/etc/systemd/system/mysqld.service:LimitNOFILE=infinity
shell> for d in tconnector tmanager treplicator mysqld; do echo $d; sudo systemctl show $d | grep LimitNOFILE; done
tconnector
LimitNOFILE=4096
tmanager
LimitNOFILE=4096
treplicator
LimitNOFILE=4096
mysqld
LimitNOFILE=18446744073709551615
Note the lack of the LimitNOFILE
setting found via grep
in any of the new Tungsten-specific service files?
Also, please note that the values of LimitNOFILE
for the new Tungsten-specific services are far too low at 4096 - we want them to be at least 65535.
Solution Steps: Summary
- Edit each of the
/etc/systemd/system/t*.service
files and add eitherLimitNOFILE=65535
orLimitNOFILE=infinity
to the[Service]
stanza. - Do the same with the
/etc/systemd/system/mysql*.service
file, if needed. - Reload systemd via `
sudo systemctl daemon-reload
`. - Validate the environment via `
systemctl show
`. - Restart each of the processes via
`sudo systemctl restart {tconnector|tmanager|treplicator|mysqld}
`.
Solution Step 1: Edit the Tungsten-Specific systemd Service Files
shell> sudo vi /etc/systemd/system/treplicator.service /etc/systemd/system/tmanager.service /etc/systemd/system/tconnector.service
3 files to edit
[Service]
…
LimitNOFILE=65535
~OR~
LimitNOFILE=infinity
Solution Step 2: Edit the MySQL-Specific systemd Service Files, If Needed
shell> sudo vi /etc/systemd/system/mysqld.service
1 file to edit
[Service]
…
LimitNOFILE=65535
~OR~
LimitNOFILE=infinity
Solution Step 3: Reload the systemd Service
shell> sudo systemctl daemon-reload
Solution Step 4: Validate the Environment
Check that the tungsten values are now OK if LimitNOFILE=65535
shell> for d in tconnector tmanager treplicator mysqld mysql; do echo $d; sudo systemctl show $d | grep LimitNOFILE; done
tconnector
LimitNOFILE=65535
tmanager
LimitNOFILE=65535
treplicator
LimitNOFILE=65535
mysqld
LimitNOFILE=18446744073709551615
Check that the tungsten values are now OK if LimitNOFILE=infinity
shell> for d in tconnector tmanager treplicator mysqld mysql; do echo $d; sudo systemctl show $d | grep LimitNOFILE; done
tconnector
LimitNOFILE=18446744073709551615
tmanager
LimitNOFILE=18446744073709551615
treplicator
LimitNOFILE=18446744073709551615
mysqld
LimitNOFILE=18446744073709551615
Solution Step 5: Restart Each of the Processes, So They Get the Correct Environment
On one node, set the policy to maintenance mode:
shell> cctrl
cctrl> set policy maintenance
cctrl> exit
Next, on each node in turn, restart the needed services:
shell> sudo systemctl restart mysqld (only if needed)
shell> sudo systemctl restart treplicator
shell> sudo systemctl restart tmanager
shell> sudo systemctl restart tconnector
Finally, set the policy back to automatic on a single node:
shell> cctrl
cctrl> set policy automatic
cctrl> exit
Wrap-Up
In this post we explored a critical, yet easily-overlooked step when installing a Tungsten Cluster node - configuring the open files limit when using `systemd
` control.
In a near-future version, the systemd service files installed by the Tungsten deployall script will already have the correct values.
Smooth sailing!
Comments
Add new comment