MariaDB tuning for Zabbix – part 2

Last year, I wrote a post about MariaDB tuning for Zabbix. My Zabbix environment has grown since then and is now a “Very large” deployment according to the Zabbix documentation. One of the issues you will run into in a large deployment is the housekeeping process, which is in charge of deleting old/expired data from the database. In particular, it is the history and trend housekeeping that becomes a performance bottleneck. One of the first signs of this problem is that the trigger “Zabbix server: Utilization of housekeeper processes over 75%” is repeatedly firing on the Zabbix server with an increasing duration. In my case, this trigger launched three times each day and lasted for about two hours. When the housekeeping process was running, the Zabbix GUI was noticeably slow and searching for historical data was extremely slow.

There are two steps to solving this problem. First, you must define a data retention period for history and trends. History is an exact representation of the gathered data, while trends is an hourly average of the history. For example, in the “ICMP ping” template, history is kept for 31 days and trends for 1 year with an update interval of 1 minute for all items. This means that one row per item is written to the database every minute (no discard preprocessing in this example) to preserve the history, while one row is added to the database each hour to track trends. For reference, I keep history for 15 days and trends for 1 year.

The second step is to partition MariaDB by range. Dropping a partition in MySQL is almost instantaneous. The goal is to partition the various history and trend tables into smaller subsets (history tables by days, and trends by months) and then have a cronjob on the Zabbix server drop old and create new partitions once each day, allowing us to disable the housekeeping process for history and trends in the Zabbix GUI (Administration – Housekeeping). How this is done is well documented by Zabbix here.

For reference, with a 200 GB database and pretty solid hardware (disk) it took around 24 hours to partition everything. One problem I ran into was that the MariaDB tmp directory was not big enough (> 50GB in my case), so make sure you have enough space. To maintain the partitions (drop old and create new partitions) I wrote a Python script (based on this) and run the script locally on the Zabbix server in Cron once each night.

With history and trends partitioned, housekeeping disabled and some simple tuning, I’m able to manage a pretty large Zabbix deployment without any performance issues.


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *