Parallel processing of incremental backup in XtraBackup reduces prepare time and changes I/O behavior under load.
The problem does not manifest immediately — until the incremental backup starts to consist of thousands of .delta files. In the classic model, XtraBackup processed them sequentially. This meant high per-file overhead and linear growth in time during the prepare stage. In large installations with many tables (.ibd), prepare became a bottleneck and slowed down recovery time.
The solution is to add parallelism to the change application phase. In XtraBackup versions 8.0.35-33 and 8.4.0-3, a –parallel flag was introduced for the xtrabackup –prepare –apply-log-only command. Now the system first scans the backup directory and builds a queue of .delta files. Then multiple threads process this queue in parallel. This is a trade-off: acceleration is achieved at the cost of increased competition for disk I/O and overhead from managing threads.
The implementation changed the execution model itself. Previously, a file was processed immediately upon detection. Now, an explicit task queue is introduced. Each thread reads a .delta file and applies changes to the corresponding InnoDB file (.ibd). This scales better with a large number of small files. However, with fewer or larger files, the effect is limited. After about 16 threads, performance may plateau or even degrade slightly due to overhead.
Results show that the gain depends on the backup structure. In a scenario with 20,608 files of 2.5 MB, increasing –parallel from 1 to 64 resulted in Disk Write IOPS increasing from 18.2K to 85K. Prepare time decreased from 3.76 minutes to about one minute. This is approximately a 3.5x speedup with I/O growth of 4.67 times. In another case, time decreased from 237 minutes to 6 minutes, yielding up to a 40x speedup. There is no universal value, but a practical starting point is 8 threads.
It is important to understand the causal relationship: acceleration is achieved not by “magic,” but through better disk utilization and parallelizing small operations. If the system is already hitting IOPS or CPU limits, increasing –parallel will not help. In such conditions, regression is even possible. This is a typical trade-off between the latency of individual operations and the overall throughput of the system.
From an operational perspective, this change shifts the focus of tuning. Previously, the key factor was the backup structure. Now, it is the balance between the number of threads and the capabilities of the storage. Observability through IOPS metrics becomes critical for finding the optimal value.
In the industry, this approach has long been used for tasks with a high number of small files. XtraBackup is effectively catching up to this pattern. This is an evolutionary improvement that addresses a real bottleneck without changing the backup format.