See this previous question for some background. I'm trying to renumber a corrupted MPTT tree using SQL. The script is working fine logically, it is just much too slow.
I repeatedly need to execute these two queries:
UPDATE `tree`
SET    `rght` = `rght` + 2
WHERE  `rght` > currentLeft;
UPDATE `tree`
SET    `lft` = `lft` + 2
WHERE  `lft` > currentLeft;
The table is defined as such:
CREATE TABLE `tree` (
  `id`        char(36) NOT NULL DEFAULT '',
  `parent_id` char(36) DEFAULT NULL,
  `lft`       int(11) unsigned DEFAULT NULL,
  `rght`      int(11) unsigned DEFAULT NULL,
  ... (a couple of more columns) ...,
  PRIMARY KEY (`id`),
  KEY `parent_id` (`parent_id`),
  KEY `lft` (`lft`),
  KEY `rght` (`rght`),
  ... (a few more indexes) ...
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The database is MySQL 5.1.37. There are currently ~120,000 records in the table. Each of the two UPDATE queries takes roughly 15 - 20 seconds to execute. The WHERE condition may apply to a majority of the records, so that almost all records need to be updated each time. In the worst case both queries are executed as many times as there are records in the database.
Is there a way to optimize this query by keeping the values in memory, delaying writing to disk, delaying index updates or something along these lines? The bottleneck seems to be hard disk throughput right now, as MySQL seems to be writing everything back to disk immediately.
Any suggestion appreciated.
I never used it, but if your have enough memory, try the memory table.
Create a table with the same structure as tree, insert into .. select from .., run your scripts against the memory table, and write it back.
Expanding on some ideas from comment as requested:
The default is to flush to disk after every commit. You can wrap multiple updates in a commit or change this parameter:
http://dev.mysql.com/doc/refman/5.1/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit
The isolation level is simple to change. Just make sure the level fits your design. This probably won't help because a range update is being used. It's nice to know though when looking for some more concurrency:
http://dev.mysql.com/doc/refman/5.1/en/set-transaction.html
Ultimately, after noticing the range update in the query, your best bet is the MEMORY table that andrem pointed out. Also, you'll probably be able to find some performance by using a btree indexes instead of the default of hash:
http://www.mysqlperformanceblog.com/2008/02/01/performance-gotcha-of-mysql-memory-tables/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With