Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

loading enwiki-latest-categorylinks.sql into mysql

Tags:

mysql

I want to load the wikipedia category-links table into the MySql database. I used the following command in the terminal:

mysql -u root -proot categorylinks < enwiki-latest-categorylinks.sql

The file is 11.6 GB in size and running since 3 days. Is there any better way to load it faster or some settings i need to set to make it run faster ???

like image 235
crazymav Avatar asked Oct 11 '25 07:10

crazymav


1 Answers

1) Separate the SQL dump file to two parts: creation and data

  1. grep -n "Dumping data" enwiki-latest-redirect.sql

This will return the line number separating the creation and data of the table parts of the file. Let it be 46.

  1. tail -n +46 enwiki-latest-redirect.sql > redirect -data.sql

The number we get in the last command is used as the input to tail to get the data part of the file.

  1. head -46 enwiki-latest-redirect.sql > redirect-creation.sql

The number we get in the first command is used as the input to head to get the table creation part of the file.

2) Remove indexes from the redirect-creation.sql

The following is the original CREATE TABLE statement in the dump:

CREATE TABLE `redirect` ( 
 `rd_from` int(8) unsigned NOT NULL DEFAULT '0', 
 `rd_namespace` int(11) NOT NULL DEFAULT '0', 
 `rd_title` varbinary(255) NOT NULL DEFAULT '', 
 `rd_interwiki` varbinary(32) DEFAULT NULL, 
 `rd_fragment` varbinary(255) DEFAULT NULL,   
  PRIMARY KEY (`rd_from`),   
  KEY `rd_ns_title` (`rd_namespace`,`rd_title`,`rd_from`)
) ENGINE=InnoDB DEFAULT CHARSET=binary; 

The following is the desired version after removing the indexes:

CREATE TABLE `redirect` ( 
 `rd_from` int(8) unsigned NOT NULL DEFAULT '0', 
 `rd_namespace` int(11) NOT NULL DEFAULT '0', 
 `rd_title` varbinary(255) NOT NULL DEFAULT '', 
 `rd_interwiki` varbinary(32) DEFAULT NULL, 
 `rd_fragment` varbinary(255) DEFAULT NULL, 
) ENGINE=InnoDB DEFAULT CHARSET=binary; 

Edit the page-creation.sql to remove indexes as shown above.

3) Load the created files

mysql -u root -p wikipedia < redirect-creation.sql
mysql -u root -p wikipedia < redirect-data.sql

4) Create indexes on fields you want after loading the data

create index idx_redirect_rd_from on redirect(rd_from)

I am able to load the categorylinks table in under an hour on my Macbook Pro by following the steps above.

like image 62
hemanth.b Avatar answered Oct 13 '25 20:10

hemanth.b