Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete mediawiki spam users via mysql

As with MediaWiki 1.31 there is no extension to mass delete spam users (only manual merge & delete). We would delete the users via MySQL, but there are warnings that this method may destroy your database because of referenced tables. When deleting user tables/rows, are there ways to make sure no references are harmed? Any experiences or recommendations?

like image 342
Tristan Avatar asked Nov 15 '25 19:11

Tristan


1 Answers

Today I was facing the issue with an old Mediawiki 1.23 and have searched a bit.

  • MediaWiki Manual:Database layout
  • Database Schema Diagramm
  • Manual:Page table - deleting pages with their relationships in text and revision table

Based on the information above i experimented a bit.

First i wanted to assess the damage a bit:

external links

select  count(*) from externallinks

select convert(el_to using utf8) as href 
from externallinks l

There where some 150.000 external links

SQL Query to find out about users

select 
  convert(user_name using utf8) as name,
  convert(user_touched using utf8) as time,
  user_editcount 
from user 
order by 2 desc

In my case all SPAM users where create in the same time period.

SQL query with a join over page,revision,text and user table.

select 
  convert(u.user_name using utf8) as username,
  p.page_id,
  convert(p.page_title using utf8) as pagetitle,
  r.rev_user as userid,
  convert(t.old_text using utf8) as text
from page p
inner join revision r
  on p.page_id=r.rev_page
inner join user u
  on r.rev_user=u.user_id  
inner join text t
  on r.rev_text_id=t.old_id

SQL query to find number of revisions per user:

select count(*),u.user_id,convert(u.user_name using utf8) as username
from revision r
inner join user u
on r.rev_user=u.user_id
group by 2
order by 1 desc 

In my case all "good" pages where fortunately only created by one user with the user_id=1 so I could assess the damage by:

select count(*) as textcount from text where old_id in (select rev_text_id from revision where not rev_user in (1));

giving me over half a million hits as result which means deletion is better done in a stepwise way:

select count(*) as textcount from text where old_id in (select rev_text_id from revision where not rev_user in (1)); 
set autocommit=0;
start transaction;
delete from text where old_id in (select rev_text_id from revision where not rev_user in (1)) limit 2000; 
commit;

Please not that the limit of 2000 already lead to a runtime of some 2 minutes. So I'd have to run the above SQLStatement some 250 times waiting 2 minutes each ...

If you run into delete timing problems you might consider the hints in:

  • How can I improve DELETE FROM performance on large InnoDB tables?

You can check our table status with:

show table status from <wiki-databasename>;

In my case the tables where using INNODB.

I tried increasing the innod_buffer_pool_size to 128 MByte but this didn't have a positive effect. The deletion was still slow.

I'll still try go get this finished and work my way by deleting the relevant rows in

  • externallinks
  • revision
  • page

I also checked the files in /var/lib/mysql/. Since I had innodb file per table on I saw that quite a few tables had grown very large.

So looked into

  • https://www.percona.com/blog/2013/09/25/how-to-reclaim-space-in-innodb-when-innodb_file_per_table-is-on/

and started

optimize table text

Which took 8 hours to complete.

Fortunately in my case it's not really a production wiki. I just wanted to check the feasibility of the approach and it looks like it depends a lot on the number of rows involved.

The API and maintainance based based approaches might be much more effective depending on the concrete scenario.

like image 195
Wolfgang Fahl Avatar answered Nov 17 '25 08:11

Wolfgang Fahl