As with MediaWiki 1.31 there is no extension to mass delete spam users (only manual merge & delete). We would delete the users via MySQL, but there are warnings that this method may destroy your database because of referenced tables. When deleting user tables/rows, are there ways to make sure no references are harmed? Any experiences or recommendations?
Today I was facing the issue with an old Mediawiki 1.23 and have searched a bit.
Based on the information above i experimented a bit.
First i wanted to assess the damage a bit:
external links
select count(*) from externallinks
select convert(el_to using utf8) as href
from externallinks l
There where some 150.000 external links
SQL Query to find out about users
select
convert(user_name using utf8) as name,
convert(user_touched using utf8) as time,
user_editcount
from user
order by 2 desc
In my case all SPAM users where create in the same time period.
SQL query with a join over page,revision,text and user table.
select
convert(u.user_name using utf8) as username,
p.page_id,
convert(p.page_title using utf8) as pagetitle,
r.rev_user as userid,
convert(t.old_text using utf8) as text
from page p
inner join revision r
on p.page_id=r.rev_page
inner join user u
on r.rev_user=u.user_id
inner join text t
on r.rev_text_id=t.old_id
SQL query to find number of revisions per user:
select count(*),u.user_id,convert(u.user_name using utf8) as username
from revision r
inner join user u
on r.rev_user=u.user_id
group by 2
order by 1 desc
In my case all "good" pages where fortunately only created by one user with the user_id=1 so I could assess the damage by:
select count(*) as textcount from text where old_id in (select rev_text_id from revision where not rev_user in (1));
giving me over half a million hits as result which means deletion is better done in a stepwise way:
select count(*) as textcount from text where old_id in (select rev_text_id from revision where not rev_user in (1));
set autocommit=0;
start transaction;
delete from text where old_id in (select rev_text_id from revision where not rev_user in (1)) limit 2000;
commit;
Please not that the limit of 2000 already lead to a runtime of some 2 minutes. So I'd have to run the above SQLStatement some 250 times waiting 2 minutes each ...
If you run into delete timing problems you might consider the hints in:
You can check our table status with:
show table status from <wiki-databasename>;
In my case the tables where using INNODB.
I tried increasing the innod_buffer_pool_size to 128 MByte but this didn't have a positive effect. The deletion was still slow.
I'll still try go get this finished and work my way by deleting the relevant rows in
I also checked the files in /var/lib/mysql/. Since I had innodb file per table on I saw that quite a few tables had grown very large.
So looked into
and started
optimize table text
Which took 8 hours to complete.
Fortunately in my case it's not really a production wiki. I just wanted to check the feasibility of the approach and it looks like it depends a lot on the number of rows involved.
The API and maintainance based based approaches might be much more effective depending on the concrete scenario.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With