Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to the Power Users community on Codidact!

Power Users is a Q&A site for questions about the usage of computer software and hardware. We are still a small site and would like to grow, so please consider joining our community. We are looking forward to your questions and answers; they are the building blocks of a repository of knowledge we are building together.

Block certain namespace webpages for anonymous users (non registered users) with some Information Security method

+2
−0

Per each general audience webpage (i.e. any main-namespace page such as an article page or Category: page), the MediaWiki content management system creates about 10 or 100 or 1,000 if not more webpages (link pages, revision pages, revision-diff pages, etc.) and for me that's a serious SEO problem.

MediaWiki doesn't have any core or even non core fast way to lock all these "peripheral webpages" (for lack of a better term) to registered users, so naturally any "anonymous user" which is also a Google crawler will crawl them each time anew and this can easily finish the crawling budget allocated for that website.

Blocking these pages with some brutal robots.txt such as the following is nice but robots.txt blocking is by nature only "advisory"; directives may go outdated; directives won't necessarily effect all search engines; and the following directives aren't accessible for users who don't know, or don't know enough, regex.

User-agent: *
Sitemap: https://example.com/sitemap/sitemap.xml
disallow: /index.php?
disallow: /index.php/*:
allow: /index.php/Category:
allow: /index.php/קטגוריה:

As of the time of publishing this post, MediaWiki doesn't have any command to block anything which isn't main-namespace from anonymous users (so that it won't even initially be discovered by search engines) and for me that's a serious SEO problem because it makes thousands if not tens or hundreds of thousands possibly irrelevant webpages to be discovered and most likely also periodically crawled (if or if not indexed) and it just "eats" any plausible crawling budget.


Blocking these webpage in the server level via Apache directives and regex isn't good because I do want to serve them, just not to anonymous users (which includes crawlers).

But, maybe some Web Application Firewal could help.
I host my website on a shared sever plan in Namecheap with Cpanel and Apache ModSecurity WAF (or other WAF).
Can this be of use to solve my problem and if so how?

History
Why does this post require moderator attention?
You might want to add some details to your flag.
Why should this post be closed?

0 comment threads

0 answers

Sign up to answer this question »