Communities

Writing

Codidact Meta

The Great Outdoors

Photography & Video

Scientific Speculation

Cooking

Electrical Engineering

Judaism

Languages & Linguistics

$Mathematics$

tag:snake search within a tag

answers:0 unanswered questions

user:xxxx search by author id

score:0.5 posts with 0.5+ score

"snake oil" exact phrase

votes:4 posts with 4+ votes

created:<1w created < 1 week ago

post_type:xxxx type of post

Search help

Notifications

Mark all as read See all your notifications »

Q&A Meta

Q&A

Posts Tags Edits

Ask Question

Welcome to the Power Users community on Codidact!

Power Users is a Q&A site for questions about the usage of computer software and hardware. We are still a small site and would like to grow, so please consider joining our community. We are looking forward to your questions and answers; they are the building blocks of a repository of knowledge we are building together.

Block certain namespace webpages for anonymous users (non registered users) with some Information Security method

−0

Per each general audience webpage (i.e. any main-namespace page such as an article page or Category: page), the MediaWiki content management system creates about 10 or 100 or 1,000 if not more webpages (link pages, revision pages, revision-diff pages, etc.) and for me that's a serious SEO problem.

MediaWiki doesn't have any core or even non core fast way to lock all these "peripheral webpages" (for lack of a better term) to registered users, so naturally any "anonymous user" which is also a Google crawler will crawl them each time anew and this can easily finish the crawling budget allocated for that website.

Blocking these pages with some brutal robots.txt such as the following is nice but robots.txt blocking is by nature only "advisory"; directives may go outdated; directives won't necessarily effect all search engines; and the following directives aren't accessible for users who don't know, or don't know enough, regex.

User-agent: *
Sitemap: https://example.com/sitemap/sitemap.xml
disallow: /index.php?
disallow: /index.php/*:
allow: /index.php/Category:
allow: /index.php/קטגוריה:

As of the time of publishing this post, MediaWiki doesn't have any command to block anything which isn't main-namespace from anonymous users (so that it won't even initially be discovered by search engines) and for me that's a serious SEO problem because it makes thousands if not tens or hundreds of thousands possibly irrelevant webpages to be discovered and most likely also periodically crawled (if or if not indexed) and it just "eats" any plausible crawling budget.

Blocking these webpage in the server level via Apache directives and regex isn't good because I do want to serve them, just not to anonymous users (which includes crawlers).

But, maybe some Web Application Firewal could help.
I host my website on a shared sever plan in Namecheap with Cpanel and Apache ModSecurity WAF (or other WAF).
Can this be of use to solve my problem and if so how?

website

posted about 2 years ago

CC BY-SA 4.0

deleted user

Raw

Markdown

History

is a duplicate

This question has been asked before and has already been answered. It should be marked as a duplicate.

Please enter the URL of the proposed duplicate in the details field below.

not constructive

This question cannot be answered in a way that is helpful to anyone. It's not possible to learn something from possible answers, except for the solution for the specific problem of the asker.

0 comment threads

0 answers

Score Active Age