Jump to content

Recommended Posts

When Google or other search engines come to your site to read and store the content in its search index, it will look for a special file called robots.txt. This file is a set of instructions to tell search engines where they can look to crawl content and where they are not allowed to crawl content. We can use these rules to ensure that search engines don't waste their time looking at links that do not have valuable content and avoid links that produce faceted content.

 

Why is this important?

Search engines need to look at and store as many pages that exist on the internet as possible. There are currently an estimated more 4.5 billion web pages active today. That's a lot of work for Google.

It cannot look and store every single page, so it needs to decide what to keep and how long it will spend on your site indexing pages. This is called a crawl budget.

How many pages a day Google will index depends on many factors, including how fresh the site is, how much content you have and how popular your site is. Some websites will have Google index as few as 30 links a day. We want every link to count and not waste Google's time.

 

What does the suggested Robots.txt file do?

ClicShopping optimised rules exclude site areas with no unique content but instead redirect links to existing topics. Also excluded are areas such as the privacy policy, cookie policy, log in and register pages and so on. Submit buttons and filters are also excluded to prevent faceted pages. Finally, user profiles are excluded as these offer little valuable content for Google but contain around 150 redirect links. Given that Google has more seconds on your site, these links that exist elsewhere eat up your crawl budget quickly.

 

What is the suggested Robots.txt file?

Here is the content of the suggested Robots.txt file. If your ClicSHoppingis inside a directory, you will need to apply it to the root of your site manually. So, for example, if your community was at /home/site/public_html/myDirectory/ - you would need to create this robots.txt file and add it to /home/site/public_html. It's simple just edit robot.txt and change inside the information

 

example of robot.txt

Note : domain.ltd must be changed by your domain.
 

# Rules for ClicSopping (https://www.clicshopping.org)

User-Agent: *

# Block pages with no unique content

Disallow: /Account/LogIn/

Disallow: /Account/CreatePro

Disallow: /Account/Create

Disallow: /Account/PasswordForgotten

Disallow: /Search/AdvancedSearch/

Disallow: /Search/Q/

# Block faceted pages and 301 redirect pages

Disallow: /*?page=

Disallow: /*?sort=

# Sitemap URL

Sitemap: https://domain.tld/index.php?Sitemap&GoogleSitemapIndex

 

Link to post
Guest
This topic is now closed to further replies.
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use