Allowing Facebook / facebookexternalhit in the Robots.txt File

Link: https://support.brilliantdirectories.com/support/solutions/articles/12000100532-allowing-facebook-facebookexternalhit-in-the-robots-txt-file

In order to share content from a website on Facebook, the "facebookexternalhit" bot must be allowed in the robots.txt file of the website. This can be done by adding these lines to the robots.txt file through the admin area of the site:


User-agent: facebookexternalhit
Disallow: /api/
Allow: /


Add this to the robots.txt file anywhere above the final lines of the file highlighted in the image below:



Bandwidth


IMPORTANT: Allowing this bot will have an extreme impact on the bandwidth usage of a website


We have observed that Facebook's "facebookexternalhit" bot consumes an extremely large amount of bandwidth on websites that allow this bot in their robots.txt file.

 

While allowing this bot is required for sharing content on Facebook, it also appears that Facebook is abusing this permission to aggressively crawl websites in order to extract additional data or some other potentially undesirable activity.  

 

The resulting bandwidth usage far exceeds what would be required for sharing content on Facebook and has led to performance issues for some websites.


We have seen the Facebook bot consume 10x - 100x the bandwidth per month compared to other desirable bots used by Google and Bing for example.  This can lead to a site requiring additional bandwidth that it would not require otherwise or the site experiencing performance issues in extreme cases.

Here is an example of the Facebook bot consuming 10x the bandwidth of Bing and 100x the bandwidth of Google on a site over the same time period:

 


For these reasons, we do not allow this bot by default, and only recommend allowing it when absolutely necessary.