5, the external call or CDN to accelerate the spider crawling, reduce the server response and the flow of waste. At present, most of the sites using a large number of pictures, video and other multimedia to show, and these images lack download traffic needs more, if we take pictures using the external call, you can save a lot of spiders crawl flow. The better.
1, find out the false spider IP shield. Through the web log analysis, we can know many of the so-called love Shanghai spider or Google spider are in fact false, we can parse out the false spider IP shield, which not only can save traffic but also can reduce the risk of site acquisition. To detect anti IP is not really a spider in the specific operation, the operation method is: start – click on the lower left corner of the -CMD- NSLOOKUP IP operation command to see the results. If it is really the search spiders have a spider marker, while the fake spider did not mark.
2, shielding invalid spiders or small effect on the Shanghai dragon search spider. For example, we know that Google spider crawling is very large, but for many industries Google’s traffic is very low, Shanghai dragon effect is not good, so it can shield Google spider to grab and save a lot of traffic, such as the beautiful site shielding Google spider crawling. In addition to Google, there are some spiders such as the Pangu search, Bing spider, these flows are very low, or almost no effect of spiders can in fact be blocked.
site is used in virtual space, so there will be some traffic restrictions, if most of the traffic was occupied by the spider, then we need to take some extra money to buy traffic. If a web site a lot of traffic is wasted by the spiders crawl, and does not affect the effect of what Shanghai dragon skills and methods can be restricted? Big that we can use the following ways:
4, grab the limit content of pages to improve efficiency and reduce the speed of grab grab, grab traffic. For any one page, there are a lot of invalid noise zone, such as a website login, registration section, the copyright information and some helpful links to navigation, or there are some template cannot be Spider identification display module, we can use these Noffollow tags or Ajax, JS and other methods to limit or reduce the amount of shielding grab, grab.
3, limit the invalid page or repeat page to grab robots. There is no previous but now some pages may, or there is a dynamic and static URL, due to the presence of reverse link or database has such links, the spider will crawl from time to time, we can find out to return to the 404 page of the URL, the URL are blocked, so as to improve the capture screen also reduce the flow of waste.