|
Cloaking Warning
Cloaking is an unethical method (SPAM) to improve the ranking of a website in the
search engines. Cloaking is discussed here only for education purposes. You are
advised NOT to implement cloaking as a SEO method on any of your websites. If you do so
it will in fact hurt your rankings in the search engines. Your site may even
get banned by the search engines. Sorry! We do not teach, implement or endorse
any of the unethical search engine optimization methods.
What is cloaking?
Cloaking is a technique that delivers visitors a different page from the one listed
within the search engine or directory. It is primarily used to show an optimized page
to the search engines and a different page to humans at the same URL.
In other words, browsers such as Netscape and MSIE are served one page,
and spiders visiting the same address are served a different page. Most search
engines will penalize a site if they discover that it is using cloaking. Some go
even further. They delete your site from the index if you are caught using cloaking.
How is cloaking implemented?
There are two important methods of delivering cloaked pages. This is done by
either looking at the IP addresses of who is requesting the page, or
by looking at the User-Agent HTTP header. The two methods are appropriately named
as "Agent Name Delivery" and "IP Address Delivery." To effectively cloak a web
page, the web server must be able to determine if the visitor is a human or a
search engine.
Let's look at Agent Name Delivery first.
Agent Name Delivery
Your first step is to read your web server's log files to analyse the traffic
to your site. It's a skill you are going to need if you want to cloak.
The first clue in spotting spiders is to look at your log files for the requests
that have been made for your robots.txt file. The robots.txt file is usually the
first thing a spider will look for when it visits your site. Humans rarely ever
want to look at your robots.txt file, you can generally assume that anything
requesting this file is a spider. However, you can also identify the spider
simply by looking at the agent name from the request that was made for your
robots.txt file. A few common agent names are listed below.
AltaVista = Scooter
Excite = Architext
Google = Googlebot
Inktomi = Slurp
Lycos = T-Rex
NorthernLight = Gulliver
If you do not have a robots.txt file look for errors in your log files for
when agents requested the file and it did not exist.
All browsers and search engine spiders have a name. The user agent field for a
human visitor usually lists what web browser software is being used, such as:
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
The user agent field for a search engine usually identifies the search engine
robot, such as this user agent field for Yahoo Slurp
"Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)"
With this knowledge, delivering a specific page based on agent name is a rather
simple task. You simply utilize a web script that says something to this effect:
if Agent Name equals A or B or C, serve Page-1 (the spider page),
else serve Page-2, where Page-2 would be the viewer's page.
Next, we look at "IP Address Delivery"
IP Address Delivery
An IP address (Internet Protocol Address) is a numeric address which identifies
your connection to the Internet. Search engine spiders not only have a name
to identify themselves but also have an IP address. For example,
web site traffic from 64.62.82.x is most likely to be a visit from Googlebot,
the famous Google search engine spider. It is the IP address of just one Google's
many many spiders.
Since you can 'sniff' for the IP address when someone visits your site, you can
use this information to serve specific pages to the spiders. This method is
more complicated than Agent Name Delivery because it requires you to maintain
an exhaustive list of IP addresses. Also, IP addresses can change and new
ones are always being added.
The advantage to IP Address Delivery is that someone can not 'fake' your IP
address. Consequently, it is impossible for anyone to see the code that is
presented to the spider.
But there are thousands of spiders crawling the web, and if you are cloaking
using IP Delivery knowing who these spiders are is going to be very important to
you. If your site is cloaked and an unrecognized spider visits, it's too late to
worry about whether your cloaking script served up the right page.
Penalty for Cloaking
Cloaking is in violation of most search engine policies and is very likely to
get your site banned. This is what Google says on its Information for Webmasters
page:
"To preserve the accuracy and quality of our search results, Google may
permanently ban from our index any sites or site authors that engage in
cloaking to distort their search rankings."
|