 |
Anatomy Of An Internet Search Engine
The Author Background: Dave Davies is the
CEO of Beanstalk Search Engine Positioning http://www.beanstalk-inc.com.
He has been optimizing and ranking websites for over four
years and has a solid history of success.
|
Anatomy
Of An Internet Search Engine By Dave Davies
|
Webmaster Tips, Hints and Resource Articles
Archive - Click
Here
For some unfortunate souls SEO is simply the learning
of tricks and techniques that, according to their understanding,
should propel their site into the top rankings on the major search
engines. This understanding of the way SEO works can be effective
for a time however it contains one basic flaw ... the rules change.
Search engines are in a constant state of evolution in order to
keep up with the SEO's in much the same way that Norton, McAfee,
AVG or any of the other anti-virus software companies are constantly
trying to keep up with the virus writers.
Basing your entire websites future on one simple
set of rules (read: tricks) about how the search engines will rank
your site contains an additional flaw, there are more factors being
considered than any SEO is aware of and can confirm. That's right,
I will freely admit that there are factors at work that I may not
be aware of and even those that I am aware of I cannot with 100%
accuracy give you the exact weight they are given in the overall
algorithm. Even if I could, the algorithm would change a few weeks
later and what's more, hold your hats for this one; there is more
than one search engine.
So if we cannot base our optimization on a set of hard-and-fast
rules what can we do? The key my friends, is not to understand the
tricks but rather what they accomplish. Reflecting back on my high
school math teach Mr. Barry Nicholl I recall a silly story that
had a great impact. One weekend he had the entire class watch Dumbo
The Flying Elephant (there was actually going to be a question about
it on our test). Why? The lesson we were to get from it is that
formulas (like tricks) are the feather in the story. They are unnecessary
and yet we hold on to them in the false belief that it is the feather
that works and not the logic. Indeed, the tricks and techniques
are not what works but rather the logic they follow and that is
their shortcoming.
And So What Is Necessary?
To rank a website highly and keep it ranking over time one must
optimize it with one primary understanding, that a search engine
is a living thing. Obviously this is not to say that search engines
have brains, I will leave those tales to Orson Scott Card and other
science fiction writers, however their very nature results in a
lifelike being with far more storage capacity.
If we consider for a moment how a search engine functions; it goes
out into the world, follows the road signs and paths to get where
it's going, and collects all of the information in its path. From
this point, the information is sent back to a group of servers where
algorithms are applied in order to determine the importance of specific
documents. How are these algorithms generated? They are created
by human beings who have a great deal of experience in understanding
the fundamentals of the Internet and the documents it contains and
who also have the capacity to learn from their mistakes, and update
the algorithms accordingly. Essentially we have an entity that collects
data, stores it, and then sorts through it to determine what's important
which it's happy to share with others and what's unimportant which
it keeps tucked away.
So Let's Break It Down ...
To gain a true understanding of what a search engine is, it's simple
enough to compare it to the human anatomy as, though not breathing,
it contains many of the same core functions required for life. And
these are:
The Lungs & Other Vital Organs - The lungs
of a search engine and indeed the vast majority of vital organs
are contained within the datacenters in which they are housed. Be
it in the form of power, Internet connectivity, etc. As with the
human body, we do not generally consider these important in defining
who we are, however we're certainly grateful to have them and need
them all to function properly.
The Arms & Legs - Think of the links from
the engine itself as the arms and legs. These are the vehicles by
which we get where we need to go and retrieve what needs to be accessed.
While we don't commonly think of these as functions when we're considering
SEO these are the purpose of the entire thing. Much as the human
body is designed primarily to keep you mobile and able to access
other things, so too is the entire search engine designed primarily
to access the outside world.
The Eyes - The eyes of the search engine are the
spiders (AKA robots or crawlers). These are the 1s and 0s that the
search engines send out over the Internet to retrieve documents.
In the case of all the major search engines the spiders crawl from
one page to another following the links, as you would look down
various paths along your way. Fortunately for the spiders they are
traveling mainly over fiber optic connections and so their ability
to travel at light speed enables them to visit all the paths they
come across whereas we as mere humans have to be a bit more selective.
The Brain - The brain of a search engine, like
the human brain, is the most complex of its functions and components.
The brain must have instinct, must know, and must learn in order
to function properly. A search engine (and by search engine we mean
the natural listings of the major engines) must also include these
critical three components in order to survive.
The Instinct - The instinct of a search engines
is defined in it's core functions, that is the crawling of sites
and either the inability to read specific types of data, or the
programmed response to ignore files meeting a specific criteria.
Even the programmed responses become automated by the engines and
thus fall under the category of instinct much the same as the westernized
human instinct to jump from a large spider is learned. An infant
would probably watch the spider or even eat it meaning this is not
an automatic human reaction.
The instinct of a search engines is important to understand however
once one understands what can and cannot be read and how the spiders
will crawl a site this will become instinct for you too and can
then safely be stored in the "autopilot" part of your
brain.
The Knowing - Search engines know by crawling.
What they know goes far beyond what is commonly perceived by most
users, webmasters and SEOs. While the vast storehouse we call the
Internet provides billions upon billions of pages of data for the
search engines to know they also pick up more than that. Search
engines know a number of different methods for storing data, presenting
data, prioritizing data and of course, way of tricking the engines
themselves.
While the search engine spiders are crawling the web they are grabbing
the stores of data that exist and sending it back to the datacenters,
where that information is processed through existing algorithms
and sp@m filters where it will attain a ranking based on the engine's
current understanding of the way the Internet and the documents
contained within it work.
Similar to the way we process an article from a newspaper based
on our current understanding of the world, the search engines process
and rank documents based on what they understand to be true in the
way documents are organized on the Internet.
The Learning - Once it is understood that search
engines rank documents based on a specific understanding of the
way the Internet functions, it then follows that in order to insure
that new document types and technologies are able to be read and
that the algorithm be changed as new understandings of the functionality
of the Internet are uncovered a search engine must have the ability
to "learn".
Aside from a search engine needing the ability to properly spider
documents stored in newer technologies, search engines must also
have the ability to detect and accurately penalize sp@m and as well
as accurately rank websites based on new understandings of the way
documents are organized and links arranged. Examples of areas where
search engines must learn in an ongoing basis include but are most
certainly not limited to:
- Understanding the relevancy of the content between sites where
a link is found
- Attaining the ability to view the content on documents contained
within new technologies such as database types, Flash, etc.
- Understanding the various methods used to hide text, links, etc.
in order to penalize sites engaging in these tactics
- Learning from current results and any shortcoming in them, what
tweaks to current algorithms or what additional considerations must
be taken into account to improve the relevancy of the results in
the future.
The learning of a search engine generally comes from the uber-geeks
hired by and the users of the search engines. Once a factor is taken
into account and programmed into the algorithm it them moves into
the "knowing" category until the next round of updates.
How This Helps in SEO
This is the point at which you may be asking yourself, "This
is all well-and-good but exactly how does this help ME?" An
understanding of how search engines function, how they learn, and
how they live is one of the most important understandings you can
have in optimizing a website. This understanding will insure that
you don't simply apply random tricks in hopes that you've listened
to the right person in the forums that day but rather that you consider
what is the search engine trying to do and does this tactic fit
with the long term goals of the engine.
For a while keyword density sp@mming was all the rage among the
less ethical SEOs as was building networks of websites to link together
in order to boost link popularity. Neither of these tactics work
today and why? They do not fit with the long-term goals of the search
engine. Search engines, like humans, want to survive. If the results
they provide are poor then the engine will die a slow but steady
death and so they evolve.
When considering any tactic you must consider, does this fit with
the long-term goals of the engine? Does this tactic in general serve
to provide better results for the largest number of searches? If
the answer is yes then the tactic is sound.
For example, the overall relevancy of your website (i.e. does the
majority of your content focus on a single subject) has become more
important over the past year or so. Does this help the searcher?
The searcher will find more content on the subject they have searched
on larger sites with larger amounts of related content and thus
this shift does help the searcher overall. A tactic that includes
the addition of more content to your site is thus a solid one as
it helps build the overall relevancy of your website and gives the
visitor more and updated information at their disposal once they
get there.
Another example would be in link building. Reciprocal links are
becoming less relevant and reciprocal-links between unrelated sites
are virtually irrelevant. If you are engaging in reciprocal link
building insure that the sites you link to are related to your site's
content. As a search engine I would want to know that a site in
my results also provided links to other related sites thus increasing
the chance that the searcher was going to find the information that
they are looking for one way or another without having to switch
to a different search engine.
In Short
In short, think ahead. Understand that search engines are organic
beings that will continue to evolve. Help feed them when they visit
your site and they will return often and reward your efforts. Use
unethical tactics and you may hold a good position for a while but
in the end, if you do not use tactics that provide for good overall
results, you will not hold your position for long. They will learn.
|