How Search Engines Work ???

Hello Everybody………As promised in my previous post, I’m back with yet another topic of how search engines work. Almost all of us are well aware of this term “Search Engine”. Very few would be unaware of this terminology. With the help of search engines (like Google, Yahoo, AltaVista… etc), I could search any topic I want over the Internet. Ever wondered how these search engines work ??

Search engines use automated software programs knows as spiders or bots to survey the Web and build their databases. Web documents are retrieved by these programs and analyzed. Data collected from each web page are then added to the search engine index. When you enter a query at a search engine site, your input is checked against the search engine’s index of all the web pages it has analyzed. The best urls are then returned to you as hits, ranked in order with the best results at the top. The most common form of searching is text search on the Web. Most search engines do their text query and retrieval using keywords.

Unless the author of the Web document specifies the keywords for her document (this is possible by using meta tags), it’s up to the search engine to determine them. Essentially, this means that search engines pull out and index words that appear to be significant. Since since engines are software programs, not rational human beings, they work according to rules established by their creators for what words are usually important in a broad range of documents. The title of a page, for example, usually gives useful information about the subject of the page (if it doesn’t, it should!). Words that are mentioned towards the beginning of a document (think of the “topic sentence” in a high school essay, where you lay out the subject you intend to discuss) are given more weight by most search engines. The same goes for words that are repeated several times throughout the document.

Search engines can be broadly classified into two main categories. These are, Crawler based search engines and Human-powered directories. Let us now see in brief what each of these means.

Crawler Based Search Engines

Crawler-based search engines, such as Google, create their listings automatically. They “crawl” or “spider” the web, then people search through what they have found.

If you change your web pages, crawler-based search engines eventually find these changes, and that can affect how you are listed. Page titles, body copy and other elements all play a role.

Human-Powered Directories

A human-powered directory, such as the Open Directory, depends on humans for its listings. You submit a short description to the directory for your entire site, or editors write one for sites they review. A search looks for matches only in the descriptions submitted.

Changing your web pages has no effect on your listing. Things that are useful for improving a listing with a search engine have nothing to do with improving a listing in a directory. The only exception is that a good site, with good content, might be more likely to get reviewed for free than a poor site.

Crawler-based search engines have three major elements. First is the spider, also called the crawler. The spider visits a web page, reads it, and then follows links to other pages within the site. This is what it means when someone refers to a site being “spidered” or “crawled.” The spider returns to the site on a regular basis, such as every month or two, to look for changes.

Everything the spider finds goes into the second part of the search engine, the index. The index, sometimes called the catalog, is like a giant book containing a copy of every web page that the spider finds. If a web page changes, then this book is updated with new information.

Sometimes it can take a while for new pages or changes that the spider finds to be added to the index. Thus, a web page may have been “spidered” but not yet “indexed.” Until it is indexed — added to the index — it is not available to those searching with the search engine.

Search engine software is the third part of a search engine. This is the program that sifts through the millions of pages recorded in the index to find matches to a search and rank them in order of what it believes is most relevant.

I have tried to give a simple and basic idea about the working of search engines. I hope it has proved to be useful to you all. Any suggestion / feedback are warmly welcomed.

Enjoy !

Similar Posts you might be interested in:

    None Found

Share and Enjoy:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • LinkedIn
  • MySpace

6 Responses to “How Search Engines Work ???”

  1. w0lf says:

    Nicely explained. I would be expecting a detailed post on Google search bots and how dorks works sometimes from you :)

  2. Ne0 says:

    Thanks w0lf. I’ll certainly try to deliver it. I’m presently searching on another topic though… Cryptographic Files Systems…. I’ll be writing on it by today… Thanks again !!

  3. w0lf says:

    That seems to be an interesting topic. Awaiting to read the post :)

  4. Ne0 says:

    Thanks w0lf. Yes I’ll upload it by today… I’m just gathering stuff required…..

  5. Robin says:

    Great explanation. Easy to understand by basic internet user even.

  6. Ne0 says:

    Thanks Robin ! Your other suggestions / comments are also welcomed ….

Leave a Reply