The Dark Internet

Hey there !

I’m back with a new topic yet again. Today I’ll be discussing what is known as the “Dark Internet” of the “Invisible Web”… Many of you might wonder what is this all about. well then folks, read on………………….

Almost all of us are quite familiar with the term www or the World Wide Web. There are millions of web sites on the Internet that fulfill your search criteria. Say for example, if I want to see how many sites are related to information technology, then I would simply open any of the popular web search engines (Google, Yahoo, Altavista and there are a lot more !), and give my search string and the search engine gives me a long list of all websites related to information technology.

But, have you ever wondered that what result you get is just a tip of the iceberg ! In reality, there could be much much more available on the Internet but either could not be searched by the search engine or is not displayed in the list that you got. This hidden content of the Internet is what we call “The Dark Internet”

The first version of this web page was written in 2000, when this topic was new and baffling to many web searchers. Since then, search engines’ crawlers and indexing programs have overcome many of the technical barriers that made it impossible for them to find and provide invisible web pages.

These types of pages used to be invisible but can now be found in most search engine results:

  • Pages in non-HTML formats (pdf, Word, Excel, PowerPoint), now converted into HTML.
  • Script-based pages, whose URLs contain a ? or other script coding.
  • Pages generated dynamically by other types of database software (e.g., Active Server Pages, Cold Fusion). These can be indexed if there is a stable URL somewhere that search engine crawlers can find.

However, there are still many pages that are hidden from the visible web. Reasons for such pages to remain invisible could be either one of these:

1) Most of the invisible web is made up of the contents of thousands of specialized searchable databases (library catalogs, article databases, etc.). When you search in one of these, the results are generated “on the fly” in answer to your search. Because the crawler programs cannot type or think, they cannot enter passwords on a login screen or keywords in a search box. Thus, these databases must be searched separately.

2) Search engine companies exclude some types of pages by policy, to avoid cluttering their databases with unwanted content.

3) Think of the billions of possible web pages generated by searches for books in library catalogs, public-record databases, etc. Each of these is created in response to a specific need. Search engines do not want all these pages in their web databases, since they generally are not of broad interest.

4) A web page creator who does not want his/her page showing up in search engines can insert special “meta tags” that will not display on the screen, but will cause most search engines’ crawlers to avoid the page.

You can find searchable databases containing invisible web pages in the course of routine searching in most general web directories. Use Google and other search engines to locate searchable databases by searching a subject term and the word “database”. If the database uses the word database in its own pages, you are likely to find it in Google. The word “database” is also useful in searching a topic in the Google Directory or the Yahoo! directory, because they sometimes use the term to describe searchable databases in their listings.

In addition to what you find in search engine results (including Google Scholar) and most web directories, there are other gold mines you have to search directly. This includes all of the licensed article, magazine, reference, news archives, and other research resources that libraries and some industries buy for those authorized to use them. The contents of these are not freely available: libraries and corporations buy the rights for their authorized users to view the contents. If they appear free, it’s because you are somehow authorized to search and read the contents (library card holder, member of the company, etc.).

In my next post, I’ll discuss how exactly the search engines find the required page or information that you specify in the search string..

Do keep reading the posts…..

Enjoy !

Similar Posts you might be interested in:

    None Found

4 Responses to “The Dark Internet”

  1. w0lf says:

    Nice post! Your posts are always informative dude.

    But I am a bit confused like whether this Dark Internet is only limited to search bots and their indexing techniques or it has a broader scope?

  2. Ne0 says:

    Thanks w0lf for the compliments ! I found this topic while searching for my previous post on “Anti-Forensic Techniques…..”. This post was related to how extremists use this side of Internet to conceal their activities. Its not just only related to search engines. There are many systems on the public Internet which are hidden, or could be accessed only with a pre-defined system setting. (E.g. A website on a server could be accessed only if you have a linux machine with Firefox installed.) Again these techniques are used by such organizations as to hide themselves on the public network. Hope this clears your doubt. You could always get back though !! :)

  3. w0lf says:

    Ahaaa! Now this nicely clears my doubt. Thanks Ne0 for the reply :)

  4. Ne0 says:

    You are most welcome, w0lf ! Its completely my pleasure…….

Leave a Reply