VOLUME 32, NUMBER 22 THURSDAY, March 1, 2001
ReporterElectronic Highways

The Invisible Web

send this article to a friend

According to a study published by Brightplanet http://www.brightplanet.com, only a small fraction of the information available on the Web is accessible through search engines. Their white paper, "The Deep Web: Surfacing Hidden Value" http://www.completeplanet.com/Tutorials/DeepWeb/index.asp, reveals that the "deep" Web-more commonly referred to as the "invisible" or "hidden" Web-is at least 500 times larger than the known "surface" World Wide Web. Brightplanet believes that "Internet searching today can be compared to dragging a net across the surface of the ocean. There is a wealth of information that is deep, and therefore missed." Material on the deep Web is missed because most of the information located there is stored in databases or in formats such as PDF, Flash and streaming media that generally are inaccessible to the software that compile search-engine indexes.

Some key findings from the Brightplanet study:

• The deep Web is 400 to 550 times larger than the commonly defined World Wide Web
• More than an estimated 100,000 deep Web sites presently exist
• The deep Web is the largest growing category of new information on the Internet
• Deep Web content is highly relevant to every information need, market and domain
• More than half of the deep Web content resides in topic specific databases
• A full 95 percent of the deep Web is publicly accessible information-not subject to fees or subscriptions

So how does one gain access to this huge, unindexed trove of information? Brightplanet has developed a partial solution with a product called "LexiBot." With a single search request, the software searches the Web pages indexed by traditional search engines, but it also searches more than 600 "hidden" databases simultaneously.

But LexiBot, even with its additional searching capabilities, only reveals a small portion of the invisible Web, and LexiBot requires patience-a typical search can take five to 30 minutes to complete and more complex requests can take more than an hour. LexiBot also costs $89.95, but a free, 30-day trial can be downloaded at http://www.lexibot.com/index.asp.

Fortunately, there are free Internet sites that provide guides to the thousands of databases that make up the hidden Web. One of the best sites, Direct Search http://gwis2.circ.gwu.edu/~gprice/direct.htm, describes itself as "a growing compilation of links to the search interfaces of resources that contain data not easily or entirely searchable/accessible from general search tools like Alta Vista, Google or Hotbot." Direct Search provides annotated links to well over 1,000 searchable, interactive databases. The site is maintained by Gary Price, a librarian at George Washington University and the co-author of the forthcoming book "The Invisible Web" (CyberAge Books, due out in July). Other sites that provide links to hidden Web databases include InvisibleWeb.com http://www.invisibleweb.com/, The Big Hub http://bighub.com/, AlphaSearch http://www.calvin.edu/library/searreso/internet/as/ and Lycos Invisible Web Catalog http://dir.lycos.com/Reference/Searchable_Databases/.

But for many people, guides-even well-compiled ones-are not a good replacement for a one-stop, one-search-box approach to searching the Web, so the pressure is on the major search engines to "reveal" more of the "invisible" or "hidden" regions of cyberspace. Some progress is being made. For example, Google http://www.google.com/ recently introduced a new feature that allows searchers to find information contained in Adobe Portable Document format (PDF) files. While HTML files make up the bulk of documents on the Web, PDF files are abundant and many government agencies and non-profit organizations use PDF as their format of choice when providing publications on the Internet.

Inevitably, market forces will motivate search engines to unmask more and more of the Invisible Web-let's hope the data sources that get exposed are precious nuggets of information and not fools' gold!

-Gemma DeVinney and Don Hartman, University Libraries

Front Page | Top Stories | Photos | Briefly | Q&A | Electronic Highways
Kudos | Obituaries | Sports | Exhibits, Notices, Jobs
Events | Current Issue | Comments?
Archives | Search | UB Home | UB News Services | UB Today