hide random home http://www.webcrawler.com/WebCrawler/Help/FAQ.html (Einblicke ins Internet, 10/1995)


Getting Started · Hints · Examples · Frequently-Asked Questions (FAQ)

Answers to Frequently Asked Questions

Searching questions:

Getting your data into WebCrawler: how does it work?

Questions about the WebCrawler Service

What are those funny numbers down the left side of my results?

These "funny numbers" are called relevance numbers. To compute the relevance number for a particular document, WebCrawler takes the total number of times each of the words in your query appears in the document and divides it by the total number of words in the document. This is computed for each document that contains all of the words in your query. The document with the largest number computed in this fashion is arbitrarily assigned a score of 100, and the scores of the remaining documents are scaled to it. In this way, these relevance numbers allow you to easily see which documents are the most relevant to your query, and which are the least.

How can I see ALL of the results for a query?

In theory, there's nothing preventing the WebCrawler from returning all of a query's results to you. However, some queries can result in over 20,000 different URLs being returned, or roughly 2MB of HTML. Queries that size are expensive to handle, especially on a heavily loaded server like the WebCrawler, so we have limited the number of results to 100 at a time. At the bottom of each results page is a button which will allow you to get the next group of results. Clicking on this button will allow you to traverse the entire list of results, if you wish. To save you time, however, we recommend that you first narrow your query as much as possible before using this feature.

How come the WebCrawler doesn't find an answer for my query?

The WebCrawler has indexed a gigabyte of source material. To keep the index small, it is aggressive about not indexing certain words it finds in source files. Some of these are common words, like "www" or "web". These aren't informative words in a query, because nearly every document contains them! It also throws out combinations of letters and numbers, most of which are junk. If you're having trouble searching, see these examples of searches that work.

I can't find my home page in a WebCrawler search. Why not?

The WebCrawler may not have reached your page in its last indexing run, or your server may have been down when it tried. Either way, you can submit your URL for indexing during a future run of the WebCrawler.

How do I get the WebCrawler to visit my page?

If the WebCrawler has not visited your page, you can submit your URL to the WebCrawler, and it will be visited on the WebCrawler's next run. We are now striving to update the WebCrawler weekly, so please wait at least that long before getting worried that it is not in the index.

Who operates the WebCrawler?

The WebCrawler is operated by America Online, Inc. at their Web Studios in San Francisco, CA. We do it for fun! The WebCrawler will always remain a free service to the Internet. So enjoy!

How big is the WebCrawler Database?

The content index is about 100MB. It contains information on over 190,000 different documents that the WebCrawler has explored. The rest of the WebCrawler database (tables of all known, unvisited documents) occupies another 200MB or so, and contains data on over 1,800,000 different documents. As you can see, the WebCrawler has a ways to go before it explores all the documents it knows about!

On what platform does the WebCrawler run?

The WebCrawler Query Server runs on five Pentium-based CPUs running NEXTSTEP. Each machine has a single 500MB disk and 96MB of memory. When it is building an index, the WebCrawler runs on a similarly configured machine.


Search · Help · Facts · Top 25 Sites · Submit URLs · Random Links · No-forms Search
Copyright © 1995, America Online, Inc.

info@webcrawler.com