This index is a collection of url references built up and indexed with a hacked version of WAIS. The index is constructed by a spider that walks the web, building a graph in an Oracle database, and WAIS indexing the full text of the document. There are currently 36,195 documents in the index.
Here are some stats concerning the spider's graph:
Distinct Distinct Total Date Time Sources Targets Edges Notes ======= ======= ======= ======= ======= ===== 2/20/94 9:50AM 13,082 24,421 103,417 5:00PM 13,789 33,715 118,930 9:30PM 14,490 37,981 128,541 2/21/94 8:30AM 16,690 48,341 162,226 11:15AM 17,278 53,803 171,957 12:15PM 17,617 62,397 182,880
Note that this is only a snapshot of a portion of the web - effectively a five level breadth-first probe from our home page. (It's not a complete probe because I ran out of tablespace in Oracle...) The index was constructed using the source html documents in the graph and the target documents in the graph that were identifiably html - patterns of the form "*.html" or "http:*/" (i.e., links that are using http as their protocol and pointing at default pages for directories).
This service offered as part of the experimental prototype under construction by the Repository Based Software Engineering project.
eichmann@rbse.jsc.nasa.gov