Data Sources

Below you can find a complete list of Archive95's data sources, along with extra information you may find useful. To learn more about how Archive95 works, check out the About page.

Overall Statistics

       URLs: 126,000
    Orphans: 78,575
Error Pages: 6,003

Grand Total: 210,578
 (- Errors): 204,575

Screenshots: 8,046

BMUG PD-ROM Fall 1995

Author: Berkeley Macintosh Users Group
Archive Date: 1994-06-03
Publish Date: 1995
Link: http://archive.org/details/cdrom-bmug-pdrom-f95
ID: bmug

Description:
A public domain CD-ROM published by the Berkeley Macintosh Users Group for the Fall 1995 season. Among the included content is a database of announcements and message board posts pertaining to up-and-coming servers on the internet. Curiously, and for reasons unknown, one of these entries is the raw HTML of Microsoft's homepage from June 3rd, 1994.
Data Integrity:
The markup appears to be unaltered.
Remediations:
N/A
Statistics:
       URLs: 1 (<0.1%)
    Orphans: 0
Error Pages: 0

Grand Total: 1 (<0.1%)
 (- Errors): 1 (<0.1%)

Screenshots: 0

Silicon Surf CD

Author: Silicon Graphics
Archive Date: 1994-12
Publish Date: 1995
Link: http://employees.org/~drich/SGI/SiliconSurf/
ID: sgi

Description:
A complete copy of Silicon Graphics' website from December 1994, which according to IRIX Network was distributed on CD. It was preserved online by Daniel Rich, a former SGI employee, in 2014.
Data Integrity:
All pages under the Edu/ directory exhibit an issue where every slash character is prepended with two dots.
Remediations:
The issue with the Edu/ directory has been fixed.
Statistics:
       URLs: 5,031 (4%)
    Orphans: 0
Error Pages: 0

Grand Total: 5,031 (2.4%)
 (- Errors): 5,031 (2.5%)

Screenshots: 0

The Risc Disc Volume 2

Author: Uniqueway
Archive Date: ~1995-05
Publish Date: 1995-10
Link: http://archive.org/details/cdrom-riscos-risc-disc-2
ID: riscdisc

Description:
The second volume in a series of software CDs for RISC OS, published in October 1995. It contains offline copies of numerous websites, which were collected throughout the year.
Data Integrity:
Links and images have been modified to point to their offline equivalents where applicable; all other links have either been commented out or removed entirely. There is no explicit indication of what URL maps to which file, and although the structure of the CD-ROM generally reflects the source URLs, all file and directory names have been truncated and converted to uppercase, and all symbols have been turned into underscores.
Remediations:
Commented-out links have been restored. Each file's original URL has been reconstructed as accurately as possible using clues from the file tree and the Wayback Machine. URLs with a low certainty of being accurate are accompanied by a warning in the navigation bar.
Statistics:
       URLs: 3,020 (2.4%)
    Orphans: 0
Error Pages: 1 (<0.1%)

Grand Total: 3,021 (1.4%)
 (- Errors): 3,020 (1.5%)

Screenshots: 0

World Wide Web Directory

Author: Jamsa Press
Archive Date: ~1995-04
Publish Date: 1995-06
Link: http://archive.org/details/www-dir-cd
ID: wwwdir

Description:
A Yellow Pages-style directory of webpages published in June 1995, bundled with a book of the same name. The CD-ROM contains the raw HTML and a browser screenshot of every included page, captured 1-2 months prior.
Data Integrity:
All data is stored in Microsoft's proprietary MVB format. The HTML data is double-encoded, and a very small handful of pages exhibit corruption issues or were not fully downloaded.
Remediations:
As there were no working decompilers for MVB files, the data was copied out semi-automatically using Microsoft Multimedia Viewer and an AutoHotKey script. The double-encoding on HTML files has been reversed, although corrupted and partially downloaded files have been left as-is.
Statistics:
       URLs: 7,990 (6.3%)
    Orphans: 0
Error Pages: 0

Grand Total: 7,990 (3.8%)
 (- Errors): 7,990 (3.9%)

Screenshots: 8,046 (100%)

Einblicke ins Internet

Author: Carl Hanser Verlag
Archive Date: ~1995-06
Publish Date: 1995-10-24
Link: http://cs.rit.edu/~ats/books/cd
ID: einblicke

Description:
An internet CD developed published in Germany in October 1995. It was developed at Osnabrück University and contains a wide assortment of primarily English and German-language web content, collected during the middle months of 1995. In 2015, its files were uploaded to an open directory by Axel-Tobias Schreiner, one of its co-authors.
Data Integrity:
A footer has been added to each page denoting its original URL. Links and images have been modified to point to their offline equivalents where applicable; all others point to error/placeholder files, and in the case of links, an additional link to the live web is inserted next to the original. All form elements have been replaced with text explaining that the functionality is not available. A text document is present mapping each file to its respective URL.
Remediations:
The custom footers have been removed. Links to the live web have been moved back to their original elements. Placeholder images have been replaced with the browser default for missing/broken images. The replacement text for form elements has been changed.
Statistics:
       URLs: 27,151 (21.5%)
    Orphans: 12 (<0.1%)
Error Pages: 39 (0.6%)

Grand Total: 27,202 (12.9%)
 (- Errors): 27,163 (13.3%)

Screenshots: 0

HotWired Trade Show Demo

Author: HotWired
Archive Date: 1995-11-13
Publish Date: 2006-07-27
Link: http://archive.gyford.com/1995/11/13/
ID: hotwired

Description:
An offline version of hotwired.com from November 1995, showcased at trade shows where internet connectivity was not available. It was preserved in 2006 by Phil Gyford, a former Wired UK employee, who wrote a blog post about it.
Data Integrity:
The implied URLs of some files may not be fully accurate. Some external links have been modified to point to offline copies located in-site, within which every link points back to the previous page.
Remediations:
The in-site copies of external pages have been left as-is, since they are impossible to distinguish from the rest of the site and do not have their original URLs stored anywhere.
Statistics:
       URLs: 1,437 (1.1%)
    Orphans: 0
Error Pages: 1 (<0.1%)

Grand Total: 1,438 (0.7%)
 (- Errors): 1,437 (0.7%)

Screenshots: 0

Chip Special: Fun im Internet

Author: Chip
Archive Date: 1995-10-15 to 1996-01-16
Publish Date: 1996-01
Link: http://archive.org/details/chipspecialbestwebsites_1_96
ID: chipfun

Description:
A coverdisc bundled with a special issue of the German computer magazine Chip, published in January 1996. It contains a variety of web content collected between October 1995 and January 1996.
Data Integrity:
Links and images have been modified to point to their offline equivalents where applicable. A <base> tag has been added to the top of each page. WebWhacker logs are present mapping each file to its respective URL.
Remediations:
The <base> tags have been removed.
Statistics:
       URLs: 1,978 (1.6%)
    Orphans: 0
Error Pages: 3 (<0.1%)

Grand Total: 1,981 (0.9%)
 (- Errors): 1,978 (1%)

Screenshots: 0

Chip Special: Spiele im Internet

Author: Chip
Archive Date: 1996-02-21 to 1996-02-28
Publish Date: 1996-05
Link: http://archive.org/details/chipspecialbestwebsites_2_96
ID: chipspiele

Description:
The follow-up to "Chip Special: Fun im Internet", published in May 1996. It contains primarily video game-related web content collected in February 1996.
Data Integrity:
Links and images have been modified to point to their offline equivalents where applicable. WebWhacker logs are present mapping each file to its respective URL.
Remediations:
N/A
Statistics:
       URLs: 1,198 (1%)
    Orphans: 0
Error Pages: 2 (<0.1%)

Grand Total: 1,200 (0.6%)
 (- Errors): 1,198 (0.6%)

Screenshots: 0

PC Press Internet CD

Author: PC Press
Archive Date: 1996-03-17 to 1996-03-28
Publish Date: 1996-05
Link: http://archive.org/details/pc-press-internet-cd
ID: pcpress

Description:
An internet CD accompanying a special issue of the PC Press magazine, published in Serbia in May 1996. It contains a large amount of web content across multiple genres and languages, including a comprehensive snapshot of the fledgling Serbian web (according to PC Press's 30th anniversary book, Serbia received internet access in February 1996). All of the included web content was collected in March 1996.
Data Integrity:
Metadata has been added to the top of each page by the in-house downloader used to retrieve them. Links and images have been modified to point to their offline equivalents where applicable, although some point to files that do not exist. Each website present is accompanied by a text document mapping each file to its respective URL; however, many of these documents exhibit corruption or URL formatting issues, and a handful of them are missing entirely.
Remediations:
The downloader metadata at the top of each page has been removed. Incorrectly-formatted URLs have been fixed, and files with corrupted or missing URLs have been preserved as orphans.
Statistics:
       URLs: 34,575 (27.4%)
    Orphans: 743 (0.9%)
Error Pages: 1,924 (32.1%)

Grand Total: 37,242 (17.7%)
 (- Errors): 35,318 (17.3%)

Screenshots: 0

A Internet em CD-ROM

Author: José Magalhães
Archive Date: ~1996-04
Publish Date: 1996
Link: http://archive.org/details/a-internet-em-cd-rom
ID: roteiro

Description:
An internet CD bundled with the book "Novo Roteiro Prático da Internet", published in Portugual in late 1996. It was developed at the University of Minho and contains an immense amount of web content collected from early to mid-1996. In 2011, a copy of the CD-ROM was donated to Arquivo.pt and integrated into their web archive; however, the original CD-ROM was not preserved publicly, and most of the issues listed below were not addressed.
Data Integrity:
There are nearly 250 identical copies of the University of Minho IT department homepage under incorrect URLs. An <a name> tag and a footer denoting the file's original URL has been added to most text files (both HTML and plaintext) as well as some binary files such as images (causing them not to render). HTTP header information has additionally been added to the beginning of some files. Since the CD-ROM uses a numeric file structure, it is impossible to determine the original URLs of files that do not contain a footer. Links and images have been modified to point to their offline equivalents where applicable, otherwise pointing to an error page with no indication of the original URL. Occasionally the modified URLs are incorrect, and in the case of images, this can result in severe visual issues. A port number has been injected into all plaintext-rendered URLs within pages, sometimes causing overflow bugs as the port number is injected inside closing tags.
Remediations:
All identical copies of the University of Minho IT department homepage have been marked as invalid. The custom headers and footers have been removed from all files, including binary ones, and files without a URL have been preserved as orphans. Incorrect link/image URLs have been left as-is since they are virtually impossible to distinguish from correct URLs. The port number injects have also been left intact for similar reasons, although the instances that were inserted into closing tags have been fixed.
Statistics:
       URLs: 16,140 (12.8%)
    Orphans: 47,451 (60.4%)
Error Pages: 3,873 (64.5%)

Grand Total: 67,464 (32%)
 (- Errors): 63,591 (31.1%)

Screenshots: 0

NetControl Archived Pages (1996)

Author: NetControl
Archive Date: ~1996
Publish Date: 2003
Link: http://netcontrol.net/english/
ID: netcontrol96

Description:
An online directory of archived pages provided as a supplement to NetControl's Greek-language "internet on paper". Although the directory is presented as one unified list, the pages appear to have been collected across two stints: first throughout 1996 into early 1997, and then again throughout 1998 into early 1999. This data source corresponds to the earlier archives.
Data Integrity:
Each page's markup has received significant alterations, including the removal of whitespace and comments, and the wrapping of text content. The title element of each page has been modified to indicate its archived nature, and a header message has been added that additionally centers the page. A script for serving ads has been injected into each page, although it is no longer functional. All emails have been encrypted through a CloudFlare mechanism, and a script has been added to the bottom of each page to decrypt them on the client-side. Images and frames have been modified to point to their locally-available equivalents, which are organized into a flat file structure with no indication of their original URLs.
Remediations:
The header messages, scripts and injected title text have been removed. All emails have been decrypted and reverted to their original forms. Images and frames without a URL have been preserved as orphans.
Statistics:
       URLs: 2,194 (1.7%)
    Orphans: 19,386 (24.7%)
Error Pages: 95 (1.6%)

Grand Total: 21,675 (10.3%)
 (- Errors): 21,580 (10.5%)

Screenshots: 0

Amiga Plus Extra No. 5

Author: Amiga Plus
Archive Date: 1997-06-15 to 1997-08-05
Publish Date: 1997-10
Link: http://archive.org/details/amiga-plus-extra-cd-5-97
ID: amigaplus

Description:
A coverdisc included with the October/November 1997 issue of Amiga Plus, a German Amiga magazine. It contains extensive captures of many computer-related websites, collected primarily in August 1997.
Data Integrity:
Each HTML file is available in two versions: one being the original, unmodified file, and the other being modified to point all links and images to their offline equivalents where applicable. Although the filesystem generally resembles the source URLs, all file and directory names have been converted to lowercase and truncated to 30 characters.
Remediations:
All URLs have been fully reconstructed using evidence from the unmodified HTML files and the Wayback Machine.
Statistics:
       URLs: 12,808 (10.2%)
    Orphans: 0
Error Pages: 4 (0.1%)

Grand Total: 12,812 (6.1%)
 (- Errors): 12,808 (6.3%)

Screenshots: 0

Internet on a CD

Author: Packard Bell
Archive Date: 1998-06-29 to 1998-10-13
Publish Date: 1998
Link: http://archive.org/details/internet-on-a-cd
ID: netonacd

Description:
An internet CD which was distributed by Packard Bell to (presumably) European customers who purchased their PCs during the 1998 Christmas season, according to Popzazzle. It contains offline copies of various websites in multiple languages, collected primarily in July 1998.
Data Integrity:
Links and images have been modified to point to their offline equivalents where applicable, with the original URLs being moved to their own HTML attribute. Links to content unavailable on the CD-ROM have been pointed to customized error pages linking to the live web.
Remediations:
The URLs for links and images have been moved back to their original attributes. Links to error pages now point directly to the URLs on those pages.
Statistics:
       URLs: 11,275 (8.9%)
    Orphans: 230 (0.3%)
Error Pages: 16 (0.3%)

Grand Total: 11,521 (5.5%)
 (- Errors): 11,505 (5.6%)

Screenshots: 0

NetControl Archived Pages (1998)

Author: NetControl
Archive Date: ~1998
Publish Date: 2003
Link: http://netcontrol.net/english/
ID: netcontrol98

Description:
An online directory of archived pages provided as a supplement to NetControl's Greek-language "internet on paper". Although the directory is presented as one unified list, the pages appear to have been collected across two stints: first throughout 1996 into early 1997, and then again throughout 1998 into early 1999. This data source corresponds to the later archives.
Data Integrity:
Each page's markup has received significant alterations, including the removal of whitespace and comments, and the wrapping of text content. The title element of each page has been modified to indicate its archived nature, and a header message has been added that additionally centers the page. A script for serving ads has been injected into each page, although it is no longer functional. All emails have been encrypted through a CloudFlare mechanism, and a script has been added to the bottom of each page to decrypt them on the client-side. Images and frames have been modified to point to their locally-available equivalents, which are organized into a flat file structure with no indication of their original URLs.
Remediations:
The header messages, scripts and injected title text have been removed. All emails have been decrypted and reverted to their original forms. Images and frames without a URL have been preserved as orphans.
Statistics:
       URLs: 1,202 (1%)
    Orphans: 10,753 (13.7%)
Error Pages: 45 (0.7%)

Grand Total: 12,000 (5.7%)
 (- Errors): 11,955 (5.8%)

Screenshots: 0

Return to homepage