While nowadays the sources that power Archive95 are valuable from a web preservation standpoint, they served different purposes in their time, and so naturally their authors were not afraid to alter or omit data as they saw fit. As a result, each source must be audited and (if necessary) undergo a restoration process before it can be added to the archive.
This process may include reconstructing URLs or reverting changes to page links, and is usually effective in returning the data to a more "correct" state, although it has its limitations. For example, it is not always possible to restore page elements that have been outright removed rather than transformed, or to map files to the correct URLs when that information is not made explicit (files without a known URL are called "orphans").
This page not only provides basic information about the sources featured in Archive95, but also the integrity of the data itself, so as to give an idea of the restoration effort required for each source.
Total URLs: 101,950
Total Orphans: 31,282
Grand Total: 133,232
Author: Silicon Graphics
Archive Date: 01/1995
Publish Date: 1995
Description: A complete copy of Silicon Graphics' website circa January 1995, which according to IRIX Network was distributed on CD. It was preserved online by Daniel Rich, a former SGI employee, in 2014.
Data Integrity: All pages in the Edu/
directory exhibit significant breakage due to every slash character being prepended with two dots.
Link: https://employees.org/~drich/SGI/SiliconSurf/
Statistics: 5,016 URLs, 0 orphans (5,016 total; 3.8% of entire archive)
Author: Uniqueway
Archive Date: ~04/1995
Publish Date: 10/1995
Description: The second in a series of software CDs for RISC OS, published in October 1995. It contains offline copies of numerous websites that were collected throughout the year.
Data Integrity: Links to content present on the CD-ROM have been modified to point within the filesystem; all other links have either been commented out or removed entirely. There is no explicit indication of what URL maps to which file, and although the structure of the CD-ROM is reflective of the source URLs, all file and directory names have been truncated and converted to uppercase, and all symbols have been turned into underscores.
Link: https://archive.org/details/cdrom-riscos-risc-disc-2
Statistics: 3,023 URLs, 0 orphans (3,023 total; 2.3% of entire archive)
Author: Jamsa Press
Archive Date: ~04/1995
Publish Date: 06/1995
Description: A Yellow Pages-esque directory of webpages published in June 1995, bundled with a book of the same name. The CD-ROM contains the HTML code and a browser screenshot of every included page, captured 1-2 months prior.
Data Integrity: A very small handful of pages exhibit corruption issues or were not fully downloaded.
Link: https://archive.org/details/www-dir-cd
Statistics: 7,992 URLs, 0 orphans (7,992 total; 6% of entire archive)
Author: Carl Hanser Verlag
Archive Date: ~06/1995
Publish Date: 10/1995
Description: A CD-ROM published in Germany in October 1995, intended to provide those who were not yet online with a glimpse into the internet. It contains a wide assortment of offline web material collected during the middle months of 1995. Its contents were uploaded to an open directory by Axel-Tobias Schreiner, one of its co-authors, in 2015.
Data Integrity: All URLs have been modified to point within the CD-ROM's filesystem. Links to content unavailable on the CD-ROM are accompanied by links to the live web, with the originals pointing to an error page. Unavailable images point to a placeholder picture without any indication of the image's original URL. Form elements have been removed, and a footer has been added to each page denoting its original URL.
Link: https://cs.rit.edu/~ats/books/cd
Statistics: 27,363 URLs, 17 orphans (27,380 total; 20.6% of entire archive)
Author: HotWired
Archive Date: 11/1995
Publish Date: 07/2006
Description: An offline version of hotwired.com from November 1995 that was showcased at trade shows where internet connectivity was not possible. It was preserved in 2006 by Phil Gyford, a former Wired UK employee, who wrote a blog post about it.
Data Integrity: The implied URLs of some pages and files may not be entirely accurate. Some off-site links have been modified to point to offline versions, within which every link points back to the previous page.
Link: https://archive.gyford.com/1995/11/13/
Statistics: 1,440 URLs, 0 orphans (1,440 total; 1.1% of entire archive)
Author: Chip
Archive Date: ~11/1995
Publish Date: 01/1996
Description: A coverdisc bundled with a special issue of the German computer magazine Chip, published in January 1996. It features numerous webpages across a variety of genres, collected in October and November 1995.
Data Integrity: A base tag has been added to the top of each page, and the URLs of content present on the CD-ROM have been changed to their locally-available equivalents.
Link: https://archive.org/details/chipspecialbestwebsites_1_96
Statistics: 2,019 URLs, 0 orphans (2,019 total; 1.5% of entire archive)
Author: Chip
Archive Date: 02/1996
Publish Date: 02/1996
Description: A coverdisc bundled with a special issue of the German computer magazine Chip, published in February 1996. It features primarily video game-related webpages, all collected the same month the CD-ROM was published.
Data Integrity: The URLs of content present on the CD-ROM have been changed to their locally-available equivalents.
Link: https://archive.org/details/chipspecialbestwebsites_2_96
Statistics: 1,208 URLs, 0 orphans (1,208 total; 0.9% of entire archive)
Author: PC Press
Archive Date: 03/1996
Publish Date: 03/1996
Description: A CD-ROM bundled with a special issue of the PC Press magazine, published in Serbia in March 1996. Like Einblicke ins Internet, it was intended to provide an offline internet experience for those who were not connected. All of the included web material is dated from the month the CD-ROM was published.
Data Integrity: A metadata element has been added to the top of each page by the downloader used to retrieve them. All URLs have been modified to point within the CD-ROM's filesystem, although there are large gaps of missing files that are referred to by existing ones. Each website present is accompanied by a list mapping each file to its respective URL, but many of these lists are either corrupted or missing, and the ones that are intact often exhibit serious issues.
Link: https://archive.org/details/pc-press-internet-cd
Statistics: 26,319 URLs, 812 orphans (27,131 total; 20.4% of entire archive)
Author: NetControl
Archive Date: ~1996
Publish Date: 2003
Description: An online directory of archived pages provided as a supplement to NetControl's Greek-language "internet on paper". Although the directory is presented as one unified list, the pages appear to have been collected across two stints: first throughout 1996 into early 1997, and then again throughout 1998 into early 1999. This source corresponds to the earlier archives.
Data Integrity: Each page's markup has received significant alterations, including the removal of whitespace and the wrapping of text content. The title element of each page has been modified to indicate its archived nature, and a header message has been added that additionally centers the page. A script for serving ads has been injected into each page, but is no longer functional. All emails have been encrypted, and an additional script has been injected at the bottom of the page to decrypt them on the client-side. Page images and frames are organized into a flat structure with no sub-folders, making it impossible to determine their original URLs.
Link: https://netcontrol.net/english/
Statistics: 2,197 URLs, 19,487 orphans (21,684 total; 16.3% of entire archive)
Author: Amiga Plus
Archive Date: 08/1997
Publish Date: 10/1997
Description: A coverdisc included with the October/November 1997 issue of Amiga Plus, a German Amiga magazine. It contains extensive copies of many computer-related websites, collected in August 1997.
Data Integrity: Each HTML file is available in two versions: one being the original, unmodified file, and the other being adapted to the CD-ROM's filesystem. Although the filesystem is reflective of the source URLs, all file and directory names have been converted to lowercase and truncated to 30 characters.
Link: https://archive.org/details/amiga-plus-extra-cd-5-97
Statistics: 12,879 URLs, 0 orphans (12,879 total; 9.7% of entire archive)
Author: Packard Bell
Archive Date: 07/1998
Publish Date: 1998
Description: According to Popzazzle, this CD-ROM was distributed by Packard Bell to (European?) customers who purchased their PCs during the 1998 Christmas season. It contains offline copies of a variety of websites in multiple languages, collected in July 1998.
Data Integrity: All URLs have been modified to point within the CD-ROM's filesystem, with the original URLs being moved to their own HTML attribute. Links to content unavailable on the CD-ROM have been pointed to customized error pages linking to the live web.
Link: https://archive.org/details/internet-on-a-cd
Statistics: 11,291 URLs, 231 orphans (11,522 total; 8.6% of entire archive)
Author: NetControl
Archive Date: ~1998
Publish Date: 2003
Description: An online directory of archived pages provided as a supplement to NetControl's Greek-language "internet on paper". Although the directory is presented as one unified list, the pages appear to have been collected across two stints: first throughout 1996 into early 1997, and then again throughout 1998 into early 1999. This source corresponds to the later archives.
Data Integrity: Each page's markup has received significant alterations, including the removal of whitespace and the wrapping of text content. The title element of each page has been modified to indicate its archived nature, and a header message has been added that additionally centers the page. A script for serving ads has been injected into each page, but is no longer functional. All emails have been encrypted, and an additional script has been injected at the bottom of the page to decrypt them on the client-side. Page images and frames are organized into a flat structure with no sub-folders, making it impossible to determine their original URLs.
Link: https://netcontrol.net/english/
Statistics: 1,203 URLs, 10,735 orphans (11,938 total; 9% of entire archive)