Design & Architecture of an Astrophysics Information System

http://guinan.gsfc.nasa.gov/AIS.html (World Wide Web Directory, 06/1995)

Design & Architecture of an Astrophysics Information System

Alan Richmond, Hughes STX, NASA/Goddard Space Flight Center-HEASARC.

Nick White, NASA/Goddard Space Flight Center-HEASARC.

This is is a preprint and is subject to change.

Abstract

The High Energy Astrophysics Science Archive Research Center, HEASARC, at NASA/GSFC can be accessed through the WWW. The HEASARC server allows access to a vast selection of data from X-ray and Gamma-ray astronomy missions including ROSAT, ASCA, Compton GRO, Einstein and EXOSAT. There is a beta release astronomical forms interface to the HEASARC database management system, which allows browsing of various astronomical catalogs and archival datafiles. This report describes our aims & objectives, experiences, & conclusions in updating access methods for HEASARC. It also outlines the technical solution we adopted.

Introduction
HEASARC
WWW
Development
StarTrax
Servers
CGI
Structure
Exposition
Summary
References

Introduction

There was a queer momentary sensation of being turned inside out. It lasted an instant and Baley knew it was a jump, that oddly incomprehensible, almost mystical, momentary transition through hyperspace that transferred a ship and all it contained from one point in space to another, light years away. Another lapse of time and another Jump, still another lapse, still another Jump.
Isaac Asimov, The Naked Sun, Ballantine, 1957.

Science fiction writers solved the problem of cosmic distance, by utilizing the mathematical property of hyperspace that all points are interconnected. Theodore Nelson applied this concept to text, coining the term hypertext. Tim Berners-Lee of CERN implemented a variant of that concept on the Internet, as the World-Wide Web. And in 1993, Marc Andreessen and his colleagues at the NCSA astonished the Internet community with a GUI browser (user interface) called Mosaic, that finally let the full potential of hypertext blaze forth, like a supernova.

As we have learnt from previous revolutions in technology, especially in the fields of transportation and communication, there will be profound challenges to our abilities to harness & benefit from the powers opening up before us. The first & most obvious of those is to do with navigation. How do you locate the information you want ? A second has to do with quality, or pollution. Cyberspace is a new frontier (Richmond 1993). The lawyers haven't got here yet. There are many links (e.g. from What's New) that look interesting, but end up either not living up to snuff, or in some cases, just not living at all..

We've taken great pains to ensure that when you find our WWW servers at HEASARC, then, if you're looking for information from, or about, HEA missions, such as ROSAT, ASCA, etc.., you'll not be let down. We hope you'll find it a rewarding experience, and the rest of this paper addresses the technical issues underlying that goal.

HEASARC

The HEASARC provides an on-line service to allow remote login to the HEASARC data holding & to data analysis software. The emphasis is on browsing of the data, such that a user can make a quick-look assessment of its worth before exporting it -- or part of it -- to his or her home site. Rather than invent yet another on-line system, the HEASARC adopted an existing system -- the one developed for the EXOSAT mission by the the European Space Agency (ESA). The advantage of this system is that it provides the capability to not only access the data, but to display & analyze it remotely.

At the heart of the system is the BROWSE program, a command-driven environment that allows a user to search one or more database tables by coordinates, name, object class, or any other valid parameter combination. The user then can display the selected data, or run analysis software on it.

Some time ago, it was decided to replace the old BROWSE CLI with a GUI. Our first prototype used a commercial portability toolkit; but then, in 1993, we observed the massive trend in the astronomical community towards the use of the World-Wide Web (WWW), & tools such as NCSA's Mosaic, Gopher, & WAIS.

We determined that our objectives could be met by using the facilities of the WWW, instead of the commercial product that we had been using. We initiated StarTrax-NGB as a WWW node to provide uniform access to multiple services of the HEASARC, e.g. bulletins, catalogs, & proposal & analysis tools of BROWSE & ROSAT MIPS.

WWW

The World-Wide Web, a distributed hypertext-based information system developed at CERN, is a globally interconnected network of hypermedia information comprising: the Internet, + a protocol for transmission of hypermedia documents, + a set of servers that respond to requests from browsers (or clients) for those documents.

Hypermedia (or, more loosely, hypertext) documents contain hyperlinks to other documents, anywhere on the Web. A hyperlink is a segment of text, or an inline image that refers to another document (text, sound, image, movie) elsewhere on the Web. When a hyperlink is selected, the referenced document is fetched from the Internet, & is displayed appropriately.

A WWW browser allows navigation through hypertext links defined as Uniform Resource Locators (URLs). The user simply specifies the link to be followed & a new document is retrieved & presented. WWW provides gateway access to several of the other systems, including Z39.50 (Fullton 1993).

NCSA Mosaic is a multi-platform GUI hypermedia browser that helps you to find, fetch, & display documents & data from the Internet. Although there have been browsers for the Web since 1991, Mosaic shot rapidly to prominence because it seamlessly integrates a great deal of useful functionality in a very pleasant & easy-to-use user interface. Incorporating Web, gopher, WAIS, NNTP, & FTP protocols, NCSA Mosaic talks natively to these servers, and can gateway to others (Hardin 1993).

Development

The major difference between the `classical' and the WWW modes of development, is that because the WWW provides a substantial pre-existing infrastructure supporting a client-server architecture on the Internet, one can deliver application functionality at a much faster rate -- say, an order of magnitude faster (to be quantified later).

In the `classical' mode -- which we started this project in -- the developer(s) spend a great deal of time wallowing in relatively low level code. We haven't yet achieved the goal of `reusability' in spite of the promises of modern software engineering methodologies.

In WWW mode, using HTML (HyperText Markup Language) (& ideally, Perl), since you build on that infrastructure, you can deliver functionality very quickly. This not only delivers the promises of rapid prototyping (throw one away), but also of rapid development.

The way we did the development was, one of us is a scientist, with a positive interest in this development. The other is a software developer with technical competence in the WWW. The work was very much a synergy: the scientist would propose a design, and the software developer would explain to him, why it was impossible to implement. A little while later (depending on how impossible) the developer would invite the scientist to try out his design; the scientist would then propose impossible improvements...

The iteration cycle time was often of the order of minutes, rather than hours or days -- as classically. Not only was this due to the pre-existing WWW functionality, but also because we chose to use Perl, instead of C. Perl is ideally suited to this kind of work, because it too provides a great deal of ready-made high-level functionality. For example, we converted some 30 lines of C code, for decoding URLs, into 3 lines of Perl. This is certainly an order of magnitude improvement...

StarTrax

The Home Page

The home page has 2 main features: a clickable image map, and a list of the HEASARC services available from the StarTrax-NGB WWW servers. This list is made up of hypertext links, or hyperlinks, which means that clicking the mouse on one of the underlined phrases, activates a new page associated with the corresponding HEASARC service.

xpaint was used to create the composite image that is now most of the home page. That composite image was ISMAP'ed so that the grid areas could link to other parts of the server. The image map displays a similar list of HEASARC services, spread out on a pictorial background. Clicking on or near a phrase activates the associated page. Some of the main features are:

Hypertext Help & Documentation.

We have implemented Help as hypertext. For example, see the Introduction page giving basic instruction, overview, & installation instructions.

Feedback

The user can enter comments about StarTrax-NGB, which will be e-mailed to the HEASARC.

User Registration

A form (html) for the client collects user information. A program on the server side processes the data, & answers the client by echoing what was entered. This is similar to the Feedback process.

Catalog Searches

Search by Name

Searches the HEASARC catalogs for named objects. The user is able to specify the equinox. It will use a Name Resolver, which is primarily a function to translate the name of an astronomical object to its coordinates.

Cone Search

This allows the user to input RA, & Dec in either degrees, or hms, plus the outer cone radius in arc min, arc sec, or degrees. The user is able to specify the equinox.

Servers

A daemon a program that constantly monitors a network port waiting for a particular signal so that it may carry out its functions. A server is the daemon together with a collection of supporting programs.

The daemon chosen determines the functionality of the server. It basically controls the flow of information between itself and the client program accessing information on the server. There is a variety of ways to structure & present information resources, & the selection of one or more of these depends on several factors; e.g. natural structure of the resource; user's expectations; available technology. Until quite recently it was usually easiest to impose hierarchical structure. Selecting one or more of these is fundamental to establishing a reliable information service. None of them yet fulfills all requirements -- they are all still evolving -- so a solution was found to minimise the effect of this by using a daemon combination. There are significant differences between them that have to be well understood to succesfully maintain a robust & useful information daemon. Consideration also needs to be given to anticipated longer-term maintenance & support from the developers of these daemons.

After installing & testing 3 of the major HTTP daemons (NCSA's httpd, Plexus, GN) we settled on NCSA's httpd, supplemented with some perl programs. The information supplier can structure resources into any arbitrary `network topology'. The httpd daemon will also be supplemented with Gopher & WAIS daemons. This will greatly extend the search capabilities of the daemon while providing additional network communications back ends.

CGI

CGI is an interface for running external programs, or gateways, under an information server. Currently, the supported information servers are HTTP servers. Gateways are programs which handle information requests and return the appropriate document or generate a document on the fly. With CGI, the server can serve information which is not in a form readable by the client (such as an SQL database), and act as a gateway between the two to produce something which clients can use.

Gateways can be used for a variety of purposes, the most common being the handling of ISINDEX and FORM requests for HTTP. Gateway programs, or scripts, are executable programs which can be run by themselves. They have been made external programs in order to allow them to run under various (possibly very different) information servers interchangably.

Structure

Click on any blue square to get documentation direcly from the source code.

Exposition

Querying the HEASARC catalogs and retrieving associated data products is a multi-phase process:

The user selects an observatory name displayed on a browser.
This sends a URL to the HEASARC WWW server, which dispatches it to a perl program: Squery.pl. Replies to browsers are made as MIME types (Multimedia Internet Mail Extension), typically text/html, which is hypertext markup language.
The corresponding search form is generated (cone, PI/object name)..
A HEASARC catalog is queried for records & data products.
The user may select a target having data products from the list.
The user may select an associated data product.
The user can click on one of these listed data products, to retrieve it. This invokes ftp directly on the machine holding the data products. The client interacts directly with the server at the data publishing site to retrieve images and relevant textual data. This need not be the same site as the server.
The data product is delivered to the user.
The client-side has mechanism for specifying which `viewer' to use for a given data-type, e.g. xv for gif's. These have to be MIME-types (Multimedia Internet Mail Extensions) or sub-types (e.g. x-fits). The browser consults the mime.types configuration file to determine what action to take with the data. For example, it may invoke a `viewer' such as saoimage.

Summary

We have found the WWW method of development to be remarkably effective, when applied to client-server style information systems. The greatest advantage for the developer accrues from the high level of functionality built into the browsers & servers. The price to pay is some loss of control & flexibility; you cannot do everything you might wish. In particular, the browser side is relatively dumb -- its currently not easy to add much functionality there, in contrast to the server side through the CGI. In spite of these sometimes severe disadvantages, we are sure the WWW is a very effective platform for rapid information systems development in astronomy & astrophysics.

References

Fullton, J, 1993, Distributed Astronomical Data Archives, ADASS proceedings.
Hardin, J. 1993, Human Collaboration Technologies for the Internet -- NCSA Mosaic & NCSA Collage, ADASS proceedings.
Richmond, A., 1993, Towards an Astrophysical Cyberspace: The Evolution of User Interfaces, ADASS proceedings.