Data Storage Implementations
Databases: Overview
A set of seven or so databases specify the structure and task of WebSubmit.
Databases are used because they are easier to update and replace than code.
In essence we will write a generalized meta-WebSubmit and implement particular
versions for particular problem domains and compute systems by preparing
alternate databases. The databases presently utilized in WebSubmit are
-
Master database
-
Authentication database
Databases are most easily maintained using the GUI manager provided with
WebSubmit. See the document describing
this manager for further details.
Structure and Implementation
Generic Database Structure
The databases used have three different modes of storage used at different
points in their cycle of existence.
-
As an ASCII text file (*.db) which
consists of individual records, each of which is a colon-delimited collection
of fields. This is the main external form used to preserve the database
between invocations of the CGI scripts. An attribute list is included
within the database, so the data is completely self-contained.
-
As an Tcl array whose indices are a combination of the database keys and
attributes. This is the internal representation used whenever database
data must be accessed or manipulated within a script.
-
As a serialized version of the Tcl array (see the section on Object Serialization).
This is another external form of representation used as a performance-enhancing
tool.
Consider the following ASCII version of a database that contains names,
ages, and identification numbers for people in an organization.
::DB_ATTRIBUTES:: key:first key:last
age idNumber
john : doe : 35 : 0
jane : doe : 42 : 1
The key attributes are first and last, and the non-key attributes are
age and idNumber. Once this database is read by WebSubmit, it is
stored internally in an array called DB
as follows:
DB(john_doe,age) = 35
DB(john_doe,idNumber) = 0
DB(jane_doe,age) = 42
DB(jane_doe,idNumber) = 1
The keys are concatenated with underscore to form the complete key used
for storage; the key and attribute are concatenated with comma to form
the index used to access individual attributes for database records.
Specific Form of ASCII Database Files
All ASCII WebSubmit databases possess the same basic structure. Each
database is a simple text file that contains
not only the records of the database, but comments and a header specifying
the list of key and attribute names. In this way, the database is self-contained
and does not require reference to other objects.
Comments
Comment lines begin with # and are ignored by the database parser.
Blank lines are also ignored by the parser.
Attributes
The attributes for each database are specified on a single line within
the database file. For ease of reading, it is suggested (although not necessary)
that this line be placed before all records within the database. The line
to specify attributes has the form
::DB_ATTRIBUTES:: a_1 a_2 a_3 ... a_n
where a_i represents the name
given to attribute i. If an attribute (or group of attributes) is
to be used as a key (unique identifier) for the record, then it must be
preceded by the modifier key:.
Otherwise, the name of an
attribute can be any ASCII string that does not contain white-space.
Avoiding the use of non-standard or control characters for attributes (or
values, for that matter) is recommended, since this may create problems
with the parser.
Simple names should be chosen for attributes. For example, attributes
for a birth record database might look like
::DB_ATTRIBUTES:: key:SSN name age
dob address
where the key:SSN represents
the user's Social Security Number. As indicated, it is possible to
have multiple key fields in a database. The actual key that results
(from the standpoint of WebSubmit internal data structures) is constructed
by concatenating the several key values, separating them with an underscore.
Data Records
Each entry within the database is referred to hereafter as a data
record or simply a record. Each record is distinguished by a
key or set of keys that are unique for that record. If a database has multiple
records with identical keys, a warning will be issued by the database parser.
The structure of each record is simple: a colon-separated list of attribute
values. Each attribute value must lie within a specified domain (see individual
specifications below). Attribute values cannot contain colon (:) characters,
since this
character acts as the internal field separator for the database.
If there is no value for a given attribute in a given record, then the
value must be specified as *, as opposed to just leaving the field empty.
The format for a given record is very important, because the database parser
allows for records that span multiple lines. In order to achieve this flexibility,
the structure within each record must conform to the following guidelines:
-
Each new record entry must begin with optional white-space, followed by
the ascii text value of the first attribute.
-
A continuation line (i.e., the continuation of a record from a previous
line) is indicated by having a line that begins with optional white-space
followed by a pipe character (|).
-
A record may span at most N lines, where N is the total number of attributes
for the database (including keys).
-
Formatting of lines for an individual record does not need to be consistent.
Sample Database
The following will serve as an example of a simple database that contains
all of the features mentioned in the description above. A comment is given
above each relevant entry to indicate its purpose. The database is an employee
telephone database for a small company. A numeric identifier in conjunction
with a Division act as a composite
key, and there are three additional attributes (Name,
Extension, Office).
# The following is a sample database
for an employee
# telephone list
# Attributes for the database
::DB_ATTRIBUTES:: key:index key:group
name extension office
##################
# Database records
##################
# A simple record
0001 : adm : Ralph Warren : x5893
: 112 Admin
# A record that spans two lines with
initial white-space
0001 : res : Jane Doe
: x4120 : 356 Research
# Another simple record
0002 : res : Amir Gupta : x8473 :
B-225 Research
# A three-line record with variable
formatting
0002 : adm : Pamela Wen
: x2991
: 820 Admin
In practice, the lines would probably not be split as they are in the
above example. This was merely done for illustrative purposes. Also, the
number of comments in this database is probably excessive and unecessary,
since individual records will rarely need comment. It is recommended, however,
that some explanatory information about the purpose of the database be
placed at the top.
Database Specifications
Master Database Specification
The master page is described by the master database, and built by the
master CGI script. The master database specifies the layout of the master
page, by indicating the hierarchy of modules to be reflected in the master
page (main WebSubmit page).
Key Attributes
moduleName
Ordering
The database is semi-ordered: structure is imposed by the modulePath
attribute, but order at the same level of modulePath
(same pathname header) is defined by order in database.
Attributes and Domains
-
moduleName:
Key attribute, unique for each record, which specifies a name for the script
or containter in the record
-
Domain: Standard
identifiers
-
auto:
Used to indicate that this is an auto-indexed database. There is
no corresponding field for this attribute.
-
Domain: NONE
(no corresponding field value)
-
host: The hostname to which this application
corresponds. The hostname is primarily used as a means for accessing
the proper CGI scripts in the host's module hierarchy (contained in $wsRootDir/modules/$hostname).
Hostnames for containers are meaningless, although they can be helpful
from an organizational standpoint.
-
title:
The title for this script or container, as it appears on the master page
-
state:
ON or OFF, depending on whether the module or container for this record
is should appear on the master page. Note that if a container is
turned OFF, then all of the children
inside this container will not be visible on the master page.
-
type:
CONTAINER or SCRIPT,
depending on whether the record corresponds to a container for other modules
or to an actual script that is to be executed. Scripts generate hyperlinks
on the master page that point to the appropriate CGI script. Containers
just appear as text on the master page.
-
Domain: {CONTAINER,
SCRIPT}
-
build:
MANUAL or AUTO.
MANUAL indicates a pre-existing
CGI script, AUTO corresponds to
a CGI script that is to be generated automatically from a forms database.
-
index:
An index that indicates the relative position of the script or container
in its parent container. Indices run from 1 to the number of scripts
or containers inside a given parent container. Numbering should be
restarted for each set of children corresponding to a given container.
-
Domain: Positive
integers
-
defaultMode:
BASIC or ADVANCED.
This attribute only has meaning for CGI scripts that have an associated
modality.
-
Domain: {BASIC,
ADVANCED}
-
sTitle:
A short title for the module corresponding to this record. This short
title will appear in the toolbar of the main WebSubmit page to allow rapid
linking to modules. This attribute has no meaning for containers
and should be given a value of * for these.
-
modulePath:
A /-separated collection of moduleNames
that establishes the relationships between containers and their children.
This looks like a file pathname, and is used as a way to generate hierarchically-nested
HTML lists that correspond to the containers and scripts.
-
Domain: Valid
pathname (key1/key2/.../keyn)
Authentication Database Specification
The authentication database contains information about valid certificate
issuers, administrative users, and regular users of the WebSubmit system.
Each user has a unique WebSubmit identification number and a state that
determines whether the user is currently being granted access to WebSubmit
facilities. This database also contains login name information for each
remote host that is acting as a compute system. In this sense, it possesses
a variable length attribute list that indicates all compute systems in
the current WebSubmit network.
Key Attributes
wsID
Ordering
Not ordered.
Attributes and Domains
-
wsID: The WebSubmit
user ID.
-
Domain: Valid user IDs (format varies
from site to site).
-
userType: Privileges for user
-
userDir: Directory for user session and configuration information
-
Domain: Valid subdirectories of $wsRootDir/user
-
userName: The full
name of the user corresponding to wsID.
-
Email: Email address for
user
-
Domain: Valid e-mail addresses (fully-qualified
hostnames only)
-
status: The status
of the current user.
-
Domain: {active, inactive}.
-
hostNames (variable
length): The user's login name on hostName.
-
Domain: Valid login names.
Additional Notes
A word of explanation is merited at this point, since the attributes
in this database are somewhat different from others. Since there will,
in general, be a variable number of WebSubmit compute systems, there will
also be a variable number of attributes in this database. For each
compute system, there will be a hostName attribute, with the
value of this attribute corresponding to the login name of userName
on hostName. A sample record for wsID
ws000, userName John Q. Public,
Email jqp@random.site.gov,
status active, and remoteHost(login)
pairs danube.nist.gov(jqp),
tiber.nist.gov(john), granta.nist.gov(johnqp):
::DB_ATTRIBUTES:: key:wsID
userName Email status danube.nist.gov granta.nist.gov tiber.nist.gov
ws000 : John Q. Public
: jqp@random.site.gov : active : jqp : john : johnqp
This enables the WebSubmit system to perform remote actions on compute
systems via the secure scp and ssh protocols. More information about
authentication in WebSubmit can be found in the section on Authentication.
Object Serialization
The process of reading databases can be time-consuming, and performance
is a consideration when working with CGI applications. In an effort
to reduce the overhead associated with reading databases, a method of object
serialization (similar in spirit to that done in Java) was adopted.
A serialized object is essentially a representation of an internal Tcl
data structure using a series of Tcl commands. Tcl provides a mechanism
for loading Tcl code from within a Tcl script (via the source
command). Hence, a serialized object can fill one or more Tcl variables
or data structures simply by invoking the source
command. For example, after a database is read, all of its data can
be serialized; the next time the database needs to be read, the serialized
version is loaded via source rather
than reading the database. The serialized version is only loaded
if it is newer than the actual database it represents; in this way, changes
to the true database are reflected properly. One important
note about object serialization: source'ing random Tcl files can be dangerous,
since these files could potentially contain commands damaging to the system.
For this reason, all serialized files are sourced within a safe Tcl
interpreter. The data from this interpreter is then passed
into the main interpreter, assuming no problems were encountered.
As an example, consider the simple database given at the beginning of
this document:
::DB_ATTRIBUTES:: key:first key:last
age idNumber
john : doe : 35 : 0
jane : doe : 42 : 1
The serialization of this database would look like the following:
# Serialization of simple database
on Thu Apr 30 16:41:21 EDT 1998
namespace eval webSubmit::foo {
set DB(john_doe,age)
35
set DB(john_doe,idNumber)
0
set DB(jane_doe,age)
42
set DB(jane_doe,idNumber)
1
set dbKeyList [list
john_doe jane_doe]
set attrList [list
age idNumber]
}
return 0
Data is encapsulated inside a namespace (here given as webSubmit::foo)
to avoid interfering with other databases stored in arrays like DB.
dbKeyList and attrList
are additional properties of the database that are carried with it.
Simply source-ing the Tcl file that contains this information effectively
creates the information stored in webSubmit::foo::DB,
webSubmit::foo::dbKeyList, and
webSubmit::foo::attrList.