Supporting archive communities in the framework of the Open Archives Initiative

Gary Simons, SIL International
Steven Bird, University of Pennsylvania

18 November 2000


What is an archive community?

One of the strengths of the Open Archives Initiative is that it allows participating archives (as data providers) to employ multiple metadata standards. Different metadata standards are typically motivated by the different needs of different user communities. >From the standpoint of the Open Archives Initiative, we can thus define an archive community as:

A community of users, data providers, and service providers who are united by their common interest in the approach to information archiving embodied in a particular metadata standard.

What does an archive community need?

The people who want to use archived resources that follow a particular approach (as embodied in a metadata standard) have the problem of finding these resources among the vast array of resources on the Internet and of judging the quality of the content and services offered by the data provider once they find such a resource. This leads to the following requirement statements:

The existence of multiple metadata formats complicates things for service providers, since a service must make assumptions about the metadata it will process. A service provider aimed at the needs of a particular archive community will necessarily build its service to exploit the details of that community's metadata format. In order for such a service to successfully input the metadata records it is built around, those records must be well-formed and valid with respect to the metadata schema. This leads to two requirements:

What is the function of a community provider?

In order to meet these needs of an archiving community, we propose that a third kind of provider be added to the Open Archives model, namely, a community provider. A community provider is a server that implements a protocol which supports the above requirements of an archive community. Given the definitions of the current Open Archives model, such a server would be neither a data provider (since it does not manage a document collection) nor a service provider (since it does not create end-user services based on data stored in archives). Rather, it provides essential services needed by both data providers and service providers.

A community provider has two primary functions under which we can recognize significant subfunctions:

Proposed protocol for a community provider

This section lists the protocol requests (verbs) that are proposed for implementing a Community Provider within the Open Archives Initiative framework. This section offers a core protocol; specifically not addressed is the issue of sharing peer reviews. Some ideas on this are given in the final section.

1. Identify

Identify
Retrieve identity information about the community

Arguments

None

Behavior

Returns information about the server and about the community it services.

2. GetSchema

GetSchema
Retrieve the schema for the community's metadata format

Arguments

None

Behavior

Returns the authoritative version of the XML schema that defines the metadata format that the community is built around.

3. GetDTD

GetDTD
Retrieve the DTD for the community's metadata format

Arguments

None

Behavior

Returns an XML DTD that corresponds to the XML schema that defines the metadata format that the community is built around. One element of the response packet indicates whether this DTD is authoritative (that is, the DTD is completely equivalent to the Schema); and if it differs, whether it is stronger (accepts a subset of the documents accepted by the Schema), weaker (accepts a superset of the documents accepted by the Schema), or overlapping (both rejects some the Schema would accept and accepts some the Schema would reject).

4. ValidateRecord

ValidateRecord
Validate an individual record

Arguments

Behavior

The community server connects with the repository named in the identifier, and requests that record in the metadata format of the community. It then validates the metadata record against the community's schema and returns a report indicating either that the record conforms to the metadata standard or that it does not. In the latter case, the report should contain a message explaining at least one way in which the record fails to conform.

5. RegisterMember

RegisterMember
Register a new community member

Arguments

Behavior

This is how an existing OAI data provider makes the request to join the community. The request results in the archive being added to the membership list maintained by the community server only if every record in the archive which claims to offer metadata in the community's standard is found to conform to the authoritative metadata schema. Otherwise, the request results in a message explaining why the requesting archive cannot be registered.

The registration process would proceed something like this:

  1. If the named archive is already registered, return a message to that effect.
  2. Send a request to the OAI server (see next section) to validate that the supplied identifier is indeed a compliant OAI data provider (and get its base URL in the process). If not, return a message explaining that the archive must first register with OAI.
  3. Send a ListMetadataFormats request back to the requesting archive to ensure that it claims to support the community's metadata standard. If not, return a message explaining that the archive must support the community's metadata standard before it can register to join the community.
  4. Send back a ListRecords request with metadataPrefix set to the community's metadata standard.
  5. Validate each retrieved record against the metadata schema. If any fail to validate, return a message explaining that registration has failed, along with a description of how some specific records are not valid.
  6. If all records are valid, registration succeeds. The community provider adds a record to its database containing at least the name, the unique identifier, and the base URL of the data server for the newly registered member. The fields for date of joining the community and date when metadata records were last validated are set to the current timestamp. A message indicating successful registration is returned.

6. RevalidateMember

RevalidateMember
Revalidate an existing community member

Arguments

Behavior

This is how a member of the community updates its registration to indicate that all new metadata records have been validated against the schema. The registration is updated only if every record added or modified since the last revalidation is found to conform to the authoritative metadata schema. (If the OAI protocol is augmented to deal with expiration of records, then records that have been reactivated following expiration would show up among those that are new or different.) Otherwise, the request results in a message explaining why the revalidation did not succeed.

The revalidation process would proceed something like this:

  1. If the named archive is not yet registered, return a message explaining that it is necessary to be a registered member of the community.
  2. Send a ListRecords request with metadataPrefix set to the community's metadata standard and from set to the timestamp recorded in the database for the last validation.
  3. Validate each retrieved record against the metadata schema. If any fail to validate, return a message explaining that revalidation has failed, along with a description of how some specific records are not valid.
  4. If all new and changed records are valid, revalidation succeeds. The community provider updates the database record for this archive by setting the date when metadata records were last validated to the current timestamp. A message indicating successful revalidation is returned.

Note that both for this operation and for RegisterMember, it is not necessary to build in security measures to ensure that the requesting agent is indeed the archive itself. In this model of community, membership is not a covenant between the archive and the community, it is a certification on behalf of the community that the archive conforms to the metadata standards of the community. Under this model, it would be appropriate for any community member to request that a particular archive that claims to use the community's metadata standard be certified. In fact, the community provider would typically generate a request for the revalidation of its members on a regular basis.

7. GetMember

GetMember
Get information on a community member

Arguments

Behavior

If the requested archive is a registered member of the community, return an XML representation of its registration information. If not, return an error report.

8. ListMembers

ListMembers
List members of the community

Arguments

Behavior

Returns a list of the data providers that are members of the community. The data element returned for each member should include subelements for at least the name, the unique identifier, the base URL of its data server, the date it joined the community, and the date its metadata records were last validated.

9. ListServices

ListServices
List service providers for the community

Arguments

Behavior

Returns a list of the service providers that are members of the community. The data element returned for each member should include subelements for at least the name of the service, the URL, the date it joined the community, and a description of the service.

Note that the protocol contains no registration request for service providers. This is because the validation of a request to be recognized as a service provider for the community cannot be done automatically. The request to register a service provider would be made via email to the leaders of the archive community. Before adding the service provider to the list of service providers for the community, hey would try the service and verify that it appropriately uses the community's metadata to perform a useful service for the community.

Support needed from OAI

This proposal for implementing a Community Provider protocol depends crucially on a service that is not currently supported by the Open Archives Initiative. Currently, the registration of a data provider as an Open Archive is handled by filling out a standard informational template that is stored as an HTML page on the data provider's site. This page is then linked to from a list of participating sites that is stored as an HTML page on the Open Archives site.

In order to implement the proposed community provider protocol, we need access to an OAI server that would offer a protocol that allows systems to request the registration information for a participating data provider and receive it back in XML format. This corresponds to request 7, GetMember , in the above protocol. If an OAI server were to handle just this one request, the above community provider protocol could be implemented:

The central OAI server could also implement the entire Community Provider protocol in order to serve as a community provider for the entire Open Archives community. As such it would provide the authoritative Dublin Core schema, validate metadata records against that schema, supply the list of participating open archives that service providers would use to drive their metadata harvesting operations, and supply a complete list of service providers. The only case in which the OAI server protocol would need to diverge from the community provider protocol is in the RegisterMember request. That request would require an additional parameter to supply all the registration information (i.e. the information that is currently put into the HTML page). A workable approach would be for the registering repository to post the registration information in an XML format as a page on its site and provide the URL to the page as a further required argument to the RegisterMember request. The OAI server would then access the file and validate it against the schema for registration information before proceeding with the registration process.

Another possibly useful extension for an OAI server would be for it to add a set of functions for keeping track of communities (RegisterCommunity, GetCommunity, ListCommunities). It could then add an identifier parameter to the GetSchema request, so that any participating archive or service could request the authoritative schema for a named community (which the OAI server would implement by looking up the base URL of the community provider and then sending it a GetSchema request). By using a community identifier rather than a URL to refer to a schema, we would avoid the potential problem that different providers could provide different versions of the same metadata schema (as is currently possible since the registration information for a data provider uses a URL to identify the schema for each metadata format it supports).

Handling peer review

The proposed requirement that

The users within an archive community (as in any academic community) need to be able to share peer reviews of the content of particular archived items and of the services offered by particular archives.

may not be one that would be generally adopted for a community provider. It could be a requirement that we would implement specially for the Open Language Archives Community. An intermediate position would be that the notion of offering archive community users a means to review an archive as a whole (and to read other users' reviews) is functionality that should be built into the general protocol for community providers, but not the ability to review individual archive items. The latter could be a service offered by a service provider.

We do not take the time at this point to fully specify the protocol for peer reviews, but just offer the following list of requests as suggestive of the direction we might go. All requests have a required parameter identifier which identifies the archive or the archive item of interest: