| Date issued: | 2009-05-30 |
|---|---|
| Status of document: | Draft Implementation Note. This is only a preliminary draft that is still under development; it has not yet been presented to the whole community for review. |
| This version: | http://www.language-archives.org/NOTE/oxygen-20090530.html |
| Latest version: | http://www.language-archives.org/NOTE/oxygen.html |
| Previous version: | None. |
| Abstract: |
Describes how to configure and use the <oXygen/> XML Editor as a tool for creating or validating a static repository that complies with the [OLAC-Repositories] standard. |
| Editors: |
|
Copyright © 2009 Gary Simons (SIL International). This material may be distributed and repurposed subject to the terms and conditions set forth in the Creative Commons Attribution-ShareAlike 2.5 License.
References
<oXygen/> XML Editor is a full-featured cross-platform XML editor (for Windows, Mac OS X, Linux and Eclipse) that provides the tools needed for XML authoring and application development [oXygen]. It is a mature product (in its Version 10.2 release at the time of writing) that is popular among academics due to its affordable academic license (priced at $48 at the time of writing) [oXygen-Academic]. The fully functional product may be downloaded and freely used for a 30-day evaluation period [oXygen-Download].
To install the program, simply download the <oXygen/> package for the appropriate platform from [oXygen-Download] and follow the instructions provided on that page.
The XML Schema validator that is built into <oXygen/> XML Editor is Apache Xerces. For many years, <oXygen/> has not worked properly with OLAC repositories. This is because Xerces incorrectly failed to validate the XML schemas for Qualified Dublin Core on which the OLAC metadata standard is based (see Appendix A of [DC-Notes]). The problems have now been fixed so that the latest version of <oXygen/> is now able to handle an OLAC repository correctly without requiring special configuration to use a different schema validator.
This document describes how <oXygen/> XML Editor can be used to create and validate a static repository of language resource descriptions that complies with the [OLAC-Repositories] standard. Those who create a static repository by running a script over an existing data source do not need a tool like <oXygen/> for creating the repository, but they do need a tool for validating the output of their script; section 2 of this note describes how <oXygen/> can be used for this purpose. Section 3 goes on to describe how it can also be used to create and maintain a static repository as an original data source.
After starting <oXygen/> XML Editor, use "File / Open" or "File / Open URL" to load the static repository that you wish to validate. For instance, a sample repository can be found at the following URL:
http://www.language-archives.org/OLAC/1.1/static-repository.xml
Click on the Validate Document button (the red check mark in the toolbar). Alternatively you may select "Document / Validate / Validate Document" from the menu bar or use the corresponding keyboard shortcut: Ctrl+Shift+V. The message in the status bar at the bottom of the window will say "Validation — in progress". On completion it will report either "Document is valid" or "Validation failed" (followed by a count of errors).
When errors are detected, they are indicated in a number of ways:
An invalid tag or value is indicated in the document by a red underline. The underline will disappear as soon as the problem is fixed with the editor. Even if you are testing the validity of a repository that is being generated by a script, you may find this feature useful since you can edit the document to test the fix before making it in the script.
The vertical scroll bar to the right of the editing page shows a red mark to indicate the location of each error in the document. Drag the scroll bar slider to line up with one of those marks in order to bring the corresponding error into view in the editing pane.
Clicking within document content that has a red underline causes the associated error message to be displayed at the bottom of the editing pane. If the message is truncated, double click on it to pop up a dialog box showing the full message.
Document validation also opens a pane below the editing pane which shows a scrolling list of all error messages. Clicking on a message in that list scrolls the document to the location of the error and places the edit cursor at that point in the document. If the error message is truncated, hover the mouse pointer over the message to pop up a text box showing the full message.
If you are unable to make sense of an error message, try the SaxonSA schema validator that is built into <oXygen/> XML Editor — the message reported by it may help to shed more light on what the problem is. This validator is invoked by clicking the "SaxonSA" button in the toolbar. The errors as reported by SaxonSA show up in the scrolling list below the editing pane. Click on the tabs below the list, "Custom Validation - SaxonSA" versus "Errors", to toggle between the two sets of error messages.
<oXygen/> XML Editor may also be used to create a static repository in the first place, and then to maintain it as records need to be added or modified. The easiest way to create a new repository is to load the sample repository from the OLAC site and save it to your computer with a name that is appropriate for the repository you want to create. The sample repository is located at:
http://www.language-archives.org/OLAC/1.1/static-repository.xml
Open the renamed repository file with <oXygen/> XML Editor and use the editor to change the repository as needed. The primary advantage of starting with the existing sample is that all of the namespace declarations in the root element are known to be correct.
The Identify section provides information about your repository. Some of the elements describe generic behavior that is true of all OLAC static repositories and do not need to be changed; others are specific to your repository and must be changed. The following are the elements that you will need to edit:
- oai:repositoryName
Enter the name of your repository as you want it to appear in the Participating Archives list [OLAC-Archives].
- oai:baseURL
Enter the URL that this static repository will have when it is posted on the web.
- oai:adminEmail
Enter your email address. This is the contact address used by others in the OLAC community who encounter problems when trying to harvest your repository.
- oai:earliestDatestamp
Enter a date that is equal to (or is prior to) the date of the earliest <oai:datestamp> of all the records listed in the ListRecords section of the repository. (Some harvesters use this value with incremental harvesting as the date before which there are guaranteed to be no records.)
- repositoryIdentifier
Enter an identifier that will uniquely identify this repository among all the participating archives. It must be based on a registered domain name, typically of the sponsoring institution. Click "Machine readable list of registered archives" on [OLAC-Archives] to see a list of identifiers for current participants. A single institution may use subdomain names to distinguish multiple metadata repositories. A host institution may also use subdomain names to create identifiers for personal repositories.
- sampleIdentifier
Enter the <oai:identifier> of one of the records listed below in the ListRecords section. It must be identifier of an existing item, and not a hypothetical one.
- olac-archive
Every attribute and subelement of the <olac-archive> element must be updated. See section 3 of [OLAC-Repositories] for the definition of all the attributes and elements. In addition to simply updating the contents that you see in the sample repository, be aware of the following special cases:
The <participant> element is repeatable. Edit the instance that is already there to give your name and role and email address; the address must match what you entered above for <oai:adminEmail>. You may now add entries for any other persons who have key roles in the archive. In addition to providing contact information for the OLAC community, doing so also creates a subscription to the automatically generated report on repository usage statistics and metadata quality metrics that is emailed quarterly. Do the following to add a participant: place the cursor after </participant>, press Enter to open a new line, type "<" to open a new tag, select participant in the context menu that pops up (either by double-clicking or pressing Enter), and fill in the attribute values.
Three of the elements are optional: <archiveURL>, <institutionURL>, and <location>. If one of these elements is not relevant for your repository, delete the whole element.
Another element is optional but not present in the sample repository, namely, <archivalSubmissionPolicy>. If that is relevant for your repository, add it to the end of the description as follows: place the cursor after </access>, press Enter to open a new line, type "<" to open a new tag, select archivalSubmissionPolicy in the context menu that pops up, and fill in the appropriate policy description.
Make no changes to the ListMetadataFormats section.
The <ListRecords metadataPrefix="olac"> section is where all of the OLAC metadata records are entered. The sample repository contains two records. You can start by deleting one of those records; then edit the remaining record to be the first record in your repository.
The <oai:header> part of the record has the following two obligatory elements:
- oai:identifier
Enter here the unique OAI identifier for the item. The identifier is constructed from three parts separated by colon: the literal string "oai", the repositoryIdentifier as entered above, and a string that identifies the item uniquely within the set of archive holdings. This latter part of the identifier, following the second colon, is the only part that will vary from record to record. It is under your discretion to assign this local identifier in whatever way you want. If the items in your collection already have some kind of unique identifier, you may use that. Otherwise, you may do something like assign a serial number sequentially. The identifier for a record becomes set the first time it is harvested. After that it should not be changed since it is by matching identifiers that the harvester knows which stored metadata record to update when a repository is reharvested.
- oai:datestamp
Enter here the latest revision date for this metadata record. The datestamp does not describe a date related to the resource itself (for this, use the date elements within the metadata part of the record). Rather, it refers only to the expression of the metadata in this record. Whenever the record is updated, the datestamp must be updated. Otherwise, incremental harvesting will not know that the record has changed and so will fail to reharvest it.
The <oai:metadata> part of the record is where the OLAC metadata record goes. Inside the <olac:olac> container, all of the metadata elements are optional and repeatable and may occur in any order. When you want to add a new metadata element, click at the end of the element you want to place it after. Then press Enter to open up a new line, type "<" to open a new tag, scroll through the pop-up list of available metadata elements, and press Enter to select the element you want. Also, you may start typing the tag name to narrow the selection list. As you are creating a metadata description, consult the OLAC Metadata Usage Guidelines [OLAC-Usage] for definitions of the available elements and examples of their use.
After you have created your first metadata record, use the Free-standing Metadata Service [OLAC-Free] to obtain an analysis of the completeness and quality of the metadata record. This is done as follows:
In <oXygen/> XML Editor, click at the beginning of the <olac:olac> tag and drag all the way to the end of the corresponding end tag in order to select the entire OLAC metadata record.
Press Ctrl+C (or select "Edit / Copy" from the menu bar) to copy the record.
Open the Free-standing Metadata Service by clicking on this link: [OLAC-Free].
Click on the "Clear the text area" button located below the submission form.
Click inside the text box, then press Ctrl+V (or select "Edit / Paste" from the menu bar) to paste the record into the submission form.
Click on the "Analyze" button located below the submission form. If the result shows validation errors, then click the browser's Back button to return to the submission form. Edit the contents to correct the errors and click "Analyze" again.
Scroll to the end of the resulting page to see the Metadata quality analysis section. It suggests things that can be added to the record in order to better follow OLAC's best practice recommendations for language resource description [OLAC-BPR] and thereby increase your repository's score with respect to the OLAC metadata metrics [OLAC-Metrics].
Once you have completed filling in the first record, you may find it helpful to turn it into a template for the remaining records. Make a copy of the complete <oai:record> and remove contents that are specific to first record, but leaving everything (both tags and contents) that you think are likely to be re-used in the other records. Then, every time you want to add a new record, start by pasting in a copy of this template record.
When your static repository is complete and is valid according to the schema validator in <oXygen/> XML Editor, you are ready to register with OLAC. The steps are as follows:
Post your repository on your institution's public web site. If you do not have access to a public web site, contact the OLAC administrators; it is possible for a static repository to be hosted on the OLAC site.
Go to the OLAC Archive Registration page [OLAC-Register] and paste the URL for your repository file into the text box and click the "Validate" button. This performs the same schema validation that you have been performing in <oXygen/>. In addition it tests for other conditions described in [OLAC-Repositories] that go beyond schema validation. If the final line says "FAILURE," you still have work to do. Make note of all the individual tests that failed and fix the offending content in the repository; then repeat the validation process. If the XML Schema Validation is failing, click on the "error logs" link to see the list of errors generated by the validator. The two validators report the errors differently, so look at both logs to get a fuller idea of what is wrong.
If the final line of the validation report says "SUCCESS," then you will be presented with a button that allows you to submit a registration request. After clicking the button, you should get an email confirmation of the submission. You will receive further notification after the repository has been reviewed against the criteria listed in section 6 of [OLAC-Repositories] and accepted for harvesting.