By Andrew Battista and Stephen Balogh
This is the third in a series of seven blog posts on the development of GeoBlacklight at NYU. We’ve incorporated many ideas after a great week of learning at Geo4LibCamp at Stanford University (Jan. 25-27) (see Darren Hardy’s summary here). For an overview of the project and an outline to the posts, which are now on a somewhat “revised” timeline, click here or the GeoBlacklight project page.
About Geospatial Metadata
The Geo4LibCamp un-conference at Stanford was filled with incredible discussions about the challenges of developing a spatial data infrastructure and wrangling metadata into a form that works with GeoBlacklight. Creating and managing geospatial metadata, more than any other issue, occupied people’s attention at Geo4LibCamp. How can we make metadata efficiently? How compliant with existing standards should it be? And are there easy ways to leverage the metadata generated by other institutions to bolster one’s own collection? In this post, we are going to talk a little bit about how we’ve been addressing these questions at NYU up until this point.
About GeoBlacklight Metadata
As they developed the GeoBlacklight project, Stanford’s librarians implemented a custom GeoBlacklight metadata schema to facilitate the discovery of geospatial data. The schema is closely related to Dublin Core and is a redaction of much longer and more granular geospatial metadata standards, most notably ISO 19139 and FGDC. In terms of building collections, there are several key challenges to producing metadata records that conform to these standards. To start, they are difficult to author from scratch and difficult to edit or transform. Worse, it’s unrealistic to expect non-experts or researchers who would like to contribute geospatial data to create or alter ISO or FGDC-compliant metadata records themselves. John Huck’s tweet from the conference sums it up.
#Geo4LibCamp day 2: uncontroversial view: ISO is too heavy
— John Huck (@jhuck_AB) January 26, 2016
The great intervention of GeoBlacklight metadata is that it implies a distinction between metadata for documentation or posterity and metadata for the sake of discovery. In short, GeoBlacklight is a minimalist conflation of several geospatial metadata standards and technological interfaces, especially Apache Solr. You need it to make a GeoBlacklight instance “work.” Thus, even though it is a reduced standard, you still need to find efficient ways of generating GeoBlacklight metadata records if you want to develop a collection of spatial data.
How does GeoBlacklight Metadata Work?
GeoBlacklight (the application) is inextricable from the metadata behind it. This may sounds like a simplistic statement to those who are used to working with discovery systems in libraries, but for us, it was an epiphany and an important part of understanding how GeoBlacklight functions. For those who haven’t seen it, here’s a breakdown of the main elements in a GeoBlacklight metadata record:
The schema incorporates key elements needed for discovery, including subject, place name (dct:spatial) and file type. There are additional elements as well that pertain to the spatial discovery and Solr index, as seen in this sample complete GeoBlacklight record (click it to enlarge).
Although Darren Hardy and Kim Durante do a great job of explaining what each element in the set means in their Code4Lib article, a few of them need more commentary. The dct:references field accords with a key-value schema, which accounts for multiple elements that get exposed in the GeoBlacklight interface. For instance, the “http://schema.org/url” key links back to the archival copy of the data in NYU’s institutional repository (the FDA). Simply put, whatever URL you place in the record after the “http://schema.org/url” key is the value (in this case a link to a record) will be prominent on the item result within GeoBlacklight. Similarly, the “http://www.opengis.net/def/serviceType/ogc/wfs” key links to the URL specific to our deployment of Geoserver, which allows for the map to be previewed and downloaded in multiple formats within GeoBlacklight.
The other very important element in this set is the dct:spatial field. The value of this field is always a string that comes from the GeoNames ontology, but there are other items in each GeoNames entry that propagate elsewhere in the metadata record. Specifically, from this entry, you can take the dct:relation field values and the georss_box values. We won’t continue to belabor the anatomy of GeoBlacklight metadata records here, especially because Andrew volunteered to write a more detailed document later that provides commentary on the standard. Suffice it to say that while simpler and more compact than most geospatial metadata standards, GeoBlacklight still requires some work to author.
Ways of Authoring Metadata
There are multiple ways to author GeoBlacklight metadata, some of which were covered in detail at Geo4LibCamp, the Digital Library Federation, and elsewhere. Kim Durante uses a combination of editing with ESRI’s ArcCatalog and transforming existing ISO or FGDC metadata documents (in XML format) with a series of XSLT workflows. Other librarians build upon these transforms and patch them together with metadata alterations in ArcCatalog, while others, such as the CIC group, use GeoNetwork to generate metadata that eventually will be GeoBlacklight compliant. In short, there is no perfect way to create GeoBlacklight metadata from scratch, and it inevitably requires a lot of work.
At NYU, we’ve begun deploying a somewhat hacked version of Omeka to generate GeoBlacklight metadata from scratch. Omeka is a great tool because it allows us to mediate the metadata creation process in several different ways. Most people have encountered Omeka as a archive web publishing platform; we are not using it as such. It’s a means to and end, and that end is getting GeoBlacklight metadata to index into our instance of GeoBlacklight. We delivered a presentation to the OpenGeoPortal Metadata Working Group on January 12, 2016 about our process with Omeka (slides are available here and the recording of us is available here).
What Omeka Does
We’re going to ease up on the blow-by blow description of Omeka in this post and encourage you to listen to the recording and look at the slides. However, here are a few summary points. The most important part of Omeka is that it allows us to call on existing API’s to promote authority control as we catalog GIS data. In particular, the GeoNames ontology, which has a robust API, can populate other relevant parts of the record just by having a person select a unique place name. In cases where we want to encourage multiple values for enhanced discovery (i.e., LCSH subject headings), users can easily add fields to account for multiple values. Finally, and most importantly, Omeka exports records in the .json format we need to index into GeoBlacklight.
There are other benefits as well:
- Batch Uploads: Like other institutions, we tend to collect items in batches or sets, and in many cases, much of the metadata is consistent and easily replicated. An example is India Census Data from ML InfoMap. Using a CSV, we can populate all of the relevant fields on a spreadsheet and then use the Batch Import plug-in to load files into. Then, all we have to do is go to the metadata record, “ping” the GeoNames field, and then export the records.
- Customizations of User Interface: We’ve altered the element prompts to provide people with clear instructions that help them fill out the record from scratch. Our goal here is to anticipate a self-deposit form function.
- Ability to Distribute Accounts: We often have student workers, of varying levels of engagement and commitment, who have extra time to help with GIS metadata creation. Omeka serves as an effective way to collaborate on larger-scale project, and its ease of use allows student workers to log in and enrich bare-bones metadata, for instance. It’s very easy to create accounts and passwords and give them to students, researchers, or whoever. No other library systems are implicated.
We’ve created a practice user account for Omeka, and we invite anyone in the community to give it a shot. Just send us an e mail for the credentials. Further, the GeoBlacklight plug-in is housed on Stephen’s GitHub account and is available for anyone to download (although you will need to alter the code base to have it conform to your institution).
Alternative Tools for Authoring Metadata
Although Omeka has worked well for us so far and meets several of our needs, we don’t imagine it to be a long-term solution to creating GeoBlacklight metadata. We’re very interested in the progress of the OpenGeoPortal Metadata Toolkit. There are other options, like Elevator and Catmandu, and more that show promise. We hope that more streamlined options will become available and are even working on a homegrown, longterm solution that we can incorporate into our infrastructure.
Depositing into OpenGeoMetadata
The final step of our metadata creation process is sharing. We have chosen to push all of our GeoBlacklight metadata records into OpenGeoMetadata, a consortium of repository records managed by Jack Reed at Stanford University. Essentially, OpenGeoMetadata is a series of GitHub repositories, from which anyone can index records into a local instance of GeoBlacklight. The goal is to facilitate cross-institution collaboration and collectively grow the amount of geospatial data that can be discovered with a single search. From each repository, administrators can index a set of GeoBlacklight records into their Solr core, and the items will be prominent within the application. If you’re interested in contributing to OpenGeoMetadata, get in touch with Jack Reed at Stanford.
Next steps for the GeoBlacklight community
The Geo4LibCamp at Stanford was an unqualified success because we learned so much about making geospatial metadata, but also because we were able to crystallize some areas for improvement. The first area is that we need to foster best-practices for generating GeoBlacklight metadata in the community. Ultimately, the creation of GeoBlacklight metadata is a user experience issue; unless metadata is created according to a consistent standard of completeness, the application will behave differently depending on which record is accessed. We are excited to see how the community continues to handle these challenges.
Coming up Next
In the next post, we’ll be talking in much more detail about the technological infrastructure behind this collection process. Stay tuned.