Skip to content

Latest commit

 

History

History
113 lines (84 loc) · 6.23 KB

03.a-map-metadata.md

File metadata and controls

113 lines (84 loc) · 6.23 KB

Mapping of Metadata

This document describes how to convert, map and homogenize community-specific structured and formated XML records to B2FIND schema JSON records. These JSON records are necessary to make metadata available with CKAN. Thus, this part provides the interface between the harvested data and the actual metadata catalogue. A vital part of the conversion step is the semantical mapping.

Environment

Ubuntu 14.04 server

Prerequisites

1. The script mdmanger.py

The python script mdmanger.py is used in the m mode for this training module.

2. Spreadsheet template

If you are a community interested in publishing your metadata within EUDAT-B2FIND, the description of the specific mapping to the B2FIND schema is needed. For this, the Mapping V n.m (YYMMDD) tab in the excel template Community-B2FIND.template.xlsx has to be filled out (see Specification of metadata). This is, however, not necessary for this training. We will continue with the fishproject as an example.

3. The mapping files

The mapping files are configuration files named according to the pattern <community>-<mdformat>.xml. We provide some examples in the mapfiles directory. If you already followed part 01.b, you will find your mapping for the fishproject here. The specific files define the mapping rules for XML records, originated from <community> and formated in <mdformat>, and turn them into well-formatted JSON files.

4. The XML samples

The mapping script expects the XML files that are to be mapped to reside in the directory oaidata/<Community>-<MDformat>.

The Mapping process

In this section, we focus on the technical part of the mapping, i.e. how to apply the script.

1. Configuration of the mapping

From part 01.b, you will already have mapping files for the fishproject. We also provide the mapping for the DataCite community and their ANDS collection:

  1. fishproject-oai_dc.xml for mapping the DublinCore elements in the XML records of the fishProject onto the B2FIND schema.
  2. datacite-oai_dc.xml for mapping the DublinCore elements in the DataCite XML records onto the B2FIND schema.

If the previous training module was executed successfully on your machine, the XML metadata files should be available:

  1. fishproject samples: If you managed to generate metadata as described in 01.b, the XML files should already reside in oaidata/fishproject-oai_dc/SET_1/xml. If not, create the latter directory and copy the provided samples to this directory:
$ mkdir -p oaidata/fishproject-oai_dc/SET_1/xml
$ cp samples/DC_examples/fishProject/xml/*.xml oaidata/fishproject-oai_dc/SET_1/xml
  1. datacite samples: If you already managed to harvest metadata as described in 02.b, the XML files should already reside in /var/lib/tomcat7/webapps/oai/WEB-INF/harvested_records/-oai_dc-ANDS-CENTRE-1 if you harvested them via the web application; or in /home/<user>/B2FIND-Training/oaidata/oaidata/datacite-oai_dc/ANDS.CENTRE-1/xml if you harvested them via the mdmanager script. If not, please create this directory and copy the provided samples into it:
mkdir -p oaidata/datacite-oai_dc/ANDS.CENTRE-1/xml
cp samples/DC_examples/ANDS.CENTRE/xml/*.xml oaidata/oaidata/datacite-oai_dc/ANDS.CENTRE-1/xml

To perform the mapping properly, you need execute the script in --mode m and you need to set the parameters as follows.

Parameter Description Option Pos. in request line Comments Example1 Example 2
Community Name of communtiy or project -c or --community 1 Is used in paths fishproject datacite
Source OAI-URL -s or --source 2 Must be set, but irrelevant for the mapping http://localhost:8181/oai/provider http://oai.datacite.org/oai
Verb The OAI-PMH request or verb --verb 3 Not relevant for the mapping ListIdentifiers ListRecords
MDprefix (OAI) meta data format --mdprefix 4 Set during OAI harvesting oai_dc oai_dc
Subset (OAI) subset --mdsubset 5 If not set during OAI harvesting, it's set to SET_# SET_1 ANDS.CENTRE-1

For example, for the fishProject (and with the subset sample, as set in module 01.b), the following script should be run:

$ ./mdmanager.py --mode m -c fishproject -s http://localhost:8181/oai/provider --mdsubset sample_1 --mdprefix oai_dc

Note This command will not do anything since there is no mapping file mapfiles/fishproject-oai_dc.xml.

./mdmanager.py --mode m -c fishproject -s http://localhost:8181/oai/provider --mdsubset sample_1 --mdprefix oai_dc
|- Run mode:   	Mapping
|- Start : 	2018-04-03 09:54:26


|- Mapping started : 2018-04-03 09:54:26

You can copy the general oai mapping:

cp mapfiles/datacite-oai_dc.xml mapfiles/fishproject-oai_dc.xml
Version:  	2.0
Run mode:   	Mapping
Start : 	2016-11-07 14:47:43


|- Mapping started : 2016-11-07 14:47:43
	[==========          ]     4 ( 50%) in 0 sec
	[====================]     8 (100%) in 0 sec
   	|- Finished   |@ 14:47:45   |
	| Provided | Mapped | Failed |
	|        8 |      8 |      0 |

You may have to correct and adapt your mapping to assure fault-free operation. If the program finished successfully, please check the resulting JSON files in your project directory, and analyse and discuss the resulting JSON records:

$ ls oaidata/fishproject-oai_dc/sample_1/json

Excercise Perform the mapping for the DataCite community. Note that here the OAI subset ANDS-CENTRE-1 must be specified.