This document describes how to convert, map and homogenize community-specific structured and formated XML records to B2FIND schema JSON records. These JSON records are necessary to make metadata available with CKAN. Thus, this part provides the interface between the harvested data and the actual metadata catalogue. A vital part of the conversion step is the semantical mapping.
Ubuntu 14.04 server
The python script mdmanger.py
is used in the m
mode for this training module.
If you are a community interested in publishing your metadata within EUDAT-B2FIND, the description of the specific mapping to the B2FIND schema is needed. For this, the Mapping V n.m (YYMMDD) tab in the excel template Community-B2FIND.template.xlsx has to be filled out (see Specification of metadata). This is, however, not necessary for this training. We will continue with the fishproject as an example.
The mapping files are configuration files named according to the pattern <community>-<mdformat>.xml
. We provide some examples in the mapfiles
directory. If you already followed part 01.b, you will find your mapping for the fishproject here. The specific files define the mapping rules for XML records, originated from <community>
and formated in <mdformat>
, and turn them into well-formatted JSON files.
The mapping script expects the XML files that are to be mapped to reside in the directory oaidata/<Community>-<MDformat>
.
In this section, we focus on the technical part of the mapping, i.e. how to apply the script.
From part 01.b, you will already have mapping files for the fishproject. We also provide the mapping for the DataCite community and their ANDS collection:
- fishproject-oai_dc.xml for mapping the DublinCore elements in the XML records of the fishProject onto the B2FIND schema.
- datacite-oai_dc.xml for mapping the DublinCore elements in the DataCite XML records onto the B2FIND schema.
If the previous training module was executed successfully on your machine, the XML metadata files should be available:
- fishproject samples: If you managed to generate metadata as described in 01.b, the XML files should already reside in
oaidata/fishproject-oai_dc/SET_1/xml
. If not, create the latter directory and copy the provided samples to this directory:
$ mkdir -p oaidata/fishproject-oai_dc/SET_1/xml
$ cp samples/DC_examples/fishProject/xml/*.xml oaidata/fishproject-oai_dc/SET_1/xml
- datacite samples: If you already managed to harvest metadata as described in 02.b, the XML files should already reside in
/var/lib/tomcat7/webapps/oai/WEB-INF/harvested_records/-oai_dc-ANDS-CENTRE-1
if you harvested them via the web application; or in/home/<user>/B2FIND-Training/oaidata/oaidata/datacite-oai_dc/ANDS.CENTRE-1/xml
if you harvested them via themdmanager
script. If not, please create this directory and copy the provided samples into it:
mkdir -p oaidata/datacite-oai_dc/ANDS.CENTRE-1/xml
cp samples/DC_examples/ANDS.CENTRE/xml/*.xml oaidata/oaidata/datacite-oai_dc/ANDS.CENTRE-1/xml
To perform the mapping properly, you need execute the script in --mode m
and you need to set the parameters as follows.
Parameter | Description | Option | Pos. in request line | Comments | Example1 | Example 2 |
---|---|---|---|---|---|---|
Community | Name of communtiy or project | -c or --community | 1 | Is used in paths | fishproject | datacite |
Source | OAI-URL | -s or --source | 2 | Must be set, but irrelevant for the mapping | http://localhost:8181/oai/provider | http://oai.datacite.org/oai |
Verb | The OAI-PMH request or verb |
--verb | 3 | Not relevant for the mapping | ListIdentifiers | ListRecords |
MDprefix | (OAI) meta data format | --mdprefix | 4 | Set during OAI harvesting | oai_dc | oai_dc |
Subset | (OAI) subset | --mdsubset | 5 | If not set during OAI harvesting, it's set to SET_# | SET_1 | ANDS.CENTRE-1 |
For example, for the fishProject (and with the subset sample
, as set in module 01.b), the following script should be run:
$ ./mdmanager.py --mode m -c fishproject -s http://localhost:8181/oai/provider --mdsubset sample_1 --mdprefix oai_dc
Note This command will not do anything since there is no mapping file mapfiles/fishproject-oai_dc.xml
.
./mdmanager.py --mode m -c fishproject -s http://localhost:8181/oai/provider --mdsubset sample_1 --mdprefix oai_dc
|- Run mode: Mapping
|- Start : 2018-04-03 09:54:26
|- Mapping started : 2018-04-03 09:54:26
You can copy the general oai mapping:
cp mapfiles/datacite-oai_dc.xml mapfiles/fishproject-oai_dc.xml
Version: 2.0
Run mode: Mapping
Start : 2016-11-07 14:47:43
|- Mapping started : 2016-11-07 14:47:43
[========== ] 4 ( 50%) in 0 sec
[====================] 8 (100%) in 0 sec
|- Finished |@ 14:47:45 |
| Provided | Mapped | Failed |
| 8 | 8 | 0 |
You may have to correct and adapt your mapping to assure fault-free operation. If the program finished successfully, please check the resulting JSON files in your project directory, and analyse and discuss the resulting JSON records:
$ ls oaidata/fishproject-oai_dc/sample_1/json
Excercise
Perform the mapping for the DataCite community. Note that here the OAI subset ANDS-CENTRE-1
must be specified.