This document describes how to validate mapped metadata formatted in the B2FIND schema (or any other schema).
Ubuntu 14.04 server
The python script mdmanager.py
is used in mode ‘v’
for this training module, just as it was in the previous modules, see this module for more information about the scrtipt.
In this section we also use the Mapping V n.m (YYMMDD) tab of the Community-B2FIND.template.xlsx spreadsheet. Typically, a version of this spreadsheet should already be created for your community or project by this stage (see section 01.a).
This is, however, not necessarily needed for this training.
We use the the same configuration files, named like <community>-<mdformat>.xml
, that were already used in the previous module 03.a.
The validation script expects some mapped JSON files that need to be checked to reside in the directory oaidata/<projectname>/<subset>/json
. If the corresponding mapping has been successfully excecuted in section 03.a, these files will be available.
Validition is executed using the option --mode v
. Apart from this, the rest of the options are the same as those used for the making the mapping in the previous section. See the example below.
./mdmanager.py --mode v -c fishproject -s http://localhost:8181/oai/provider --mdsubset sample_1 --mdprefix oai_dc
Version: 2.0
Run mode: Validating
Start : 2016-10-31 14:29:52
|- Validating started : 2016-10-31 14:29:52
[===== ] 4 / 50% in 0 sec
[========== ] 8 / 100% in 0 sec
Statistics of
community fishproject
subset sample_1
# of records 8
see in /home/rda/B2FIND-Training/oaidata/fishproject-oai_dc/sample_1/validation.stat
The file validation.stat
contains statistical information about the coverage of the B2FIND fields.
$ less /home/rda/B2FIND-Training/oaidata/fishproject-oai_dc/sample_1/validation.stat
Statistics of
community fishproject
subset sample_1
# of records 8
see as well /home/rda/B2FIND-Training/oaidata/fishproject-oai_dc/sample_1/validation.stat
|-> Facet name <-- XPATH
|- Mapped | Validated |
|-- # | % | # | % |
| Value statistics:
|- #Occ : Value |
----------------------------------------------------------
|-> title <-- //dc:title/text()
|-- 8 | 100 | 8 | 100
|- 1 : Haplochromis compressiceps |
|- 1 : Cyphotilapia frontosus |
|- 1 : Neolamprologus brichardi |
|- 1 : Lamprologus tretocephalus |
|- 1 : Neolamprologus multifasciatus |
|- 1 : Neolamprologus cylindricus |
|- 1 : Julidochromis ornatus |
|- 1 : Julidochromis regani |
|-> notes <-- string-join(distinct-values(//dc:description/text()), '\n')
|-- 0 | 0 | 0 | 0
.....
I.e. in this case the coverage of the facet title is 100 percent, or - in other words - to all eight datasets a title
is assigned. Furthermore the used XPATH mapping rule is shown, here dc:title
.
For the facet notes
(Description) we have the reverse case: For none of the eight datasets can a value be assigned.
Excercise
Analyse why notes
is not properly mapped and update the mapfiles for generation (mapfiles/fishproject-oai_dc.csv
) and mapping (mapfiles/fishproject-oai_dc.xml
) such that a successful mapping of this facet can also be achieved.
The ouput of the validation file is used to fill the column E (XPATH mapping rule
) and column G (Coverage (% of mapped datasets [in ...
).
Excercise
Perform the validation for the metadata harvested from the DataCite
as well.
Note : If you successfully performed the last few modules (harvest and mapping procedure) for the
datacite
community, the mapped JSON files produced in the directoryoaidata/datacite-oai_dc/ANDS.CENTRE-1/json/
can be used. If these files are not available for any reason, copy the JSON files from the samples path, as follows :
cp samples/DC_examples/ANDS.CENTRE/json/*.json oaidata/datacite-oai_dc/ANDS.CENTRE-1/json/