Skip to content

Latest commit

 

History

History
96 lines (71 loc) · 4.45 KB

03.b-validate-metadata.md

File metadata and controls

96 lines (71 loc) · 4.45 KB

Validation of Metadata

This document describes how to validate mapped metadata formatted in the B2FIND schema (or any other schema).

Environment

Ubuntu 14.04 server

Prerequisites

1. The script mdmanager.py

The python script mdmanager.py is used in mode ‘v’ for this training module, just as it was in the previous modules, see this module for more information about the scrtipt.

2. Spreadsheet template

In this section we also use the Mapping V n.m (YYMMDD) tab of the Community-B2FIND.template.xlsx spreadsheet. Typically, a version of this spreadsheet should already be created for your community or project by this stage (see section 01.a).

This is, however, not necessarily needed for this training.

3. The map files

We use the the same configuration files, named like <community>-<mdformat>.xml, that were already used in the previous module 03.a.

4. Some JSON samples

The validation script expects some mapped JSON files that need to be checked to reside in the directory oaidata/<projectname>/<subset>/json. If the corresponding mapping has been successfully excecuted in section 03.a, these files will be available.

The Validation process

Validition is executed using the option --mode v. Apart from this, the rest of the options are the same as those used for the making the mapping in the previous section. See the example below.

./mdmanager.py --mode v -c fishproject -s http://localhost:8181/oai/provider --mdsubset sample_1 --mdprefix oai_dc
Version:  	2.0
Run mode:   	Validating
Start : 	2016-10-31 14:29:52


|- Validating started : 2016-10-31 14:29:52
	[=====               ] 4 / 50% in 0 sec
	[==========          ] 8 / 100% in 0 sec

 Statistics of
	community	fishproject
	subset		sample_1
	# of records	8
  see in /home/rda/B2FIND-Training/oaidata/fishproject-oai_dc/sample_1/validation.stat

The file validation.stat contains statistical information about the coverage of the B2FIND fields.

$ less /home/rda/B2FIND-Training/oaidata/fishproject-oai_dc/sample_1/validation.stat

 Statistics of
        community       fishproject
        subset          sample_1
        # of records    8
  see as well /home/rda/B2FIND-Training/oaidata/fishproject-oai_dc/sample_1/validation.stat

 |-> Facet name       <-- XPATH                
  |- Mapped     | Validated | 
  |--     # |    % |     # |    % |
      | Value statistics:
      |- #Occ  : Value                          |
 ----------------------------------------------------------

 |-> title            <--       //dc:title/text()

  |--     8 |  100 |     8 |  100
      |- 1     : Haplochromis compressiceps          |
      |- 1     : Cyphotilapia frontosus              |
      |- 1     : Neolamprologus brichardi            |
      |- 1     : Lamprologus tretocephalus           |
      |- 1     : Neolamprologus multifasciatus       |
      |- 1     : Neolamprologus cylindricus          |
      |- 1     : Julidochromis ornatus               |
      |- 1     : Julidochromis regani                |
|-> notes            <--       string-join(distinct-values(//dc:description/text()), '\n')

  |--     0 |    0 |     0 |    0

.....

I.e. in this case the coverage of the facet title is 100 percent, or - in other words - to all eight datasets a title is assigned. Furthermore the used XPATH mapping rule is shown, here dc:title.

For the facet notes (Description) we have the reverse case: For none of the eight datasets can a value be assigned.

Excercise Analyse why notes is not properly mapped and update the mapfiles for generation (mapfiles/fishproject-oai_dc.csv) and mapping (mapfiles/fishproject-oai_dc.xml) such that a successful mapping of this facet can also be achieved.

The ouput of the validation file is used to fill the column E (XPATH mapping rule) and column G (Coverage (% of mapped datasets [in ...).

Excercise Perform the validation for the metadata harvested from the DataCite as well.

Note : If you successfully performed the last few modules (harvest and mapping procedure) for the datacite community, the mapped JSON files produced in the directory oaidata/datacite-oai_dc/ANDS.CENTRE-1/json/ can be used. If these files are not available for any reason, copy the JSON files from the samples path, as follows :

cp samples/DC_examples/ANDS.CENTRE/json/*.json oaidata/datacite-oai_dc/ANDS.CENTRE-1/json/