ESA CCI file checking #53

knappett · 2024-12-11T14:43:32Z

To add functionality to checksit for checking ESA CCI files, specs yaml files to check CCI file attributes and file names have been added.

The existing file name checking function - check_file_name - in generic.py was was hardcoded to NCAS file formats. A new function - check_generic_file_name - has therefore been added to enable file names with a user specified number of fields to be checked. The new check_generic_file_name function requires a specs yaml file to define vocab_checks, segregator, and extension fields. The 'spec_verbose' flag can also be set in order to output details of the vocab checks performed.

In order to enable the ESA CCI online vocabulary to be checked, the _load_from_url private method within the Vocabs class (cvs.py) has been modified to be more generic so that it can call either _load_from_url_ncas or _load_from_url_esacci, depending on the vocab url provided.

…nfig files orurl.

…ields so tha ant error messageis shown if the number of fields in the yaml file is exceeded.

…me parts against the number defined in the user supplied yaml file, in place of a try-except statment.

…specs comparison information is only printed to screen when spec_verbose is set in the yaml file.

…ages to generic.py and updated test_generic.

checksit/cvs.py

dwest77a

Four comments for a few things you can do differently, only the one about the vocab_list is a definite bug fix, the rest are all just suggestions.

dwest77a · 2024-12-13T14:37:08Z

checksit/cvs.py

@@ -25,26 +25,53 @@ def _load(self, vocab_id):
        vocab_file = os.path.join(vocabs_dir, f"{vocab_id}.json")
        self._vocabs[vocab_id] = json.load(open(vocab_file))

+    def _load_from_url_ncas(self, vocab_id_url):


Adding this comment which applies to all functions. Here's some things you can optionally add to make it easier to debug later:

Docstrings: At the start of the function add a section denoted by three quotes ("""docstring""") where you can write a description of what the function does.

[Optional] Type hints: Add hints for what the parameters of the function should be and what the function returns. E.g def add(a: int, b: int) -> int:

Throughout the repo there's currently a mix of docstrings or no docstrings - while I agree with both points, I think this could do with it's own dedicated PR to blitz through the whole repo

checksit/cvs.py

dwest77a · 2024-12-13T14:47:57Z

checksit/generic.py

+    # check against defined file extension
+
+    vocab_checks = vocab_checks or {}
+    try:


If segregator and extension are dicts, you can use:

seg = segregator.get("seg", "_") # Where the second parameter is the default return value in the case where the first one is not present in the dict

dwest77a · 2024-12-13T14:49:50Z

checksit/generic.py

+        else:
+            field=vocab_checks["field"+num]
+
+            if field.startswith('__vocabs__') or field.startswith('__URL__'):


Repeating the last comment on nested ifs, you can use continue to skip loops where something isn't true rather than having a nested set of conditions.

checksit/cvs.py

joshua-hampton · 2024-12-13T15:35:12Z

checksit/cvs.py

+            f"{vocab_id_url_base}/releases/latest"
+        ).url.split("/")[-1]
+        vocab_id_url = vocab_id_url.replace("__latest__", latest_version)
+        res = requests.get(vocab_id_url.replace("__URL__", "https://"))


I'm not sure the replace section in here is needed, as it should have been done at the start of the _load_from_url function (I can see that this duplication was there before though, so this needs to be checked).

…github, and reinstated the if 'latest' statement in the correct place. Also removed some unnecessary if/else indentation.

joshua-hampton · 2024-12-20T09:58:54Z

I'm happy to accept this as it is, with some of the outstanding comments (formatting, docstrings, type hints) being something I'll look at repo-wide after Christmas.

knappett added 8 commits October 23, 2024 10:57

Initial commit of esa-cci yml files

f0f8876

Working version to check standard esa cci file names against vocab co…

78e3e80

…nfig files orurl.

Moved esa cci yaml files into the esa-cci-v1.0 subdirectory.

4a7de35

Modified generic.py with try/except statement for checking filename f…

b914fc0

…ields so tha ant error messageis shown if the number of fields in the yaml file is exceeded.

Updated check_generic_file_name with a check of the number of file na…

445b0a4

…me parts against the number defined in the user supplied yaml file, in place of a try-except statment.

Added test_check_generic_file_name to test_generic.py.

80db5cb

Added keyword spec_verbose to check_generic_file_name to ensure that …

f0ef22c

…specs comparison information is only printed to screen when spec_verbose is set in the yaml file.

Added new ESA CCI filename specs files. Added more verbose error mess…

0a47805

…ages to generic.py and updated test_generic.

knappett requested review from joshua-hampton and agstephens December 11, 2024 14:45

Fixed issue with removesuffix.

5d1ef42

dwest77a reviewed Dec 13, 2024

View reviewed changes