Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESA CCI file checking #53

Merged
merged 11 commits into from
Dec 20, 2024
Merged

ESA CCI file checking #53

merged 11 commits into from
Dec 20, 2024

Conversation

knappett
Copy link
Collaborator

@knappett knappett commented Dec 11, 2024

To add functionality to checksit for checking ESA CCI files, specs yaml files to check CCI file attributes and file names have been added.

The existing file name checking function - check_file_name - in generic.py was was hardcoded to NCAS file formats. A new function - check_generic_file_name - has therefore been added to enable file names with a user specified number of fields to be checked. The new check_generic_file_name function requires a specs yaml file to define vocab_checks, segregator, and extension fields. The 'spec_verbose' flag can also be set in order to output details of the vocab checks performed.

In order to enable the ESA CCI online vocabulary to be checked, the _load_from_url private method within the Vocabs class (cvs.py) has been modified to be more generic so that it can call either _load_from_url_ncas or _load_from_url_esacci, depending on the vocab url provided.

…ields so tha ant error messageis shown if the number of fields in the yaml file is exceeded.
…me parts against the number defined in the user supplied yaml file, in place of a try-except statment.
…specs comparison information is only printed to screen when spec_verbose is set in the yaml file.
…ages to generic.py and updated test_generic.
checksit/cvs.py Outdated Show resolved Hide resolved
Copy link

@dwest77a dwest77a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Four comments for a few things you can do differently, only the one about the vocab_list is a definite bug fix, the rest are all just suggestions.

checksit/cvs.py Outdated
@@ -25,26 +25,53 @@ def _load(self, vocab_id):
vocab_file = os.path.join(vocabs_dir, f"{vocab_id}.json")
self._vocabs[vocab_id] = json.load(open(vocab_file))

def _load_from_url_ncas(self, vocab_id_url):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this comment which applies to all functions. Here's some things you can optionally add to make it easier to debug later:

  • Docstrings: At the start of the function add a section denoted by three quotes ("""docstring""") where you can write a description of what the function does.
  • [Optional] Type hints: Add hints for what the parameters of the function should be and what the function returns. E.g def add(a: int, b: int) -> int:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throughout the repo there's currently a mix of docstrings or no docstrings - while I agree with both points, I think this could do with it's own dedicated PR to blitz through the whole repo

checksit/cvs.py Outdated Show resolved Hide resolved
# check against defined file extension

vocab_checks = vocab_checks or {}
try:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If segregator and extension are dicts, you can use:

seg = segregator.get("seg", "_") # Where the second parameter is the default return value in the case where the first one is not present in the dict

else:
field=vocab_checks["field"+num]

if field.startswith('__vocabs__') or field.startswith('__URL__'):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeating the last comment on nested ifs, you can use continue to skip loops where something isn't true rather than having a nested set of conditions.

checksit/cvs.py Outdated Show resolved Hide resolved
checksit/cvs.py Outdated Show resolved Hide resolved
checksit/cvs.py Outdated Show resolved Hide resolved
checksit/cvs.py Outdated Show resolved Hide resolved
f"{vocab_id_url_base}/releases/latest"
).url.split("/")[-1]
vocab_id_url = vocab_id_url.replace("__latest__", latest_version)
res = requests.get(vocab_id_url.replace("__URL__", "https://"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the replace section in here is needed, as it should have been done at the start of the _load_from_url function (I can see that this duplication was there before though, so this needs to be checked).

…github, and reinstated the if 'latest' statement in the correct place. Also removed some unnecessary if/else indentation.
@joshua-hampton
Copy link
Collaborator

I'm happy to accept this as it is, with some of the outstanding comments (formatting, docstrings, type hints) being something I'll look at repo-wide after Christmas.

@joshua-hampton joshua-hampton merged commit 3ee22df into main Dec 20, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants