-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESA CCI file checking #53
Conversation
…nfig files orurl.
…ields so tha ant error messageis shown if the number of fields in the yaml file is exceeded.
…me parts against the number defined in the user supplied yaml file, in place of a try-except statment.
…specs comparison information is only printed to screen when spec_verbose is set in the yaml file.
…ages to generic.py and updated test_generic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Four comments for a few things you can do differently, only the one about the vocab_list is a definite bug fix, the rest are all just suggestions.
checksit/cvs.py
Outdated
@@ -25,26 +25,53 @@ def _load(self, vocab_id): | |||
vocab_file = os.path.join(vocabs_dir, f"{vocab_id}.json") | |||
self._vocabs[vocab_id] = json.load(open(vocab_file)) | |||
|
|||
def _load_from_url_ncas(self, vocab_id_url): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this comment which applies to all functions. Here's some things you can optionally add to make it easier to debug later:
- Docstrings: At the start of the function add a section denoted by three quotes ("""docstring""") where you can write a description of what the function does.
- [Optional] Type hints: Add hints for what the parameters of the function should be and what the function returns. E.g def add(a: int, b: int) -> int:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throughout the repo there's currently a mix of docstrings or no docstrings - while I agree with both points, I think this could do with it's own dedicated PR to blitz through the whole repo
# check against defined file extension | ||
|
||
vocab_checks = vocab_checks or {} | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If segregator and extension are dicts, you can use:
seg = segregator.get("seg", "_") # Where the second parameter is the default return value in the case where the first one is not present in the dict
else: | ||
field=vocab_checks["field"+num] | ||
|
||
if field.startswith('__vocabs__') or field.startswith('__URL__'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Repeating the last comment on nested ifs, you can use continue to skip loops where something isn't true rather than having a nested set of conditions.
f"{vocab_id_url_base}/releases/latest" | ||
).url.split("/")[-1] | ||
vocab_id_url = vocab_id_url.replace("__latest__", latest_version) | ||
res = requests.get(vocab_id_url.replace("__URL__", "https://")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure the replace section in here is needed, as it should have been done at the start of the _load_from_url
function (I can see that this duplication was there before though, so this needs to be checked).
…github, and reinstated the if 'latest' statement in the correct place. Also removed some unnecessary if/else indentation.
I'm happy to accept this as it is, with some of the outstanding comments (formatting, docstrings, type hints) being something I'll look at repo-wide after Christmas. |
To add functionality to checksit for checking ESA CCI files, specs yaml files to check CCI file attributes and file names have been added.
The existing file name checking function - check_file_name - in generic.py was was hardcoded to NCAS file formats. A new function - check_generic_file_name - has therefore been added to enable file names with a user specified number of fields to be checked. The new check_generic_file_name function requires a specs yaml file to define vocab_checks, segregator, and extension fields. The 'spec_verbose' flag can also be set in order to output details of the vocab checks performed.
In order to enable the ESA CCI online vocabulary to be checked, the _load_from_url private method within the Vocabs class (cvs.py) has been modified to be more generic so that it can call either _load_from_url_ncas or _load_from_url_esacci, depending on the vocab url provided.