Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ncas image v1.0 #36

Merged
merged 122 commits into from
Jan 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
122 commits
Select commit Hold shift + click to select a range
fb87ce3
First commit of image reader
joshua-hampton May 17, 2023
4ea4456
this is a test
Aug 7, 2023
493ebb2
Merge pull request #31 from cedadev/main
joshua-hampton Aug 7, 2023
e9ed8dd
_attrs_dict splits the line once at the first :
Aug 7, 2023
f6b688a
Add image reader to parse_file_header
Aug 7, 2023
0415305
Undo test comment
Aug 7, 2023
f2ffc5c
New file for NCAS image
Aug 7, 2023
ad9acf4
New file for NCAS image global attrs
Aug 7, 2023
b7026c7
#24 moving to use global attributes file
Aug 7, 2023
ea79ab9
#23 rename from image to photo
Aug 7, 2023
c3a7d96
#24 new plot file
Aug 7, 2023
2783d2d
#24 new image checks
Aug 8, 2023
0aa4d13
#24 removed plot/photo specific checks
Aug 8, 2023
72009e9
#24 tidying up after chat with Graham and Davey
Aug 8, 2023
aaef23e
#24 new rules for NCAS image
Aug 8, 2023
b2b2f32
# 24 import image reader
Aug 8, 2023
28e09a2
#24 import image reader
Aug 8, 2023
1b982ca
Merge branch 'ncas-image' of https://github.com/cedadev/checksit into…
Aug 8, 2023
fe3a8e4
#24 adapt to key/value format
Aug 9, 2023
8de3455
#24 import datetime
Aug 9, 2023
f33da82
#24 regex edits
Aug 9, 2023
2a77e88
#24 regex brackets correction
Aug 9, 2023
3f20290
#24 regex brackets correction
Aug 9, 2023
a661571
#24 swapping -args for -j due to newline/.
Aug 10, 2023
19acb9c
#24 specifying relation uuid length
Aug 10, 2023
e4c42dd
changing folder name
Aug 10, 2023
f8852d7
#24 adding warnings code
Aug 10, 2023
4ffac7a
#24 tidying up
Aug 10, 2023
a94fb16
#24 warnings for vocab_attrs
Aug 10, 2023
0cf0e53
#24 name warning
Aug 10, 2023
83b01ba
#24 relation warning
Aug 21, 2023
1d456ea
#24 rights warning
Aug 21, 2023
835b052
#24 WebStatement warning
Aug 21, 2023
ca064ee
#24 credit warning
Aug 21, 2023
22d12bc
#24 location must have at least one comma & space
Aug 21, 2023
68e4730
#24 location warning
Aug 21, 2023
edbbf1e
#24 Headline warning function
Aug 21, 2023
067c668
#24 check url exists
Aug 21, 2023
6671191
#24 url check ContributerIdentifier
Aug 21, 2023
7a58a2e
#24 WebStatement valid URL check
Aug 22, 2023
c4fa3d8
#24 relation url valid check
Aug 22, 2023
e4022a6
#24 change url valid checks to warnings
Aug 22, 2023
a38e7f7
#24 space optional in name regex
Aug 22, 2023
d75f6e5
#24 change list of names to a warning
Shanrahan16 Aug 22, 2023
bc010a7
#24 remove WebStatement valid URL check - regex ok
Shanrahan16 Aug 22, 2023
47a70c3
#24 title_check
Shanrahan16 Aug 22, 2023
73dd4f4
#24 compare Title to actual file name
Shanrahan16 Aug 23, 2023
2b61a16
#24 tidying up
Shanrahan16 Aug 23, 2023
d3f6f2f
#24 latitude/longitude range checks
Shanrahan16 Aug 23, 2023
ee27e80
#24 adding possiblility of - within lat/long regex
Shanrahan16 Aug 23, 2023
f9ae14f
#24 tidying up
Shanrahan16 Aug 24, 2023
bf2846d
#24 allow muliple checks for each metadata key
Shanrahan16 Aug 24, 2023
00b9519
#24 changing the yaml file to allow multiple
Shanrahan16 Aug 24, 2023
fb904ee
#24 tidying up
Shanrahan16 Aug 24, 2023
a982333
#24 correcting headline capital letter check
Shanrahan16 Aug 24, 2023
5af1750
#24 fixing title check
Shanrahan16 Aug 24, 2023
f8d32f3
#24 tidying up
Shanrahan16 Aug 24, 2023
deb379e
#24 test images
Shanrahan16 Aug 25, 2023
8e54b9c
#24 title data product warning
Shanrahan16 Aug 30, 2023
0182c3a
#24 error/warning wording
Shanrahan16 Aug 31, 2023
ff96edd
#24 new test images
Shanrahan16 Aug 31, 2023
324df5c
#24 test images
Shanrahan16 Aug 31, 2023
622fc13
#24 updating for errors & warnings being returned
Shanrahan16 Sep 4, 2023
d94b392
#24 tidying up
Shanrahan16 Sep 4, 2023
1cec480
#24 resolves list index - warnings for vocab_attrs
Shanrahan16 Sep 5, 2023
986e5d7
#24 addresses key error from inpt
Shanrahan16 Sep 5, 2023
49f92bf
#24 reducing traceback output when filepath wrong
Shanrahan16 Sep 5, 2023
14cd3da
#24 adding requests to requirements
Shanrahan16 Sep 6, 2023
911588f
#24 allow decimal altitudes-warning if not integer
Shanrahan16 Sep 6, 2023
54c3221
#24 location- allowing digits, hyphens, accents...
Shanrahan16 Sep 6, 2023
932903d
#24 ncas email warning
Shanrahan16 Sep 6, 2023
eb21897
#24 valid email error
Shanrahan16 Sep 6, 2023
bf0ef8a
#24 allow apostrophes, hyphens & accents in names
Shanrahan16 Sep 6, 2023
c3edfae
#24 all characters for names
Shanrahan16 Sep 6, 2023
036a815
#24 allow special characters in title
Shanrahan16 Sep 6, 2023
06671da
#24 combining if statements
Shanrahan16 Sep 7, 2023
9f4f201
#24 reorder so functn won't error out if <32 char
Shanrahan16 Sep 7, 2023
48462fe
#24 raise exception if no space in relation
Shanrahan16 Sep 7, 2023
156eafe
#24 changing Python error to a checksit error
Shanrahan16 Sep 7, 2023
f54cbec
#24 tidying up
Shanrahan16 Sep 7, 2023
bbc3f4f
#24 name characters warning
Shanrahan16 Sep 7, 2023
e3f1d3b
#24 renaming name-format regex
Shanrahan16 Sep 7, 2023
1470757
#24 missing comma
Shanrahan16 Sep 7, 2023
ca803c5
#24 name characters separate from format
Shanrahan16 Sep 7, 2023
d6ce055
#24 allow any metadata tags in image reader
Shanrahan16 Sep 7, 2023
4eb3a04
#24 removing tags dictionary as no longer used
Shanrahan16 Sep 7, 2023
4fe1e59
#24 test images
Shanrahan16 Sep 8, 2023
ae016de
#24 fixing relation url error output
Shanrahan16 Sep 8, 2023
c93b664
#24 tidying up
Shanrahan16 Sep 8, 2023
6373fdd
#24 more test images
Shanrahan16 Sep 8, 2023
d771550
#24 stopping empty warning being returned- url
Shanrahan16 Sep 8, 2023
0e8d1b5
#24 tidying up & stop url redirecting
Shanrahan16 Sep 8, 2023
58a9e45
#24 removing url requests.get()
Shanrahan16 Sep 21, 2023
5fe4f4d
#24 tidying up
Shanrahan16 Sep 21, 2023
0211340
Tightened lat/lon checks around edge of allowable values
joshua-hampton Nov 22, 2023
af1b08a
Change to regex checks for lat and lon for images
joshua-hampton Dec 1, 2023
73fe8ac
Auto-find specs for NCAS-IMAGE files
joshua-hampton Dec 1, 2023
23106b6
Merge branch 'main' into ncas-image
joshua-hampton Jan 10, 2024
e05e1e1
Create image extensions constant
joshua-hampton Jan 10, 2024
0c98439
Remove old comments
joshua-hampton Jan 10, 2024
0e1a8e3
Removed forgotten bracket
joshua-hampton Jan 10, 2024
45e4ccd
Initial upload of tests on images
joshua-hampton Jan 10, 2024
9db5006
Futher checks for test images
joshua-hampton Jan 11, 2024
0512475
Add skip spellcheck flag
joshua-hampton Jan 11, 2024
4fc8055
Add test_images checks to workflow
joshua-hampton Jan 11, 2024
e3b91af
Install udunits2 in workflow
joshua-hampton Jan 11, 2024
0930d16
Install checksit as part of workflow
joshua-hampton Jan 11, 2024
1861ad4
Troubleshooting workflow
joshua-hampton Jan 11, 2024
e597fd4
Troubleshooting workflow
joshua-hampton Jan 11, 2024
d3987fb
Troubleshooting - checking directory
joshua-hampton Jan 11, 2024
c6ba504
Troubleshooting - checking directory
joshua-hampton Jan 11, 2024
869ac10
Troubleshooting - checking directory
joshua-hampton Jan 11, 2024
5a93184
Checking file path of images
joshua-hampton Jan 11, 2024
5dd94ff
Checking file path of images
joshua-hampton Jan 11, 2024
e851b25
Checking file path of images
joshua-hampton Jan 11, 2024
6bfeb17
Checking file path of images exists
joshua-hampton Jan 11, 2024
65f1258
Checking file path of images
joshua-hampton Jan 11, 2024
44f218a
Troubleshooting - make sure file parser works as expected
joshua-hampton Jan 11, 2024
70995d8
Check exiftool exists
joshua-hampton Jan 11, 2024
272cab8
Check exiftool exists
joshua-hampton Jan 11, 2024
cf4c9dd
Restore to pre-test phase
joshua-hampton Jan 11, 2024
f0b4304
Install exiftool
joshua-hampton Jan 11, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,28 @@ jobs:
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Install udunits
run: |
cd /opt
curl -O https://downloads.unidata.ucar.edu/udunits/2.2.28/udunits-2.2.28.tar.gz
tar -xzvf udunits-2.2.28.tar.gz
cd udunits-2.2.28
./configure
make all install
ln -sf /opt/lib/* $LD_LIBRARY_PATH
- name: Install exiftool
run: |
sudo apt install libimage-exiftool-perl -y
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install flake8 black pytest
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
if [ -f requirements_dev.txt ]; then pip install -r requirements_dev.txt; fi
- name: Look for exiftool
run: |
which exiftool
- name: Test with pytest
run: |
python -m pytest -v tests/test_readers.py
export UDUNITS2_XML_PATH=/opt/share/udunits/udunits2.xml
python -m pytest -v tests/test_readers.py tests/test_images.py
48 changes: 39 additions & 9 deletions checksit/check.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,14 @@

from .cvs import vocabs, vocabs_prefix
from .rules import rules, rules_prefix
from .readers import pp, badc_csv, cdl, yml
from .readers import pp, badc_csv, cdl, yml, image
from .specs import SpecificationChecker
from .utils import get_file_base, extension, UNDEFINED
from .config import get_config
from .make_specs import make_amof_specs

AMOF_CONVENTIONS = ['"CF-1.6, NCAS-AMF-2.0.0"']
IMAGE_EXTENSIONS = ["png", "jpg", "jpeg"]
conf = get_config()


Expand Down Expand Up @@ -229,12 +230,16 @@ def check_file(self, file_path, template="auto", mappings=None, extra_rules=None

# tmpl = self.parse_file_header(template, auto_cache=auto_cache, verbose=verbose)

### Check for AMOF netCDF file and gather specs ###
if template == "auto" and file_path.split('.')[-1] == 'nc':
# Look for AMOF Convention string in Conventions global attr, if it exists
if ':Conventions' in file_content.cdl:
conventions = file_content.cdl.split(':Conventions =')[1].split(';')[0].strip()
if "NCAS-AMOF" in conventions or "NCAS-GENERAL" in conventions or "NCAS-AMF" in conventions:
### Check for NCAS data files and gather specs ###
# if template and specs are "default" values, check to see if
# file is an ncas file (assuming file name starts with instrument name)
if (template == "auto" and specs == None and
file_path.split("/")[-1].startswith("ncas-")):
# find appropriate specs depending on convention
if file_path.split(".")[-1] == "nc" and ":Conventions" in file_content.cdl:
conventions = file_content.cdl.split(":Conventions =")[1].split(";")[0].strip()
# NCAS-GENERAL file
if any(name in conventions for name in ["NCAS-GENERAL", "NCAS-AMF", "NCAS-AMOF"]):
if verbose:
print("\nNCAS-AMOF file detected, finding correct spec files")
print("Finding correct AMOF version...")
Expand All @@ -245,7 +250,7 @@ def check_file(self, file_path, template="auto", mappings=None, extra_rules=None
# check specs exist for that version
specs_dir = os.path.join(conf["settings"].get("specs_dir", "./specs"), f"groups/{spec_folder}")
if not os.path.exists(specs_dir):
if verbose: print(f"Specs for version {version_number} not found, attempting download...")
if verbose: print(f"Specs for version NCAS-GENERAL-{version_number} not found, attempting download...")
try:
vocabs_dir = os.path.join(conf["settings"].get("vocabs_dir", "./checksit/vocabs"), f"AMF_CVs/{version_number}")
cvs = urllib.request.urlopen(f"https://github.com/ncasuk/AMF_CVs/tree/v{version_number}/AMF_CVs")
Expand Down Expand Up @@ -280,7 +285,6 @@ def check_file(self, file_path, template="auto", mappings=None, extra_rules=None
sys.exit()
except:
raise


# get deployment mode and data product, to then get specs
deployment_mode = file_content.cdl.split(':deployment_mode =')[1].split(';')[0].strip().strip('"')
Expand All @@ -291,6 +295,30 @@ def check_file(self, file_path, template="auto", mappings=None, extra_rules=None
# don't need to do template check
template = "off"

# NCAS-RADAR (coming soon...)
# if "NCAS-Radar" in conventions

elif (file_path.split(".")[-1].lower() in IMAGE_EXTENSIONS and
"XMP-photoshop:Instructions" in file_content.global_attrs.keys()):
conventions = file_content.global_attrs["XMP-photoshop:Instructions"]
if "National Centre for Atmospheric Science Image Metadata Standard" in file_content.global_attrs["XMP-photoshop:Instructions"].replace("\n"," "):
if verbose:
print("\nNCAS-IMAGE file detected, finding correct spec files")
print("Finding correct IMAGE version...")
version_number = conventions.replace("\n"," ").split("Metadata Standard ")[1].split(":")[0]
spec_folder = f"ncas-image-{version_number}"
if verbose: print(f" {version_number}")
specs_dir = os.path.join(conf["settings"].get("specs_dir", "./specs"), f"groups/{spec_folder}")
if not os.path.exists(specs_dir):
print(f"[ERROR] specs for NCAS-IMAGE {version_number} can not be found.")
print("Aborting...")
sys.exit()
product = file_path.split('/')[-1].split('_')[3]
product_spec = f"{spec_folder}/amof-{product}"
specs = [product_spec, f"{spec_folder}/amof-image-global-attrs"]
template = "off"



if template == "off":
tmpl = template
Expand Down Expand Up @@ -404,6 +432,8 @@ def parse_file_header(self, file_path, auto_cache=False, verbose=False):
reader = badc_csv
elif ext in ("yml"):
reader = yml
elif ext.lower() in IMAGE_EXTENSIONS:
reader = image
else:
raise Exception(f"No known reader for file with extension: {ext}")

Expand Down
6 changes: 4 additions & 2 deletions checksit/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ def check_global_attrs(dct, defined_attrs=None, vocab_attrs=None, regex_attrs=No
errors.append(f"[global-attributes:**************:{attr}]: No value defined for attribute '{attr}'.")
else:
errors.extend(vocabs.check(vocab_attrs[attr], dct["global_attributes"].get(attr), label=f"[global-attributes:******:{attr}]***"))

for attr in regex_attrs:
if attr not in dct['global_attributes']:
errors.append(
Expand All @@ -123,7 +123,9 @@ def check_global_attrs(dct, defined_attrs=None, vocab_attrs=None, regex_attrs=No
elif is_undefined(dct['global_attributes'].get(attr)):
errors.append(f"[global-attributes:**************:{attr}]: No value defined for attribute '{attr}'.")
else:
errors.extend(rules.check(rules_attrs[attr], dct['global_attributes'].get(attr), label=f"[global-attributes:******:{attr}]***"))
rules_check_output = rules.check(rules_attrs[attr], dct['global_attributes'].get(attr), context=dct['inpt'], label=f"[global-attributes:******:{attr}]***")
warnings.extend(rules_check_output[1])
errors.extend(rules_check_output[0])


return errors, warnings
Expand Down
7 changes: 5 additions & 2 deletions checksit/readers/cdl.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
import re
import yaml
import subprocess as sp
import sys

from ..cvs import vocabs, vocabs_prefix

Expand Down Expand Up @@ -40,7 +41,8 @@ def _parse(self, inpt):

for s in self.CDL_SPLITTERS:
if s not in cdl_lines:
raise Exception(f"Invalid file or CDL contents provided: '{inpt[:100]}...'")
print(f"Please check your command - invalid file or CDL contents provided: '{inpt[:100]}...'")
sys.exit(1)

sections = self._get_sections(cdl_lines, split_patterns=self.CDL_SPLITTERS, start_at=1)

Expand Down Expand Up @@ -188,7 +190,8 @@ def to_yaml(self):
def to_dict(self):
return {"dimensions": self.dimensions,
"variables": self.variables,
"global_attributes": self.global_attrs}
"global_attributes": self.global_attrs,
"inpt": self.inpt}


def read(fpath, verbose=False):
Expand Down
60 changes: 60 additions & 0 deletions checksit/readers/image.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
import subprocess as sp
import yaml

def get_output(cmd):
subp = sp.Popen(cmd, shell=True, stdout=sp.PIPE, stderr=sp.PIPE)
return subp.stdout.read().decode("charmap"), subp.stderr.read().decode("charmap")


class ImageParser:

def __init__(self, inpt, verbose=False):
self.inpt = inpt
self.verbose = verbose
self.base_exiftool_arguments = ["exiftool", "-G1", "-j", "-c", "%+.6f"]
self._find_exiftool()
self._parse(inpt)

def _parse(self, inpt):
if self.verbose: print(f"[INFO] Parsing input: {inpt[:100]}...")
self.global_attrs = {}
exiftool_arguments = self.base_exiftool_arguments + [inpt]
exiftool_return_string = sp.check_output(exiftool_arguments)
raw_global_attrs = yaml.load(exiftool_return_string, Loader=yaml.SafeLoader)[0]
for tag_name in raw_global_attrs.keys():
value_type = type(raw_global_attrs[tag_name])
if value_type == list:
self.global_attrs[tag_name] = str(raw_global_attrs[tag_name][0])
else:
self.global_attrs[tag_name] = str(raw_global_attrs[tag_name])

def _find_exiftool(self):
if self.verbose: print("[INFO] Searching for exiftool...")
which_output, which_error = get_output("which exiftool")
if which_error.startswith("which: no exiftool in"):
msg = (
f"'exiftool' required to read image file metadata but cannot be found.\n"
f" Visit https://exiftool.org/ for information on 'exiftool'."
)
raise RuntimeError(msg)
else:
self.exiftool_location = which_output.strip()
if self.verbose: print(f"[INFO] Found exiftool at {self.exiftool_location}.")

def _attrs_dict(self,content_lines):
attr_dict = {}
for line in content_lines:
if self.verbose: print(f"WORKING ON LINE: {line}")
key_0 = line.split("=",1)[0].strip()
key = key_0[1:] #removes first character - unwanted quotation marks
value = line.split("=",1)[1].strip()
attr_dict[key] = value
return attr_dict

def to_dict(self):
return {"global_attributes": self.global_attrs, "inpt": self.inpt}


def read(fpath, verbose=False):
return ImageParser(fpath, verbose=verbose)

Loading
Loading