Skip to content

Commit

Permalink
Merge pull request #136 from PDOK/PDOK/validate_empty_geoms
Browse files Browse the repository at this point in the history
Check for empty geometries
  • Loading branch information
kad-korpem authored Dec 3, 2024
2 parents 2b5f1b6 + a9661c2 commit 8baf13f
Show file tree
Hide file tree
Showing 7 changed files with 105 additions and 24 deletions.
52 changes: 28 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

Geopackages are a data format that have a deliberately broad application, so many of the requirements are dependend on your use.

The PDOK geopackage validator is used by [PDOK](https://www.pdok.nl/). PDOK is part of the Dutch government. This geopackage validator is used to validate a [set of requirements](#what-does-it-do) to make sure geopackages adhere to our standardized ETL pipeline. It is possible to use this for your own purposes as described [here](https://github.com/PDOK/geopackage-validator/issues/115#issuecomment-1529488733). The validations will not change (except for bugfixes); **new validations are always added to the list**. In case you are looking for a more generic validator. These do exist and can be found:
The PDOK geopackage validator is used by [PDOK](https://www.pdok.nl/). PDOK is part of the Dutch government. This geopackage validator is used to validate a [set of requirements](#what-does-it-do) to make sure geopackages adhere to our standardized ETL pipeline. It is possible to use this for your own purposes as described [here](https://github.com/PDOK/geopackage-validator/issues/115#issuecomment-1529488733). The validations will not change (except for bugfixes); **new validations are always added to the list**. In case you are looking for a more generic validator. These do exist and can be found:

- [teamengine](https://cite.opengeospatial.org/teamengine) (official OGC, Java)
- [teamengine Github](https://github.com/opengeospatial/teamengine)
Expand All @@ -13,49 +13,50 @@ The PDOK geopackage validator is used by [PDOK](https://www.pdok.nl/). PDOK is p

## Table of Contents

- [geopackage-validator](#geopackage-validator)
- [geopackage-validator](#pdok-geopackage-validator)
- [Table of Contents](#table-of-contents)
- [What does it do](#what-does-it-do)
- [Geopackage versions](#geopackage-versions)
- [Installation](#installation)
- [Docker](#docker-installation)
- [Usage](#usage)
- [RQ8 Validation](#local-rq8-validation)
- [Show validations](#local-show-validations)
- [Generate table definitions](#local-generate-table-definitions)
- [RQ8 Validation](#rq8-validation)
- [Show validations](#show-validations)
- [Generate table definitions](#generate-table-definitions)
- [Local development](#local-development)
- [Usage](#usage-1)
- [Docker run](#docker-run)
- [Python console](#python-console)
- [Code style](#code-style)
- [Tests](#tests)
- [Releasing](#releasing)

## TL;DR Commands

Either run through [docker](#docker) or [locally](#local).
Either run through [docker](#docker) or [locally](#local).

### Docker

Validate a GeoPackage with the default set of validation rules:

```sh
gpkg_path=relative/path/to/the.gpkg
docker run -v "$(pwd)":/gpkg --rm pdok/geopackage-validator validate --gpkg-path "/gpkg/${gpkg_path}"
docker run -v "$(pwd)":/gpkg --rm pdok/geopackage-validator validate --gpkg-path "/gpkg/${gpkg_path}"
```

Validate a GeoPackage with the default set of validation rules including a schema:

```sh
schema_path=relative/path/to/the/schema.json
gpkg_path=relative/path/to/the.gpkg
docker run -v "$(pwd)":/gpkg --rm pdok/geopackage-validator validate --gpkg-path "/gpkg/${gpkg_path}" --table-definitions-path "/gpkg/${schema_path}"
docker run -v "$(pwd)":/gpkg --rm pdok/geopackage-validator validate --gpkg-path "/gpkg/${gpkg_path}" --table-definitions-path "/gpkg/${schema_path}"
```

Generate a schema:
Generate a schema:

```sh
schema_path=relative/path/to/the/schema.json
gpkg_path=relative/path/to/the.gpkg
docker run -v "$(pwd)":/gpkg --rm pdok/geopackage-validator generate-definitions --gpkg-path "/gpkg/${gpkg_path}" > "$schema_path"
docker run -v "$(pwd)":/gpkg --rm pdok/geopackage-validator generate-definitions --gpkg-path "/gpkg/${gpkg_path}" > "$schema_path"
```

### Local
Expand All @@ -64,23 +65,23 @@ For a local setup we require/tested against python > 3.6 and gdal = 3.4.

```sh
gpkg_path=relative/path/to/the.gpkg
geopackage-validator validate --gpkg-path "/gpkg/${gpkg_path}"
geopackage-validator validate --gpkg-path "/gpkg/${gpkg_path}"
```

Validate a GeoPackage with the default set of validation rules including a schema:

```sh
schema_path=relative/path/to/the/schema.json
gpkg_path=relative/path/to/the.gpkg
geopackage-validator validate --gpkg-path "/gpkg/${gpkg_path}" --table-definitions-path "/gpkg/${schema_path}"
geopackage-validator validate --gpkg-path "/gpkg/${gpkg_path}" --table-definitions-path "/gpkg/${schema_path}"
```

Generate a schema:
Generate a schema:

```sh
schema_path=relative/path/to/the/schema.json
gpkg_path=relative/path/to/the.gpkg
geopackage-validator generate-definitions --gpkg-path "/gpkg/${gpkg_path}" > "$schema_path"
geopackage-validator generate-definitions --gpkg-path "/gpkg/${gpkg_path}" > "$schema_path"
```

## What does it do
Expand Down Expand Up @@ -111,20 +112,22 @@ The current checks are (see also the 'show-validations' command):
| RQ21 | All layer and column names shall not be longer than 57 characters. |
| RQ22 | Only the following EPSG spatial reference systems are allowed: 28992, 3034, 3035, 3040, 3041, 3042, 3043, 3044, 3045, 3046, 3047, 3048, 3049, 3857, 4258, 4326, 4936, 4937, 5730, 7409. |
| RQ23 | Geometry should be valid and simple. |
| RQ24 | Geometry should not be empty (e.g. 'POINT EMPTY', represented as 'POINT(NaN NaN)'). |
| RC17 | It is recommended to name all GEOMETRY type columns 'geom'. |
| RC18 | It is recommended to give all GEOMETRY type columns the same name. |
| RC19 | It is recommended to only use multidimensional geometry coordinates (elevation and measurement) when necessary. |
| RC20 | It is recommended that all (MULTI)POLYGON geometries have a counter-clockwise orientation for their exterior ring, and a clockwise direction for all interior rings. |
| RC20 | It is recommended that all (MULTI)POLYGON geometries have a counter-clockwise orientation for their exterior ring, and a clockwise direction for all interior rings. |
| UNKNOWN_WARNINGS | It is recommended that the unexpected (GDAL) warnings are looked into. |

\* Legacy requirements are only executed with the validate command when explicitly requested in the validation set.
\** Since version 0.8.0 the recommendations are part of the same sequence as the requirements. From now on a check will always maintain the integer part of the code. Even if at a later time the validation type can shift between requirement and recommendation.
\** Since version 0.8.0 the recommendations are part of the same sequence as the requirements. From now on a check will always maintain the integer part of the code. Even if at a later time the validation type can shift between requirement and recommendation.

An explanation in Dutch with a reason for each rule can be found [here](https://www.pdok.nl/voor-data-aanbieders#:~:text=Regels%20in%20detail).

## Geopackage versions

The Geopackage validator support the following Geopackage versions:

- 1.4
- 1.3.1
- 1.3
Expand All @@ -133,11 +136,12 @@ The Geopackage validator support the following Geopackage versions:
## Installation

This package requires:

- [GDAL](https://gdal.org/) version >= 3.2.1.
- [Spatialite](https://www.gaia-gis.it/fossil/libspatialite/index) version >= 5.0.0
- And python >= 3.8 to run.

We recommend using the docker image. When above requirements are met the package can be installed using pip (`pip install pdok-geopackage-validator`).
We recommend using the docker image. When above requirements are met the package can be installed using pip (`pip install pdok-geopackage-validator`).

### Docker Installation

Expand Down Expand Up @@ -167,7 +171,7 @@ To validate RQ8 you have to generate definitions first.

```bash
docker run -v ${PWD}:/gpkg --rm pdok/geopackage-validator geopackage-validator generate-definitions --gpkg-path /path/to/file.gpkg
````
```

### Validate

Expand Down Expand Up @@ -402,14 +406,14 @@ Options:

## Local development

We advise using docker-compose for local development. This allows live editing and testing code with the correct gdal/ogr version with spatialite 5.0.0.
First build the local image with your machines user id and group id:
We advise using docker-compose for local development. This allows live editing and testing code with the correct gdal/ogr version with spatialite 5.0.0.
First build the local image with your machines user id and group id:

```bash
docker-compose build --build-arg USER_ID=`id -u` --build-arg GROUP_ID=`id -g`
```

### Usage
### Docker run

There will be a script you can run like this:

Expand All @@ -422,7 +426,7 @@ to point the docker-compose to other files, you can add or edit the volumes in t

### Python console

Ipython is available in the docker:
Ipython is available in the docker:

```bash
docker-compose run --rm validator ipython
Expand All @@ -435,7 +439,7 @@ work on it, run the following command periodically:

```bash
docker-compose run --rm validator black .
```
```

### Tests

Expand Down
4 changes: 4 additions & 0 deletions geopackage_validator/validations/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@
ValidGeometryValidator,
ValidGeometryValidatorV0,
)
from geopackage_validator.validations.geometry_empty_check import (
EmptyGeometryValidator,
)
from geopackage_validator.validations.layerfeature_check import (
OGRIndexValidator,
NonEmptyLayerValidator,
Expand Down Expand Up @@ -62,6 +65,7 @@
"GpkgGeometryTypeNameValidator",
"GeometryTypeEqualsGpkgDefinitionValidator",
"PolygonWindingOrderValidator",
"EmptyGeometryValidator",
# Recommendations
"GeomColumnNameValidator",
"GeomColumnNameEqualValidator",
Expand Down
45 changes: 45 additions & 0 deletions geopackage_validator/validations/geometry_empty_check.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
from typing import Iterable, Tuple
from geopackage_validator.validations import validator
from geopackage_validator import utils

SQL_EMPTY_TEMPLATE = """SELECT count(row_id) AS count, row_id
FROM(
SELECT
cast(rowid AS INTEGER) AS row_id
FROM "{table_name}" WHERE ST_IsEmpty("{column_name}") = 1
);"""


def query_geometry_empty(dataset, sql_template) -> Iterable[Tuple[str, str, int, int]]:
columns = utils.dataset_geometry_tables(dataset)

for table_name, column_name, _ in columns:
validations = dataset.ExecuteSQL(
sql_template.format(table_name=table_name, column_name=column_name)
)
for count, row_id in validations:
yield table_name, column_name, count, row_id
dataset.ReleaseResultSet(validations)


class EmptyGeometryValidator(validator.Validator):
"""Geometries should not be empty."""

code = 24
level = validator.ValidationLevel.ERROR
message = "Found empty geometry in table: {table_name}, column {column_name}, {count} {count_label}, example id {row_id}"

def check(self) -> Iterable[str]:
result = query_geometry_empty(self.dataset, SQL_EMPTY_TEMPLATE)

return [
self.message.format(
table_name=table_name,
column_name=column_name,
count=count,
count_label=("time" if count == 1 else "times"),
row_id=row_id,
)
for table_name, column_name, count, row_id in result
if count > 0
]
Binary file added tests/data/test_geometry_empty.gpkg
Binary file not shown.
1 change: 1 addition & 0 deletions tests/test_validate.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ def test_determine_validations_to_use_none():
"RQ21",
"RQ22",
"RQ23",
"RQ24",
"RC17",
"RC18",
"RC19",
Expand Down
20 changes: 20 additions & 0 deletions tests/validations/test_geometry_empty_check.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
from geopackage_validator.utils import open_dataset
from geopackage_validator.validations.geometry_empty_check import (
EmptyGeometryValidator,
)


def test_with_gpkg_empty():
dataset = open_dataset("tests/data/test_geometry_empty.gpkg")
result = list(EmptyGeometryValidator(dataset).check())
assert len(result) == 1
assert (
result[0]
== "Found empty geometry in table: stations, column geom, 45 times, example id 129"
)


def test_with_gpkg_allcorrect():
dataset = open_dataset("tests/data/test_allcorrect.gpkg")
result = list(EmptyGeometryValidator(dataset).check())
assert len(result) == 0
7 changes: 7 additions & 0 deletions tests/validations/test_geometry_valid_check.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,13 @@ def test_with_gpkg_valid_simple():
assert checks[0][4] == 1


def test_with_gpkg_empty():
# geometries that are empty are still considered valid
dataset = open_dataset("tests/data/test_geometry_empty.gpkg")
checks = list(query_geometry_valid(dataset, SQL_VALID_TEMPLATE))
assert len(checks) == 0


def test_with_gpkg_allcorrect():
dataset = open_dataset("tests/data/test_allcorrect.gpkg")
checks = list(query_geometry_valid(dataset, SQL_VALID_TEMPLATE))
Expand Down

0 comments on commit 8baf13f

Please sign in to comment.