Skip to content

Commit

Permalink
Merge pull request #77 from linkml/enrich
Browse files Browse the repository at this point in the history
Schema enricher and Docker config
  • Loading branch information
cmungall authored Jul 6, 2022
2 parents be6d7f0 + c0295e4 commit 9e75445
Show file tree
Hide file tree
Showing 10 changed files with 190 additions and 10 deletions.
29 changes: 29 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# set base image (host OS)
FROM python:3.9

# https://stackoverflow.com/questions/53835198/integrating-python-poetry-with-docker
ENV YOUR_ENV=${YOUR_ENV} \
PYTHONFAULTHANDLER=1 \
PYTHONUNBUFFERED=1 \
PYTHONHASHSEED=random \
PIP_NO_CACHE_DIR=off \
PIP_DISABLE_PIP_VERSION_CHECK=on \
PIP_DEFAULT_TIMEOUT=100 \
POETRY_VERSION=1.1.13

# System deps:
RUN pip install "poetry==$POETRY_VERSION"

# set the working directory in the container
WORKDIR /work

RUN pip install schema-automator

#COPY poetry.lock pyproject.toml /code/

# Project initialization:
#RUN poetry install


# command to run on container start
CMD [ "bash" ]
41 changes: 41 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
VERSION = $(shell git tag | tail -1)

.PHONY: all clean test

all: clean test target/soil_meanings.yaml


clean:
rm -rf target/soil_meanings.yaml
rm -rf target/soil_meanings_generated.yaml
Expand Down Expand Up @@ -63,3 +66,41 @@ target/availabilities_g_s_strain_202112151116_org_meanings_curated.yaml: target/
# this can be used outside the poetry environment
bin/schemauto:
echo `poetry run which schemauto` '"$$@"' > $@ && chmod +x $@


################################################
#### Commands for building the Docker image ####
################################################

IM=linkml/schema-automator

docker-build-no-cache:
@docker build --no-cache -t $(IM):$(VERSION) . \
&& docker tag $(IM):$(VERSION) $(IM):latest

docker-build:
@docker build -t $(IM):$(VERSION) . \
&& docker tag $(IM):$(VERSION) $(IM):latest

docker-build-use-cache-dev:
@docker build -t $(DEV):$(VERSION) . \
&& docker tag $(DEV):$(VERSION) $(DEV):latest

docker-clean:
docker kill $(IM) || echo not running ;
docker rm $(IM) || echo not made

docker-publish-no-build:
@docker push $(IM):$(VERSION) \
&& docker push $(IM):latest

docker-publish-dev-no-build:
@docker push $(DEV):$(VERSION) \
&& docker push $(DEV):latest

docker-publish: docker-build
@docker push $(IM):$(VERSION) \
&& docker push $(IM):latest

docker-run:
@docker run -v $(PWD):/work -w /work -ti $(IM):$(VERSION)
4 changes: 2 additions & 2 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@
# -- Project information -----------------------------------------------------

project = 'Schema Automator'
copyright = '2022, Chris Mungall'
author = 'Chris Mungall, Harshad Hegde'
copyright = '2022, LinkML Developers'
author = 'Chris Mungall, Harshad Hegde, Mark Miller'

# The full version, including alpha/beta/rc tags
# release = '0.1.4'
Expand Down
19 changes: 18 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
@@ -1,12 +1,29 @@
LinkML Schema Automator
============================================

Schema Automator is a toolkit for bootstrapping and automatically enhancing LinkML schemas from a variety of sources
Schema Automator is a toolkit for bootstrapping and automatically enhancing schemas from a variety of sources.

Use cases include:

1. Inferring an initial schema or data dictionary from a dataset that is a collection of TSVs
2. Automatically annotating schema elements and enumerations using the BioPortal annotator
3. Importing from a language like RDFS/OWL

The primary output of Schema Automator is a `LinkML Schema <https://linkml.io/linkml>`_. This can be converted to other
schema frameworks, including:

* JSON-Schema
* SQL DDL
* SHACL
* ShEx
* RDFS/OWL
* Python dataclasses or Pydantic

.. toctree::
:maxdepth: 3
:caption: Contents:

index
introduction
install
cli
Expand Down
19 changes: 19 additions & 0 deletions docs/install.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
Installation
======

Direct Installation
------------

``schema-automator`` and its components require Python 3.9 or greater.
Expand All @@ -13,3 +16,19 @@ To check this works:
schemauto --help
Running via Docker
------------

You can use the `Schema Automator Docker Container <https://hub.docker.com/r/linkml/schema-automator>`_

To start a shell

.. code:: bash
docker run -v $PWD:/work -w /work -ti linkml/schema-automator
Within the shell you should see all your files, and you should have access:

.. code:: bash
schemauto --help
16 changes: 9 additions & 7 deletions docs/introduction.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
LinkML Schema Automator
.. _introduction:

Introduction
=======================

This is a toolkit that assists with generating and enhancing schemas and data models from a variety
Expand All @@ -17,17 +19,17 @@ See :ref:`generalizers`

Generalizers allow you to *bootstrap* a schema by generalizing from existing data files

- TSVs and spreadsheets
- SQLite databases
- RDF instance graphs
* TSVs and spreadsheets
* SQLite databases
* RDF instance graphs

Importing from alternative modeling framework
Importing from alternative modeling frameworks
---------------------------------

See :ref:`importers`

- OWL (but this only works for schema-style OWL)
- JSON-Schema
* OWL (but this only works for schema-style OWL)
* JSON-Schema

In future other frameworks will be supported

Expand Down
19 changes: 19 additions & 0 deletions schema_automator/annotators/schema_annotator.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,25 @@ def annotate_schema(self, schema: Union[SchemaDefinition, str], curie_only=True)

return sv.schema

def enrich(self, schema: Union[SchemaDefinition, str]) -> SchemaDefinition:
sv = SchemaView(schema)
oi = self.ontology_implementation
for elt_name, elt in sv.all_elements().items():
curies = [sv.get_uri(elt)]
for rel, ms in sv.get_mappings().items():
curies += ms
for x in curies:
print(f"X={x}")
if elt.description:
break
try:
defn = oi.get_definition_by_curie(x)
if defn:
elt.description = defn
except Exception:
pass
return sv.schema


@click.command()
@click.argument('schema')
Expand Down
17 changes: 17 additions & 0 deletions schema_automator/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -245,6 +245,23 @@ def annotate_schema(schema: str, input: str, output: str, curie_only: bool, **ar
write_schema(schema, output)


@main.command()
@click.argument('schema')
@click.option('--input', '-i', help="OAK input ontology selector")
@output_option
def enrich_schema(schema: str, input: str, output: str, **args):
"""
Annotate all elements of a schema
Requires Bioportal API key
"""
impl = get_implementation_from_shorthand(input)
logging.basicConfig(level=logging.INFO)
annr = SchemaAnnotator(impl)
schema = annr.enrich(schema)
write_schema(schema, output)


@main.command()
@click.argument('schema')
@output_option
Expand Down
4 changes: 4 additions & 0 deletions tests/resources/so-mini.obo
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[Term]
id: SO:0000704
name: gene
def: "A region (or regions) that includes all of the sequence elements necessary to encode a functional transcript. A gene may include regulatory regions, transcribed regions and/or other functional sequence regions." []
32 changes: 32 additions & 0 deletions tests/test_annotators/test_schema_enricher.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# -*- coding: utf-8 -*-

import logging
import os
import unittest
from linkml.utils.schema_builder import SchemaBuilder
from linkml_runtime.dumpers import yaml_dumper
from linkml_runtime.linkml_model import SchemaDefinition, EnumDefinition, PermissibleValue
from oaklib.implementations import BioportalImplementation
from oaklib.selector import get_implementation_from_shorthand

from schema_automator.annotators.schema_annotator import SchemaAnnotator
from linkml.generators.yamlgen import YAMLGenerator
from tests import INPUT_DIR, OUTPUT_DIR


class SchemaEnricherTestCase(unittest.TestCase):

def setUp(self) -> None:
impl = get_implementation_from_shorthand(os.path.join(INPUT_DIR, "so-mini.obo"))
self.annotator = SchemaAnnotator(impl)

def test_enrich(self):
s = SchemaDefinition(id='test', name='test')
sb = SchemaBuilder(s)
sb.add_class('Gene', class_uri="SO:0000704").add_slot('part_of')
s = self.annotator.enrich(sb.schema)
#print(yaml_dumper.dumps(s))
assert s.classes['Gene'].description.startswith("A region")



0 comments on commit 9e75445

Please sign in to comment.