Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/browser db #29

Closed
wants to merge 33 commits into from
Closed
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
c38a13b
Initial failing browser db test
SvenMarcus May 2, 2023
ca7f4df
Passes first DB test. Needs refactoring!
SvenMarcus May 3, 2023
5542972
Begin implementing repository
SvenMarcus May 4, 2023
a525a1d
Initial failing browser db test
SvenMarcus May 2, 2023
3f0e307
Passes first DB test. Needs refactoring!
SvenMarcus May 3, 2023
c908ad8
Begin implementing repository
SvenMarcus May 4, 2023
aaf6fec
Refactor DB test. Make repository setup async
SvenMarcus May 5, 2023
0801e99
Merge branch 'feature/browser-db' of https://github.com/SvenMarcus/oc…
SvenMarcus May 8, 2023
4e625e8
Work on test setup
SvenMarcus May 11, 2023
91ca4df
Refactor fixtures. Begin refactor towards running browser
SvenMarcus May 16, 2023
6a321eb
Factory responsible for starting browsers.
SvenMarcus May 19, 2023
9e57fee
Remove RedirectMap from workspace endpoint. Restoring factory control…
SvenMarcus May 23, 2023
1406a15
Use FastAPI dependency injection
SvenMarcus Jun 5, 2023
a0877cf
Update docker-compose.yml and init.sh for MongoDB
SvenMarcus Jun 5, 2023
25683e0
Working Docker Browsers. Subprocesses are crashing
SvenMarcus Jun 6, 2023
9124bf3
Working subprocess browsers
SvenMarcus Jun 7, 2023
fe176f1
Try ports instead of taking from set
SvenMarcus Jun 12, 2023
59ae366
Mark launch tests as integration
SvenMarcus Jun 12, 2023
f7cdf42
Add simple test to ping browser. Remove unused Port class
SvenMarcus Jun 13, 2023
f16cff5
Build browse-ocrd docker image in CI
SvenMarcus Jun 13, 2023
d1a0167
Introduce first method for repository
SvenMarcus Jun 13, 2023
fd4767b
Fix mypy failure
SvenMarcus Jun 13, 2023
94a5966
Clean unreachable browsers on startup. Refactor workspace endpoint
SvenMarcus Jun 14, 2023
546aecd
Catch exception when killing browser fails
SvenMarcus Jun 19, 2023
98d03ce
Pass SubProcessBrowser Tests. Refactor towards common port binding fu…
SvenMarcus Jun 21, 2023
a8bf2b8
Better, but still insufficient, handling of native browser launches
SvenMarcus Jun 23, 2023
4efba42
Fix duplicate window bug
SvenMarcus Jul 14, 2023
037633c
Fix the tests
SvenMarcus Jul 14, 2023
de3c815
Passes mypy
SvenMarcus Jul 14, 2023
18519c0
Introduce Fixture facade
SvenMarcus Jul 18, 2023
5554c85
Refactor existing tests for Fixture API
SvenMarcus Jul 18, 2023
9246c61
Refactor existing tests for Fixture API
SvenMarcus Jul 18, 2023
f6babea
Remove unnecessary markers
SvenMarcus Jul 19, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/workflows/test-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,9 @@ jobs:
python-version: ${{ env.PYTHON_VERSION }}
cache: 'pip'

- name: Build browse-ocrd Docker image
run: make build-browse-ocrd-docker

- name: Install dependencies using pip
run: pip install -e ".[dev]"

Expand Down
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@
authorized_keys
__pycache__/
.python-version
.pdm.toml
.pdm-python
.pdm.lock
5 changes: 5 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,11 @@ build:
pull:
docker pull $(TAGNAME)


build-browse-ocrd-docker:
docker build -t ocrd-browser:latest -f docker-browse-ocrd/Dockerfile docker-browse-ocrd
Comment on lines +14 to +15
Copy link
Member

@bertsky bertsky Jun 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should advertise this in make help and readme?

Also, shouldn't we make this a dependency for make test straight away? (If the image already exists and nothing in the build context changed, then the build cache will skip rebuild.)

EDIT: does not make sense – pytest under docker run precludes docker mode. So conversely, we need this as a dependency when running tests without make test (i.e. via native nox or pytest directly)...



define HELP
cat <<"EOF"
Targets:
Expand Down
17 changes: 17 additions & 0 deletions docker-browse-ocrd/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
FROM python:3.7

RUN apt-get update \
&& apt-get install -y --no-install-recommends libcairo2-dev libgtk-3-bin libgtk-3-dev libglib2.0-dev libgtksourceview-3.0-dev libgirepository1.0-dev gir1.2-webkit2-4.0 pkg-config cmake \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
&& apt-get install -y --no-install-recommends libcairo2-dev libgtk-3-bin libgtk-3-dev libglib2.0-dev libgtksourceview-3.0-dev libgirepository1.0-dev gir1.2-webkit2-4.0 pkg-config cmake \
&& apt-get install -o Acquire::Retries=3 -y --no-install-recommends libcairo2-dev libgtk-3-bin libgtk-3-dev libglib2.0-dev libgtksourceview-3.0-dev libgirepository1.0-dev gir1.2-webkit2-4.0 pkg-config cmake \

&& pip3 install -U setuptools \
&& pip3 install browse-ocrd

ENV GDK_BACKEND broadway
ENV BROADWAY_DISPLAY :5

EXPOSE 8085
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
EXPOSE 8085
EXPOSE 8085
VOLUME /data


COPY init.sh /init.sh

RUN chmod +x /init.sh

CMD ["/init.sh"]
5 changes: 5 additions & 0 deletions docker-browse-ocrd/init.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#!/usr/bin/env bash

set -x
nohup broadwayd :5 &
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why nohup? This will (attempt to) create a nohup.out (which at runtime we might not even have write access to). Also, why should broadwayd be kept alive if the shell running init.sh (and browse-ocrd) is gone? (Sounds like an invitation for zombies to me...)

Probably not relevant though, because when browse-ocrd and thus init.sh exits, the whole container should stop as well.

(In contrast, in the subprocess mode, we explicitly kill $! the broadwayd after browse-ocrd exits.)

browse-ocrd /data/mets.xml
bertsky marked this conversation as resolved.
Show resolved Hide resolved
25 changes: 25 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ services:
environment:
MONITOR_PORT_LOG: ${MONITOR_PORT_LOG}
CONTROLLER: "${CONTROLLER_HOST}:${CONTROLLER_PORT_SSH}"
DB_CONNECTION: "mongodb://${DB_ROOT_USER:-root}:${DB_ROOT_PASSWORD:-root_password}@ocrd-database:27017"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to be picky, but we already use DB_* in top-level .env for the MySQL database in Kitodo, and it clashes on DB_ROOT_PASSWORD (but not DB_ROOT_USER). Do we simply tie these two databases to the same credentials here, or rather create a separate set of variables?


ports:
- ${MONITOR_PORT_WEB}:5000
Expand All @@ -37,5 +38,29 @@ services:
# DOZZLE_USERNAME=
# DOZZLE_PASSWORD=

ocrd-database:
image: "mongo:latest"
container_name: ocrd-database
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the fixed container_name here? IIUC it should also work having Docker daemon assign a default name (based on the Docker network, as all other services), but still referencing MongoDB by the service name (i.e. Docker should map service to host names).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me it does work without container_name (keeping the MongoDB URL as is).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait. I do get an occasional failure to start mongo-express now (Could not connect to database using connectionString: mongodb://root:1234@ocrd-database:27017/") – despite the condition: service_started...


environment:
MONGO_INITDB_ROOT_USERNAME: ${DB_ROOT_USER:-root}
MONGO_INITDB_ROOT_PASSWORD: ${DB_ROOT_PASSWORD:-root_password}

volumes:
- db-volume:/data/db


mongo-express:
image: mongo-express:latest
depends_on:
- ocrd-database
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- ocrd-database
ocrd-database
condition: service_started

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see afdc359

ports:
- 8081:8081
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the external port be controlled by an environment variable (which can be set via .env, esp. in the superrepository)? Or is mongo-express only for testing/debugging, i.e. will never get enabled by ocrd_kitodo/docker-compose.yml anyway?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- 8081:8081
- ${MONITOR_PORT_DBE:-8081}:8081

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see f5db85d

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Native port 8081 is occupied on my server. Perhaps we should make this configurable as MONITOR_PORT_DBE?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see f5db85d

environment:
ME_CONFIG_MONGODB_ADMINUSERNAME: ${DB_ROOT_USER:-root}
ME_CONFIG_MONGODB_ADMINPASSWORD: ${DB_ROOT_PASSWORD:-root_password}
ME_CONFIG_MONGODB_SERVER: ocrd-database

volumes:
db-volume:
shared:
1 change: 1 addition & 0 deletions init.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ fi
export OCRD_BROWSER__MODE=native
export OCRD_BROWSER__WORKSPACE_DIR=/data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does seem to work:

Suggested change
export OCRD_BROWSER__WORKSPACE_DIR=/data
export OCRD_BROWSER__WORKSPACE_DIR=/data/ocr-d

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see cb5e065

export OCRD_BROWSER__PORT_RANGE="[9000,9100]"
export OCRD_BROWSER__DB_CONNECTION_STRING=$DB_CONNECTION
export OCRD_LOGVIEW__PORT=$MONITOR_PORT_LOG
export OCRD_CONTROLLER__JOB_DIR=/run/lock/ocrd.jobs
export OCRD_CONTROLLER__HOST=$CONTROLLER_HOST
Expand Down
20 changes: 5 additions & 15 deletions ocrdbrowser/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,33 +5,23 @@
OcrdBrowser,
OcrdBrowserClient,
OcrdBrowserFactory,
filter_owned,
in_other_workspaces,
in_same_workspace,
launch,
stop_all,
stop_owned_in_workspace,
)
from ._docker import DockerOcrdBrowserFactory
from ._port import NoPortsAvailableError
from ._subprocess import SubProcessOcrdBrowserFactory
from ._client import HttpBrowserClient
from ._docker import DockerOcrdBrowser, DockerOcrdBrowserFactory
from ._port import NoPortsAvailableError
from ._subprocess import SubProcessOcrdBrowser, SubProcessOcrdBrowserFactory

__all__ = [
"Channel",
"ChannelClosed",
"DockerOcrdBrowser",
"DockerOcrdBrowserFactory",
"HttpBrowserClient",
"NoPortsAvailableError",
"OcrdBrowser",
"OcrdBrowserClient",
"OcrdBrowserFactory",
"SubProcessOcrdBrowser",
"SubProcessOcrdBrowserFactory",
"filter_owned",
"launch",
"in_other_workspaces",
"in_same_workspace",
"stop_all",
"stop_owned_in_workspace",
"workspace",
]
70 changes: 4 additions & 66 deletions ocrdbrowser/_browser.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@


class OcrdBrowser(Protocol):
def process_id(self) -> str:
...

def address(self) -> str:
...

Expand All @@ -18,9 +21,6 @@ def workspace(self) -> str:
def client(self) -> OcrdBrowserClient:
...

async def start(self) -> None:
...

async def stop(self) -> None:
...

Expand All @@ -46,67 +46,5 @@ def open_channel(self) -> AsyncContextManager[Channel]:


class OcrdBrowserFactory(Protocol):
def __call__(self, owner: str, workspace_path: str) -> OcrdBrowser:
async def __call__(self, owner: str, workspace_path: str) -> OcrdBrowser:
...


BrowserProcesses = set[OcrdBrowser]


async def launch(
workspace_path: str,
owner: str,
browser_factory: OcrdBrowserFactory,
running_browsers: BrowserProcesses | None = None,
) -> OcrdBrowser:
running_browsers = running_browsers or set()
owned_processes = filter_owned(owner, running_browsers)
in_workspace = in_same_workspace(workspace_path, owned_processes)

if in_workspace:
return in_workspace.pop()

return await start_process(browser_factory, workspace_path, owner)


def in_same_workspace(
workspace_path: str, browser_processes: BrowserProcesses
) -> BrowserProcesses:
workspace_path = path.abspath(workspace_path)
return {
p for p in browser_processes if path.abspath(p.workspace()) == workspace_path
}


def in_other_workspaces(
workspace_path: str, browser_processes: BrowserProcesses
) -> BrowserProcesses:
workspace_path = path.abspath(workspace_path)
return {p for p in browser_processes if p.workspace() != workspace_path}


def filter_owned(owner: str, running_processes: BrowserProcesses) -> BrowserProcesses:
return {p for p in running_processes if p.owner() == owner}


async def stop_all(owned_processes: BrowserProcesses) -> None:
async with asyncio.TaskGroup() as group:
for p in owned_processes:
group.create_task(p.stop())


async def stop_owned_in_workspace(
owner: str, workspace: str, browsers: set[OcrdBrowser]
) -> set[OcrdBrowser]:
owned = filter_owned(owner, browsers)
in_workspace = in_same_workspace(workspace, owned)
await stop_all(in_workspace)
return in_workspace


async def start_process(
process_factory: OcrdBrowserFactory, workspace_path: str, owner: str
) -> OcrdBrowser:
process = process_factory(owner, workspace_path)
await process.start()
return process
13 changes: 10 additions & 3 deletions ocrdbrowser/_client.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
from __future__ import annotations

import logging

from types import TracebackType
from typing import AsyncContextManager, Type, cast

Expand Down Expand Up @@ -66,9 +68,14 @@ def __init__(self, address: str) -> None:
self.address = address

async def get(self, resource: str) -> bytes:
async with httpx.AsyncClient(base_url=self.address) as client:
response = await client.get(resource)
return response.content
try:
async with httpx.AsyncClient(base_url=self.address) as client:
response = await client.get(resource)
return response.content
except Exception as ex:
logging.error(f"Tried to connect to {self.address}")
logging.error(f"Requested resource {resource}")
raise ConnectionError from ex

def open_channel(self) -> AsyncContextManager[Channel]:
return WebSocketChannel(self.address + "/socket")
99 changes: 68 additions & 31 deletions ocrdbrowser/_docker.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
from typing import Any

from ._browser import OcrdBrowser, OcrdBrowserClient
from ._port import Port
from ._port import NoPortsAvailableError
from ._client import HttpBrowserClient

_docker_run = "docker run --rm -d --name {} -v {}:/data -p {}:8085 ocrd-browser:latest"
Expand All @@ -19,51 +19,53 @@ async def _run_command(cmd: str, *args: Any) -> asyncio.subprocess.Process:
return await asyncio.create_subprocess_shell(
command,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
)


class DockerOcrdBrowser:
def __init__(self, host: str, port: Port, owner: str, workspace: str) -> None:
self._host = host
self._port = port
def __init__(
self, owner: str, workspace: str, address: str, process_id: str
) -> None:
self._owner = owner
self._workspace = path.abspath(workspace)
self.id: str | None = None
self._workspace = workspace
self._address = address
self._process_id: str = process_id

def process_id(self) -> str:
return self._process_id

def address(self) -> str:
return f"{self._host}:{self._port}"
return self._address

def workspace(self) -> str:
return self._workspace

def owner(self) -> str:
return self._owner

async def start(self) -> None:
cmd = await _run_command(
_docker_run, self._container_name(), self._workspace, self._port.get()
)
self.id = str(cmd.stdout).strip()

async def stop(self) -> None:
cmd = await _run_command(
_docker_stop, self._container_name(), self.workspace(), self._port.get()
)
cmd = await _run_command(_docker_stop, self._process_id)

if cmd.returncode != 0:
logging.info(
f"Stopping container {self.id} returned exit code {cmd.returncode}"
f"Stopping container {self.process_id} returned exit code {cmd.returncode}"
)

self._port.release()
self.id = None

def client(self) -> OcrdBrowserClient:
return HttpBrowserClient(self.address())

def _container_name(self) -> str:
workspace = path.basename(self.workspace())
return f"ocrd-browser-{self.owner()}-{workspace}"

def _container_name(owner: str, workspace: str) -> str:
workspace = path.basename(workspace)
return f"ocrd-browser-{owner}-{workspace}"


async def log_from_stream(stream: asyncio.StreamReader | None) -> None:
if not stream:
return

logging.info(await stream.read())


class DockerOcrdBrowserFactory:
Expand All @@ -72,16 +74,51 @@ def __init__(self, host: str, available_ports: set[int]) -> None:
self._ports = available_ports
self._containers: list[DockerOcrdBrowser] = []

def __call__(self, owner: str, workspace_path: str) -> OcrdBrowser:
container = DockerOcrdBrowser(
self._host, Port(self._ports), owner, workspace_path
)
self._containers.append(container)
return container
async def __call__(self, owner: str, workspace_path: str) -> OcrdBrowser:
abs_workspace = path.abspath(workspace_path)

container: DockerOcrdBrowser | None = None
for port in self._ports:
cmd = await _run_command(
_docker_run,
_container_name(owner, abs_workspace),
abs_workspace,
port,
)

return_code = await self.wait_for(cmd)
if return_code != 0:
continue

container = DockerOcrdBrowser(
owner,
abs_workspace,
f"{self._host}:{port}",
await self._read_container_id(cmd),
)
self._containers.append(container)
return container

raise NoPortsAvailableError()

async def wait_for(self, cmd: asyncio.subprocess.Process) -> int:
return_code = await cmd.wait()
await log_from_stream(cmd.stderr)

return return_code

async def _read_container_id(self, cmd: asyncio.subprocess.Process) -> str:
stdout = cmd.stdout
container_id = ""
if stdout:
container_id = str(await stdout.read()).strip()

return container_id

async def stop_all(self) -> None:
running_ids = [c.id for c in self._containers if c.id]
running_ids = [c.process_id() for c in self._containers]
if running_ids:
await _run_command(_docker_kill, " ".join(running_ids))
cmd = await _run_command(_docker_kill, " ".join(running_ids))
await cmd.wait()

self._containers = []
Loading