Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/hooks track operation #739

Merged
merged 82 commits into from
Dec 4, 2023
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
82 commits
Select commit Hold shift + click to select a range
bf647a8
added the track_operations.py file
melodyyzh Feb 15, 2023
d330db1
added util.py file
melodyyzh Feb 15, 2023
e15f0be
modified track_operations.py
melodyyzh Feb 15, 2023
3345cb1
Merge branch 'master' into feature/hooks-TrackOperation
melodyyzh Mar 1, 2023
f18c07e
Updated code for modern signac-flow api.
klywang Mar 1, 2023
ef2751e
modified __init__.py, util.py, track_operations.py files
melodyyzh Mar 9, 2023
2787f2a
Merge branch 'feature/hooks-TrackOperation' of https://github.com/glo…
melodyyzh Mar 10, 2023
9c64821
Flake8 changes.
klywang Mar 29, 2023
c151c00
Added more tests to make sure metadata for on start matches with expe…
klywang Mar 29, 2023
a2f7e7f
Added test for success and exception hooks.
klywang Mar 29, 2023
f802f64
Merge branch 'main' into feature/hooks-TrackOperation
klywang Apr 5, 2023
bb2b8ad
Added git tracking files from execution-hooks branch.
klywang Apr 5, 2023
b36ca27
Fixed error and started work on git tests.
klywang Apr 5, 2023
d649690
Fixed git util file.
klywang Apr 13, 2023
90dcaee
Ignore tests that require git if git is not installed.
klywang Apr 13, 2023
edd8fde
Changed operation name in pytest for strict git false.
klywang Apr 13, 2023
e7fd3f8
Started strict git tests.
klywang Apr 13, 2023
f866571
Merge branch 'main' into feature/hooks-TrackOperation
klywang Apr 19, 2023
1328b5c
Updated to project.path.
klywang Apr 19, 2023
16cdcbd
Implemented git dirty tests.
klywang Apr 19, 2023
36dc228
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 19, 2023
ec374de
Updated changelog.
klywang Apr 19, 2023
f9c29a0
Merge branch 'feature/hooks-TrackOperation' of https://github.com/glo…
klywang Apr 19, 2023
362f1d8
Update changelog.txt
melodyyzh Jun 2, 2023
c602605
Update flow/hooks/git_util.py
melodyyzh Jun 2, 2023
e65215d
added doc strings and removed unused variables in test_project.py
melodyyzh Jun 19, 2023
4a084c9
Merge branch 'main' into feature/hooks-TrackOperation
melodyyzh Jun 19, 2023
fc617e0
update tests to avoid issue with python 3.8
melodyyzh Jun 22, 2023
8f87ffb
made additional edits on the docstrings
melodyyzh Jun 25, 2023
fc8d024
Update flow/hooks/git_util.py
melodyyzh Jul 5, 2023
acde532
Update flow/hooks/track_operations.py
melodyyzh Jul 5, 2023
58cc823
Fixed import issue.
klywang Jul 5, 2023
057e0e8
Initialize repository for git test.
klywang Jul 5, 2023
4883020
Test if git repository is dirty.
klywang Jul 5, 2023
252553c
Removed old unused file.
klywang Jul 5, 2023
29e1dd7
Use project instead of _project and removed unneeded old string.
klywang Jul 5, 2023
ac439fc
Removed __call__
klywang Jul 5, 2023
7658eed
Updated docstring and removed more instances of _project.
klywang Jul 5, 2023
08d89e9
Merge branch 'main' into feature/hooks-TrackOperation
klywang Jul 7, 2023
f26410f
Updated changelog.
klywang Jul 7, 2023
836dac0
Merge branch 'main' into feature/hooks-TrackOperation
b-butler Oct 11, 2023
63a08c1
test: Add GitPython to test dependencies
b-butler Oct 13, 2023
b68866c
doc: apply suggestions/corrections to documentation
b-butler Oct 18, 2023
957b295
doc: Add TrackOperations
b-butler Oct 18, 2023
af11f3f
feat: Add support for storing metadata in job document.
b-butler Oct 18, 2023
b459955
doc: Refactor documentation style for collect_metadata
b-butler Oct 18, 2023
1707bf0
fix: Prior feature to write to document.
b-butler Oct 19, 2023
cb55e29
test: Test new TrackOperations changes/features
b-butler Oct 19, 2023
9f3adfd
refactor: Simplify metadata collection logic
b-butler Oct 19, 2023
6d262f0
doc: Update collect_metadata docstring's possible todo
b-butler Oct 19, 2023
a74b05b
ci: Fix oldest dependency tests without git.
b-butler Oct 20, 2023
19962a4
test: Split TrackOperation project into git_strict and not
b-butler Oct 20, 2023
98b3261
test: Actually fix test errors without git.
b-butler Oct 23, 2023
551e456
refactor: Store git info if possible when strict_git=False
b-butler Oct 26, 2023
19d318b
refactor: change handling of erroring on dirty git
b-butler Oct 26, 2023
1843d91
test: Refactor TestHooksTrackOperationsNotStrict with more options
b-butler Oct 26, 2023
2ef3f04
test: Fix condition test (make more strict).
b-butler Oct 27, 2023
a8efc65
test: Use == on project path and remove comment
b-butler Oct 27, 2023
8888e47
doc: Correct TrackOperations docstring.
b-butler Oct 27, 2023
1d9498b
test: Fix path test for CI with detailed comment on necessity
b-butler Oct 27, 2023
f97ee59
Merge branch 'main' into feature/hooks-TrackOperation
b-butler Nov 2, 2023
1923a4c
doc: Apply Bradley's fixes
b-butler Nov 7, 2023
bad7e30
refactor: metadata schema
b-butler Nov 7, 2023
d961499
test: Require GitPython
b-butler Nov 7, 2023
31d5f8c
test: Skip check for GitHub Actions MacOS runners.
b-butler Nov 7, 2023
0e458cc
doc: Correct incomplete sentence.
b-butler Nov 7, 2023
0535e25
refactor: Switch back to filebased logging
b-butler Nov 7, 2023
4c2207c
doc: Document the schema keys in the hook.
b-butler Nov 7, 2023
6c84699
doc: Fix documentation for current code.
b-butler Nov 7, 2023
60007ad
ci: Update oldest requirements.
b-butler Nov 7, 2023
4d5aa94
doc: Fix documentation rendering.
b-butler Nov 7, 2023
8761ce0
doc: Fix nested list formatting.
b-butler Nov 7, 2023
514eda6
doc: Fix method reference.
b-butler Nov 7, 2023
d83d0e5
ci: Try to get working version of GitPython
b-butler Nov 7, 2023
387ff7b
ci (WIP): Test if latest GitPython works.
b-butler Nov 7, 2023
d8876fe
Merge branch 'main' into feature/hooks-TrackOperation
b-butler Nov 8, 2023
fc08084
Merge branch 'main' into feature/hooks-TrackOperation
b-butler Nov 13, 2023
a7bbb4a
doc: Fix typos
b-butler Nov 14, 2023
6411bd5
refactor: Change default name of file
b-butler Dec 4, 2023
815f07e
Merge branch 'main' into feature/hooks-TrackOperation
b-butler Dec 4, 2023
c759957
misc: Formatting to make pre-commit pass.
b-butler Dec 4, 2023
d35908d
test: Refactor conditional to not use pass
b-butler Dec 4, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/ci-oldest-reqs.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,4 @@ pytest-cov==3.0.0
pytest==7.0.1
ruamel.yaml==0.17.21
tqdm==4.60.0
GitPython==3.1.37
81 changes: 57 additions & 24 deletions flow/hooks/track_operations.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,38 +8,59 @@

try:
from .git_util import collect_git_metadata
except ImportError:
GIT = False

Check warning on line 12 in flow/hooks/track_operations.py

View check run for this annotation

Codecov / codecov/patch

flow/hooks/track_operations.py#L11-L12

Added lines #L11 - L12 were not covered by tests
else:
GIT = True


_DEFAULT_FILENAME = "signac-execution-history.log"
b-butler marked this conversation as resolved.
Show resolved Hide resolved


class TrackOperations:
b-butler marked this conversation as resolved.
Show resolved Hide resolved
""":class:`~.TrackOperations` tracks information about the execution of operations to a logfile.

This hooks can provides information on the start, successful completion, and/or erroring of
one or more operations in a `flow.FlowProject` instance. The logs are stored either in the
job document or the file given by ``log_filename``. If stored in the job document, it uses
the key ``execution_history``. The file or document list will be appended to if it already
This hook can provides information on the start, successful completion, and/or error of
one or more operations in a :class:`~.FlowProject` instance. The logs are stored in the file
given by ``log_filename`` within the job's path. The file will be appended to if it already
b-butler marked this conversation as resolved.
Show resolved Hide resolved
exists.

The hooks stores metadata regarding the execution of the operation and the state of the
project at the time of execution, error, and/or completion. The data will also include
information about the git status if the project is detected as a git repo and
``GitPython`` is installed in the environment.

Each call to the hook adds a single JSON line to the log file. These can be
read using the `json` builtin package or :meth:`~.TrackOperations.read_log`.

The current schema has the following structure:

- ``time``: The time of querying the metadata.
- ``stage``: Whether the hook was executed either "prior" or "after" the associated
operation's execution.
- ``error``: The error message on executing the operation if any.
- ``project``
- ``path``: Filepath to the project
- ``schema_version``: The project's schema version
- ``operation``: The operation name
- ``job_id``: The job id
- ``git``
b-butler marked this conversation as resolved.
Show resolved Hide resolved
- ``commit_id``: The current commit of the project's git repo.
- ``dirty``: Whether the project's repo has uncommitted changes or not.
- ``_schema_version``: The metadata storage's schema version. Schema is currently in version 1.
b-butler marked this conversation as resolved.
Show resolved Hide resolved


Warning
-------
This class will error on construction if GitPython is not available and ``strict_git`` is set
to ``True``. If ``strict_git`` is ``True`` trying to execute an operation with uncommitted
changes.
This class will raise an exception when strict_git is set to ``True`` and either GitPython is
b-butler marked this conversation as resolved.
Show resolved Hide resolved
b-butler marked this conversation as resolved.
Show resolved Hide resolved
not available or the repository contains uncommitted changes (i.e. is "dirty").

Examples
--------
The following example will install :class:`~.TrackOperations` at the operation level.
Where the log will be stored in the job document.

.. code-block:: python
b-butler marked this conversation as resolved.
Show resolved Hide resolved

from flow import FlowProject
from flow.hooks import TrackOperations

Expand All @@ -61,6 +82,7 @@
instance of :class:`~.FlowProject`

.. code-block:: python
b-butler marked this conversation as resolved.
Show resolved Hide resolved

from flow import FlowProject
from flow.hooks import TrackOperations

Expand All @@ -77,19 +99,17 @@

Parameters
----------
log_filename: str, optional
The name of the log file in the job workspace. If ``None`` store in a list in the job
document in a key labeled ``f"{operation_name}"`` under the ``"execution_history"`` key.
Defaults to ``None``.
strict_git: bool, optional
log_filename : str, optional
The name of the log file in the job workspace. Defaults to "signac-execution-history.log".
b-butler marked this conversation as resolved.
Show resolved Hide resolved
strict_git : bool, optional
Whether to fail if ``GitPython`` cannot be imported or if there are uncommitted changes
to the project's git repo. Defaults to ``True``.
"""

def __init__(self, log_filename=None, strict_git=True):
def __init__(self, log_filename=_DEFAULT_FILENAME, strict_git=True):
self.log_filename = log_filename
if strict_git and not GIT:
raise RuntimeError(

Check warning on line 112 in flow/hooks/track_operations.py

View check run for this annotation

Codecov / codecov/patch

flow/hooks/track_operations.py#L112

Added line #L112 was not covered by tests
"Unable to collect git metadata from the repository, "
"because the GitPython package is not installed.\n\n"
"You can use '{}(strict_git=False)' to ignore this "
Expand All @@ -98,13 +118,6 @@
self.strict_git = strict_git

def _write_metadata(self, job, metadata):
print(f"Writing... {metadata}.", flush=True)
if self.log_filename is None:
history = job.doc.setdefault("execution_history", {})
# No need to store job id or operation name.
operation_name = metadata.pop("job-operation")["name"]
history.setdefault(operation_name, []).append(metadata)
return
with open(job.fn(self.log_filename), "a") as logfile:
logfile.write(json.dumps(metadata) + "\n")

Expand Down Expand Up @@ -146,10 +159,10 @@
Parameters
----------
op : function or type
An operation function to log or a subclass of `flow.FlowProject` if ``project_cls`` is
``None``.
An operation function to log or a subclass of :class:`~.FlowProject` if
``project_cls`` is ``None``.
project_cls : type
A subclass of `flow.FlowProject`.
A subclass of :class:`~.FlowProject`.
"""
if project_cls is None:
return lambda func: self.install_operation_hooks(func, op)
Expand All @@ -170,3 +183,23 @@
project.project_hooks.on_success.append(self.on_success)
project.project_hooks.on_exception.append(self.on_exception)
return project

@classmethod
def read_log(cls, job, log_filename=_DEFAULT_FILENAME):
"""Return the execution log data as a list of dictionaries.

Parameters
----------
job : signac.job.Job
The job to read the execution history of.
log_filename : str, optional
The name of the log file in the job workspace. Defaults to
"signac-execution-history.log".
b-butler marked this conversation as resolved.
Show resolved Hide resolved

Returns
-------
log : list[dict[str, any]]
Returns the job's current execution history for logged operations.
"""
with open(job.fn(log_filename)) as fh:
return [json.loads(line) for line in fh.readlines() if line != ""]
12 changes: 2 additions & 10 deletions flow/hooks/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,6 @@ def collect_metadata(operation, job):

Returns a directory including schema version, time, project, and job-operation.

We can no longer track the following because we take in the operation name as a string rather
than as an object, but they provide useful information. We could try to perform introspection
of the project through the job later if desired to get these values.

- "cmd": operation.cmd,
- "directives": operation.directives,
"""
return {
# the metadata schema version:
Expand All @@ -26,8 +20,6 @@ def collect_metadata(operation, job):
# the project schema version:
"schema_version": job.project.config.get("schema_version"),
},
"job-operation": {
"name": operation,
"job_id": job.id,
},
"operation": operation,
"job_id": job.id,
}
5 changes: 1 addition & 4 deletions tests/define_hooks_track_operations_project.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,22 +8,19 @@ class _HooksTrackOperations(FlowProject):
pass


LOG_FILENAME = "operations.log"
LOG_FILENAME = "signac-execution-history.log"
b-butler marked this conversation as resolved.
Show resolved Hide resolved


track_operations = TrackOperations(strict_git=False)
track_operations_with_file = TrackOperations(LOG_FILENAME, strict_git=False)


@track_operations_with_file.install_operation_hooks(_HooksTrackOperations)
@track_operations.install_operation_hooks(_HooksTrackOperations)
@_HooksTrackOperations.operation
def base(job):
if job.sp.raise_exception:
raise RuntimeError(HOOKS_ERROR_MESSAGE)


@track_operations_with_file.install_operation_hooks(_HooksTrackOperations)
@track_operations.install_operation_hooks(_HooksTrackOperations)
@_HooksTrackOperations.operation(cmd=True, with_job=True)
def cmd(job):
Expand Down
3 changes: 0 additions & 3 deletions tests/define_hooks_track_operations_strict_project.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,19 +12,16 @@ class _HooksTrackOperations(FlowProject):


track_operations = TrackOperations()
track_operations_with_file = TrackOperations(LOG_FILENAME)


@track_operations.install_operation_hooks(_HooksTrackOperations)
@track_operations_with_file.install_operation_hooks(_HooksTrackOperations)
@_HooksTrackOperations.operation
def base(job):
if job.sp.raise_exception:
raise RuntimeError(HOOKS_ERROR_MESSAGE)


@track_operations.install_operation_hooks(_HooksTrackOperations)
@track_operations_with_file.install_operation_hooks(_HooksTrackOperations)
@_HooksTrackOperations.operation(cmd=True, with_job=True)
def cmd(job):
if job.sp.raise_exception:
Expand Down
104 changes: 34 additions & 70 deletions tests/test_project.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,21 @@
# All rights reserved.
# This software is licensed under the BSD 3-Clause License.
import datetime
import json
import logging
import os
import platform
import subprocess
import sys
from contextlib import contextmanager, redirect_stderr, redirect_stdout
from functools import partial
from io import StringIO
from itertools import groupby
from pathlib import Path

import define_hooks_test_project
import define_hooks_track_operations_project
import define_hooks_track_operations_strict_project
import define_status_test_project
import git
import pytest
import signac
from conftest import MockScheduler, TestProjectBase
Expand All @@ -42,17 +43,6 @@
_switch_to_directory,
)

try:
import define_hooks_track_operations_strict_project
import git

skip_git = False
except RuntimeError:
skip_git = True


git_mark_skipif = pytest.mark.skipif(skip_git, reason="git could not be imported")


@contextmanager
def suspend_logging():
Expand Down Expand Up @@ -2600,10 +2590,7 @@ class TestHooksTrackOperationsNotStrict(TestHooksSetUp):
def operation_info(self, request):
return request.param

@pytest.fixture(
params=None if skip_git else (True, False),
ids=None if skip_git else ("git-dirty", "git-clean"),
)
@pytest.fixture(params=(True, False), ids=("git-dirty", "git-clean"))
def git_repo(self, project, request):
# params=None is equivalent to not passing a parameter which results in
# result.param being unset.
Expand All @@ -2620,25 +2607,11 @@ def git_repo(self, project, request):
repo.index.add(["dirty.txt"])
return repo

def split_log(self, job, fn):
with open(job.fn(fn)) as f:
values = f.read().split("\n")
if values[-1] == "":
values = values[:-1]
return values

def get_log(self, job, fn=None):
if fn is None:
execution_history = job.doc["execution_history"]()
# Add back in redundent information to make check_metadata universal
for key, item in execution_history.items():
for entry in item:
entry["job-operation"] = {"name": key, "job_id": job.id}
return execution_history
log_entries = [json.loads(entry) for entry in self.split_log(job, fn)]
def get_log(self, job):
log_entries = flow.hooks.TrackOperations.read_log(job)
execution_history = {}
for entry in log_entries:
operation_name = entry["job-operation"]["name"]
operation_name = entry["operation"]
execution_history.setdefault(operation_name, []).append(entry)
return execution_history

Expand All @@ -2653,40 +2626,35 @@ def check_metadata(
repo,
):
assert metadata["stage"] == expected_stage
# When running on MacOS CI a private/ is prepended to job.project.path
# seemingly indeterminetly. If metadata["project"]["path"] is checked
# then job.project.path will have "private" in it. If I check
# job.project.path then metadata["project"]["path"] will have "private"
# in it. If I check neither, one of them will have "private" in it.
test_path = os.path.join(
*filter(lambda x: x != "private", Path(metadata["project"]["path"]).parts)
)
current_path = os.path.join(
*filter(lambda x: x != "private", Path(job.project.path).parts)
)
assert current_path == test_path

test_path = metadata["project"]["path"]
current_path = job.project.path
# Still allow for local MacOS check but skip in CI MacOS runners. Rather
# than xfail or skip this ensures we test everything else.
if (
"private" in test_path
or "private" in current_path
and platform.system == "Darwin"
):
pass
b-butler marked this conversation as resolved.
Show resolved Hide resolved
else:
assert current_path == test_path
assert metadata["project"]["schema_version"] == job.project.config.get(
"schema_version"
)
start_time = datetime.datetime.fromisoformat(metadata["time"])
assert before_execution < start_time
job_op_metadata = metadata["job-operation"]
assert job_op_metadata["job_id"] == job.id
assert job_op_metadata["name"] == operation_name
assert metadata["job_id"] == job.id
assert metadata["operation"] == operation_name

if expected_stage != "prior" and job.sp.raise_exception:
assert error_message in metadata["error"]

else:
assert metadata["error"] is None

if repo is not None:
assert metadata["project"]["git"]["commit_id"] == str(repo.commit())
assert metadata["project"]["git"]["dirty"] == repo.is_dirty()
else:
if self.error_on_no_git:
# Should not happen in actual testing, here for developers.
assert False, "Expected gitpython for test."
assert metadata["project"]["git"]["commit_id"] == str(repo.commit())
assert metadata["project"]["git"]["dirty"] == repo.is_dirty()

def test_metadata(self, project, job, operation_info, git_repo):
operation_name, error_message = operation_info
Expand All @@ -2705,7 +2673,7 @@ def test_metadata(self, project, job, operation_info, git_repo):

assert job.isfile(self.log_fname)
for filename in (None, self.log_fname):
metadata = self.get_log(job, filename)
metadata = self.get_log(job)
# For the single operation
# Each metadata whether file or document is one since they both
# write to different places.
Expand All @@ -2732,21 +2700,17 @@ def test_metadata(self, project, job, operation_info, git_repo):
)


if not skip_git:

class TestHooksTrackOperationsStrict(TestHooksTrackOperationsNotStrict):
project_class = (
define_hooks_track_operations_strict_project._HooksTrackOperations
)
entrypoint = dict(
path=os.path.realpath(
os.path.join(
os.path.dirname(__file__),
"define_hooks_track_operations_strict_project.py",
)
class TestHooksTrackOperationsStrict(TestHooksTrackOperationsNotStrict):
project_class = define_hooks_track_operations_strict_project._HooksTrackOperations
entrypoint = dict(
path=os.path.realpath(
os.path.join(
os.path.dirname(__file__),
"define_hooks_track_operations_strict_project.py",
)
)
error_on_no_git = True
)
error_on_no_git = True


class TestIgnoreConditions:
Expand Down