-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release v0.3 #103
Release v0.3 #103
Conversation
Reorganize and improve documentation
Co-authored-by: Julia Damerow <[email protected]>
Co-authored-by: Julia Damerow <[email protected]>
Add example usage to the readme #75
Check notebooks in CI `check` workflow #76
Rename formatters submodule and classes to converters
EDTF demo/validation notebook
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Caution Review failedThe pull request is closed. WalkthroughThe pull request includes comprehensive updates across multiple files, primarily focusing on enhancing the contributor recognition system, updating testing workflows, and restructuring documentation. Key changes involve modifying the Changes
Possibly related issues
Possibly related PRs
Suggested reviewers
Poem
Warning Rate limit exceeded@rlskoeser has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 16 minutes and 12 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 26
🧹 Outside diff range and nitpick comments (52)
.pre-commit-config.yaml (1)
18-22
: Enhance MyPy's type checking configurationWhile MyPy is correctly set up, consider enabling stricter type checking for better code quality.
- id: mypy + args: [--strict, --ignore-missing-imports] additional_dependencies: [numpy]
Also, consider creating a
mypy.ini
or updatingpyproject.toml
with MyPy configurations for more granular control..github/workflows/check.yml (3)
26-28
: Consider enhancing Ruff configurationWhile Ruff is a good choice, consider adding explicit formatting checks and specifying rules.
Consider enhancing the configuration:
- name: Run ruff - run: ruff check + run: | + ruff check . + ruff format --check .
35-36
: Consider expanding type checking coverageThe current configuration only checks the src directory. Consider including tests and using an explicit config file.
Consider enhancing the configuration:
- name: Check types in python src directory; install needed types - run: mypy --install-types --non-interactive src + run: mypy --install-types --non-interactive src tests
38-40
: Enhance notebook testing configurationWhile adding notebook testing is great, consider making it more robust with timeouts and explicit directory specification.
Consider enhancing the configuration:
- name: Check jupyter notebooks with treon - run: treon + run: treon examples/ --timeout=300tests/test_converters/edtf/test_edtf_parser.py (2)
15-15
: Consider adding boundary test cases for negative yearsWhile the added test cases cover basic negative years and scientific notation, consider adding tests for:
- Minimum supported year value
- Transition cases around year 0
- Large negative years without scientific notation
Also applies to: 17-18
Line range hint
43-54
: Enhance test documentation and error coverageThe test file would benefit from:
- Adding docstrings explaining the test strategy and EDTF level coverage
- Expanding
error_cases
to include:
- Invalid date combinations (e.g., "1984-04-31")
- Malformed scientific notation
- Invalid qualifiers
- Incorrect interval formats
Would you like me to help generate additional error test cases and documentation?
.github/workflows/unit_tests.yml (1)
41-41
: Consider adding pytest configurationThe switch from tox to direct pytest usage is good for simplicity, but consider adding a pytest configuration file to ensure consistent test behavior:
- Test discovery patterns
- Test ordering
- Coverage settings
Would you like me to help create a
pytest.ini
or add the configuration topyproject.toml
?Also applies to: 46-46
tests/test_converters/edtf/test_edtf_transformer.py (1)
36-40
: Consider adding more edge cases for unspecified digitsThe current test cases provide good coverage for unspecified digits, but consider adding these additional patterns for completeness:
- Mixed 'X' and digit in day:
"1984-12-X5"
- All possible positions in month:
"1984-X2"
,"1984-2X"
# level 2 unspecified digits ("156X-12-25", Undate("156X", 12, 25)), ("XXXX-12-XX", Undate("XXXX", 12, "XX")), ("1XXX-XX", Undate("1XXX", "XX")), ("1984-1X", Undate(1984, "1X")), +("1984-12-X5", Undate(1984, 12, "X5")), +("1984-X2", Undate(1984, "X2")), +("1984-2X", Undate(1984, "2X")),.all-contributorsrc (1)
79-80
: Consider adding alt text for accessibilityWhile the badge template is well-configured, consider adding alt text to improve accessibility. The current implementation might not be screen-reader friendly.
- "badgeTemplate": "![All Contributors](https://img.shields.io/github/all-contributors/dh-tech/undate-python?color=ee8449&style=flat-square)" + "badgeTemplate": "![All Contributors - See the full list of project contributors](https://img.shields.io/github/all-contributors/dh-tech/undate-python?color=ee8449&style=flat-square)"CHANGELOG.md (1)
21-21
: Fix typo and consider adding contextThere's a typo: "payloud" should be "payload". Consider adding context about the benefits that justify the increased size.
-Performance differences seem to be negligible, but it does increase payloud size. The virtualenv for installing version 0.2 was 14MB; when installing the newer version with numpy, the virtualenv is 46MB (the numpy folder in site packages is 31MB on its own). +Performance differences seem to be negligible, but it does increase payload size. The virtualenv for installing version 0.2 was 14MB; when installing the newer version with numpy, the virtualenv is 46MB (the numpy folder in site packages is 31MB on its own). This increase is justified by the extended year range support and improved date handling capabilities provided by numpy.tests/test_converters/test_base.py (5)
1-7
: Remove extra empty lineThere's an unnecessary empty line at line 6. One empty line between imports and class definition is sufficient.
import pytest from undate.converters.base import BaseDateConverter - class TestBaseDateConverter:
8-18
: Enhance test assertions with descriptive messagesAdd descriptive messages to assertions to make test failures more informative.
- assert isinstance(available_converters, dict) + assert isinstance(available_converters, dict), "available_converters should return a dictionary" # NOTE: import _after_ generating available formatters # so we can confirm it gets loaded from undate.converters.iso8601 import ISO8601DateFormat - assert ISO8601DateFormat.name in available_converters - assert available_converters[ISO8601DateFormat.name] == ISO8601DateFormat + assert ISO8601DateFormat.name in available_converters, f"Expected {ISO8601DateFormat.name} in available converters" + assert available_converters[ISO8601DateFormat.name] == ISO8601DateFormat, "Converter class mismatch"
24-31
: Verify error messages in NotImplementedError testsThe tests should verify the specific error messages to ensure they provide helpful guidance to implementers.
def test_parse_not_implemented(self): - with pytest.raises(NotImplementedError): + with pytest.raises(NotImplementedError, match="Subclasses must implement parse method"): BaseDateConverter().parse("foo bar baz") def test_parse_to_string(self): - with pytest.raises(NotImplementedError): + with pytest.raises(NotImplementedError, match="Subclasses must implement to_string method"): BaseDateConverter().to_string(1991)
33-51
: Add docstring and strengthen assertionsThe test would benefit from a docstring explaining its purpose and more specific assertions.
def test_import_converters_import_only_once(caplog): + """ + Test that the import_converters method: + 1. Successfully imports converters on first call + 2. Uses cache on subsequent calls + 3. Logs appropriate messages + """ # clear the cache, since any instantiation of an Undate # object anywhere in the test suite will populate it BaseDateConverter.import_converters.cache_clear() # run first, and confirm it runs and loads formatters with caplog.at_level(logging.DEBUG): import_count = BaseDateConverter.import_converters() # should import at least one thing (iso8601) - assert import_count >= 1 + assert import_count >= 1, f"Expected at least one converter to be imported, got {import_count}" # should have log entry - assert "Loading converters" in caplog.text + assert "Loading converters" in caplog.text, "Expected 'Loading converters' message in logs"
1-64
: Consider adding extensibility testsThe test suite would benefit from examples demonstrating how to properly extend
BaseDateConverter
. Consider adding a test case that implements a valid custom converter to serve as documentation for other developers.Would you like me to provide an example test case that demonstrates implementing a custom converter?
examples/use-cases/shakespeare-and-company-project/README.md (3)
9-12
: Add hyphens to compound adjectives for clarity.The content is excellent, but for better readability, compound adjectives should be hyphenated.
-The dataset includes durations for subscriptions and borrow events in human readable format and number of days. +The dataset includes durations for subscriptions and borrow events in human-readable format and number of days.🧰 Tools
🪛 LanguageTool
[misspelling] ~11-~11: This word is normally spelled with a hyphen.
Context: ... for subscriptions and borrow events in human readable format and number of days. The [Shakes...(EN_COMPOUNDS_HUMAN_READABLE)
13-15
: Improve readability and conciseness.The technical explanation is excellent, but consider these improvements:
- Add hyphens to compound adjectives
- Make the language more concise
-subscriptions in both human readable and machine readable form +subscriptions in both human-readable and machine-readable form -Our hope is that in the future, `undate` will make it possible +In the future, `undate` will make it possible🧰 Tools
🪛 LanguageTool
[misspelling] ~13-~13: This word is normally spelled with a hyphen.
Context: ...rowing events and subscriptions in both human readable and machine readable form (text and cou...(EN_COMPOUNDS_HUMAN_READABLE)
[misspelling] ~13-~13: This word is normally spelled with a hyphen.
Context: ...ubscriptions in both human readable and machine readable form (text and count by days). Data a...(EN_COMPOUNDS_MACHINE_READABLE)
[style] ~15-~15: This phrase might be unnecessarily wordy.
Context: ...koeser/shxco-missingdata-specreading)). Our hope is that in the future,undate
will make ...(MY_HOPE_IS)
22-22
: Format DOI as a proper markdown link.Convert the bare URL to a proper markdown link for better documentation practices.
-https://doi.org/10.34770/nz90-ym25 +[DOI: 10.34770/nz90-ym25](https://doi.org/10.34770/nz90-ym25)🧰 Tools
🪛 Markdownlint
22-22: null
Bare URL used(MD034, no-bare-urls)
tests/test_date.py (2)
54-63
: Add edge cases to subtraction testsWhile the basic subtraction cases are covered, consider adding tests for:
- Subtracting dates with different precisions
- Error cases (e.g., subtracting a Date from a Timedelta)
- Boundary conditions (e.g., dates across year boundaries)
def test_subtract_edge_cases(self): # Different precisions date1 = Date(2024, 1, 1) date2 = Date(2024) with pytest.raises(ValueError): _ = date1 - date2 # Invalid operation with pytest.raises(TypeError): _ = ONE_YEAR - date1 # Year boundary assert (Date(2024, 1, 1) - Date(2023, 12, 31)).days == 1
1-2
: Add pytest import and consider adding type hints
- Add explicit pytest import since this is a test file
- Consider adding type hints to improve code clarity and enable static type checking
import numpy as np +import pytest from undate.date import ONE_YEAR, Date, DatePrecision, Timedelta
Example with type hints:
def test_init_year_month(self) -> None: d: Date = Date(2010, 5) assert isinstance(d, Date)pyproject.toml (2)
12-12
: Consider adding version constraints for dependenciesWhile the dependencies are correctly specified, it's recommended to add version constraints for
lark
andnumpy
to ensure reproducible builds and avoid potential compatibility issues with future releases.Example:
-dependencies = ["lark", "numpy"] +dependencies = ["lark>=1.0.0", "numpy>=1.20.0"]
85-85
: Fix typo in pytest marker descriptionThere's a typo in the "first" marker description: "efore" should be "before".
- "first : run marked tests efore all others", + "first : run marked tests before all others",docs/conf.py (1)
88-89
: Consider using alternative MyST extensions for linksThe comment indicates that the "linkify" extension was not found. Consider using other MyST extensions for handling links, such as "colon_fence" or "dollarmath" if needed.
-# myst_enable_extensions = ["linkify"] # disabling because not found +myst_enable_extensions = [ + "colon_fence", + "smartquotes", + "replacements" +]src/undate/converters/edtf/transformer.py (2)
67-74
: Add input validation to prevent potential errors.While the logic is correct, the method could be more robust:
- Add bounds checking for the year value
- Add type validation for the year token
- Handle potential IndexError when accessing items[1]
Consider this safer implementation:
def year(self, items): # when the year is negative, there are two tokens if len(items) > 1 and items[0] == "-": # an anonymous token for the - and the integer year + if len(items) < 2: + raise ValueError("Expected year value after negative sign") year = items[1] + if not isinstance(year, (int, str)) or not str(year).isdigit(): + raise ValueError(f"Invalid year value: {year}") + year = int(year) + if not (-9999 <= -year <= 9999): # Adjust bounds as needed + raise ValueError(f"Year {-year} is out of valid range") return Tree(data="year", children=[-year]) + if not items: + raise ValueError("Expected year value") + year = items[0] + if not isinstance(year, (int, str)) or not str(year).isdigit(): + raise ValueError(f"Invalid year value: {year}") + year = int(year) + if not (-9999 <= year <= 9999): # Adjust bounds as needed + raise ValueError(f"Year {year} is out of valid range") return Tree(data="year", children=[items[0]])
76-76
: Fix indentation inconsistency.The indentation on line 78 is misaligned with the method definition.
def year_fivedigitsplus(self, items): - # strip off the leading Y and convert to integer + # strip off the leading Y and convert to integerAlso applies to: 78-78
tests/test_converters/test_edtf.py (3)
8-18
: Consider parameterizing the test cases for better maintainability.The test cases in
test_parse_singledate
could be parameterized using@pytest.mark.parametrize
to make the tests more maintainable and easier to extend.Here's a suggested refactor:
- def test_parse_singledate(self): - assert EDTFDateConverter().parse("2002") == Undate(2002) - assert EDTFDateConverter().parse("1991-05") == Undate(1991, 5) - assert EDTFDateConverter().parse("1991-05-03") == Undate(1991, 5, 3) - # unknown dates are not strictly equal, but string comparison should match - assert str(EDTFDateConverter().parse("201X")) == str(Undate("201X")) - assert str(EDTFDateConverter().parse("2004-XX")) == str(Undate(2004, "XX")) - assert str(EDTFDateConverter().parse("XXXX-05-03")) == Undate( - month=5, day=3 - ).format("EDTF") + @pytest.mark.parametrize( + "input_date,expected", + [ + ("2002", Undate(2002)), + ("1991-05", Undate(1991, 5)), + ("1991-05-03", Undate(1991, 5, 3)), + ("201X", Undate("201X")), + ("2004-XX", Undate(2004, "XX")), + ("XXXX-05-03", Undate(month=5, day=3)), + ], + ) + def test_parse_singledate(self, input_date, expected): + if "X" in input_date: + assert str(EDTFDateConverter().parse(input_date)) == str(expected) + else: + assert EDTFDateConverter().parse(input_date) == expected
32-44
: Consider adding more edge cases for invalid inputs.The current test cases cover basic invalid scenarios. Consider adding these additional edge cases:
- Invalid separators (e.g., "1991/05-03")
- Invalid characters in year/month/day positions
- Malformed ranges
- Leading/trailing whitespace
Here's a suggested addition:
invalid_inputs = [ "1991-13", # invalid month "1991-12-32", # invalid day "199A", # invalid year format "", # empty string None, # None input + "1991/05-03", # invalid separator + "199Z-05-03", # invalid year character + "1991-0Z-03", # invalid month character + "1991-05-0Z", # invalid day character + "1991-05-03/", # malformed range + " 1991-05-03", # leading whitespace + "1991-05-03 ", # trailing whitespace ]
70-70
: Address the TODO comment about digit replacement.The TODO comment suggests testing digit replacement functionality, but the requirement is unclear.
Would you like me to help:
- Implement the test case for digit replacement?
- Open a GitHub issue to track this requirement?
src/undate/converters/base.py (3)
33-39
: Consider adding name uniqueness validationWhile the docstring states that subclasses must define a unique name, there's no mechanism to enforce this uniqueness. Consider adding validation in the
available_converters
method to detect duplicate names.@classmethod def available_converters(cls) -> Dict[str, Type["BaseDateConverter"]]: # ensure undate converters are imported cls.import_converters() - return {c.name: c for c in cls.__subclasses__()} # type: ignore + converters = {} + for converter in cls.__subclasses__(): + if converter.name in converters: + raise ValueError(f"Duplicate converter name '{converter.name}' found") + converters[converter.name] = converter + return converters
78-85
: Address unused variables and add error handling
- Rename unused variables as suggested by static analysis
- Add error handling for import failures to prevent silent failures
- for importer, modname, ispkg in pkgutil.iter_modules( + for _importer, modname, _ispkg in pkgutil.iter_modules( converter_path, converter_prefix ): # import everything except the current file if not modname.endswith(".base"): - importlib.import_module(modname) - import_count += 1 + try: + importlib.import_module(modname) + import_count += 1 + except ImportError as e: + logger.error(f"Failed to import converter module {modname}: {e}")🧰 Tools
🪛 Ruff
78-78: Loop control variable
importer
not used within loop bodyRename unused
importer
to_importer
(B007)
78-78: Loop control variable
ispkg
not used within loop bodyRename unused
ispkg
to_ispkg
(B007)
88-95
: Improve type safetyConsider improving type hints and removing the type ignore comment:
+from typing import TYPE_CHECKING +if TYPE_CHECKING: + from typing import Type + from .types import BaseDateConverterType @classmethod - def available_converters(cls) -> Dict[str, Type["BaseDateConverter"]]: + def available_converters(cls) -> Dict[str, BaseDateConverterType]: """ Dictionary of available converters keyed on name. """ # ensure undate converters are imported cls.import_converters() - return {c.name: c for c in cls.__subclasses__()} # type: ignore + return {c.name: c for c in cls.__subclasses__()}Add a new file
types.py
:from typing import Type, TypeVar BaseDateConverterType = TypeVar('BaseDateConverterType', bound='BaseDateConverter')CONTRIBUTING.md (3)
26-29
: Fix typo in git flow instructionsThere's a redundant phrase in the git flow instructions.
-If you have `git flow` installed, you can start, you can use: +If you have `git flow` installed, you can use:
14-19
: Enhance branching strategy documentationThe git flow explanation is good, but consider adding:
- A note about hotfix branches for critical bug fixes
- Guidelines for commit message format
Would you like me to propose additional content for these sections?
🧰 Tools
🪛 LanguageTool
[formatting] ~19-~19: Insert a comma after ‘cases’: “In most cases,”?
Context: ...that relates to the changes or feature In most cases branches should be created from the mos...(IN_MOST_CASES_COMMA)
43-46
: Improve grammar in feature request descriptionThe sentence structure could be improved for better clarity.
-If you find a bug or can think a feature you would really like to see being implemented +If you find a bug or can think of a feature you would really like to see implemented🧰 Tools
🪛 LanguageTool
[uncategorized] ~44-~44: Possible missing preposition found.
Context: ...equests If you find a bug or can think a feature you would really like to see be...(AI_EN_LECTOR_MISSING_PREPOSITION)
src/undate/converters/iso8601.py (6)
7-13
: Consider formalizing the validation strategyThe comment raises important questions about validation approach. Consider implementing a strict validation mode (using regex) that can be toggled, allowing both permissive and strict parsing based on use case.
Would you like me to propose a validation implementation that supports both modes?
29-30
: Handle edge case with ISO datetime stringsThe TODO comments indicate missing handling for ISO dates with time components.
Would you like me to propose an implementation for handling ISO datetime strings? We could either:
- Strip the time component
- Raise a specific exception
- Add support for time precision
Line range hint
47-54
: Fix type checking issue in _parse_single_dateThe type ignore comment masks an underlying type compatibility issue that should be addressed.
- return Undate(*date_parts) # type: ignore + # Ensure date_parts has exactly 3 elements (year, month, day) + while len(date_parts) < 3: + date_parts.append(None) + return Undate( + year=date_parts[0], + month=date_parts[1], + day=date_parts[2] + )🧰 Tools
🪛 Ruff
79-79: Loop control variable
iso_format
not used within loop bodyRename unused
iso_format
to_iso_format
(B007)
64-66
: Document ISO8601 range representation limitationsThe comment raises valid concerns about ISO8601 compliance for ranges. Consider:
- Documenting these limitations in the docstring
- Adding a strict mode that raises exceptions for non-compliant formats
77-78
: Consider refactoring to use propertiesThe TODO suggests using year/month/day properties which would make the code cleaner and more maintainable.
Would you like me to propose a refactored implementation using properties?
Line range hint
95-96
: Fix type checking in join operationThe type ignore comment masks an underlying type issue that should be addressed.
- return "-".join(date_parts) # type: ignore + # Filter out None values and convert to strings + valid_parts = [str(part) for part in date_parts if part is not None] + return "-".join(valid_parts)README.md (2)
7-10
: Consider using platform-agnostic warning formatThe current warning uses GitHub-specific alert syntax (
> [!WARNING]
). Consider using a more universal format to ensure consistent rendering across different platforms:-> [!WARNING] -> This is pre-alpha software and is **NOT** feature complete! Use with caution. -> Currently it only supports parsing and formatting dates in ISO8601 format and -> some portions of EDTF (Extended Date Time Format). +⚠️ **WARNING** + +This is pre-alpha software and is **NOT** feature complete! Use with caution. +Currently it only supports parsing and formatting dates in ISO8601 format and +some portions of EDTF (Extended Date Time Format).
34-45
: Add type hints to code examplesConsider adding type hints to make the examples more educational and align with modern Python practices:
-from undate.undate import Undate +from undate.undate import Undate -november7 = Undate(2000, 11, 7) +november7: Undate = Undate(2000, 11, 7)examples/notebooks/edtf-support.ipynb (2)
63-81
: Consider adding error handling test cases.While the happy path tests are comprehensive, consider adding test cases for:
- Invalid date strings
- Out of range values
- Malformed EDTF strings
Example test cases to add:
# Invalid date with pytest.raises(ValueError): Undate.parse("1985-13-01", "EDTF") # Invalid month # Malformed EDTF with pytest.raises(ValueError): Undate.parse("1985/04/12", "EDTF") # Wrong separator
727-729
: Track TODO comment for implementation.The TODO comment indicates missing functionality for handling minimum and maximum year values.
Would you like me to:
- Generate the implementation for
MIN_YEAR
andMAX_YEAR
constants?- Create a GitHub issue to track this enhancement?
examples/notebooks/shxco_partial_date_durations.ipynb (1)
299-300
: Consider adding error handling for file loading.The dataset loading could benefit from error handling to gracefully handle cases where the file doesn't exist or is corrupted.
-events_df = pd.read_csv("../use-cases/shakespeare-and-company-project/SCoData_events_v1.2_2022-01.csv", low_memory=False) +try: + events_df = pd.read_csv("../use-cases/shakespeare-and-company-project/SCoData_events_v1.2_2022-01.csv", low_memory=False) +except FileNotFoundError: + raise FileNotFoundError("Shakespeare and Company dataset file not found. Please ensure it exists in the use-cases directory.") +except pd.errors.EmptyDataError: + raise ValueError("The dataset file is empty.") +except Exception as e: + raise Exception(f"Error loading dataset: {str(e)}")src/undate/converters/edtf/converter.py (1)
84-84
: Address the TODO comments for handling uncertain or approximate datesThere are TODO comments indicating that handling for uncertain or approximate dates needs to be implemented in the sections dealing with years, months, and days. Implementing this will ensure full support for EDTF specifications regarding uncertain or approximate dates.
Would you like assistance in implementing the handling for uncertain or approximate dates? I can help generate the necessary code changes.
Also applies to: 91-91, 98-98
src/undate/date.py (2)
35-35
: Typo in comment: Remove extra spaceThere's an extra space between "single" and "year" in the comment on line 35.
Apply this diff to fix the typo:
-#: timedelta for a single year (non-leap year) +#: timedelta for a single year (non-leap year)
107-118
: Generalize handling of timedelta units in__sub__
methodCurrently, the
__sub__
method only handles results with dtype"timedelta64[D]"
. If future code introduces dates with different time units (e.g., weeks or months), this method may not handle them correctly. Consider generalizing the condition to handle other possible timedelta units.Apply this diff to generalize handling of timedelta units:
def __sub__(self, other): # modify to conditionally return a timedelta object instead of a # Date object with dtype timedelta64[D] (default behavior) result = super().__sub__(other) # if the result has a timedelta type (i.e., date minus date = timedelta), # cast to local Timedelta object; otherwise, leave as is # (i.e., date minus timedelta = date) - if result.dtype == "timedelta64[D]": + if np.issubdtype(result.dtype, np.timedelta64): result = Timedelta(result) return resulttests/test_undate.py (1)
366-367
: Remove redundant commented-out codeThe commented-out assertion is redundant since the same assertion is active in the next line. Removing it can improve code readability.
src/undate/undate.py (3)
36-37
: Consider limiting the allowable year range to practical values.Using extremely large integers for
MIN_ALLOWABLE_YEAR
andMAX_ALLOWABLE_YEAR
may lead to computational issues or unexpected behavior. Consider setting logical limits within the range of acceptable dates.
48-48
: Address TODO: Add validation for string inputs.The TODO comment suggests adding validation to ensure that if
year
,month
, orday
are strings, they must be of expected length. Implementing this validation will improve data integrity.Would you like assistance in implementing this validation?
133-135
: Address TODO: Handle February 29 with unknown year.The TODO comment indicates that if the date is February 29 with an unknown year, the min/max years should be adjusted to known leap years. Implementing this special case will ensure accurate date ranges.
Would you like assistance in implementing this logic?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (6)
_static/undate_logo.png
is excluded by!**/*.png
docs/_static/dhtech-logo.png
is excluded by!**/*.png
docs/_static/dhtech-logo.svg
is excluded by!**/*.svg
docs/_static/logo.png
is excluded by!**/*.png
docs/_static/undate_logo.png
is excluded by!**/*.png
examples/use-cases/shakespeare-and-company-project/SCoData_events_v1.2_2022-01.csv
is excluded by!**/*.csv
📒 Files selected for processing (57)
.all-contributorsrc
(5 hunks).github/workflows/check.yml
(1 hunks).github/workflows/unit_tests.yml
(2 hunks).gitignore
(1 hunks).pre-commit-config.yaml
(2 hunks).pythonversion
(1 hunks)CHANGELOG.md
(1 hunks)CONTRIBUTING.md
(1 hunks)CONTRIBUTORS.md
(1 hunks)DEVELOPER_NOTES.md
(1 hunks)README.md
(2 hunks)docs/CONTRIBUTING.md
(1 hunks)docs/CONTRIBUTORS.md
(1 hunks)docs/DEVELOPER_NOTES.md
(1 hunks)docs/LICENSE.md
(1 hunks)docs/_static/custom.css
(1 hunks)docs/_templates/sidebar_dhtech.html
(1 hunks)docs/authors.rst
(0 hunks)docs/conf.py
(3 hunks)docs/index.rst
(1 hunks)docs/license.rst
(0 hunks)docs/readme.md
(1 hunks)docs/readme.rst
(0 hunks)docs/requirements.txt
(0 hunks)docs/undate.rst
(0 hunks)docs/undate/converters.rst
(1 hunks)docs/undate/core.rst
(1 hunks)docs/undate/index.rst
(1 hunks)examples/README.md
(1 hunks)examples/notebooks/edtf-support.ipynb
(1 hunks)examples/notebooks/shxco_partial_date_durations.ipynb
(112 hunks)examples/use-cases/shakespeare-and-company-project/README.md
(1 hunks)pyproject.toml
(1 hunks)pytest.ini
(0 hunks)setup.cfg
(0 hunks)setup.py
(0 hunks)src/undate/__init__.py
(1 hunks)src/undate/converters/__init__.py
(1 hunks)src/undate/converters/base.py
(1 hunks)src/undate/converters/edtf/__init__.py
(1 hunks)src/undate/converters/edtf/converter.py
(1 hunks)src/undate/converters/edtf/edtf.lark
(2 hunks)src/undate/converters/edtf/parser.py
(0 hunks)src/undate/converters/edtf/transformer.py
(2 hunks)src/undate/converters/iso8601.py
(3 hunks)src/undate/date.py
(1 hunks)src/undate/dateformat/__init__.py
(0 hunks)src/undate/dateformat/base.py
(0 hunks)src/undate/undate.py
(14 hunks)tests/test_converters/edtf/test_edtf_parser.py
(3 hunks)tests/test_converters/edtf/test_edtf_transformer.py
(3 hunks)tests/test_converters/test_base.py
(1 hunks)tests/test_converters/test_edtf.py
(1 hunks)tests/test_converters/test_iso8601.py
(1 hunks)tests/test_date.py
(1 hunks)tests/test_dateformat/test_base.py
(0 hunks)tests/test_undate.py
(9 hunks)
💤 Files with no reviewable changes (12)
- docs/authors.rst
- docs/license.rst
- docs/readme.rst
- docs/requirements.txt
- docs/undate.rst
- pytest.ini
- setup.cfg
- setup.py
- src/undate/converters/edtf/parser.py
- src/undate/dateformat/init.py
- src/undate/dateformat/base.py
- tests/test_dateformat/test_base.py
✅ Files skipped from review due to trivial changes (17)
- .gitignore
- .pythonversion
- docs/CONTRIBUTING.md
- docs/CONTRIBUTORS.md
- docs/DEVELOPER_NOTES.md
- docs/LICENSE.md
- docs/_static/custom.css
- docs/_templates/sidebar_dhtech.html
- docs/index.rst
- docs/readme.md
- docs/undate/converters.rst
- docs/undate/core.rst
- docs/undate/index.rst
- examples/README.md
- src/undate/converters/init.py
- src/undate/converters/edtf/init.py
- tests/test_converters/test_iso8601.py
🧰 Additional context used
🪛 actionlint
.github/workflows/check.yml
24-24: property "python-cache" is not defined in object type {}
(expression)
.github/workflows/unit_tests.yml
37-37: property "python-cache" is not defined in object type {}
(expression)
🪛 LanguageTool
CONTRIBUTING.md
[style] ~5-~5: Consider a different adverb to strengthen your wording.
Context: ... to this project. This is great! We are always looking for new contributors. If you ca...
(ALWAYS_CONSTANTLY)
[formatting] ~19-~19: Insert a comma after ‘cases’: “In most cases,”?
Context: ...that relates to the changes or feature In most cases branches should be created from the mos...
(IN_MOST_CASES_COMMA)
[uncategorized] ~37-~37: A comma might be missing here.
Context: ...b issue in your pull request comments. Ideally contributions should include documentat...
(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)
[typographical] ~38-~38: It seems that a comma is missing.
Context: ...oposed changes, but if that is a barrier please let us know when you submit a pull requ...
(IF_PLEASE_COMMA)
[uncategorized] ~44-~44: Possible missing preposition found.
Context: ...equests If you find a bug or can think a feature you would really like to see be...
(AI_EN_LECTOR_MISSING_PREPOSITION)
DEVELOPER_NOTES.md
[grammar] ~7-~7: The word “checkout” is a noun. The verb is spelled with a white space.
Context: ... you started. ## Setup ### Use git to checkout a copy of the repository ```sh git clon...
(NOUN_VERB_CONFUSION)
[style] ~50-~50: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... sh pip install -e ".[dev]"
### Install pre-commit hooks We use [pre-commit](h...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
examples/use-cases/shakespeare-and-company-project/README.md
[misspelling] ~11-~11: This word is normally spelled with a hyphen.
Context: ... for subscriptions and borrow events in human readable format and number of days. The [Shakes...
(EN_COMPOUNDS_HUMAN_READABLE)
[misspelling] ~13-~13: This word is normally spelled with a hyphen.
Context: ...rowing events and subscriptions in both human readable and machine readable form (text and cou...
(EN_COMPOUNDS_HUMAN_READABLE)
[misspelling] ~13-~13: This word is normally spelled with a hyphen.
Context: ...ubscriptions in both human readable and machine readable form (text and count by days). Data a...
(EN_COMPOUNDS_MACHINE_READABLE)
[style] ~15-~15: This phrase might be unnecessarily wordy.
Context: ...koeser/shxco-missingdata-specreading)). Our hope is that in the future, undate
will make ...
(MY_HOPE_IS)
🪛 Markdownlint
CONTRIBUTORS.md
51-51: Expected: 0; Actual: 2
Unordered list indentation
(MD007, ul-indent)
50-50: null
Link fragments should be valid
(MD051, link-fragments)
DEVELOPER_NOTES.md
37-37: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
examples/use-cases/shakespeare-and-company-project/README.md
22-22: null
Bare URL used
(MD034, no-bare-urls)
🪛 Ruff
src/undate/converters/base.py
78-78: Loop control variable importer
not used within loop body
Rename unused importer
to _importer
(B007)
78-78: Loop control variable ispkg
not used within loop body
Rename unused ispkg
to _ispkg
(B007)
src/undate/converters/edtf/converter.py
43-45: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
src/undate/converters/iso8601.py
79-79: Loop control variable iso_format
not used within loop body
Rename unused iso_format
to _iso_format
(B007)
🔇 Additional comments (53)
src/undate/__init__.py (2)
6-6
: LGTM! Well-structured public API declaration
The __all__
declaration correctly matches the imported symbols and explicitly defines the public API, following Python best practices.
3-4
: Verify module structure and imports
The imports look correct based on the module structure described in the summary.
✅ Verification successful
Let me verify the class definitions with a different approach since ast-grep might have limitations with Python files.
Import statements are correctly structured and verified
All imported classes exist in their respective modules:
DatePrecision
class is defined insrc/undate/date.py
Undate
andUndateInterval
classes are defined insrc/undate/undate.py
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify that the imported modules exist and contain the referenced classes
# Check if the modules exist
echo "Checking for module files..."
fd -t f "^(date|undate)\.py$" src/undate/
# Verify class definitions in the modules
echo "Checking for class definitions..."
ast-grep --pattern 'class DatePrecision' src/undate/date.py
ast-grep --pattern 'class Undate' src/undate/undate.py
ast-grep --pattern 'class UndateInterval' src/undate/undate.py
Length of output: 411
Script:
#!/bin/bash
# Check class definitions using ripgrep for better Python support
echo "Checking DatePrecision in date.py:"
rg "^class DatePrecision" -A 1 src/undate/date.py
echo -e "\nChecking Undate in undate.py:"
rg "^class Undate[^I]" -A 1 src/undate/undate.py
echo -e "\nChecking UndateInterval in undate.py:"
rg "^class UndateInterval" -A 1 src/undate/undate.py
Length of output: 671
.pre-commit-config.yaml (1)
Line range hint 1-22
: Verify tool version consistency across configuration files
Let's ensure the tool versions and configurations are consistent across the project.
✅ Verification successful
Tool configurations are consistent and well-structured
The pre-commit configuration is properly aligned with the project's development dependencies and configurations:
- Ruff and MyPy are correctly listed as dev dependencies in pyproject.toml
- MyPy configuration in pre-commit matches the plugin configuration in pyproject.toml
- The pre-commit hooks provide comprehensive linting and type checking coverage
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for version consistency across configuration files
# Check for Ruff and MyPy configurations in pyproject.toml
echo "Checking pyproject.toml for tool configurations..."
if [ -f "pyproject.toml" ]; then
echo "=== Ruff Configuration ==="
rg -A 5 '^\[tool\.ruff\]' pyproject.toml
echo "=== MyPy Configuration ==="
rg -A 5 '^\[tool\.mypy\]' pyproject.toml
fi
# Check for version specifications in requirements files
echo "Checking requirements files for tool versions..."
fd -e txt -e in requirements | while read -r file; do
echo "=== $file ==="
rg -i "ruff|mypy" "$file"
done
Length of output: 554
Script:
#!/bin/bash
# Let's check for any additional configuration files and requirements
# that might contain version specifications
# Check for any requirements files in the root or common locations
echo "=== Requirements files ==="
fd -e txt -e in '(requirements|deps|dev)' -d 2
# Look for pyproject.toml content
echo -e "\n=== pyproject.toml content ==="
if [ -f "pyproject.toml" ]; then
cat pyproject.toml
fi
# Look for setup.py/setup.cfg for any version specifications
echo -e "\n=== Setup files ==="
if [ -f "setup.py" ]; then
echo "setup.py content:"
cat setup.py
fi
if [ -f "setup.cfg" ]; then
echo "setup.cfg content:"
cat setup.cfg
fi
Length of output: 2794
.github/workflows/check.yml (1)
16-21
: LGTM: Python setup is well configured
The setup uses the latest stable Python version and action version with proper caching configuration.
tests/test_converters/edtf/test_edtf_parser.py (1)
2-2
: LGTM: Import path updated correctly
The import path change from dateformat
to converters
aligns with the module restructuring.
.github/workflows/unit_tests.yml (2)
14-14
: LGTM: Python 3.12 is a good choice for coverage reporting
Using Python 3.12 for coverage reporting is appropriate as it's the latest stable version.
33-33
: LGTM: Updated cache dependency path
The change from setup.cfg
to pyproject.toml
aligns with modern Python packaging standards.
tests/test_converters/edtf/test_edtf_transformer.py (3)
2-3
: LGTM: Import statements are correctly structured
The imports are appropriately organized and provide the necessary components for testing.
15-19
: LGTM: Good coverage for negative year scenarios
The test cases appropriately cover both extended format (-Y17000002
) and simple format (-1985
) for negative years.
20-20
: Verify the commented-out season test case
The commented test case "2001-21"
for spring 2001 suggests that season support might be planned.
Let's check if there are any related issues or implementations:
✅ Verification successful
Season support is planned but not yet implemented
The commented test case is related to EDTF Level 1 season support (where 21 represents Spring), which is currently tracked in issue #73 but not yet implemented. This is an intentionally commented-out test case awaiting implementation.
- Issue EDTF support - level 0, 1 and parts of level 2 #73 explicitly lists seasons support as an unfinished task: "values 21, 22, 23, 24 may be used used to signify 'Spring', 'Summer', 'Autumn', 'Winter'"
- The test case follows EDTF specification format where "2001-21" represents Spring 2001
- Other Level 1 features like negative years and unspecified digits are already implemented, but seasons remain a TODO item
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for season-related code or issues
echo "Searching for season-related code..."
rg -l "season|spring|summer|fall|winter|21|22|23|24" --type py
echo "Searching for related issues..."
gh issue list --search "season date"
Length of output: 565
Script:
#!/bin/bash
# Let's check the actual implementation and tests for season handling
echo "Checking EDTF parser implementation for seasons..."
rg -A 5 "season|21|22|23|24" src/undate/converters/edtf/parser.py
echo "\nChecking EDTF transformer tests..."
cat tests/test_converters/edtf/test_edtf_transformer.py
echo "\nChecking related issue details..."
gh issue view 73
Length of output: 5234
src/undate/converters/edtf/edtf.lark (3)
45-45
: LGTM! Correct implementation of extended year format.
The pattern correctly implements the EDTF specification for years exceeding 4 digits, including negative years.
Let's verify the handling of edge cases:
#!/bin/bash
# Description: Check test coverage for extended year formats
# Look for test cases covering edge cases in extended year formats
rg -A 3 "test.*Y\d{5}" "tests/test_converters/edtf/"
37-42
: Clarify EDTF level separation in grammar rules.
The comment mentions Level 2 support, but the grammar structure doesn't clearly separate different EDTF levels. Consider restructuring to make the level separation more explicit.
Let's verify the EDTF specification compliance for unspecified components:
#!/bin/bash
# Description: Verify EDTF level separation and unspecified component handling
# Look for level-specific test organization
rg -A 5 "test.*unspecified" "tests/test_converters/edtf/"
17-17
: Consider optimizing the year rule pattern.
The current rule allows whitespace between the negative sign and the year number. Consider using a more precise pattern.
-year: INT | /\-/ INT
+year: /\-?\d+/
Let's verify the EDTF specification compliance:
.all-contributorsrc (2)
17-18
: LGTM: Contributor role updates are well-structured
The additions of "ideas" and other roles for multiple contributors are well-organized and follow the all-contributors specification. These updates appropriately reflect the evolving contributions to the project.
Also applies to: 42-44, 56-57
3-3
: Verify CONTRIBUTORS.md setup
The change to use CONTRIBUTORS.md as the target file is a good practice for maintainability. However, let's verify its implementation.
✅ Verification successful
Let me check the content of both files to ensure proper setup and references.
✓ CONTRIBUTORS.md setup is correctly implemented
The change to use CONTRIBUTORS.md as the target file is properly implemented:
- CONTRIBUTORS.md exists and is properly set up with the expected content
- README.md correctly references CONTRIBUTORS.md in the contributors badge
- Both files maintain consistent badge sections
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify CONTRIBUTORS.md exists and is properly referenced
# Check if CONTRIBUTORS.md exists
if [ -f "CONTRIBUTORS.md" ]; then
echo "✓ CONTRIBUTORS.md exists"
else
echo "✗ CONTRIBUTORS.md is missing"
fi
# Check for any remaining references to the old location in README.md
rg -l "contributors" README.md
Length of output: 128
Script:
#!/bin/bash
# Check content of both files for contributor references
echo "=== README.md contributor references ==="
rg "contributor" README.md -A 2 -B 2
echo -e "\n=== CONTRIBUTORS.md content preview ==="
head -n 10 CONTRIBUTORS.md
Length of output: 1050
CHANGELOG.md (4)
9-11
: Well documented EDTF improvements
Good practice to include a validation notebook for demonstrating the new EDTF support. The attribution to numpy.datetime64 for enabling the feature is clear.
12-16
: Verify documentation structure and completeness
The documentation improvements span multiple areas, which is great. Let's verify the structure and cross-linking.
#!/bin/bash
# Description: Check documentation structure and cross-references
# Check if converter documentation exists
fd -e md -e rst "converter"
# Check if branching guidelines are properly linked
rg -i "branch.*guideline" "CONTRIBUTING.md"
# Verify Sphinx documentation structure
fd . "docs/" -e rst -e md
17-17
: Verify tooling change documentation
The switch from black to ruff is a significant change in the development workflow.
#!/bin/bash
# Description: Verify tooling change documentation
# Check for ruff configuration
fd "ruff" -e toml -e ini
# Check if the change is documented in developer notes
rg -i "ruff.*black" "DEVELOPER_NOTES.md"
# Verify pre-commit configuration
rg "ruff" ".pre-commit-config.yaml"
5-8
: Verify breaking changes documentation
The renaming of "formatters" to "converters" represents a breaking change that could impact existing code. While the change is documented here, we should ensure it's prominently highlighted in the upgrade guide.
✅ Verification successful
Let's search for documentation and code changes to verify the breaking changes and their documentation.
Let's examine the documentation and code references to understand how this change is documented.
Breaking changes are adequately documented
The renaming of "formatters" to "converters" is well documented and implemented consistently:
- The change is clearly documented in CHANGELOG.md
- README.md has been updated to use the new "converter" terminology
- The change is reflected in the codebase structure with a dedicated
converters
module and documentation atdocs/undate/converters.rst
- The API provides backward compatibility through clear method names (
format
andparse
) - Implementation shows a clean transition with no remnants of old formatter terminology in the core code
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if breaking changes are documented in upgrade guide
# Look for upgrade guide or migration documentation
fd -e md -e rst "upgrade|migration|breaking.changes"
# Check if the formatter to converter change is documented in README
rg -i "formatter.*converter" "README.md"
Length of output: 300
Script:
#!/bin/bash
# Search for documentation files in the root and docs directory
fd -e md -e rst .
# Search for formatter/converter related changes in Python files
rg -i "formatter.*converter|converter.*formatter" -g "*.py"
# Search for any migration or upgrade related content
rg -i "migration|upgrade|breaking.change" -g "*.md" -g "*.rst"
# Look for the old formatter implementation
ast-grep --pattern 'class $_(Formatter|formatter)'
# Look for the new converter implementation
ast-grep --pattern 'class $_(Converter|converter)'
Length of output: 807
Script:
#!/bin/bash
# Check the content of the converter documentation
cat docs/undate/converters.rst
# Look for any mentions of formatters in the codebase
rg -i "formatter" -g "*.py" -g "*.rst" -g "*.md" -A 2
# Look for any mentions of converters in the codebase
rg -i "converter" -g "*.py" -g "*.rst" -g "*.md" -A 2
# Check the content of core documentation
cat docs/undate/core.rst
Length of output: 21963
examples/use-cases/shakespeare-and-company-project/README.md (2)
1-4
: LGTM!
The header section is well-structured and provides clear attribution.
5-8
: LGTM!
Clear explanation of the dataset scope and standardized date formats.
tests/test_date.py (1)
15-38
: LGTM! Well-structured initialization tests
The initialization tests provide good coverage across different precision levels and data types.
pyproject.toml (1)
2-3
: LGTM: Build system configuration is appropriate
The transition to hatchling as the build backend is a good choice, as it's a modern, maintainable build system with good support for Python packaging standards.
docs/conf.py (4)
35-39
: LGTM! Good choice using myst_parser
The switch from m2r2 to myst_parser is a good modernization. MyST Parser is the recommended choice for Markdown support in Sphinx documentation.
49-52
: LGTM! Clear and explicit file type mapping
The new dictionary format for source_suffix provides a clearer mapping between file extensions and their processors. This configuration properly supports both reStructuredText and Markdown files.
78-86
: Verify custom sidebar template existence
The configuration includes a custom sidebar template "sidebar_dhtech.html". Please ensure this template file exists in the appropriate location.
#!/bin/bash
# Check if the custom sidebar template exists
fd "sidebar_dhtech.html" docs/_templates/
20-21
:
Update release version to match v0.3
The release version is still set to "0.0.1.dev" while this PR is for Release v0.3. Please update the version number to maintain consistency.
-release = "0.0.1.dev"
+release = "0.3"
Likely invalid or redundant comment.
src/undate/converters/edtf/transformer.py (2)
1-1
: LGTM! Import statement is well organized.
The imports are properly grouped and ordered.
67-89
: Verify test coverage for negative year handling.
The changes to year handling are significant. Let's verify the test coverage:
tests/test_converters/test_edtf.py (3)
1-7
: LGTM! Clean imports and class setup.
The imports are appropriate and the test class follows Python naming conventions.
50-69
: LGTM! Comprehensive string conversion tests.
The test cases cover various scenarios including:
- Year padding
- Unknown components
- Negative years
- Error handling for empty dates
29-30
: Address the commented question about EDTF support.
The comment raises an important question about EDTF support for dates with missing years. This should be investigated and either:
- Uncomment and fix the test if EDTF supports this format
- Remove the comment if confirmed that EDTF doesn't support this format
- Create a GitHub issue to track this investigation
Let's verify EDTF specification support:
src/undate/converters/base.py (1)
1-31
: LGTM! Well-documented module with clear extension instructions.
The module documentation provides comprehensive guidance for implementing new converters, and the imports are well-organized.
CONTRIBUTING.md (3)
55-58
: LGTM! Clear help and community information
The help section provides clear guidance and good community references.
8-8
: Verify the Developer Notes reference
The documentation now refers to "Developer Notes" for setup instructions. Let's verify this file exists and contains the necessary information.
✅ Verification successful
The Developer Notes reference is valid and accurate
The DEVELOPER_NOTES.md file exists and contains comprehensive setup instructions, including:
- Development environment setup with git-flow
- Python installation and configuration
- Package installation steps
- Pre-commit hooks setup
- Testing instructions
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if DEVELOPER_NOTES.md exists and contains setup instructions
if [ -f "DEVELOPER_NOTES.md" ]; then
echo "DEVELOPER_NOTES.md exists"
rg -i "setup|install|test" "DEVELOPER_NOTES.md"
else
echo "DEVELOPER_NOTES.md not found"
fi
Length of output: 1610
52-52
: Verify examples directory structure
Let's confirm the examples directory exists and is properly structured.
✅ Verification successful
Examples directory structure is valid and well-organized
The examples directory exists at the correct path and is properly structured with:
- A clear README.md explaining the purpose
- Jupyter notebooks with example code
- Use cases from real projects
The path reference in the CONTRIBUTING.md is accurate and points to an active directory.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if examples directory exists and its structure
if [ -d "examples" ]; then
echo "Examples directory exists"
ls -la examples/
else
echo "Examples directory not found"
fi
Length of output: 384
Script:
#!/bin/bash
# Let's check the content of examples/README.md to understand the directory structure
cat examples/README.md
# Also check if there are any example files in the notebooks and use-cases directories
echo -e "\nContents of notebooks directory:"
ls -la examples/notebooks/
echo -e "\nContents of use-cases directory:"
ls -la examples/use-cases/
Length of output: 1097
src/undate/converters/iso8601.py (1)
23-28
: LGTM! Documentation improvements
The docstring clearly describes the supported formats and return types.
README.md (4)
21-23
: LGTM!
The badges are well-organized and provide clear project status information.
28-32
: LGTM!
The introduction effectively explains the library's purpose and target use cases.
48-155
: LGTM!
The code examples are comprehensive and effectively demonstrate the library's features:
- Initialization with different date precisions
- String formatting
- Duration calculations
- Date comparisons
- Interval handling
- Parsing and formatting
159-167
: LGTM!
The documentation section provides clear links to additional resources and properly references the license.
examples/notebooks/edtf-support.ipynb (2)
1-18
: Well-structured documentation and clear introduction!
The notebook provides excellent documentation with:
- Clear purpose and scope
- Reference to the EDTF specification
- Explicit mention of supported and unsupported features
696-767
: Excellent test coverage for Level 2 EDTF features!
The implementation thoroughly covers:
- Unspecified digits in various positions
- Edge cases with multiple unspecified digits
- Proper handling of earliest/latest calculations
examples/notebooks/shxco_partial_date_durations.ipynb (4)
Line range hint 325-338
: LGTM! Clean and well-documented implementation.
The duration calculation implementation is clear and includes helpful comments explaining the off-by-one adjustment. The imports are correctly updated to use the new ISO8601DateFormat from the converters package.
2595-2595
: LGTM! Python version update.
The Python version has been updated to 3.12.7, which is appropriate for using the latest features and security updates.
2540-2549
: Well-documented historical issues and fixes.
The markdown comments effectively document previous issues and their resolutions:
- Edge case with unknown years and date wrapping
- Leap year calculation adjustment
- Data validation against source material
This documentation is valuable for future maintenance.
Line range hint 777-2566
: Verify test coverage for edge cases.
The duration comparison tests cover a good range of cases, but consider adding explicit tests for:
- Leap year transitions
- Century transitions
- Invalid date combinations
src/undate/date.py (1)
40-121
: Implementation of Date
class is well-structured
The Date
class effectively extends numpy.datetime64
to behave more like Python's built-in datetime.date
. The methods and properties are well-defined, and the use of type hints enhances code readability. The class provides a convenient interface for date manipulation.
tests/test_undate.py (5)
134-150
: Well-structured tests for the year
property
The test cases effectively cover various scenarios, including numeric and string years, partially known years, and unset years. This enhances the coverage and reliability of the year
property.
151-164
: Comprehensive tests for the month
property
The tests handle different scenarios such as known, partially known, and unknown months, ensuring robustness in the month
property functionality.
165-184
: Extensive tests for the day
property
The test cases adequately cover various situations for the day
property, including known days, partially unknown days, fully unknown days, and edge cases like months with fewer days. This improves the reliability of date handling.
191-192
: LGTM
The test correctly asserts that comparing an Undate
object with an unrelated type returns NotImplemented
, which is appropriate behavior.
535-540
:
Verify the duration calculation for intervals spanning unknown years
The test calculates the duration between Undate(None, 6, 7)
and Undate(None, 6, 6)
and asserts it to be 365 days. However, the duration from June 7 to June 6 of the following year is typically 364 days, unless it's a leap year. Please verify the correctness of this test case.
src/undate/__init__.py
Outdated
@@ -1 +1,6 @@ | |||
__version__ = "0.3.0.dev0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update version for release
The version string indicates a development version (0.3.0.dev0
), but this PR is for Release v0.3. The version should be updated to 0.3.0
for the release.
-__version__ = "0.3.0.dev0"
+__version__ = "0.3.0"
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
__version__ = "0.3.0.dev0" | |
__version__ = "0.3.0" |
.pre-commit-config.yaml
Outdated
- repo: https://github.com/astral-sh/ruff-pre-commit | ||
rev: v0.3.4 | ||
hooks: | ||
- id: black | ||
- id: ruff | ||
args: [ --select, I, --fix, --exit-non-zero-on-fix ] | ||
- id: ruff-format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider expanding Ruff's rule selection for better code quality coverage
The current configuration only enables import sorting rules (--select I
). Consider enabling additional essential rule sets for comprehensive code quality checks.
- id: ruff
- args: [ --select, I, --fix, --exit-non-zero-on-fix ]
+ args: [ --select, E,F,W,I, --fix, --exit-non-zero-on-fix ]
This change would enable:
- E/F rules: Python errors and syntax errors
- W rules: Python warnings
- I rules: Import sorting (currently enabled)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
- repo: https://github.com/astral-sh/ruff-pre-commit | |
rev: v0.3.4 | |
hooks: | |
- id: black | |
- id: ruff | |
args: [ --select, I, --fix, --exit-non-zero-on-fix ] | |
- id: ruff-format | |
- repo: https://github.com/astral-sh/ruff-pre-commit | |
rev: v0.3.4 | |
hooks: | |
- id: ruff | |
args: [ --select, E,F,W,I, --fix, --exit-non-zero-on-fix ] | |
- id: ruff-format |
.github/workflows/check.yml
Outdated
- name: Install package with development dependencies | ||
run: pip install -e ".[dev]" | ||
if: steps.python-cache.outputs.cache-hit != 'true' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the cache hit condition
The condition references an undefined cache output variable. The correct output variable should be from the setup-python action.
Apply this fix:
- name: Install package with development dependencies
run: pip install -e ".[dev]"
- if: steps.python-cache.outputs.cache-hit != 'true'
+ if: steps.setup-python.outputs.cache-hit != 'true'
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
- name: Install package with development dependencies | |
run: pip install -e ".[dev]" | |
if: steps.python-cache.outputs.cache-hit != 'true' | |
- name: Install package with development dependencies | |
run: pip install -e ".[dev]" | |
if: steps.setup-python.outputs.cache-hit != 'true' |
🧰 Tools
🪛 actionlint
24-24: property "python-cache" is not defined in object type {}
(expression)
.github/workflows/check.yml
Outdated
- name: Check that documentation can be built | ||
run: tox -e docs | ||
run: sphinx-build docs docs/_build # --fail-on-warning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Remove the commented --fail-on-warning
flag
The documentation setup is standard with no special warning configurations in either conf.py
or Makefile
. The git history shows this flag was added as a comment in a recent change from tox -e docs
to direct sphinx-build
usage. Since there's no indication of a need for strict warning handling in the configuration files, and the flag was deliberately commented out rather than enabled, it should be removed entirely to avoid confusion.
- Remove the commented
--fail-on-warning
flag from.github/workflows/check.yml
line 32
🔗 Analysis chain
Decide on documentation warning handling
The --fail-on-warning
flag is commented out, suggesting uncertainty about the desired strictness level.
Let's check if there are any current documentation warnings:
Consider either:
- Removing the comment if warnings should be allowed
- Uncommenting the flag if warnings should fail the build
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for documentation warnings
sphinx-build -b html docs docs/_build 2>&1 | grep -i "warning:"
Length of output: 58
Script:
#!/bin/bash
# Let's first check if sphinx-build is available in the environment
which sphinx-build
# Let's also check the content of docs directory to understand the documentation setup
ls -la docs/
# Check if there's any configuration for sphinx
fd -t f "conf.py" docs/
# Look for any historical changes to the sphinx build command
git log -p -- .github/workflows/check.yml | grep -A 5 -B 5 "sphinx-build"
Length of output: 1777
Script:
#!/bin/bash
# Let's check the content of conf.py to see if there are any warning configurations
cat docs/conf.py
# Let's also check the Makefile to see if there are any warning-related configurations
cat docs/Makefile
Length of output: 3351
# level 2 unspecified digits | ||
"156X-12-25", | ||
"XXXX-12-XX", | ||
"1XXX-12", | ||
"1XXX-XX", | ||
"1984-1X", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error cases for invalid unspecified digit patterns
The new test cases cover valid patterns well, but consider adding error cases to error_cases
for:
- Invalid mixing of specified and unspecified digits (e.g., "19X4")
- Incorrect position of X in dates
- Invalid month/day combinations with X
# TODO: handle leap day in an unknown year | ||
# (currently causes an exception because min/max years are not leap years) | ||
# Undate(None, 2, 29) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Address the TODO for handling leap day in unknown years
The TODO comment indicates that initializing Undate
with day 29 in February and an unknown year causes an exception because the min/max years are not leap years. Implementing support for this scenario will enhance the robustness of date handling.
Would you like assistance in implementing this functionality or opening a GitHub issue to track this task?
|
||
# duration is not supported for open-ended intervals | ||
assert UndateInterval(Undate(2000), None).duration() == NotImplemented | ||
|
||
# one year set and the other not currently raises not implemented error | ||
with pytest.raises(NotImplementedError): | ||
UndateInterval(Undate(2000), Undate()).duration() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Ensure consistent handling of unsupported durations in open-ended intervals
Currently, UndateInterval(Undate(2000), None).duration()
returns NotImplemented
, while UndateInterval(Undate(2000), Undate()).duration()
raises NotImplementedError
. Consider standardizing the behavior to either always raise an exception or always return NotImplemented
to maintain consistency.
src/undate/undate.py
Outdated
return f"{year:>04}" | ||
# if value is unset but date precision is month or greater, return unknown month |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix incorrect zero-padding in date component formatting.
The current format specifiers in the year
, month
, and day
properties use '>
' alignment, which does not zero-pad numbers. Use :0>4
for year
and :0>2
for month
and day
to correctly zero-pad on the left.
Apply this diff to fix the formatting:
- return f"{year:>04}"
+ return f"{year:0>4}"
...
- return f"{month:>02}"
+ return f"{month:0>2}"
...
- return f"{day:>02}"
+ return f"{day:0>2}"
Also applies to: 343-344, 354-355
converter: Optional[BaseDateConverter] = None, | ||
label: Optional[str] = None, | ||
): | ||
# keep track of initial values and which values are known |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Changing formatter
to converter
may break backward compatibility.
Changing the parameter name from formatter
to converter
in the Undate
constructor might break existing code that uses the formatter
parameter. Consider maintaining backward compatibility by accepting both parameters, using formatter
if converter
is not provided, and issuing a deprecation warning.
You can apply this diff to handle both parameters and maintain backward compatibility:
def __init__(
self,
year: Optional[Union[int, str]] = None,
month: Optional[Union[int, str]] = None,
day: Optional[Union[int, str]] = None,
- converter: Optional[BaseDateConverter] = None,
+ converter: Optional[BaseDateConverter] = None,
+ formatter: Optional[BaseDateConverter] = None,
label: Optional[str] = None,
):
+ if converter is None and formatter is not None:
+ converter = formatter
+ # Issue a deprecation warning
+ warnings.warn(
+ "The 'formatter' parameter is deprecated and will be removed in a future version. Please use 'converter' instead.",
+ DeprecationWarning,
+ )
+ elif converter is None:
+ # Existing logic to initialize converter
Committable suggestion skipped: line range outside the PR's diff.
return all( | ||
[ | ||
self.earliest <= other.earliest, | ||
self.latest >= other.latest, | ||
# is precision sufficient for comparing partially known dates? | ||
# checking based on less precise /less granular time unit, | ||
# e.g. a day or month could be contained in a year | ||
# but not the reverse | ||
self.precision < other.precision, | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verify precision comparison logic in __contains__
method.
The current logic uses self.precision < other.precision
, which may not correctly represent containment when comparing date precisions. Since a higher DatePrecision
value indicates finer granularity (e.g., DAY > MONTH > YEAR), consider revising the comparison to self.precision > other.precision
to ensure accurate containment checks.
Apply this diff to fix the precision comparison:
- self.precision < other.precision,
+ self.precision > other.precision,
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
return all( | |
[ | |
self.earliest <= other.earliest, | |
self.latest >= other.latest, | |
# is precision sufficient for comparing partially known dates? | |
# checking based on less precise /less granular time unit, | |
# e.g. a day or month could be contained in a year | |
# but not the reverse | |
self.precision < other.precision, | |
] | |
return all( | |
[ | |
self.earliest <= other.earliest, | |
self.latest >= other.latest, | |
# is precision sufficient for comparing partially known dates? | |
# checking based on less precise /less granular time unit, | |
# e.g. a day or month could be contained in a year | |
# but not the reverse | |
self.precision > other.precision, | |
] |
Includes several items suggested or flagged by @coderabbitai
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (15)
.github/workflows/check.yml (2)
26-28
: Consider enhancing ruff configurationWhile using ruff is a good choice, consider specifying explicit rules or configuration to ensure consistent checking across different environments.
- - name: Run ruff - run: ruff check + - name: Run ruff + run: ruff check --select E,F,B,I,N,RUF --target-version py312
38-40
: Consider enhancing notebook testing configurationWhile adding treon for notebook testing is valuable, consider adding more specific configuration for better control and reliability.
- - name: Check jupyter notebooks with treon - run: treon + - name: Check jupyter notebooks with treon + run: treon examples/ --timeout=300src/undate/converters/edtf/edtf.lark (1)
44-48
: Consider adding example patterns in comments.The implementation correctly handles Y-prefixed years with optional negative signs. To improve maintainability, consider adding example patterns in the comments.
// 'Y' may be used at the beginning of the date string to signify that // the date is a year, when (and only when) the year exceeds four digits, // i.e. for years later than 9999 or earlier than -9999. // For negative years, the minus comes after the Y. +// Examples: Y170000, Y-170000, Y10000, Y-10000 year_fivedigitsplus: /Y-?\d{5,}/
pyproject.toml (2)
20-20
: Consider removing or formalizing the maintainers commentThe comment about maintainers could be removed as it doesn't provide value, or better yet, formalized using the proper
maintainers
field if you want to explicitly document this information.
82-85
: Fix typo in pytest marker descriptionThere's a typo in the 'first' marker description:
- "first : run marked tests efore all others", + "first : run marked tests before all others",src/undate/converters/edtf/converter.py (2)
54-67
: Consider code organization improvementsWhile the implementation is correct, consider these improvements:
- Move the note about open intervals vs unknown start/end to the method's docstring.
- Consider extracting the interval handling logic to a separate helper method for better maintainability.
Example refactor:
def to_string(self, undate: Union[Undate, UndateInterval]) -> str: """ Convert an :class:`~undate.undate.Undate` or :class:`~undate.undate.UndateInterval` to EDTF format. + + Note: + The spec distinguishes between open intervals ("..") and + unknown start/end (""). """ if isinstance(undate, Undate): return self._undate_to_string(undate) elif isinstance(undate, UndateInterval): - # NOTE: what is the difference between an open interval and unknown start/end? - # spec distinguishes between these, open is ".." but unknown is "" - start = self._undate_to_string(undate.earliest) if undate.earliest else ".." - end = self._undate_to_string(undate.latest) if undate.latest else ".." - return f"{start}/{end}" + return self._interval_to_string(undate) + + def _interval_to_string(self, interval: UndateInterval) -> str: + start = self._undate_to_string(interval.earliest) if interval.earliest else ".." + end = self._undate_to_string(interval.latest) if interval.latest else ".." + return f"{start}/{end}"
68-99
: Consider breaking down the method for better maintainabilityThe method handles multiple date precisions with similar patterns. Consider extracting each precision handling into separate helper methods to improve readability and maintainability.
Example refactor:
def _undate_to_string(self, undate: Undate) -> str: - parts = [] + return "-".join(filter(None, [ + self._format_year(undate), + self._format_month(undate), + self._format_day(undate) + ])) - if undate.precision >= DatePrecision.YEAR: - year = self._convert_missing_digits(undate.year, undate.MISSING_DIGIT) - if year and len(year.lstrip("-")) > 4: - year = f"Y{year}" - parts.append(year or EDTF_UNSPECIFIED_DIGIT * 4) + def _format_year(self, undate: Undate) -> Optional[str]: + if undate.precision < DatePrecision.YEAR: + return None + year = self._convert_missing_digits(undate.year, undate.MISSING_DIGIT) + if year and len(year.lstrip("-")) > 4: + year = f"Y{year}" + return year or EDTF_UNSPECIFIED_DIGIT * 4 - if undate.precision >= DatePrecision.MONTH: - parts.append( - self._convert_missing_digits(undate.month, undate.MISSING_DIGIT) - or EDTF_UNSPECIFIED_DIGIT * 2 - ) + def _format_month(self, undate: Undate) -> Optional[str]: + if undate.precision < DatePrecision.MONTH: + return None + return (self._convert_missing_digits(undate.month, undate.MISSING_DIGIT) + or EDTF_UNSPECIFIED_DIGIT * 2) - if undate.precision >= DatePrecision.DAY: - parts.append( - self._convert_missing_digits(undate.day, undate.MISSING_DIGIT) - or EDTF_UNSPECIFIED_DIGIT * 2 - ) + def _format_day(self, undate: Undate) -> Optional[str]: + if undate.precision < DatePrecision.DAY: + return None + return (self._convert_missing_digits(undate.day, undate.MISSING_DIGIT) + or EDTF_UNSPECIFIED_DIGIT * 2)README.md (3)
50-51
: Consider consistent formatting for code outputsFor better readability, consider using consistent formatting for all code outputs. Currently, some outputs use list representation while others use direct printing.
->>> [str(d) for d in [november7, november, year2k, november7_some_year]] -['2000-11-07', '2000-11', '2000', '--11-07'] +>>> for d in [november7, november, year2k, november7_some_year]: +... print(str(d)) +... +2000-11-07 +2000-11 +2000 +--11-07Also applies to: 62-68
34-45
: Consider adding type hints in examplesTo improve documentation and IDE support, consider adding type hints to the example code:
-from undate.undate import Undate +from undate.undate import Undate +from typing import List -november7 = Undate(2000, 11, 7) +november7: Undate = Undate(2000, 11, 7)
29-30
: Fix punctuation in the introductionAdd a comma after "Often" to improve readability.
-Often humanities and cultural data include imprecise or uncertain temporal information. +Often, humanities and cultural data include imprecise or uncertain temporal information.🧰 Tools
🪛 LanguageTool
[uncategorized] ~29-~29: Possible missing comma found.
Context: ...ibution information. ## Example Usage Often humanities and cultural data include im...(AI_HYDRA_LEO_MISSING_COMMA)
src/undate/undate.py (5)
31-37
: Consider documenting the rationale for MIN_ALLOWABLE_YEAR adjustmentThe code adds 1 year to avoid a leap year, but this might be confusing for future maintainers. Consider adding more detailed documentation about why this specific adjustment was chosen and its implications.
# numpy datetime is stored as 64-bit integer, so min/max # depends on the time unit; assume days for now # See https://numpy.org/doc/stable/reference/arrays.datetime.html#datetime-units # It just so happens that int(2.5e16) is a leap year, which is a weird default, # so let's increase our lower bound by one year. +# Note: Adding 1 year ensures we start with a non-leap year for consistent date calculations MIN_ALLOWABLE_YEAR = int(-2.5e16) + 1 MAX_ALLOWABLE_YEAR = int(2.5e16)
177-196
: Reduce code duplication in parse and format methodsThe converter lookup logic is duplicated. Consider extracting it to a private helper method.
+ @classmethod + def _get_converter(cls, format): + """Get a converter instance for the specified format""" + converter_cls = BaseDateConverter.available_converters().get(format, None) + if not converter_cls: + raise ValueError(f"Unsupported format '{format}'") + return converter_cls() + @classmethod def parse(cls, date_string, format) -> Union["Undate", "UndateInterval"]: """parse a string to an undate or undate interval using the specified format; for now, only supports named converters""" - converter_cls = BaseDateConverter.available_converters().get(format, None) - if converter_cls: - # NOTE: some parsers may return intervals; is that ok here? - return converter_cls().parse(date_string) - - raise ValueError(f"Unsupported format '{format}'") + converter = cls._get_converter(format) + return converter.parse(date_string) def format(self, format) -> str: """format this undate as a string using the specified format; for now, only supports named converters""" - converter_cls = BaseDateConverter.available_converters().get(format, None) - if converter_cls: - # NOTE: some parsers may return intervals; is that ok here? - return converter_cls().to_string(self) - - raise ValueError(f"Unsupported format '{format}'") + converter = self._get_converter(format) + return converter.to_string(self)
325-364
: Add more documentation for date part propertiesThe properties would benefit from more detailed documentation, including examples of return values for different scenarios (known, unknown, partially known).
@property def year(self) -> Optional[str]: - "year as string (minimum 4 characters), if year is known" + """Year as string (minimum 4 characters). + + Returns: + - Known year: "2024" + - Unknown year with year precision: "XXXX" + - No year component: None + """
385-387
: Remove commented out codeRemove the commented out datetime.date code as it's no longer needed after the migration to the new Date class.
- # latest = datetime.date( - # self.earliest.year, self.latest.month, self.latest.day - # )
Line range hint
1-520
: Consider splitting the file into smaller modulesThe file has grown quite large and handles multiple concerns (date parsing, formatting, comparison, duration calculation). Consider splitting it into smaller, focused modules:
undate/core.py
- Core Undate classundate/interval.py
- UndateInterval classundate/formatting.py
- Parsing and formatting logicThis would improve maintainability and make the code easier to test.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (18)
.github/workflows/check.yml
(1 hunks).github/workflows/unit_tests.yml
(2 hunks).pre-commit-config.yaml
(2 hunks)CHANGELOG.md
(1 hunks)CITATION.cff
(1 hunks)README.md
(2 hunks)docs/conf.py
(3 hunks)examples/notebooks/README.md
(1 hunks)examples/notebooks/edtf-support.ipynb
(1 hunks)pyproject.toml
(1 hunks)src/undate/__init__.py
(1 hunks)src/undate/converters/edtf/converter.py
(1 hunks)src/undate/converters/edtf/edtf.lark
(2 hunks)src/undate/converters/edtf/transformer.py
(2 hunks)src/undate/undate.py
(14 hunks)tests/test_converters/edtf/test_edtf_parser.py
(3 hunks)tests/test_converters/edtf/test_edtf_transformer.py
(3 hunks)tests/test_converters/test_edtf.py
(1 hunks)
✅ Files skipped from review due to trivial changes (2)
- CITATION.cff
- examples/notebooks/README.md
🚧 Files skipped from review as they are similar to previous changes (9)
- .pre-commit-config.yaml
- CHANGELOG.md
- docs/conf.py
- examples/notebooks/edtf-support.ipynb
- src/undate/init.py
- src/undate/converters/edtf/transformer.py
- tests/test_converters/edtf/test_edtf_parser.py
- tests/test_converters/edtf/test_edtf_transformer.py
- tests/test_converters/test_edtf.py
🧰 Additional context used
🪛 actionlint
.github/workflows/check.yml
24-24: property "setup-python" is not defined in object type {}
(expression)
.github/workflows/unit_tests.yml
37-37: property "setup-python" is not defined in object type {}
(expression)
🪛 LanguageTool
README.md
[uncategorized] ~29-~29: Possible missing comma found.
Context: ...ibution information. ## Example Usage Often humanities and cultural data include im...
(AI_HYDRA_LEO_MISSING_COMMA)
🪛 Ruff
src/undate/converters/edtf/converter.py
43-45: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
🔇 Additional comments (23)
.github/workflows/check.yml (4)
16-21
: LGTM! Modern Python setup with appropriate caching
The setup is well-configured with the latest stable Python version and proper caching using pyproject.toml.
22-24
: LGTM! Cache condition is correctly configured
The installation step is properly configured with the correct cache hit condition referencing the setup-python step.
🧰 Tools
🪛 actionlint
24-24: property "setup-python" is not defined in object type {}
(expression)
34-36
: LGTM! Type checking is well configured
The mypy configuration is appropriate with non-interactive type installation and focused src directory checking.
30-32
: Verify the need for strict documentation warnings
The --fail-on-warning
flag is now enabled, which will cause the workflow to fail on any documentation warnings. Please verify if this strict checking aligns with the project's documentation standards.
✅ Verification successful
Keep the --fail-on-warning
flag for documentation quality
Based on the evidence gathered, keeping the --fail-on-warning
flag is appropriate for this project:
- The project maintains high quality standards with multiple quality checks (Ruff, codecov, unit tests) as shown in README.md
- Documentation is prominently featured with ReadTheDocs integration and badges
- Recent commit history shows active documentation maintenance and improvements
- The CONTRIBUTING.md explicitly states documentation is an expected part of contributions
- No issues were found regarding documentation warnings being too strict
The strict documentation checking aligns with the project's demonstrated commitment to code and documentation quality.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check documentation configuration and current warning state
# Check sphinx configuration for warning handling
echo "Checking sphinx configuration..."
fd -t f "conf.py" docs/ -x cat {}
# Check if there are any current documentation warnings
echo "Checking for current warnings..."
sphinx-build -b html docs docs/_build 2>&1 | grep -i "warning:"
Length of output: 2856
Script:
#!/bin/bash
# Let's check for any documentation-related discussions or standards in project files
# Check README and other documentation files for any mentions of documentation standards
echo "Checking documentation standards..."
rg -i "documentation|sphinx|docs?" -A 3 -B 3 "README.md" "CONTRIBUTING.md" "docs/README.md" || true
# Check commit history for any relevant discussions about documentation warnings
echo "Checking commit history..."
git log --grep="sphinx\|documentation\|warning" --pretty=format:"%h %s" -n 10
# Check if there are any existing documentation issues
echo "Checking documentation issues..."
gh issue list --label "documentation" || true
Length of output: 4689
.github/workflows/unit_tests.yml (4)
14-14
: LGTM! Good choice updating to Python 3.12
Using the latest stable Python version (3.12) for coverage reporting is a good practice.
21-21
: Consider removing Python 3.13 from the test matrix
Python 3.13 is currently in alpha, and including it in the test matrix might lead to unstable builds. Key dependencies like numpy typically don't guarantee support for alpha Python versions.
Let's verify the current Python 3.13 compatibility status:
#!/bin/bash
# Check if the project's dependencies support Python 3.13
curl -s https://pypi.org/pypi/numpy/json | jq -r '.info.requires_python'
curl -s https://pypi.org/pypi/lark/json | jq -r '.info.requires_python'
# Check if there are any known issues with Python 3.13
gh api graphql -f query='
{
repository(owner: "numpy", name: "numpy") {
issues(first: 5, states: OPEN, labels: ["Python 3.13"]) {
nodes {
title
url
}
}
}
}'
33-33
: LGTM! Good update to use pyproject.toml
Using pyproject.toml for dependency caching aligns with modern Python packaging standards.
41-46
: Verify pytest configuration maintains all test scenarios
The switch from tox to direct pytest commands simplifies the workflow, but please ensure that all test scenarios from the tox configuration are preserved in pytest configuration.
Let's check for any tox-specific configurations that need to be migrated:
✅ Verification successful
Let me gather more information about the previous test configuration to ensure we haven't lost any important settings.
Based on the git history and test files, I can now provide a conclusive response about the pytest configuration changes.
Test configuration has been properly migrated from tox
The switch from tox to direct pytest commands maintains all essential test functionality:
- The pytest configuration in
pyproject.toml
preserves the important settings:- Correct Python path (
pythonpath = "src/"
) - Test markers for execution order
- Correct Python path (
- The workflow changes maintain the same test coverage by:
- Running basic tests across all Python versions
- Running coverage report specifically for one version
- No test scenarios were lost in the migration as tox was primarily used as a test runner without additional test configurations (the removed tox config only specified basic pytest execution)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Look for tox configuration files and their content
fd "tox.ini|setup.cfg" --exec cat {}
# Look for pytest configuration in pyproject.toml
fd "pyproject.toml" --exec cat {} | grep -A 20 "\[tool.pytest"
Length of output: 314
Script:
#!/bin/bash
# Check git history for removed tox configuration
git log -p -- tox.ini setup.cfg
# Look for any pytest-related files
fd "conftest.py|pytest.ini" --exec cat {}
# Check test directory structure and any test configuration files
fd "test" --type d
fd "test" --type f -e py -e ini -e cfg --exec cat {}
Length of output: 42999
src/undate/converters/edtf/edtf.lark (1)
37-39
: Implementation aligns with EDTF specification.
The addition of unspecified ~ 4
correctly implements fully unspecified years (XXXX) as per Level 2 of the EDTF specification.
Let's verify the grammar handles all unspecified year cases correctly:
✅ Verification successful
Grammar correctly handles all unspecified year patterns
The test coverage shows comprehensive verification of the grammar implementation:
- Full year unspecification:
XXXX
(e.g.XXXX-12-31
) - Partial year unspecification patterns:
- Decade unspecified:
19XX
,20XX
- Single digit unspecified:
201X
,156X
- Multiple digits:
1XXX
- Decade unspecified:
- All patterns are tested with various month/day combinations
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check test coverage for unspecified year patterns
# Look for test cases covering both partial and fully unspecified years
rg -A 2 'XXXX|[0-9]XXX|[0-9]{2}XX|[0-9]{3}X' tests/
Length of output: 4707
pyproject.toml (4)
2-3
: LGTM! Clean build system configuration.
The transition to hatchling as the build backend is a good choice, providing a modern and maintainable build system.
38-38
: Remove Python 3.13 classifier
As noted in the previous review, Python 3.13 is not yet released and the project hasn't been tested against it. This classifier should be removed until compatibility is confirmed.
61-64
: Remove circular dependencies
The following circular dependencies should be removed:
undate[docs]
andundate[test]
in dev dependenciesundate
in all dependencies
These are redundant and could cause issues during development.
66-71
: LGTM! Well-structured project URLs.
All essential URLs are provided and consistently use the 'main' branch.
src/undate/converters/edtf/converter.py (5)
1-13
: LGTM! Well-organized imports and clear constant definition.
The imports are logically grouped and the constant is well-documented with proper type annotation.
15-28
: LGTM! Well-documented class with clear scope.
The class is well-structured with comprehensive documentation that clearly defines its capabilities and limitations.
42-45
: Enhance exception handling by preserving the original exception
When re-raising an exception within an except
block, it's best practice to include the original exception to maintain the exception chain.
Apply this diff to improve the exception handling:
- except UnexpectedCharacters:
+ except UnexpectedCharacters as err:
raise ValueError(
"Parsing failed: '%s' is not a supported EDTF date format" % value
- )
+ ) from err
🧰 Tools
🪛 Ruff
43-45: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
47-53
: LGTM! Clean and focused helper method.
The method is well-typed and follows the single responsibility principle.
104-104
: Correct exception message formatting
The ValueError
raised should have a properly formatted message.
Apply this diff to fix the exception message:
- raise ValueError("Failed to generate an EDTF string from %r", undate)
+ raise ValueError(f"Failed to generate an EDTF string from {undate!r}")
README.md (2)
21-23
: LGTM! Comprehensive set of badges
The badges effectively communicate the project's status, quality metrics, and community engagement.
Line range hint 1-169
: Excellent documentation update!
The README now provides:
- Clear project status and limitations
- Comprehensive examples covering major functionality
- Well-structured sections for easy navigation
- Proper references to external documentation
The changes significantly improve the documentation's quality and usability.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~29-~29: Possible missing comma found.
Context: ...ibution information. ## Example Usage Often humanities and cultural data include im...
(AI_HYDRA_LEO_MISSING_COMMA)
src/undate/undate.py (3)
44-44
: Previous comment about backward compatibility is still valid
The change from formatter
to converter
parameter could break existing code.
Line range hint 151-168
: LGTM: String representation logic is well-implemented
The implementation correctly handles partially known dates and uses proper zero-padding.
292-301
: Previous comment about precision comparison logic is still valid
The precision comparison in __contains__
method needs to be reversed.
.github/workflows/unit_tests.yml
Outdated
pip install -e ".[test]" | ||
if: steps.setup-python.outputs.cache-hit != 'true' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix undefined step ID in cache hit check
The condition uses steps.setup-python
but this step ID is not defined in the workflow. The Python setup step doesn't have an id
field defined.
Apply this fix:
- if: steps.setup-python.outputs.cache-hit != 'true'
+ if: steps.cache.outputs.cache-hit != 'true'
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
pip install -e ".[test]" | |
if: steps.setup-python.outputs.cache-hit != 'true' | |
pip install -e ".[test]" | |
if: steps.cache.outputs.cache-hit != 'true' |
🧰 Tools
🪛 actionlint
37-37: property "setup-python" is not defined in object type {}
(expression)
src/undate/converters/edtf/edtf.lark
Outdated
@@ -14,7 +14,7 @@ | |||
|
|||
date: year | year "-" month | year "-" month "-" day | |||
|
|||
year: INT | |||
year: INT | /\-/ INT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Simplify the negative year pattern and prevent unintended spaces.
The current pattern INT | /\-/ INT
could allow spaces between the minus sign and the number. Consider using a single regex pattern for better precision.
-year: INT | /\-/ INT
+year: /-?\d+/
Committable suggestion skipped: line range outside the PR's diff.
|
||
**undate** is a python library for working with uncertain or partially known dates. | ||
![undate](_static/undate_logo.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Logo image path needs to be updated
The logo image exists but the path in the README needs to be corrected. The image is present in both _static/undate_logo.png
and docs/_static/undate_logo.png
, but for consistency with other documentation, it should reference the one in the docs/_static/
directory.
- Update the image path in README.md from
![undate](_static/undate_logo.png)
to![undate](docs/_static/undate_logo.png)
🔗 Analysis chain
Verify the logo image path and accessibility
Please ensure that:
- The
_static
directory exists in the repository root - The logo image file is present and correctly named
- The image is visible in the rendered README
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if the logo image exists and is accessible
if [ -f "_static/undate_logo.png" ]; then
echo "Logo image exists"
file "_static/undate_logo.png"
else
echo "Logo image not found at _static/undate_logo.png"
# Check if it exists elsewhere
fd -e png "undate_logo"
fi
Length of output: 184
Script:
#!/bin/bash
# Let's verify the existence and structure of _static directory
ls -la _static/ 2>/dev/null || echo "Directory _static does not exist"
# Check if the image exists in any other location
find . -type f -name "undate_logo.png"
# List all png files in the repository to see if the logo might be named differently
find . -type f -name "*.png"
Length of output: 412
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (2)
src/undate/converters/edtf/edtf.lark (1)
44-48
: LGTM! Clear documentation for extended year handling.The documentation and implementation correctly handle extended year ranges with the Y prefix. The clarification about negative year placement is particularly helpful.
Consider adding an example in the comments:
// For negative years, the minus comes after the Y. +// Example: "Y-170000" represents the year -170000
README.md (1)
89-119
: Consider adding explanatory comments for comparison behaviorThe comparison examples would benefit from brief comments explaining why certain comparisons raise
NotImplementedError
. This would help users better understand the limitations and behavior of date comparisons.Example enhancement:
# When dates cannot be compared due to ambiguity or precision, comparison methods raise a NotImplementedError. +# For example, when comparing a specific day (2020-11-07) with its containing month (2020-11), +# the comparison is ambiguous since the day falls within the month's range. >>> november_2020 = Undate(2020, 11) >>> november7_2020 > november_2020
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
README.md
(2 hunks)pyproject.toml
(1 hunks)src/undate/converters/edtf/edtf.lark
(2 hunks)
🧰 Additional context used
🪛 LanguageTool
README.md
[uncategorized] ~29-~29: Possible missing comma found.
Context: ...ibution information. ## Example Usage Often humanities and cultural data include im...
(AI_HYDRA_LEO_MISSING_COMMA)
🔇 Additional comments (8)
src/undate/converters/edtf/edtf.lark (1)
17-17
: LGTM! Improved year pattern implementation.
The simplified regex pattern properly handles negative years while preventing unintended spaces, as previously suggested.
pyproject.toml (4)
2-3
: LGTM: Clean build system configuration
The transition to hatchling
as the build backend is well-structured and follows modern Python packaging practices.
38-38
: Remove Python 3.13 classifier
Including Python 3.13 in classifiers is premature as it's not yet released and there's no testing infrastructure for this version.
61-64
: Remove circular dependencies
The following dependencies create circular references:
undate[docs]
andundate[test]
in dev dependenciesundate
in all dependencies
66-71
: LGTM: Well-structured project URLs
All essential repository links are provided with correct branch references.
README.md (3)
7-10
: LGTM: Well-formatted warning section
The warning section effectively uses GitHub's alert syntax to highlight the pre-alpha status and feature limitations.
21-26
: LGTM: Comprehensive badge coverage
The badges section effectively communicates project status, including code quality (Ruff), test coverage, documentation status, and contributor recognition.
161-165
: LGTM: Clear documentation references
The documentation section effectively directs users to both the ReadTheDocs site and developer notes.
// In Level 2, year may be completely unspecified. | ||
unspecified: /X/ | ||
?year_unspecified: /\d+/ unspecified+ | ||
?year_unspecified: /\d+/ unspecified+ | unspecified ~ 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Implementation needs adjustment to match EDTF Level 2 specification
The current implementation allowing XXXX
(completely unspecified year) appears to exceed the documented Level 2 support. Based on the evidence:
- The notebook explicitly states: "The only part of L2 that
undate
currently supports is allowing an unspecified digit anywhere in the date" - Test cases show
XXXX
pattern being used, but this isn't documented as a supported Level 2 feature - The implementation should focus on the specified Level 2 feature of allowing 'X' anywhere within a component, as shown in example '156X-12-25'
Suggested actions:
- Remove or document the rationale for supporting completely unspecified years (
XXXX
) - Update tests to focus on the documented Level 2 feature of allowing 'X' anywhere within components
🔗 Analysis chain
Verify EDTF Level 2 specification compliance.
The addition of unspecified ~ 4
for completely unspecified years (XXXX) appears to be a Level 2 feature. Please ensure this aligns with the EDTF specification and that appropriate tests are in place.
Let me check the test files to verify the implementation details and specification compliance.
Let me check the EDTF specification reference in the codebase to ensure complete verification.
Let me check the notebook that demonstrates EDTF support to verify Level 2 compliance.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for related test cases and Level 2 implementation
# Look for test files that verify XXXX pattern handling
rg -l "XXXX" tests/
# Check if we're implementing other Level 2 features
rg "Level 2" src/
Length of output: 282
Script:
#!/bin/bash
# Check test cases for XXXX pattern
rg -B2 -A2 "XXXX" tests/test_converters/test_edtf.py tests/test_converters/edtf/test_edtf_parser.py tests/test_converters/edtf/test_edtf_transformer.py
# Check for any Level 2 documentation or specification references
rg -l "edtf.*level.*2|level.*2.*edtf" -i .
Length of output: 1852
Script:
#!/bin/bash
# Look for EDTF specification references in documentation or comments
rg -i "edtf.*specification|specification.*edtf" .
# Check for README or documentation files that might contain EDTF spec details
fd -e md -e rst -e txt | xargs rg -i "edtf"
Length of output: 1719
Script:
#!/bin/bash
# Extract content from the EDTF support notebook
cat examples/notebooks/edtf-support.ipynb | jq -r '.cells[] | select(.source != null) | .source[]' | grep -A5 -B5 -i "level.*2\|unspecified.*year"
Length of output: 1253
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Summary by CodeRabbit
Release Notes
New Features
CONTRIBUTORS.md
section, showcasing individual contributions and a badge.BaseDateConverter
class for flexible date conversions.Date
andTimedelta
classes for improved date and time handling.README.md
with expanded usage examples and clearer descriptions.Improvements
hatchling
for package management, improving build and dependency management.pytest
and updating testing commands.Bug Fixes
Tests
Date
,Timedelta
, andBaseDateConverter
classes.