Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "desire paths" to API #1192

Merged
merged 17 commits into from
Feb 22, 2022
Merged

Conversation

Yoshanuikabundi
Copy link
Collaborator

@Yoshanuikabundi Yoshanuikabundi commented Feb 9, 2022

This PR re-exports select classes and functions to improve their accessibility:

  • openff.toolkit.ForceField -> openff.toolkit.typing.engines.smirnoff.forcefield.ForceField (Should this be renamed SmirnoffFF or something to account for future forcefield types? I think having a unified ForceField type is very valuable)
  • openff.toolkit.get_available_force_fields -> openff.toolkit.typing.engines.smirnoff.forcefield.get_available_force_fields
  • openff.toolkit.Molecule -> openff.toolkit.topology.molecule.Molecule
  • openff.toolkit.Topology -> openff.toolkit.topology.topology.Topology
  • A new openff.toolkit.typing.engines.smirnoff.parametertypes module that re-exports the ParameterType classes from the class attributes in openff.toolkit.typing.engines.smirnoff.parameters (and the docs have been redirected to point to these re-exports rather than the class attributes, which fixes a series of sphinx warnings)

It also fixes some documentation warnings, such as by removing API references to the removed TopologyAtom, TopologyBond, TopologyVirtualSite, TopologyVirtualParticle and TopologyMolecule classes, and adds the ConstraintType and ConstraintHandler classes to the API reference.

@Yoshanuikabundi Yoshanuikabundi changed the base branch from master to topology-biopolymer-refactor February 9, 2022 05:08
@codecov
Copy link

codecov bot commented Feb 9, 2022

Codecov Report

❗ No coverage uploaded for pull request base (topology-biopolymer-refactor@1115d25). Click here to learn what that means.
The diff coverage is n/a.

@mattwthompson
Copy link
Member

I have to raise a point of concern that exposing everything at the top level like this will drastically slow import times for all imports:

$ git checkout upstream/desire_paths
Previous HEAD position was a7a9f186 Bump actions/setup-python from 2.3.1 to 2.3.2 (#1189)
HEAD is now at 2c36a0cf Update changelog
$ time python -c "from openff.toolkit import __file__"
python -c "from openff.toolkit import __file__"  3.03s user 0.94s system 78% cpu 5.025 total
$ git checkout upstream/topology-biopolymer-refactor
Previous HEAD position was 2c36a0cf Update changelog
HEAD is now at 0beef754 Add `use_interchange` argument to `ForceField.create_openmm_system` (#1165)
$ time python -c "from openff.toolkit import __file__"
python -c "from openff.toolkit import __file__"  0.06s user 0.06s system 83% cpu 0.136 total
$ git checkout upstream/master
Previous HEAD position was 0beef754 Add `use_interchange` argument to `ForceField.create_openmm_system` (#1165)
HEAD is now at a7a9f186 Bump actions/setup-python from 2.3.1 to 2.3.2 (#1189)
$ time python -c "from openff.toolkit import __file__"
python -c "from openff.toolkit import __file__"  0.07s user 0.06s system 79% cpu 0.168 total

My hardware is aging so it may be closer to 2 seconds on a faster CPU & disk. This isn't so impactful for workflows that are already importing the major classes in the toolkit, but it unfortunately it does mean that any downstream library built off of any individual component of the toolkit will need to import all of it, i.e. something just wanting to use the Molecule class for file parsing will need to pull in all of the typing machinery.

Waiting 2-3 seconds for an interpreter or script to start up isn't the end of the world, but these can easily add up (hgrecco/pint#1460) and grow to 5+ seconds, which IMO is not a good user experience.

@j-wags
Copy link
Member

j-wags commented Feb 10, 2022

Hm, darn. This would be a really convenient thing for users. Are we sure that @mattwthompson's tests reflect real use cases? I don't know if we actually expose anything at the openff.toolkit level, so I'm not sure that people would be importing from there in the first place.

@Yoshanuikabundi
Copy link
Collaborator Author

Yoshanuikabundi commented Feb 10, 2022

Thanks for pointing that out Matt, I hadn't considered it! I agree that this might be a net loss if it dramatically increases import times for typical users. I think we're in luck though.

Without this PR, import openff.toolkit is more or less useless. The only useful objects you get out of it is openff.toolkit.__version__ and the builtin stuff. Since it doesn't import other modules, you can't even use it to get deeper into the toolkit:

>>> import openff.toolkit
>>> openff.toolkit.topology.molecule.Molecule
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'openff.toolkit' has no attribute 'topology'

This is changed by this PR, which I think is more in line with people's expectation. It takes more time because it actually imports the whole toolkit. (or at least, all of the toolkit that is used by Molecule or ForceField... which seems to be most of it. I have learnt a lot about the minutiae of Python importing today!)

I didn't realise this, but Python runs all of the __init__.py files for parent packages when it loads a module, so this PR does mean that the toolkit's entire public API is loaded when any module is imported.

However, this PR barely affects the time taken to import the main classes.

I think this is because all the main modules are complex enough to incidentally import most of the rest of the toolkit. I've done some benchmarking with hyperfine, which runs the provided command repeatedly to get stats on the time it takes. These times are probably close to the fastest modern hardware can go, and they're definitely long enough to be problematic, but I don't think they support holding back this PR. Here are my benchmarks:

$ git checkout topology-biopolymer-refactor
Already on 'topology-biopolymer-refactor'
Your branch is up to date with 'origin/topology-biopolymer-refactor'.

$ hyperfine 'python -c "from openff.toolkit.topology.molecule import Molecule"'
Benchmark 1: python -c "from openff.toolkit.topology.molecule import Molecule"
  Time (mean ± σ):     814.2 ms ±  12.7 ms    [User: 1068.9 ms, System: 1388.2 ms]
  Range (min … max):   789.0 ms … 832.1 ms    10 runs
 
$ hyperfine 'python -c "from openff.toolkit.typing.engines.smirnoff.forcefield import ForceField"'
Benchmark 1: python -c "from openff.toolkit.typing.engines.smirnoff.forcefield import ForceField"
  Time (mean ± σ):     825.3 ms ±  13.2 ms    [User: 1062.0 ms, System: 1408.6 ms]
  Range (min … max):   800.3 ms … 843.7 ms    10 runs
 
$ git checkout desire_paths                                                                       
Switched to branch 'desire_paths'

$ hyperfine 'python -c "from openff.toolkit import ForceField"' 
Benchmark 1: python -c "from openff.toolkit import ForceField"
  Time (mean ± σ):     839.0 ms ±   6.1 ms    [User: 1091.1 ms, System: 1391.4 ms]
  Range (min … max):   829.5 ms … 847.2 ms    10 runs
 
$ hyperfine 'python -c "from openff.toolkit import Molecule"'  
Benchmark 1: python -c "from openff.toolkit import Molecule"
  Time (mean ± σ):     839.0 ms ±   9.9 ms    [User: 1074.8 ms, System: 1402.9 ms]
  Range (min … max):   823.5 ms … 854.8 ms    10 runs

So the new desire paths are, like, a few per cent slower. Probably an imperceptible amount of time for any computer recent enough to run Python 3.7. At these levels, there can be variance between runs even if you repeat them due to caching and clock boost and CPU/disk temps and stuff like that, so even this 2 or 3 per cent might not be real (I can't reliably reproduce it).

In particular, importing Molecule apparently imports so much of the typing machinery that importing ForceField straight after is basically free:

$ git checkout topology-biopolymer-refactor                                                       
Switched to branch 'topology-biopolymer-refactor'
Your branch is up to date with 'origin/topology-biopolymer-refactor'.
 
$ hyperfine 'python -c "from openff.toolkit.topology.molecule import Molecule"'                   
Benchmark 1: python -c "from openff.toolkit.topology.molecule import Molecule"
  Time (mean ± σ):     822.1 ms ±  11.3 ms    [User: 1059.9 ms, System: 1400.8 ms]
  Range (min … max):   802.0 ms … 837.2 ms    10 runs
 
$ hyperfine 'python -c "from openff.toolkit.topology.molecule import Molecule; from openff.toolkit.typing.engines.smirnoff.forcefield import ForceField"'
Benchmark 1: python -c "from openff.toolkit.topology.molecule import Molecule; from openff.toolkit.typing.engines.smirnoff.forcefield import ForceField"
  Time (mean ± σ):     826.0 ms ±  15.3 ms    [User: 1080.0 ms, System: 1389.2 ms]
  Range (min … max):   789.3 ms … 841.2 ms    10 runs

Some cases that do get slower with this PR are checking the version:

$ git checkout topology-biopolymer-refactor
Already on 'topology-biopolymer-refactor'
Your branch is up to date with 'origin/topology-biopolymer-refactor'.

$ hyperfine 'python -c "from openff.toolkit import __version__"'
Benchmark 1: python -c "from openff.toolkit import __version__"
  Time (mean ± σ):      30.7 ms ±   4.7 ms    [User: 24.1 ms, System: 7.1 ms]
  Range (min … max):    25.2 ms …  42.5 ms    105 runs
 
$ git checkout desire_paths                                                    
Switched to branch 'desire_paths'

$ hyperfine 'python -c "from openff.toolkit import __version__"'
Benchmark 1: python -c "from openff.toolkit import __version__"
  Time (mean ± σ):     826.5 ms ±  14.4 ms    [User: 1070.6 ms, System: 1390.3 ms]
  Range (min … max):   808.9 ms … 855.4 ms    10 runs

and importing small modules that don't depend on other modules:

$  git checkout topology-biopolymer-refactor                                    

$ hyperfine 'python -c "import openff.toolkit.utils.constants"'
Benchmark 1: python -c "import openff.toolkit.utils.constants"
  Time (mean ± σ):     452.5 ms ±   7.1 ms    [User: 470.4 ms, System: 468.0 ms]
  Range (min … max):   439.5 ms … 461.7 ms    10 runs

$  git checkout desire_paths         

$ hyperfine 'python -c "import openff.toolkit.utils.constants"'        
Benchmark 1: python -c "import openff.toolkit.utils.constants"
  Time (mean ± σ):     843.5 ms ±  10.2 ms    [User: 1081.1 ms, System: 1402.5 ms]
  Range (min … max):   827.8 ms … 862.7 ms    10 runs

TL;DR Importing anything in the toolkit now imports most of the toolkit. For the main classes, this doesn't change the time it takes because they're complex enough that the whole toolkit gets imported anyway. For small modules with few imports, this can increase import times. I think this is acceptable for the usability wins with IPython and in notebooks, where tab completion can now take you through the entire toolkit, and for saving users from memorizing some pretty obscure paths. It also lets us explicitly declare our public API, which is useful for any future automated documentation.

@Yoshanuikabundi
Copy link
Collaborator Author

Don't think this is useful but Hyperfine is cool

hyperfine -L branch desire_paths,topology-biopolymer-refactor -L script "from openff.toolkit.topology.molecule import Molecule","from openff.toolkit.typing.engines.smirnoff.forcefield import ForceField" -p "git checkout {branch}" -n '{branch}:{script}' 'python -c "{script}"' --export-markdown desire_path_benchmarks.md
Command Mean [ms] Min [ms] Max [ms] Relative
desire_paths:from openff.toolkit.topology.molecule import Molecule 844.6 ± 14.5 824.5 873.6 1.02 ± 0.02
topology-biopolymer-refactor:from openff.toolkit.topology.molecule import Molecule 838.6 ± 23.8 796.2 877.3 1.01 ± 0.03
desire_paths:from openff.toolkit.typing.engines.smirnoff.forcefield import ForceField 839.7 ± 17.8 814.3 870.7 1.01 ± 0.03
topology-biopolymer-refactor:from openff.toolkit.typing.engines.smirnoff.forcefield import ForceField 827.8 ± 13.3 805.5 849.3 1.00

@j-wags
Copy link
Member

j-wags commented Feb 10, 2022

Wow, hyperfine is really cool - Thanks for sharing your timing commands, @Yoshanuikabundi!

I ran the same on my computer (with a billion tabs open, pycharm, slack, etc) and I saw the same relative timings as you, though another important result is that your computer is 2-3x faster than my 2018 MBP, and my computer is probably faster than many of our users'. I've also included a few "lighter" imports that users might reasonably call and saw a significant performance degradation there.

Command Mean [s] Min [s] Max [s] Relative
desire_paths:from openff.toolkit.topology.molecule import Molecule 2.722 ± 0.200 2.596 3.237 1.73 ± 0.13
topology-biopolymer-refactor:from openff.toolkit.topology.molecule import Molecule 2.654 ± 0.143 2.561 3.023 1.69 ± 0.09
desire_paths:from openff.toolkit.typing.engines.smirnoff.forcefield import ForceField 2.737 ± 0.223 2.555 3.272 1.74 ± 0.14
topology-biopolymer-refactor:from openff.toolkit.typing.engines.smirnoff.forcefield import ForceField 2.675 ± 0.081 2.584 2.788 1.70 ± 0.05
desire_paths:from openff.toolkit.utils import constants 2.658 ± 0.106 2.580 2.931 1.69 ± 0.07
topology-biopolymer-refactor:from openff.toolkit.utils import constants 1.597 ± 0.028 1.554 1.661 1.02 ± 0.02
desire_paths:from openff.toolkit.utils import RDKitToolkitWrapper 2.693 ± 0.183 2.541 3.175 1.71 ± 0.12
topology-biopolymer-refactor:from openff.toolkit.utils import RDKitToolkitWrapper 1.573 ± 0.017 1.555 1.602 1.00

There's a fine balance to this tradeoff -

  • The negatives are that
    • we significantly slow down some use cases
    • we increase the API's surface area
    • we make it impossible to undo this slowdown without breaking the API
  • The positives are that
    • most users get to use simpler import paths

I think the negatives here outweigh the positives.

So, to move forward, let's sink these quick imports behind something like openff.toolkit.app. That would gain most of the benefit of these changes and shake off most of the negatives. So, imports like

from openff.toolkit.topology import Molecule
from openff.toolkit.typing.engines.smirnoff import ForceField
from openff.toolkit.utils import RDKitToolkitWrapper

become

from openff.toolkit.app import (Molecule, ForceFIeld, RDKitToolkitWrapper)

I'm happy to discuss alternative names than app, that's just the first one that comes to mind since openmm uses it.

@mattwthompson
Copy link
Member

Without this PR, import openff.toolkit is more or less useless.

Agree, I use it for __version__ and __file__. My understanding, though, is that putting everything in the top-level __init__.py has the effect of importing everything whenever any import is hit, whereas currently importing just one class might not bring in everything. (Though, as you accurately point out, things are intertwined here in ways that make importing one thing already similar to importing everything.) I grant that ~90% of the use cases are going to pull into Molecule and ForceField. I agree with your assessment that import times of the other modules aren't so important. I actually didn't even remember there was a .constants module, but my memory is neither here nor there ...

importing Molecule apparently imports so much of the typing machinery that importing ForceField straight after is basically free

This is counter-intuitive to me, and looking around in the files openff/toolkit/topology/*/*.py gives me no clue as to how this happens. Personally I'd like to find a way to untangle these/lazy-load where possible, but I won't put effort into that if we go with an approach that puts everything in openff/toolkit/__init__.py. If I understand import paths accurately, this would be worth exploring after something like Jeff's openff.toolkit.app idea, if that's what is chosen.

In the interest of fairness, I used tuna to see why the imports are so incredibly slow, and was reminded it's because it's of Mendeleev bringing in Pandas and some other things:

image

This will be fixed when #1182 is merged, though I doubt it will affect the numbers enough to alter any conclusions. The hardware I use most often work is pretty old, also:

image

I'm not the ultimate reviewer of changes to the toolkit. I did, however, want to voice the general concern of import times here given that

  • This toolkit is likely to increase in size and complexity (LOC, functionality, dependencies, ...) over time
  • Most of the OpenFF stack will import something from the toolkit
  • Most workflows are going to import multiple other libraries, some of which are probably heavy themselves
  • Many users do star imports

As a matter of preference and sharing my biases, I'll add that

  • I prefer, all other things being equal, the ability to import a portion of a library without pulling in everything
  • I hate star imports
  • I think really long import times (i.e. 5+ seconds) are unsightly and I'm concerned projects like Interchange will have a hard time avoiding this on mid-range hardware
  • I am impatient and prone to context switching when a command like interpreter startup takes several seconds to run (okay, this is maybe a wetware issue)

@j-wags
Copy link
Member

j-wags commented Feb 10, 2022

@mattwthompson Would the openff.toolkit.app solution remove all the negatives from your point of view?

@mattwthompson
Copy link
Member

It does seem worth it. Provided we want a single, short-ish path that provides a the key classes and provided we avoid using it internally, I don't see substantial new negatives introduced and it leaves the door open to speeding up imports whenever we do tackle that.

I have no preference on app naming; I don't think it's a common pattern in Python libraries, or at least it's not something I see commonly enough to have other good points of reference.

I did some profiling after merging #1182, which shows some new issues. But they're out of scope for this PR and I split them out here: openforcefield/openff-units#17

@lgtm-com
Copy link

lgtm-com bot commented Feb 11, 2022

This pull request introduces 4 alerts when merging 4f4844f into 1115d25 - view on LGTM.com

new alerts:

  • 4 for Explicit export is not defined

@Yoshanuikabundi
Copy link
Collaborator Author

Yoshanuikabundi commented Feb 11, 2022

Cool! I've moved the new imports to the new openff.toolkit.app module. I think this is a good compromise. Should I add any toolkit wrappers to it?

I've also implemented a lazy importing system for openff.toolkit. It delays the actual import to when an object is asked for, but still shows it in tab completion and so forth. It uses a new mechanism added in Python 3.7 (PEP 562, note that lazy imports are one of the suggested use cases). I don't think its worth the added complexity and magic, but it only took a few minutes to whip up so I thought I'd point it out as a possibility. I'm expecting to remove it before merging (leaving openff/toolkit/__init__.py in the same state it was before this PR)

@lilyminium
Copy link
Collaborator

Should I add any toolkit wrappers to it?

If it doesn't make anything harder, I'd (as a user) love that. Thanks for this PR, it'll take away one of the serious pain points of doing anything with a ForceField!

I've also implemented a lazy importing system for openff.toolkit.

This is cool!

@Yoshanuikabundi
Copy link
Collaborator Author

I've added all 4 toolkit wrappers, the ToolkitRegistry class, and the GLOBAL_TOOLKIT_REGISTRY constant to the app module, as requested by @lilyminium

@lgtm-com
Copy link

lgtm-com bot commented Feb 15, 2022

This pull request introduces 4 alerts when merging ed96e05 into 1115d25 - view on LGTM.com

new alerts:

  • 4 for Explicit export is not defined

Copy link
Member

@j-wags j-wags left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow! I REALLY like the design for lazy loading in __init__.py, and it gives us everything that the openff.toolkit.app module would, with only a moderate tradeoff in terms of complexity.

@mattwthompson and @lilyminium - I've never futzed with import paths before like this. Does the __init__.py design look legit to you? @Yoshanuikabundi's code follows closely from the linked PEP so I have some confidence that it's pretty mainstream, but I'd like to get another set of eyes on it. If either of you can vouch for the correct-ish-ness of it, could you give an approving review?

Comment on lines 1 to 5
"""Re-exports of concrete ParameterTypes

openff.toolkit.typing.engines.smirnoff.parameters defines a number of parameter
types within class definitions of the corresponding ParameterHandler. This module
re-exports them for discoverability and ease of use."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(blocking) Per our discussion today, let's

  • delete this file
  • move the exporting of the ParameterTypes to parameters.py itself
  • include in exported ParameterTypes in the __all__ for parameters.py
  • Add a unit test to test_parameters that ensures that all ParameterType subclasses owned by ParameterHandler subclasses are exposed via re-export and are in __all__

VirtualSiteHandler.VirtualSiteMonovalentLonePairType
VirtualSiteHandler.VirtualSiteDivalentLonePairType
VirtualSiteHandler.VirtualSiteTrivalentLonePairType
ConstraintType
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 Thanks for catching this!

Comment on lines -44 to -46
TopologyAtom
TopologyBond
TopologyVirtualSite
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@mattwthompson
Copy link
Member

My glances through earlier versions of this PR didn't expose any obvious red flags but I will give this a more thorough review later today.

@j-wags
Copy link
Member

j-wags commented Feb 17, 2022

Thanks, @mattwthompson! And reading this morning, I realized that it may not have been clear what I'm asking about - The path forward I'm considering would be completely deleting openff/toolkit/app.py and instead letting people do from openff.toolkit import Molecule using the "lazy-loading" functionality from __init__.py. So I've reviewed the rest of the PR, I'd just like to see if anyone can provide a more certain vote of confidence/reliability for the new code in __init__.py.

Copy link
Member

@mattwthompson mattwthompson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets an approval from me! Assuming app.py is removed as per the proposal, though it's not a file that's imported anywhere so it should not have an impact on what I tinkered with. I also noticed there's a difference between what's in app.py and what's currently lazy-loaded in openff/toolkit/__init__.py. I have no preferences but wanted to point this out in case from openff.toolkit import RDKitToolkitWrapper was expected to work.

The import machinery looks good to me. We did the same thing in #1021 (and removed in #1156) when we wanted to have module-level exceptions but have them only imported when explicitly imported. We didn't call this lazy-loading at the time, but I think that's the definition.

The behavior does what I'd expect:

In [1]: from openff.toolkit import Molecule, Topology

In [2]: locals().keys()
Out[2]: dict_keys(['__name__', '__doc__', '__package__', '__loader__', '__spec__', '__builtin__', '__builtins__', '_ih', '_oh', '_dh', 'In', 'Out', 'get_ipython', 'exit', 'quit', '_', '__', '___', '_i', '_ii', '_iii', '_i1', 'OE_1996539322038294600', 'OE_3449561020318279929', 'OE_3534227101456629665', 'OE_7544655256171905223', 'OE_14481470454631586240', 'OE_2886910160464464284', 'OE_12830519133454273888', 'OE_7146837923505175476', 'OE_1333624159125614004', 'OE_7190599916362654688', 'OE_18194288045002224399', 'OE_2920715238402669782', 'OE_10862347053494700408', 'OE_1152104254218024490', 'OE_12407415621277732493', 'OE_2699086864785953591', 'OE_13975590391927931629', 'Molecule', 'Topology', '_i2'])

In [3]: from openff.toolkit import ForceField

In [4]: locals().keys()
Out[4]: dict_keys(['__name__', '__doc__', '__package__', '__loader__', '__spec__', '__builtin__', '__builtins__', '_ih', '_oh', '_dh', 'In', 'Out', 'get_ipython', 'exit', 'quit', '_', '__', '___', '_i', '_ii', '_iii', '_i1', 'OE_1996539322038294600', 'OE_3449561020318279929', 'OE_3534227101456629665', 'OE_7544655256171905223', 'OE_14481470454631586240', 'OE_2886910160464464284', 'OE_12830519133454273888', 'OE_7146837923505175476', 'OE_1333624159125614004', 'OE_7190599916362654688', 'OE_18194288045002224399', 'OE_2920715238402669782', 'OE_10862347053494700408', 'OE_1152104254218024490', 'OE_12407415621277732493', 'OE_2699086864785953591', 'OE_13975590391927931629', 'Molecule', 'Topology', '_i2', '_2', '_i3', 'ForceField', '_i4'])

In [5]: from openff.toolkit import Foo
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-5-84c3b7d3429c> in <module>
----> 1 from openff.toolkit import Foo

ImportError: cannot import name 'Foo' from 'openff.toolkit' (/Users/mwt/software/openforcefield/openff/toolkit/__init__.py)

The timings are ultimately not good, but they're not worse, so it's a fair tradeoff. My SSD is in poor shape (will be replaced soon!), so these are slower times than I'd normally expect.

git checkout upstream/topology-biopolymer-refactor
hyperfine --min-runs 20 'python -c "from openff.toolkit.typing.engines.smirnoff.forcefield import ForceField"'
hyperfine --min-runs 20 'python -c "from openff.toolkit.topology import Molecule"'
hyperfine --min-runs 20 'python -c "from openff.toolkit.topology import Topology"'
git checkout upstream/desire_paths
hyperfine --min-runs 20 'python -c "from openff.toolkit import ForceField"'
hyperfine --min-runs 20 'python -c "from openff.toolkit import Molecule"'
hyperfine --min-runs 20 'python -c "from openff.toolkit import Topology"'
Previous HEAD position was ed96e05c Add toolkits to app module
HEAD is now at 1115d256 Refactor to openff.units.elements (#1182)
Benchmark 1: python -c "from openff.toolkit.typing.engines.smirnoff.forcefield import ForceField"
  Time (mean ± σ):      2.506 s ±  0.197 s    [User: 1.948 s, System: 0.628 s]
  Range (min … max):    2.294 s …  2.938 s    20 runs

Benchmark 1: python -c "from openff.toolkit.topology import Molecule"
  Time (mean ± σ):      2.335 s ±  0.067 s    [User: 1.938 s, System: 0.609 s]
  Range (min … max):    2.237 s …  2.484 s    20 runs

Benchmark 1: python -c "from openff.toolkit.topology import Topology"
  Time (mean ± σ):      2.344 s ±  0.060 s    [User: 1.953 s, System: 0.615 s]
  Range (min … max):    2.257 s …  2.446 s    20 runs

Previous HEAD position was 1115d256 Refactor to openff.units.elements (#1182)
HEAD is now at ed96e05c Add toolkits to app module
Benchmark 1: python -c "from openff.toolkit import ForceField"
  Time (mean ± σ):      2.494 s ±  0.183 s    [User: 2.015 s, System: 0.646 s]
  Range (min … max):    2.288 s …  2.873 s    20 runs

Benchmark 1: python -c "from openff.toolkit import Molecule"
  Time (mean ± σ):      2.543 s ±  0.256 s    [User: 2.013 s, System: 0.645 s]
  Range (min … max):    2.251 s …  3.181 s    20 runs

Benchmark 1: python -c "from openff.toolkit import Topology"
  Time (mean ± σ):      2.637 s ±  0.432 s    [User: 2.017 s, System: 0.658 s]
  Range (min … max):    2.282 s …  3.894 s    20 runs

This is probably due to the Pandas issue I linked, so I removed it from my environment and got slightly better tiimes:

Benchmark 1: python -c "from openff.toolkit.typing.engines.smirnoff.forcefield import ForceField"
  Time (mean ± σ):      2.403 s ±  0.382 s    [User: 1.648 s, System: 0.606 s]
  Range (min … max):    1.907 s …  3.740 s    20 runs

Benchmark 1: python -c "from openff.toolkit.topology import Molecule"
  Time (mean ± σ):      1.992 s ±  0.135 s    [User: 1.552 s, System: 0.520 s]
  Range (min … max):    1.842 s …  2.385 s    20 runs

Benchmark 1: python -c "from openff.toolkit.topology import Topology"
  Time (mean ± σ):      1.904 s ±  0.055 s    [User: 1.543 s, System: 0.514 s]
  Range (min … max):    1.821 s …  1.992 s    20 runs

Previous HEAD position was 1115d256 Refactor to openff.units.elements (#1182)
HEAD is now at ed96e05c Add toolkits to app module
Benchmark 1: python -c "from openff.toolkit import ForceField"
  Time (mean ± σ):      1.979 s ±  0.109 s    [User: 1.549 s, System: 0.515 s]
  Range (min … max):    1.867 s …  2.166 s    20 runs

Benchmark 1: python -c "from openff.toolkit import Molecule"
  Time (mean ± σ):      1.941 s ±  0.110 s    [User: 1.546 s, System: 0.518 s]
  Range (min … max):    1.838 s …  2.203 s    20 runs

Benchmark 1: python -c "from openff.toolkit import Topology"
  Time (mean ± σ):      1.908 s ±  0.077 s    [User: 1.540 s, System: 0.514 s]
  Range (min … max):    1.804 s …  2.128 s    20 runs

Basically Molecule and Topology take identical amounts of time but ForceField is faster, which is great. I dug around a bit and wasn't able to figure out why. I think it has something to do with the other things snuck along the way in the long import paths (star imports and/or __all__ definitions) but I can't say for sure. There's some more work to do here but I'm getting way off track here and digging into non-blockers.

@lgtm-com
Copy link

lgtm-com bot commented Feb 21, 2022

This pull request introduces 10 alerts when merging 88bbb7e into 0e18422 - view on LGTM.com

new alerts:

  • 10 for Explicit export is not defined

@lgtm-com
Copy link

lgtm-com bot commented Feb 21, 2022

This pull request introduces 10 alerts when merging 9080c6f into 0e18422 - view on LGTM.com

new alerts:

  • 10 for Explicit export is not defined

@Yoshanuikabundi
Copy link
Collaborator Author

Yoshanuikabundi commented Feb 21, 2022

I have removed the app module and added the toolkit re-exports to openff.toolkit (thanks for the reminder Matt!)

I've also moved the ParameterType re-exports to the bottom of parameters.py and written a test that any new ParameterTypes are re-exported in the same way. This test would fail, as there is an undocumented and un-re-exported ParameterType called VirtualSiteType. This class is marked as an abstract base class (though it has no abstract methods) and has a number of subclasses (VirtualSiteMonovalentLonePairType, VirtualSiteDivalentLonePairType etc), all of which are re-exported.

I didn't re-export this class because I think it's essentially an implementation detail, but it's not clear to me how to exclude it from the test. I see a number of options:

  1. Have the test exclude abstract base classes, and make it an ABC by adding an abstract method (I've implemented this one)
  2. Make it private by renaming it _VirtualSiteType (the test must exclude private classes to avoid complaining about _INFOTYPE a lot)
  3. Re-export it
  4. Add a specific exception to the test (like if paramtype is VirtualSiteType: continue)

I chose 1 because the class is currently in a weird place, being marked as an ABC but not having any abstract methods. Without any abstract methods, Python considers it to be a concrete class, so inheriting from abc.ABC doesn't really do anything productive. It seems like VirtualSiteType._add_virtual_site was once an abstract method called add_virtual_site, but was at some point converted to a private method, but the git blame shows this is not the case. All the subclasses implement add_virtual_site with the same signature and docstring, which seems like a clear case of a missing abstract method declaration, so I've added the declaration.

Is VirtualSiteType intended to be used by users for implementing their own VirtualSites? If so (3) is probably the best solution. If not, (1) or (2) probably is. In any case, I would like to keep the abstract method (and possibly remove the exception for ABCs from the test), assuming it doesn't break anything in tests.

Background: VirtualSiteType was made an ABC here, but no methods were marked as abstract. This happened after the base class _?add_virtual_site was made private.
ABCs are supposed to allow base classes to be declared that cannot be instantiated on their own, but that can be used for checking that subclasses inherit from them. An ABC can then implement abstract methods that subclasses must implement, providing a way of declaring a generic interface as an alternative to duck typing. I love this because I love type systems, but Python's implementation is a mess. The docs for the abc.ABC class say:

A helper class that has ABCMeta as its metaclass. With this class, an abstract base class can be created by simply deriving from ABC avoiding sometimes confusing metaclass usage, for example

And then gives an example that declares no abstract methods. However, the PEP says

Implementation: The @AbstractMethod decorator sets the function attribute isabstractmethod to the value True. The ABCMeta.new method computes the type attribute abstractmethods as the set of all method names that have an isabstractmethod attribute whose value is true. It does this by combining the abstractmethods attributes of the base classes, adding the names of all methods in the new class dict that have a true isabstractmethod attribute, and removing the names of all methods in the new class dict that don't have a true isabstractmethod attribute. If the resulting abstractmethods set is non-empty, the class is considered abstract, and attempts to instantiate it will raise TypeError.

In other words, simply inheriting from ABC or using the ABCMeta metaclass is not sufficient to create an ABC; you must also declare an abstract method. I think it's possible for someone to follow the Python docs and add the ABC parent class but no abstract methods expecting this to make the class abstract.

@lgtm-com
Copy link

lgtm-com bot commented Feb 21, 2022

This pull request introduces 10 alerts when merging 58cde36 into 0e18422 - view on LGTM.com

new alerts:

  • 10 for Explicit export is not defined

@mattwthompson
Copy link
Member

Is VirtualSiteType intended to be used by users for implementing their own VirtualSites?

Not really, the unspoken recommendation here is for users to specify their parameters in OFFXML and let the toolkit handle everything. Unless you're asking about implementing an entire new type (as in variety/flavor/version) of virtual sites with a new handler. That would be more or less uncharted territory and I'm personally fine with the user experience there being ambiguous and not well-supported.

I think the answer is "no" and making them proper abstract classes is the way to go. To be honest I also think that this corner of the codebase is so rough that any changes not making it substantially worse is good, but fixing it might be best split out into another PR so the original focus of this PR can be implemented without needed to make decisions on how the virtual site classes should be improved.

@Yoshanuikabundi
Copy link
Collaborator Author

Sounds good. Shall I merge this as-is?

@mattwthompson
Copy link
Member

Earlier I didn't see that Jeff's comment earlier defers to me as a sufficient reviewer - now I do, and yes, go ahead and merge.

I know I put up some resistance here but I am looking forward to typing fewer characters at the top of every script!

@Yoshanuikabundi
Copy link
Collaborator Author

Yoshanuikabundi commented Feb 22, 2022

No worries, your input definitely made this PR better!!

@Yoshanuikabundi Yoshanuikabundi merged commit c95642b into topology-biopolymer-refactor Feb 22, 2022
@Yoshanuikabundi Yoshanuikabundi deleted the desire_paths branch February 22, 2022 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants