-
-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cython perf simplification #861
Conversation
@pfebrer you won't like this... Sadly, pure-python mode does not allow cimport statements. This code-restructuring is a first step to reduce the amount of Cython code. |
@@ -88,13 +88,16 @@ | |||
# import the common options used | |||
from ._common import * | |||
|
|||
from ._core import * |
Check notice
Code scanning / CodeQL
'import *' may pollute namespace Note
sisl._core
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #861 +/- ##
==========================================
- Coverage 86.98% 86.97% -0.01%
==========================================
Files 404 405 +1
Lines 52638 52792 +154
==========================================
+ Hits 45785 45916 +131
- Misses 6853 6876 +23 ☔ View full report in Codecov by Sentry. |
23ab3b7
to
2a5ec18
Compare
It does, see here: sisl/src/sisl/geom/_neighbors/_operations.py Lines 8 to 11 in 6bd1395
Do you mean that it doesn't allow to |
True, I misunderstood this here... You're right, but for this to work for sisl intrinsics, you would have to duplicate interfaces in See here: http://docs.cython.org/en/latest/src/tutorial/pure.html#cimports |
Well, of course it depends on how complex things are, but you could always mock the But apart from files running out of the box in normal python (which is nice), for me it is a question of readability and the feeling that the code can be ported to something else much more easily if there is a better accelerating framework than Not saying we should translate everything to pure python, just giving some arguments in favour of the pure python syntax :) |
5014eac
to
295c413
Compare
@@ -98,45 +99,177 @@ | |||
csr.translate_columns(col_from, col_to) | |||
|
|||
|
|||
def _mat_spin_convert(M, spin=None): | |||
def _mat2dtype(M, dtype: np.dtype) -> None: |
Check notice
Code scanning / CodeQL
Explicit returns mixed with implicit (fall through) returns Note
f16ea89
to
a9a5f0f
Compare
a9a5f0f
to
60f3e22
Compare
This was not really intended. But the amount of different codes that were not using fused tyes was annoying and prohibited further development. I initially tried to implement pure-python mode codes. However, it seems to be impossible to do pure-python mode and cimport it into another pure-python mode code. Before this will be merged, I need to benchmark the changes made. I *hope* it is faster, but I am not sure anything happened. The main change is that the fused datatypes are omnipresent and that the ndarray data type has been removed almost completely. So now, the memoryviews are prevalent. Signed-off-by: Nick Papior <[email protected]>
This means that it is much simpler to track problems. There is no code duplication in the spin-box calculations. I am not sure whether the python-yellow annotations are only for the int/long variables, and when down-casting. In any case, I think this is more easy to manage. Signed-off-by: Nick Papior <[email protected]>
Simple benchmarks showed that siesta Hamiltonians to Hk can be much faster by changing how the folding of the matrices are done. Instead of incrementally adding elements, and searching for duplicates before each addition of elements, we know built the entire array, and use numpy.unique to reduce the actual array. This leverages the numpy unique function which already returns a sorted array. It marginally slows down csr creation of matrices with few edges per orbital (TB models). But will be much faster for larger models stemming from DFT or the likes. Tests for this commit: %timeit H.Hk() %timeit H.Hk([0.1] * 3) %timeit H.Hk(format="array") %timeit H.Hk([0.1] * 3, format="array") For a *many* edge system, we get: 67.2 ms ± 1.51 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 85.4 ms ± 8.81 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 5.59 ms ± 426 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 11.3 ms ± 39.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) While for a *few* edge system, we get: 9.1 ms ± 52.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 9.25 ms ± 65.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 5.75 ms ± 397 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 6.17 ms ± 394 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) For commit v0.15.1-57-g6bbbde39 we get: 196 ms ± 3.01 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 214 ms ± 1.87 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 6.58 ms ± 139 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 12.8 ms ± 58.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) and 7.41 ms ± 77.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 7.37 ms ± 73.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 6.04 ms ± 383 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 5.81 ms ± 37 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Signed-off-by: Nick Papior <[email protected]>
Enabling fortran is necessary for it to populate details of the fortran world. Signed-off-by: Nick Papior <[email protected]>
Signed-off-by: Nick Papior <[email protected]>
This should enable simpler access to data Signed-off-by: Nick Papior <[email protected]>
Lots of the code were built for floats. This now fixes the issue of reading/writing sparse matrices in specific data-formats. It allows a more natural way of handling SOC matrices, with complex interplay, say: H[0, 0, 2] = Hud as a complex variable, when dealing with floats one needs to do this: H[0, 0, 2] = Hud.real H[0, 0, 3] = Hud.imag which is not super-intuitive. Currently there are still *many* hardcodings of the indices. And we should strive to move these into a common framework to limit the problems it creates. Tests has been added that checks Hamiltonian eigenvalues and Density matrices mulliken charges. So it seems it works as intended, but not everything has been fully tested. Signed-off-by: Nick Papior <[email protected]>
Signed-off-by: Nick Papior <[email protected]>
Signed-off-by: Nick Papior <[email protected]>
Signed-off-by: Nick Papior <[email protected]>
Signed-off-by: Nick Papior <[email protected]>
Signed-off-by: Nick Papior <[email protected]>
e23989a
to
f6df3fb
Compare
fixed errors in col creation of dtypes when calling csr.diags() Ensured that sparsegeometry.finalize accepts any arguments the csr.finalize accepts. Removed casting errors in mat2dtype. This may *hide* potential problems when there are non-zero imaginary parts. We should perhaps later revisit the problem considering that TB + Peierls has complex overlap matrices. Fixed issue when a construct was called with Python intrinsic complex values. Signed-off-by: Nick Papior <[email protected]>
Signed-off-by: Nick Papior <[email protected]>
docs/
CHANGELOG.md
Here are some details that needs to be sorted out before merging:
The idea is that instead of 8 numbers, we can now do 4,
[uu, dd, ud, du]
, this has the problem ofoverlap being stored as a complex number, but that might be ok.