Skip to content

Commit c6253c8

Browse files
authored
Cython perf simplification (#861)
* complete rewrite of cython files This was not really intended. But the amount of different codes that were not using fused tyes was annoying and prohibited further development. I initially tried to implement pure-python mode codes. However, it seems to be impossible to do pure-python mode and cimport it into another pure-python mode code. Before this will be merged, I need to benchmark the changes made. The main change is that the fused datatypes are omnipresent and that the ndarray data type has been removed almost completely. So now, the memoryviews are prevalent. This means that it is much simpler to track problems. There is no code duplication in the spin-box calculations. I am not sure whether the python-yellow annotations are only for the int/long variables, and when down-casting. In any case, I think this is more easy to manage. Signed-off-by: Nick Papior <[email protected]> * redid fold_csr_matrix for much better perf Simple benchmarks showed that siesta Hamiltonians to Hk can be much faster by changing how the folding of the matrices are done. Instead of incrementally adding elements, and searching for duplicates before each addition of elements, we know built the entire array, and use numpy.unique to reduce the actual array. This leverages the numpy unique function which already returns a sorted array. It marginally slows down csr creation of matrices with few edges per orbital (TB models). But will be much faster for larger models stemming from DFT or the likes. Tests for this commit: %timeit H.Hk() %timeit H.Hk([0.1] * 3) %timeit H.Hk(format="array") %timeit H.Hk([0.1] * 3, format="array") For a *many* edge system, we get: 67.2 ms ± 1.51 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 85.4 ms ± 8.81 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 5.59 ms ± 426 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 11.3 ms ± 39.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) While for a *few* edge system, we get: 9.1 ms ± 52.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 9.25 ms ± 65.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 5.75 ms ± 397 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 6.17 ms ± 394 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) For commit v0.15.1-57-g6bbbde39 we get: 196 ms ± 3.01 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 214 ms ± 1.87 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 6.58 ms ± 139 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 12.8 ms ± 58.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) and 7.41 ms ± 77.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 7.37 ms ± 73.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 6.04 ms ± 383 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 5.81 ms ± 37 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Enabling fortran is necessary for it to populate details of the fortran world. This should enable simpler access to data Signed-off-by: Nick Papior <[email protected]> * fixed handling complex matrices in sisl Lots of the code were built for floats. This now fixes the issue of reading/writing sparse matrices in specific data-formats. It allows a more natural way of handling SOC matrices, with complex interplay, say: H[0, 0, 2] = Hud as a complex variable, when dealing with floats one needs to do this: H[0, 0, 2] = Hud.real H[0, 0, 3] = Hud.imag which is not super-intuitive. Currently there are still *many* hardcodings of the indices. And we should strive to move these into a common framework to limit the problems it creates. Tests has been added that checks Hamiltonian eigenvalues and Density matrices mulliken charges. So it seems it works as intended, but not everything has been fully tested. * added explanation of how transform works * added more tests for dtype conversion fixed errors in col creation of dtypes when calling csr.diags() Ensured that sparsegeometry.finalize accepts any arguments the csr.finalize accepts. Removed casting errors in mat2dtype. This may *hide* potential problems when there are non-zero imaginary parts. We should perhaps later revisit the problem considering that TB + Peierls has complex overlap matrices. Fixed issue when a construct was called with Python intrinsic complex values. --------- Signed-off-by: Nick Papior <[email protected]>
1 parent 8559eda commit c6253c8

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+3513
-4533
lines changed

CHANGELOG.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,14 @@ we hit release version 1.0.0.
1616
sisl.geom.graphene
1717

1818
### Fixed
19-
2019
- `projection` arguments of several functions has been streamlined
2120

21+
### Changed
22+
- internal Cython code for performance improvements.
23+
This yield significant perf. improvements for DFT sparse matrices
24+
with *many* edges in the sparse matrix, but a perf. hit for very
25+
small TB matrices.
26+
2227

2328
## [0.15.2] - 2024-11-06
2429

CMakeLists.txt

Lines changed: 14 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,6 @@ add_compile_definitions(CYTHON_NO_PYINIT_EXPORT=1)
6565
#: lib, perhaps we should change this
6666
set(CMAKE_SHARED_MODULE_PREFIX "")
6767

68-
6968
# Determine whether we are in CIBUILDWHEEL
7069
# and whether we are building for the universal target
7170
set(_def_fortran TRUE)
@@ -81,6 +80,8 @@ option(WITH_FORTRAN
8180

8281
# Define all options for the user
8382
if( WITH_FORTRAN )
83+
enable_language(Fortran)
84+
8485
set(F2PY_REPORT_ON_ARRAY_COPY 10
8586
CACHE STRING
8687
"The minimum (element) size of arrays before warning about copies")
@@ -209,6 +210,18 @@ if(WITH_FORTRAN)
209210
endif(WITH_FORTRAN)
210211

211212

213+
message(STATUS "Python variables:")
214+
list(APPEND CMAKE_MESSAGE_INDENT " ")
215+
216+
cmake_print_variables(Python_INCLUDE_DIRS)
217+
cmake_print_variables(Python_NumPy_INCLUDE_DIRS)
218+
if(WITH_FORTRAN)
219+
cmake_print_variables(Python_NumPy_F2Py_INCLUDE_DIR)
220+
endif()
221+
222+
list(POP_BACK CMAKE_MESSAGE_INDENT)
223+
224+
212225
message(STATUS "sisl options")
213226
list(APPEND CMAKE_MESSAGE_INDENT " ")
214227

@@ -230,18 +243,6 @@ endif()
230243
list(POP_BACK CMAKE_MESSAGE_INDENT)
231244

232245

233-
message(STATUS "Python variables:")
234-
list(APPEND CMAKE_MESSAGE_INDENT " ")
235-
236-
cmake_print_variables(Python_INCLUDE_DIRS)
237-
cmake_print_variables(Python_NumPy_INCLUDE_DIRS)
238-
if(WITH_FORTRAN)
239-
cmake_print_variables(Python_NumPy_F2Py_INCLUDE_DIR)
240-
endif()
241-
242-
list(POP_BACK CMAKE_MESSAGE_INDENT)
243-
244-
245246

246247
# Return in _result whether the _file should be built, or not
247248
# It checks whether the file is present in the NO_COMPILATION
Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"Here we test and check the performance of the `Hk` implementation."
8+
]
9+
},
10+
{
11+
"cell_type": "code",
12+
"execution_count": null,
13+
"metadata": {},
14+
"outputs": [],
15+
"source": [
16+
"import os\n",
17+
"from pathlib import Path\n",
18+
"import numpy as np\n",
19+
"import sisl as si\n",
20+
"\n",
21+
"files = Path(os.environ[\"SISL_FILES_TESTS\"])\n",
22+
"siesta = files / \"siesta\"\n",
23+
"\n",
24+
"N = 10"
25+
]
26+
},
27+
{
28+
"cell_type": "code",
29+
"execution_count": null,
30+
"metadata": {},
31+
"outputs": [],
32+
"source": [
33+
"H = si.Hamiltonian.read(siesta / \"Si_pdos_k\" / \"Si_pdos.TSHS\").tile(N, 0).tile(N, 1)\n",
34+
"\n",
35+
"%timeit H.Hk()\n",
36+
"%timeit H.Hk([0.1] * 3)\n",
37+
"%timeit H.Hk(format=\"array\")\n",
38+
"%timeit H.Hk([0.1] * 3, format=\"array\")"
39+
]
40+
},
41+
{
42+
"cell_type": "code",
43+
"execution_count": null,
44+
"metadata": {},
45+
"outputs": [],
46+
"source": [
47+
"H = si.Hamiltonian.read(siesta / \"Pt2_soc\" / \"Pt2_xx.TSHS\").tile(N, 0).tile(N // 2, 1)\n",
48+
"\n",
49+
"%timeit H.Hk()\n",
50+
"%timeit H.Hk([0.1] * 3)\n",
51+
"%timeit H.Hk(format=\"array\")\n",
52+
"%timeit H.Hk([0.1] * 3, format=\"array\")"
53+
]
54+
}
55+
],
56+
"metadata": {
57+
"kernelspec": {
58+
"display_name": "Python 3 (ipykernel)",
59+
"language": "python",
60+
"name": "python3"
61+
},
62+
"language_info": {
63+
"codemirror_mode": {
64+
"name": "ipython",
65+
"version": 3
66+
},
67+
"file_extension": ".py",
68+
"mimetype": "text/x-python",
69+
"name": "python",
70+
"nbconvert_exporter": "python",
71+
"pygments_lexer": "ipython3",
72+
"version": "3.11.7"
73+
}
74+
},
75+
"nbformat": 4,
76+
"nbformat_minor": 4
77+
}

benchmarks/run.sh

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,5 @@ profile=$base.profile
1515
# Stats
1616
stats=$base.stats
1717

18-
python -m cProfile -o $profile $script $@
19-
python stats.py $profile > $stats
20-
18+
python3 -m cProfile -o $profile $script $@
19+
python3 stats.py $profile > $stats

benchmarks/run3.sh

Lines changed: 0 additions & 20 deletions
This file was deleted.

src/sisl/CMakeLists.txt

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,9 @@
1+
set_property(DIRECTORY
2+
APPEND
3+
PROPERTY INCLUDE_DIRECTORIES
4+
${CMAKE_CURRENT_SOURCE_DIR}/_core
5+
)
6+
17
foreach(source _indices _math_small)
28
add_cython_library(
39
SOURCE ${source}.pyx
@@ -29,6 +35,7 @@ endforeach()
2935
get_directory_property( SISL_DEFINITIONS DIRECTORY
3036
${CMAKE_CURRENT_SOURCE_DIR}
3137
COMPILE_DEFINITIONS )
38+
3239
# Join to stringify list
3340
list(JOIN SISL_DEFINITIONS " " SISL_DEFINITIONS)
3441

src/sisl/__init__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,8 @@
8888
# import the common options used
8989
from ._common import *
9090

91+
from ._core import *
92+
9193
# Import warning classes
9294
# We currently do not import warn and info
9395
# as they are too generic names in case one does from sisl import *
@@ -106,8 +108,6 @@
106108
# Below are sisl-specific imports
107109
from .shape import *
108110

109-
from ._core import *
110-
111111
# Physical quantities and required classes
112112
from .physics import *
113113

src/sisl/_core/CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
foreach(source _lattice _sparse)
1+
foreach(source _lattice _dtypes _sparse)
22
add_cython_library(
33
SOURCE ${source}.pyx
44
LIBRARY ${source}

src/sisl/_core/_dtypes.pxd

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
"""
2+
Shared header for fused dtypes
3+
"""
4+
cimport cython
5+
6+
import numpy as np
7+
8+
cimport numpy as cnp
9+
from numpy cimport (
10+
complex64_t,
11+
complex128_t,
12+
float32_t,
13+
float64_t,
14+
int8_t,
15+
int16_t,
16+
int32_t,
17+
int64_t,
18+
uint8_t,
19+
uint16_t,
20+
uint32_t,
21+
uint64_t,
22+
)
23+
24+
# Generic typedefs for sisl internal naming convention
25+
ctypedef size_t size_st
26+
ctypedef Py_ssize_t ssize_st
27+
28+
29+
ctypedef fused ints_st:
30+
int
31+
long
32+
33+
34+
ctypedef fused floats_st:
35+
float
36+
double
37+
38+
39+
ctypedef fused complexs_st:
40+
float complex
41+
double complex
42+
43+
44+
ctypedef fused floatcomplexs_st:
45+
float
46+
double
47+
float complex
48+
double complex
49+
50+
51+
# We need this fused data-type to omit complex data-types
52+
ctypedef fused reals_st:
53+
int
54+
long
55+
float
56+
double
57+
58+
ctypedef fused numerics_st:
59+
int
60+
long
61+
float
62+
double
63+
float complex
64+
double complex
65+
66+
ctypedef fused _type2dtype_types_st:
67+
short
68+
int
69+
long
70+
float
71+
double
72+
float complex
73+
double complex
74+
float32_t
75+
float64_t
76+
#complex64_t # not usable...
77+
#complex128_t
78+
int8_t
79+
int16_t
80+
int32_t
81+
int64_t
82+
uint8_t
83+
uint16_t
84+
uint32_t
85+
uint64_t
86+
87+
88+
cdef object type2dtype(const _type2dtype_types_st v)
89+
90+
91+
ctypedef fused _inline_sum_st:
92+
short
93+
int
94+
long
95+
int16_t
96+
int32_t
97+
int64_t
98+
uint16_t
99+
uint32_t
100+
uint64_t
101+
102+
cdef ssize_st inline_sum(const _inline_sum_st[::1] array) noexcept nogil

0 commit comments

Comments
 (0)