-
-
Notifications
You must be signed in to change notification settings - Fork 524
ENH: use limited C API to produce abi3 wheels for with-GIL interpreters in release builds #828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
After fixing the benchmarks in gh-835, I got around to running some benchmarks. It looks like for very small image sizes, there is quite a lot of overhead. For operations that take about 1 ms or more, the difference seems very small. Example: The benchmarks are old and seem to be written for fast runtime rather than being realistic - I wouldn't expect a size 16 or 16x16 transform to be useful. For images, I'd say 256x256 to 4096x4096 would be most relevant, maybe 128x128 for too. |
a025157 to
ddbcd62
Compare
|
More worryingly, the new That should be in the first test in Error annotation: |
This will build an `abi3` wheel per platform, which can be used for multiple (with-GIL) Python interpreter versions.
ddbcd62 to
467212d
Compare
I haven't been able to reproduce that, however it would be explained by the single issue that UBSan flagged (now fixed), see gh-836. |
|
The 1-D class DwtSpanLengthsTimeSuite:
params = ([16, 101, 256, 1024, 4096, 16384, 65536],
['haar', 'sym8'])
param_names = ('n', 'wavelet')
def setup(self, n, wavelet):
self.data = np.ones(n, dtype=np.float64)
def time_dwt(self, n, wavelet):
pywt.dwt(self.data, wavelet, Modes.symmetric)Result: The conclusion is similar though: really small data sees larger regressions, while from 500 us or so it doesn't matter too much. 1-D transforms are just faster, so the relative decrease in performance is larger. We may just want to keep this for testing, or decide that the reduction in wheel builds is worth it and do PyPI releases with @grlee77 do you have any thoughts here on how much performance matters for small data and operations that are <1 ms? |
The default for a local build is still to produce
cp3xxwheels; when passing a build flag it is now possible to opt into producing wheels for the Stable ABI. This reduces the number of wheels to build from 5 to 3 per platform.TODO: benchmarking to ensure we don't lose a significant amount of performance.