Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add P7/8/9/10 divide/modulo doubleword operations to libraries. #188

Merged
merged 1 commit into from
Feb 16, 2024

Conversation

munroesj52
Copy link
Contributor

POWER10 added vector divide/divide-extended/modulo instructions. This patch adds static and IFUNC enabled dynamic callable functions to the PVECLIB executable library. This is restricted to the subset of operations that have significant code size or runtime. This is basically the divide/modulo operations; vec_divdud, vec_diveud, vec_divqud, vec_divud, vec_moddud, vec_modud.

* configure: Autoreconf regenerated.
* Makefile.in: Automake regenerated.
* aclocal.m4: Automake regenerated.
* src/Makefile.am: Add vec_int64_runtime.c to dependency for PWR library builds. Add Automake regenerated.
* src/Makefile.in: Automake regenerated.

* src/pveclib/vec_int64_ppc.h: Update copyright dates. Various doxygen text updates for POWER10. (vec_mrgahd, vec_mrgald): New Forward function prototypes. (vec_divdud, vec_diveud, vec_divqud): New extern for dynamic library functions. (vec_divqud_inline): Doxygen text note about usage in quadword long division implementations. (vec_divqud_inline [(_ARCH_PWR10) && (__GNUC__ >= 12)]): POWER10 specific implementation using intrinsics. (vec_divud, vec_moddud, vec_modud): New extern for dynamic library functions.

* src/testsuite/arith128_test_i64.c [__VEC_PWR_IMP()]: Define extern and map unit tests to library implementations.
* src/testsuite/vec_int64_dummy.c: Update copyright dates. (test_vec_divqud [(_ARCH_PWR10) && (__GNUC__ >= 12)]): POWER10 specific implementation using intrinsics.
* src/testsuite/vec_perf_i128.c [__VEC_PWR_IMP()]: Define extern and map timed tests to library implementations.
* src/testsuite/vec_pwr10_dummy.c (test_vec_rlqi_64_PWR10, test_vec_slqi_64_PWR10, test_vec_sraqi_64_PWR10, test_vec_srqi_64_PWR10): New compile tests for doubleword shift/rotates special case. (test_vec_divqud_PWR10): New compile test. POWER10 specific implementation using intrinsics.

* src/vec_int64_runtime.c: New file.
* src/vec_runtime_DYN.c: Include <pveclib/vec_int64_ppc.h>. [VEC_INT64_LIB_LIST(_TARGET)]: Define list of int64 target specific functions. [PVECLIB_DISABLE_POWER7]: Add VEC_INT64_LIB_LIST (_PWR7). Add VEC_INT64_LIB_LIST (_PWR8). [PVECLIB_DISABLE_POWER9]: Add VEC_INT64_LIB_LIST (_PWR9). [PVECLIB_DISABLE_POWER10]: Add VEC_INT64_LIB_LIST (_PWR10). [VEC_RESOLVER_2]: vec_diveud, vec_divqud, vec_divud, vec_modud. [VEC_RESOLVER_3]: vec_divdud, vec_moddud.
* src/vec_runtime_PWR10.c: Include vec_int64_runtime.c.
* src/vec_runtime_PWR7.c: Include vec_int64_runtime.c.
* src/vec_runtime_PWR8.c: Include vec_int64_runtime.c.
* src/vec_runtime_PWR9.c: Include vec_int64_runtime.c.

POWER10 added vector divide/divide-extended/modulo instructions.
This patch adds static and IFUNC enabled dynamic callable functions
to the PVECLIB executable library. This is restricted to the subset
of operations that have significant code size or runtime.
This is basically the divide/modulo operations; vec_divdud, vec_diveud,
vec_divqud, vec_divud, vec_moddud, vec_modud.

	* configure: Autoreconf regenerated.
	* Makefile.in: Automake regenerated.
	* aclocal.m4: Automake regenerated.
	* src/Makefile.am: Add vec_int64_runtime.c to dependency for PWR
	library builds. Add Automake regenerated.
	* src/Makefile.in: Automake regenerated.

	* src/pveclib/vec_int64_ppc.h: Update copyright dates.
	Various doxygen text updates for POWER10.
	(vec_mrgahd, vec_mrgald): New Forward function prototypes.
	(vec_divdud, vec_diveud, vec_divqud):
	New extern for dynamic library functions.
	(vec_divqud_inline): Doxygen text note about usage in quadword
	long division implementations.
	(vec_divqud_inline [(_ARCH_PWR10) && (__GNUC__ >= 12)]):
	POWER10 specific implementation using intrinsics.
	(vec_divud, vec_moddud, vec_modud):
	New extern for dynamic library functions.

	* src/testsuite/arith128_test_i64.c [__VEC_PWR_IMP()]:
	Define extern and map unit tests to library implementations.
	* src/testsuite/vec_int64_dummy.c: Update copyright dates.
	(test_vec_divqud [(_ARCH_PWR10) && (__GNUC__ >= 12)]):
	POWER10 specific implementation using intrinsics.
	* src/testsuite/vec_perf_i128.c [__VEC_PWR_IMP()]:
	Define extern and map timed tests to library implementations.
	* src/testsuite/vec_pwr10_dummy.c
	(test_vec_rlqi_64_PWR10, test_vec_slqi_64_PWR10,
	test_vec_sraqi_64_PWR10, test_vec_srqi_64_PWR10):
	New compile tests for doubleword shift/rotates special case.
	(test_vec_divqud_PWR10): New compile test.
	POWER10 specific implementation using intrinsics.

	* src/vec_int64_runtime.c: New file.
	* src/vec_runtime_DYN.c: Include <pveclib/vec_int64_ppc.h>.
	[VEC_INT64_LIB_LIST(_TARGET)]:
	Define list of int64 target specific functions.
	[PVECLIB_DISABLE_POWER7]: Add VEC_INT64_LIB_LIST (_PWR7).
	Add VEC_INT64_LIB_LIST (_PWR8).
	[PVECLIB_DISABLE_POWER9]: Add VEC_INT64_LIB_LIST (_PWR9).
	[PVECLIB_DISABLE_POWER10]: Add VEC_INT64_LIB_LIST (_PWR10).
	[VEC_RESOLVER_2]: vec_diveud, vec_divqud, vec_divud, vec_modud.
	[VEC_RESOLVER_3]: vec_divdud, vec_moddud.
	* src/vec_runtime_PWR10.c: Include vec_int64_runtime.c.
	* src/vec_runtime_PWR7.c: Include vec_int64_runtime.c.
	* src/vec_runtime_PWR8.c: Include vec_int64_runtime.c.
	* src/vec_runtime_PWR9.c: Include vec_int64_runtime.c.

Signed-off-by: Steven Munroe <[email protected]>
@munroesj52 munroesj52 requested a review from tuliom January 31, 2024 22:12
@munroesj52 munroesj52 self-assigned this Jan 31, 2024
@munroesj52
Copy link
Contributor Author

Tulio please a look at this and issue #189. This patch is not bad (size wise) but the int128 equivalent changes will be more complicated.

@munroesj52
Copy link
Contributor Author

Seems Tulio is busy. I'll merge this so I can push quadword divide and work on Float128 divide round to odd.

@munroesj52 munroesj52 merged commit 7cddb17 into open-power-sdk:master Feb 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant