Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Require C11 #12660

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Require C11 #12660

wants to merge 2 commits into from

Conversation

devreal
Copy link
Contributor

@devreal devreal commented Jul 9, 2024

Error out if the compiler does not support C11 and limit the standard to C11 if the compiler accepts a standard flag. This limit prevents us from using features of newer standard versions.

Also removes support for __thread since we now rely on C11 _Thread_local.

Please check my m4 code, not an expert on this.

See #12658 (review) for notes from the RM discussion and the PR that initiated this.

There will likely be more changes over time replacing pre-C11 features with C11 features where possible but I wanted to keep this manageable.

@jsquyres where should I put a note about this for users?

@devreal devreal requested review from jsquyres and bosilca July 9, 2024 21:12
Copy link
Member

@bosilca bosilca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for 5.x

@devreal
Copy link
Contributor Author

devreal commented Jul 9, 2024

Mhh, CI fails with this error:

  LEX      keyval_lex.c
  CC       keyval_lex.lo
keyval_lex.c: In function ‘opal_util_keyval_yy_init_buffer’:
keyval_lex.c:1848:48: error: implicit declaration of function ‘fileno’ [-Werror=implicit-function-declaration]
 1848 |         b->yy_is_interactive = file ? (isatty( fileno(file) ) > 0) : 0;
      |                                                ^~~~~~
cc1: some warnings being treated as errors

I tried adding POSIX feature test macros but that doesn't seem to help. There seems to be a fix in flex from 7 years ago: westes/flex#263 Maybe that just hasn't trickled down to the distributions yet... I cannot reproduce this problem on my Mac, unfortunately.

@bosilca
Copy link
Member

bosilca commented Jul 9, 2024

Add -D _POSIX_C_SOURCE=20240101L" to libopalutilkeyval_la_CFLAGS` in opal/util/keyval/Makefile.am ?

I guess you will need to so the same for all lex/yacc outputs

config/opal_setup_cc.m4 Outdated Show resolved Hide resolved
@devreal devreal force-pushed the require-c11 branch 3 times, most recently from 8714b5e to c6ba95b Compare July 10, 2024 14:32
AC_MSG_NOTICE([using $flag to ensure C11 support])
break
fi
done
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to put a check here to ensure that some flag was found and that C11 is, indeed, supported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do error out if C11 is not supported (the block above). This is just adding the C11 flag as a restriction (GCC will eventually default to C23 so we could accidentally sneak in C23 code that). If we don't find a flag for some reason we can just move on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah -- so there's no case where an additional flag is needed, but we can't find that flag? If that's true, then this is good enough.

I ask because $opal_cv_c11_flag_required is set to true if we try to compile a C11 program (i.e., checking __STDC_VERSION__) with no additional flags and it fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OPAL_PROG_CC_C11 macro sets the -std=c11 parameter if it's needed for C11 support and then tries to pick the right flag. If we don't find a suitable flag we error out. If C11 support is available without a flag then we get here and try to find one to restrict ourselves. So this block is somewhat optional and we I'm ok with removing it if we don't want to restrict ourselves to C11.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, gotcha. Yes, if this is redundant code, let's ditch it.

Per your other point: I can't think of a reason to restrict ourselves to C11 -- can you? If neither of us can think of one, I think we're clear to remove this block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mhh, I was thinking we should protect ourselves from unintentionally introducing C23 code once compilers default to that but maybe that risk is low. I removed that part of configure.

@devreal devreal force-pushed the require-c11 branch 3 times, most recently from 9a54aae to 4ec6936 Compare July 15, 2024 14:52
@devreal
Copy link
Contributor Author

devreal commented Jul 16, 2024

Not sure why CI fails. All tests seem green but at the end it's marked as failed. Can someone please point me to the error?

@devreal devreal changed the title Require and limit features to C11 Require C11 Jul 17, 2024
@devreal
Copy link
Contributor Author

devreal commented Jul 17, 2024

Got it. Tests show green even if they failed in Jenkins... The culprit is our tests on RHEL 7.9, which still ships GCC 4.8. GCC added support for _Atomic, _Thread_local, and _Generic in 4.9 [1] and we are currently testing for that as part of our C11 test. We are not using _Atomic by default and _Generic is used in one place, optionally (the displacement/size wrappers for collectives). This PR makes _Thread_local the only option for TLS and removes the detection of __thread. All other features of C11 appear to be available in GCC 4.8, including __STDC_VERSION__ == 201112L.

I see three options:

  1. Drop support for RHEL 7.9 and its prehistoric default compiler. According to [2] RHEL 7.9 is on extended support until 2028.
  2. Separate out detection of _Atomic, _Generic, and _Thread_local, bring back the fallback to __thread, and require all other features of C11 by checking for __STDC_VERSION__ == 201112L (since GCC 4.7). That will give us access to some of the things C11 introduced without checking for all features explicitly, including _Static_assert, _Noreturn, stdalign.h, and anonymous structs.
  3. Drop this PR and bring it back post 2028.

[1] https://gcc.gnu.org/wiki/C11Status
[2] https://en.wikipedia.org/wiki/Red_Hat_Enterprise_Linux

@wenduwan
Copy link
Contributor

Drop support for RHEL 7.9 and its prehistoric default compiler.

The extended supported is a paid subscription IIUC so I don't think we can strictly secure the test environment. FYI EFA is dropping support for CentOS 7 and RHEL 7.

@devreal
Copy link
Contributor Author

devreal commented Jul 17, 2024

That's an important data point, thanks @wenduwan. I put this topic on the list for the next devel call.

@devreal
Copy link
Contributor Author

devreal commented Sep 10, 2024

It looks like the Mac OS X environment is borked. It fails to find strsep (even though <string.h> is included) and fails on some internal header:

/Applications/Xcode_15.4.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/signal.h:69:42: error: use of undeclared identifier 'NSIG'
extern __const char *__const sys_signame[NSIG];
                                         ^

@rhc54
Copy link
Contributor

rhc54 commented Sep 10, 2024

Take a gander at @wenduwan comment in the slack "general" thread - looks like GitHub is making some significant changes to the Mac "actions" environment.

@devreal
Copy link
Contributor Author

devreal commented Sep 10, 2024

@rhc54 Thanks for the pointer. According to https://github.blog/changelog/2024-08-19-notice-of-upcoming-deprecations-and-breaking-changes-in-github-actions-runners/ the macos-12 are phased out but we're using macos-14-arm64 so I don't think we're affected here.

@hjelmn
Copy link
Member

hjelmn commented Oct 8, 2024

Awesome! +1000

Copy link

github-actions bot commented Oct 8, 2024

Hello! The Git Commit Checker CI bot found a few problems with this PR:

1a86e0b: Disable silent-rules on CI

  • check_cherry_pick: contains a cherry pick message that refers to a commit that exists, but is in an as-yet unmerged pull request: d4410ad

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

1 similar comment
Copy link

github-actions bot commented Oct 9, 2024

Hello! The Git Commit Checker CI bot found a few problems with this PR:

1a86e0b: Disable silent-rules on CI

  • check_cherry_pick: contains a cherry pick message that refers to a commit that exists, but is in an as-yet unmerged pull request: d4410ad

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

@devreal
Copy link
Contributor Author

devreal commented Oct 9, 2024

I removed the verbose output and rebased onto current main. All tests pass now. I think this is good to go so please re-review.

@bwbarrett
Copy link
Member

@devreal Do you still want to pull these changes into main for the next major release? I think we're all on the same page that we want to do this.

@devreal
Copy link
Contributor Author

devreal commented Dec 16, 2024

Thanks for the nudge @bwbarrett. I rebased, hope all tests pass this time.

@devreal
Copy link
Contributor Author

devreal commented Dec 17, 2024

All tests passed. Please rereview and merge if happy with it.

@hppritcha
Copy link
Member

could you squash down to fewer commits - maybe one?

Error out if the compiler does not support C11.

Removes support for `__thread` since we now rely on C11 `_Thread_local`.

*Auxilliary fixes required to build with C11:*

Replace typeof() with size_t in loop header
All uses of the `PREPARE_REQUESTS_WITH_NO_FREE` macro map to size_t.

Add feature test macro _POSIX_C_SOURCE to enable fileno()
Flex prior to 2.6.6 uses fileno() but does not define the needed
feature test macros.
See westes/flex#263 for details.

Add feature test macro to enable pthread_mutexattr_settype()

Enable nanosleep with _POSIX_C_SOURCE
Move calls to nanosleep out of headers and add the feature test macro
_POSIX_C_SOURCE to enable its use. This should not cause
any significant overheads.

Reduce_local test: Add feature test macros
Needed for strsep(), posix_memalign(), and getopt().
Also fix the min/max macros. There is no need to cache the variables
as these macros are used without producing side-effects.

Include ompi_config.h in reduce_local.c

Signed-off-by: Joseph Schuchart <[email protected]>
@devreal
Copy link
Contributor Author

devreal commented Dec 17, 2024

Sorry, forgot about that. Squashed down to one commit and kept the details of the changes in the commit message.

@bwbarrett
Copy link
Member

I actually disagree with Howard on the squash, but what's done is done.

__typeof__(b) _b = (b); \
_a < _b ? _a : _b; \
})
#define max(a, b) (a) > (b) ? (a) : (b)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max() isn't used, and min() looks like it isn't used in a way where this implementation breaks, so I'm ok with it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why ? The original code was correct, compliant with C11. Why changing it to a worst version ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typeof is not C11 and the generic types aren't needed here.

Copy link
Member

@jsquyres jsquyres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally missed that you asked where we should notify users.

I took the liberty of pushing a commit to this PR with the corresponding docs changes (including a new v6.0.x changelog which features a single line so far: OMPI now requires a C11-compliant compiler).

@devreal
Copy link
Contributor Author

devreal commented Dec 17, 2024

Thanks @jsquyres!

@janjust The Nvidia CI seems to be running low on disk space.

Copy link
Member

@bosilca bosilca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many unnecessary things not related to the scope of this PR were added inside.

__typeof__(b) _b = (b); \
_a < _b ? _a : _b; \
})
#define max(a, b) (a) > (b) ? (a) : (b)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why ? The original code was correct, compliant with C11. Why changing it to a worst version ?

@@ -20,7 +20,7 @@ static int vprotocol_pessimist_request_no_free(ompi_request_t **req) {
}

#define PREPARE_REQUESTS_WITH_NO_FREE(count, requests) do { \
for (typeof(count) i = 0; i < count; i++) \
for (size_t i = 0; i < count; i++) \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this related to the scope of this PR (aka. require C11) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typeof is not C11.

@@ -224,7 +224,7 @@ static inline opal_list_item_t *opal_fifo_pop_atomic(opal_fifo_t *fifo)
if (++attempt == 5) {
/* deliberately suspend this thread to allow other threads to run. this should
* only occur during periods of contention on the lifo. */
_opal_lifo_release_cpu();
opal_lifo_release_cpu();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this related to the scope of this PR (aka. require C11) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This had to be outlined to limit the scope of the _POSIX_C_SOURCE feature test macro needed for nanosleep.

@@ -31,3 +35,16 @@ static void opal_lifo_construct(opal_lifo_t *lifo)
}

OBJ_CLASS_INSTANCE(opal_lifo_t, opal_object_t, opal_lifo_construct, NULL);


void opal_lifo_release_cpu(void)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this related to the scope of this PR (aka. require C11) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above, outlined for the _POSIX_C_SOURCE feature test macro.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants