Skip to content

Releases/gcc 12 #65

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2,900 commits into
base: master
Choose a base branch
from
Open

Releases/gcc 12 #65

wants to merge 2,900 commits into from

Conversation

jacopobrusini
Copy link

Support for Apple Silicon!!!

@jwakely
Copy link
Contributor

jwakely commented Feb 21, 2024

This is an unofficial mirror that has nothing to do with the GCC project, so submitting pull requests here is a waste of time.

Also, I have no idea what this pull request is trying to do but it would never be accepted even if it was submitted to the right place.

NinaRanns pushed a commit to NinaRanns/gcc that referenced this pull request Jan 28, 2025
…on-r15-7214-g0710024b5bd861

Contracts nonattr rebase on r15 7214 g0710024b5bd861
GCC Administrator and others added 27 commits February 23, 2025 00:19
During combine we may end up with

(set (reg:DI 66 [ _6 ])
     (ashift:DI (reg:DI 72 [ x ])
                (subreg:QI (and:TI (reg:TI 67 [ _1 ])
                                   (const_wide_int 0x0aaaaaaaaaaaaaabf))
                           15)))

where the shift count operand does not trivially fit the scheme of
address operands.  Reject those operands, especially since
strip_address_mutations() expects expressions of the form
(and ... (const_int ...)) and fails for (and ... (const_wide_int ...)).

Thus, be more strict here and accept only CONST_INT operands.  Done by
replacing immediate_operand() with const_int_operand() which is enough
since the former only additionally checks for LEGITIMATE_PIC_OPERAND_P
and targetm.legitimate_constant_p which are always true for CONST_INT
operands.

While on it, fix indentation of the if block.

gcc/ChangeLog:

	PR target/118835
	* config/s390/s390.cc (s390_valid_shift_count): Reject shift
	count operands which do not trivially fit the scheme of
	address operands.

gcc/testsuite/ChangeLog:

	* gcc.target/s390/pr118835.c: New test.

(cherry picked from commit ac9806d)
Floating-point emulation in the D front-end is done via a type named
`struct longdouble`, which in GDC is a small interface around the
real_value type. Because the D code cannot include gcc/real.h directly,
a big enough buffer is used for the data instead.

On x86_64, this buffer is actually bigger than real_value itself, so
when a new longdouble object is created with

    longdouble r;
    real_from_string3 (&r.rv (), buffer, mode);
    return r;

there is uninitialized padding at the end of `r`.  This was never a
problem when D was implemented in C++ (until GCC 12) as comparing two
longdouble objects with `==' would be forwarded to the relevant
operator== overload that extracted the underlying real_value.

However when the front-end was translated to D, such conditions were
instead rewritten into identity comparisons

    return exp.toReal() is CTFloat.zero

The `is` operator gets lowered as a call to `memcmp() == 0', which is
where the read of uninitialized memory occurs, as seen by valgrind.

==26778== Conditional jump or move depends on uninitialised value(s)
==26778==    at 0x911F41: dmd.dstruct._isZeroInit(dmd.expression.Expression) (dstruct.d:635)
==26778==    by 0x9123BE: StructDeclaration::finalizeSize() (dstruct.d:373)
==26778==    by 0x86747C: dmd.aggregate.AggregateDeclaration.determineSize(ref const(dmd.location.Loc)) (aggregate.d:226)
[...]

To avoid accidentally reading uninitialized data, explicitly initialize
all `longdouble` variables with an empty constructor on C++ side of the
implementation before initializing underlying real_value type it holds.

	PR d/116961

gcc/d/ChangeLog:

	* d-codegen.cc (build_float_cst): Change new_value type from real_t to
	real_value.
	* d-ctfloat.cc (CTFloat::fabs): Default initialize the return value.
	(CTFloat::ldexp): Likewise.
	(CTFloat::parse): Likewise.
	* d-longdouble.cc (longdouble::add): Likewise.
	(longdouble::sub): Likewise.
	(longdouble::mul): Likewise.
	(longdouble::div): Likewise.
	(longdouble::mod): Likewise.
	(longdouble::neg): Likewise.
	* d-port.cc (Port::isFloat32LiteralOutOfRange): Likewise.
	(Port::isFloat64LiteralOutOfRange): Likewise.

gcc/testsuite/ChangeLog:

	* gdc.dg/pr116961.d: New test.

(cherry picked from commit f7bc17e)
…ed in i3 [PR118739]

The combine pass is trying to combine:

Trying 16, 22, 21 -> 23:
   16: r104:QI=flags:CCNO>0
   22: {r120:QI=r104:QI^0x1;clobber flags:CC;}
      REG_UNUSED flags:CC
   21: r119:QI=flags:CCNO<=0
      REG_DEAD flags:CCNO
   23: {r110:QI=r119:QI|r120:QI;clobber flags:CC;}
      REG_DEAD r120:QI
      REG_DEAD r119:QI
      REG_UNUSED flags:CC

and creates the following two insn sequence:

modifying insn i2    22: r104:QI=flags:CCNO>0
      REG_DEAD flags:CC
deferring rescan insn with uid = 22.
modifying insn i3    23: r110:QI=flags:CCNO<=0
      REG_DEAD flags:CC
deferring rescan insn with uid = 23.

where the REG_DEAD note in i2 is not correct, because the flags
register is still referenced in i3.  In try_combine() megafunction,
we have this part:

--cut here--
    /* Distribute all the LOG_LINKS and REG_NOTES from I1, I2, and I3.  */
    if (i3notes)
      distribute_notes (i3notes, i3, i3, newi2pat ? i2 : NULL,
			elim_i2, elim_i1, elim_i0);
    if (i2notes)
      distribute_notes (i2notes, i2, i3, newi2pat ? i2 : NULL,
			elim_i2, elim_i1, elim_i0);
    if (i1notes)
      distribute_notes (i1notes, i1, i3, newi2pat ? i2 : NULL,
			elim_i2, local_elim_i1, local_elim_i0);
    if (i0notes)
      distribute_notes (i0notes, i0, i3, newi2pat ? i2 : NULL,
			elim_i2, elim_i1, local_elim_i0);
    if (midnotes)
      distribute_notes (midnotes, NULL, i3, newi2pat ? i2 : NULL,
			elim_i2, elim_i1, elim_i0);
--cut here--

where the compiler distributes REG_UNUSED note from i2:

   22: {r120:QI=r104:QI^0x1;clobber flags:CC;}
      REG_UNUSED flags:CC

via distribute_notes() using the following:

--cut here--
	  /* Otherwise, if this register is used by I3, then this register
	     now dies here, so we must put a REG_DEAD note here unless there
	     is one already.  */
	  else if (reg_referenced_p (XEXP (note, 0), PATTERN (i3))
		   && ! (REG_P (XEXP (note, 0))
			 ? find_regno_note (i3, REG_DEAD,
					    REGNO (XEXP (note, 0)))
			 : find_reg_note (i3, REG_DEAD, XEXP (note, 0))))
	    {
	      PUT_REG_NOTE_KIND (note, REG_DEAD);
	      place = i3;
	    }
--cut here--

Flags register is used in I3, but there already is a REG_DEAD note in I3.
The above condition doesn't trigger and continues in the "else" part where
REG_DEAD note is put to I2.  The proposed solution corrects the above
logic to trigger every time the register is referenced in I3, avoiding the
"else" part.

	PR rtl-optimization/118739

gcc/ChangeLog:

	* combine.cc (distribute_notes) <case REG_UNUSED>: Correct the
	logic when the register is used by I3.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr118739.c: New test.

(cherry picked from commit a92dc3f)
Uros' r15-7793 fixed this PR as well, I'm just committing tests
from the PR so that it can be closed.

2025-03-04  Jakub Jelinek  <[email protected]>

	PR rtl-optimization/119071
	* gcc.dg/pr119071.c: New test.
	* gcc.c-torture/execute/pr119071.c: New test.

(cherry picked from commit ccf9db9)
…485)

Commit r9-4307-g89d7557202d25a forgot to accept a fixed PIC register
when extending the assert in require_pic_register.

arm_pic_register can be set explicitly by the user
(e.g. -mpic-register=r9) or implicitly as the default value with
-fpic/-fPIC/-fPIE and -mno-pic-data-is-text-relative -mlong-calls, and
we want to use/accept it when recording cfun->machine->pic_reg as used
to be the case.

	PR target/115485
	gcc/
	* config/arm/arm.cc (require_pic_register): Fix typos in
	comment. Handle fixed arm_pic_register.

	gcc/testsuite/
	* g++.target/arm/pr115485.C: New test.

(cherry picked from commit b1d0ac2)
The testcase contains a VNx2QImode pseudo that is live across a call
and that cannot be allocated a call-preserved register.  LRA quite
reasonably tried to save it before the call and restore it afterwards.
Unfortunately, the target told it to do that in SImode, even though
punning between SImode and VNx2QImode is disallowed by both
TARGET_CAN_CHANGE_MODE_CLASS and TARGET_MODES_TIEABLE_P.

The natural class to use for SImode is GENERAL_REGS, so this led
to an unsalvageable situation in which we had:

  (set (subreg:VNx2QI (reg:SI A) 0) (reg:VNx2QI B))

where A needed GENERAL_REGS and B needed FP_REGS.  We therefore ended
up in a reload loop.

The hooks above should ensure that this situation can never occur
for incoming subregs.  It only happened here because the target
explicitly forced it.

The decision to use SImode for modes smaller than 4 bytes dates
back to the beginning of the port, before 16-bit floating-point
modes existed.  I'm not sure whether promoting to SImode really
makes sense for any FPR, but that's a separate performance/QoI
discussion.  For now, this patch just disallows using SImode
when it is wrong for correctness reasons, since that should be
safer to backport.

gcc/
	PR testsuite/116238
	* config/aarch64/aarch64.cc (aarch64_hard_regno_caller_save_mode):
	Only return SImode if we can convert to and from it.

gcc/testsuite/
	PR testsuite/116238
	* gcc.target/aarch64/sve/pr116238.c: New test.

(cherry picked from commit ec9d6d4)
The svwhilele folder mishandled the degenerate case in which
the second argument is the maximum integer.  In that case,
the result is all-true regardless of the first parameter:

  If the second scalar operand is equal to the maximum signed integer
  value then a condition which includes an equality test can never fail
  and the result will be an all-true predicate.

This is because the conceptual "increment the first operand
by 1 after each element" is done modulo the range of the operand.
The GCC code was instead treating it as infinite precision.
whilele_5.c even had a test for the incorrect behaviour.

The easiest fix seemed to be to handle that case specially before
doing constant folding.  This also copes with variable first operands.

gcc/
	PR target/116999
	PR target/117045
	* config/aarch64/aarch64-sve-builtins-base.cc
	(svwhilelx_impl::fold): Check for WHILELTs of the minimum value
	and WHILELEs of the maximum value.  Fold them to all-false and
	all-true respectively.

gcc/testsuite/
	PR target/116999
	PR target/117045
	* gcc.target/aarch64/sve/acle/general/whilele_5.c: Fix bogus
	expected result.
	* gcc.target/aarch64/sve/acle/general/whilele_11.c: New test.
	* gcc.target/aarch64/sve/acle/general/whilele_12.c: Likewise.

(cherry picked from commit 50e7c51)
There was an embarrassing typo in the folding of BIT_NOT_EXPR for
POLY_INT_CSTs: it used - rather than ~ on the poly_int.  Not sure
how that happened, but it might have been due to the way that
~x is implemented as -1 - x internally.

gcc/
	PR tree-optimization/118976
	* fold-const.cc (const_unop): Use ~ rather than - for BIT_NOT_EXPR.
	* config/aarch64/aarch64.cc (aarch64_test_sve_folding): New function.
	(aarch64_run_selftests): Run it.

(cherry picked from commit 78380fd)
An optimization was added in GDC-12 which sets the TREE_READONLY flag on
all local variables with the storage class `const' assigned.  For some
reason, const is also being added by the front-end to `__result'
variables in non-virtual functions, which ends up getting wrong code by
the gimplify pass promoting the local to static storage.

A bug has been raised upstream, as this looks like an error in the AST.
For now, turn off setting TREE_READONLY on all result variables.

	PR d/119139

gcc/d/ChangeLog:

	* decl.cc (get_symbol_decl): Don't set TREE_READONLY for __result
	declarations.

gcc/testsuite/ChangeLog:

	* gdc.dg/pr119139.d: New test.

(cherry picked from commit 81582ca)
jakubjelinek and others added 30 commits June 13, 2025 13:10
…decisions [PR119327]

The following testcase FAILs because the always_inline function can't
be inlined.
The rs6000 backend has similarly to other targets a hook which rejects
inlining which would bring in new ISAs which aren't there in the caller.
And this hook rejects this because of OPTION_MASK_SAVE_TOC_INDIRECT
differences.
This flag is set if explicitly requested or by default depending on
whether the current function looks hot (or at least not cold):
  if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0
      && flag_shrink_wrap_separate
      && optimize_function_for_speed_p (cfun))
    rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT;
The target nodes that are being compared here are actually the default
target node (which was created when cfun was NULL) vs. one that was
created for the always_inline function when it wasn't NULL, so one
doesn't have it, the other does.
In any case, this flag feels like a tuning decision rather than hard
ISA requirement and I see no problems why we couldn't inline
even explicit -msave-toc-indirect function into -mno-save-toc-indirect
or vice versa.
We already ignore OPTION_MASK_P{8,10}_FUSION which are also more
like tuning flags.

2025-04-22  Jakub Jelinek  <[email protected]>

	PR target/119327
	* config/rs6000/rs6000.cc (rs6000_can_inline_p): Ignore also
	OPTION_MASK_SAVE_TOC_INDIRECT differences.

	* g++.dg/opt/pr119327.C: New test.

(cherry picked from commit 4b62cf5)
We need to drop the kind argument from what is passed to the
library, but need to do it not only when one uses the argument name
for it (so kind=4 etc.) but also when one passes all the arguments
to the intrinsics.

The following patch uses what gfc_conv_intrinsic_findloc uses,
which looks more efficient and cleaner, we already set automatic
vars to point to the kind and back actual arguments, so we can just
free/clear expr on the former and set name to "%VAL" on the latter.

And similarly clears dim argument for the BT_CHARACTER case when using
maxloc2/minloc2, again regardless of whether it was named or not.

2025-05-13  Jakub Jelinek  <[email protected]>
	    Daniil Kochergin  <[email protected]>
	    Tobias Burnus  <[email protected]>

	PR fortran/120191
	* trans-intrinsic.cc (strip_kind_from_actual): Remove.
	(gfc_conv_intrinsic_minmaxloc): Don't call strip_kind_from_actual.
	Free and clear kind_arg->expr if non-NULL.  Set back_arg->name to
	"%VAL" instead of a loop looking for last argument.  Remove actual
	variable, use array_arg instead.  Free and clear dim_arg->expr if
	non-NULL for BT_CHARACTER cases instead of using a loop.

	* gfortran.dg/pr120191_1.f90: New test.

(cherry picked from commit ec249be)
I've tried to write a testcase for the BT_CHARACTER maxloc/minloc with named
or unnamed arguments and indeed the just posted patch fixed the arguments
in there in multiple cases to match what the library expects.
But the testcase still fails, due to library problems.

One dealt with in this patch are _gfortran_s{max,min}loc2_{4,8,16}_s{1,4}
functions.  Those are trivial wrappers around
_gfortrani_{max,min}loc2_{4,8,16}_s{1,4} which should call those functions
if the scalar mask is true and just return 0 otherwise.
The two bugs I see there is that the back, len arguments are swapped,
which means that it always acts as back=.true. and for len will use
character length of 1 or 0 instead of the desired one.
The _gfortrani_{max,min}loc2_{4,8,16}_s{1,4} functions have prototypes like
GFC_INTEGER_4
maxloc2_4_s1 (gfc_array_s1 * const restrict array, GFC_LOGICAL_4 back, gfc_charlen_type len)
so back comes before len, ditto for the
GFC_INTEGER_4
smaxloc2_4_s1 (gfc_array_s1 * const restrict array,
               GFC_LOGICAL_4 *mask, GFC_LOGICAL_4 back, gfc_charlen_type len)
The other problem is that it was just testing if (mask).  In my limited
Fortran understanding that means that the optional argument mask was
supplied but nothing about its actual value.  Other scalar mask generated
routines use if (mask == NULL || *mask) as the condition when to call the
non-masked function, i.e. when mask is not supplied (then it should act like
.true. mask) or when it is supplied and evaluates to .true.).

2025-05-13  Jakub Jelinek  <[email protected]>

	PR fortran/120191
	* m4/maxloc2s.m4: For smaxloc2 call maxloc2 if mask is NULL or *mask.
	Swap back and len arguments.
	* m4/minloc2s.m4: Likewise.
	* generated/maxloc2_4_s1.c: Regenerate.
	* generated/maxloc2_4_s4.c: Regenerate.
	* generated/maxloc2_8_s1.c: Regenerate.
	* generated/maxloc2_8_s4.c: Regenerate.
	* generated/maxloc2_16_s1.c: Regenerate.
	* generated/maxloc2_16_s4.c: Regenerate.
	* generated/minloc2_4_s1.c: Regenerate.
	* generated/minloc2_4_s4.c: Regenerate.
	* generated/minloc2_8_s1.c: Regenerate.
	* generated/minloc2_8_s4.c: Regenerate.
	* generated/minloc2_16_s1.c: Regenerate.
	* generated/minloc2_16_s4.c: Regenerate.

	* gfortran.dg/pr120191_2.f90: New test.

(cherry picked from commit 482f219)
There is a bug in _gfortran_s{max,min}loc1_{4,8,16}_s{1,4} which the
following testcase shows.
The functions return but then crash in the caller.
Seems that is because buffer overflows, I believe those functions for
if (mask == NULL || *mask) condition being false are supposed to fill in
the result array with all zeros (or allocate it and fill it with zeros).
My understanding is the result array in that case is integer(kind={4,8,16})
and should have the extents the character input array has.
The problem is that it uses * string_len in the extent multiplication:
      extent[n] = GFC_DESCRIPTOR_EXTENT(array,n) * string_len;
and
      extent[n] =
        GFC_DESCRIPTOR_EXTENT(array,n + 1) * string_len;
which is I guess fine and desirable for the extents of the character array,
but not for the extents of the destination array.  Yet the code uses
that extent array for that purpose (and no other purposes).
Here it uses it to set the dimensions for the case where it needs to
allocate (as well as size):
      for (n = 0; n < rank; n++)
        {
          if (n == 0)
            str = 1;
          else
            str = GFC_DESCRIPTOR_STRIDE(retarray,n-1) * extent[n-1];
          GFC_DIMENSION_SET(retarray->dim[n], 0, extent[n] - 1, str);
        }
Here it uses it for bounds checking of the destination:
      if (unlikely (compile_options.bounds_check))
        {
          for (n=0; n < rank; n++)
            {
              index_type ret_extent;

              ret_extent = GFC_DESCRIPTOR_EXTENT(retarray,n);
              if (extent[n] != ret_extent)
                runtime_error ("Incorrect extent in return value of"
                               " MAXLOC intrinsic in dimension %ld:"
                               " is %ld, should be %ld", (long int) n + 1,
                               (long int) ret_extent, (long int) extent[n]);
            }
        }
and here to find out how many retarray elements to actually fill in each
dimension:
  while(1)
    {
      *dest = 0;
      count[0]++;
      dest += dstride[0];
      n = 0;
      while (count[n] == extent[n])
        {
          /* When we get to the end of a dimension, reset it and increment
             the next dimension.  */
          count[n] = 0;
          /* We could precalculate these products, but this is a less
             frequently used path so probably not worth it.  */
          dest -= dstride[n] * extent[n];
Seems maxloc1s.m4 and minloc1s.m4 are the only users of ifunction-s.m4,
so we can change SCALAR_ARRAY_FUNCTION in there without breaking anything
else.

2025-05-13  Jakub Jelinek  <[email protected]>

	PR fortran/120191
	* m4/ifunction-s.m4 (SCALAR_ARRAY_FUNCTION): Don't multiply
	GFC_DESCRIPTOR_EXTENT(array,) by string_len.
	* generated/maxloc1_4_s1.c: Regenerate.
	* generated/maxloc1_4_s4.c: Regenerate.
	* generated/maxloc1_8_s1.c: Regenerate.
	* generated/maxloc1_8_s4.c: Regenerate.
	* generated/maxloc1_16_s1.c: Regenerate.
	* generated/maxloc1_16_s4.c: Regenerate.
	* generated/minloc1_4_s1.c: Regenerate.
	* generated/minloc1_4_s4.c: Regenerate.
	* generated/minloc1_8_s1.c: Regenerate.
	* generated/minloc1_8_s4.c: Regenerate.
	* generated/minloc1_16_s1.c: Regenerate.
	* generated/minloc1_16_s4.c: Regenerate.

	* gfortran.dg/pr120191_3.f90: New test.

(cherry picked from commit 781cfc4)
As mentioned in the PR, _gfortran_{,m,s}findloc2_s{1,4} iterate too many
times in the back case if nothing is found.
For !back, the loops are for (i = 1; i <= extent; i++) so i is in the
body [1, extent] if nothing is found, but for back it is
for (i = extent; i >= 0; i--) so i is in the body [0, extent] and compares
one element before the start of the array.
Note, findloc1_s{1,4} uses
          for (n = len; n > 0; n--, src -= delta * len_array)
for the back loop and
          for (n = 1; n <= len; n++, src += delta * len_array)
for !back.  This patch fixes that.
The testcase fails under valgrind without the libgfortran changes and
succeeds with those.

2025-05-13  Jakub Jelinek  <[email protected]>

	PR libfortran/120196
	* m4/ifindloc2.m4 (header1, header2): For back use i > 0 rather than
	i >= 0 as for condition.
	* generated/findloc2_s1.c: Regenerate.
	* generated/findloc2_s4.c: Regenerate.

	* gfortran.dg/pr120196.f90: New test.

(cherry picked from commit 748a7bc)
The UB on the following testcase isn't diagnosed by -fsanitize=address,
because we see that the array has a single element and optimize the
strlen to 0.  I think it is fine to assume e.g. for range purposes the
lower bound for the strlen as long as we don't try to optimize
strlen (str)
where we know that it returns [26, 42] to
26 + strlen (str + 26), but for the upper bound we really want to punt
on optimizing that for -fsanitize=address to read all the bytes of the
string and diagnose if we run to object end etc.

2024-02-06  Jakub Jelinek  <[email protected]>

	PR sanitizer/110676
	* gimple-fold.cc (gimple_fold_builtin_strlen): For -fsanitize=address
	reset maxlen to sizetype maximum.

	* gcc.dg/asan/pr110676.c: New test.

(cherry picked from commit d3eac7d)
mark_vtable_entries already has

   /* It's OK for the vtable to refer to deprecated virtual functions.  */
   warning_sentinel w(warn_deprecated_decl);

but that doesn't cover __attribute__((unavailable)).  We can use the
following override to cover both.

	PR c++/116606

gcc/cp/ChangeLog:

	* decl2.cc (mark_vtable_entries): Temporarily override deprecated_state to
	UNAVAILABLE_DEPRECATED_SUPPRESS.  Remove a warning_sentinel.

gcc/testsuite/ChangeLog:

	* g++.dg/ext/attr-unavailable-13.C: New test.

(cherry picked from commit d9d34f9)
…for methods with such attributes [PR116636]

On the following testcase, we emit false positive warnings/errors about using
the deprecated or unavailable methods when creating thunks for them, even
when nothing (in the testcase so far) actually used those.

The following patch temporarily disables that diagnostics when creating
the thunks.

2024-09-12  Jakub Jelinek  <[email protected]>

	PR c++/116636
	* method.cc: Include decl.h.
	(use_thunk): Temporarily change deprecated_state to
	UNAVAILABLE_DEPRECATED_SUPPRESS.

	* g++.dg/warn/deprecated-19.C: New test.

(cherry picked from commit 4026d89)
The following testcases are miscompiled on s390x-linux, because the
doloop_optimize
  /* Ensure that the new sequence doesn't clobber a register that
     is live at the end of the block.  */
  {
    bitmap modified = BITMAP_ALLOC (NULL);

    for (rtx_insn *i = doloop_seq; i != NULL; i = NEXT_INSN (i))
      note_stores (i, record_reg_sets, modified);

    basic_block loop_end = desc->out_edge->src;
    bool fail = bitmap_intersect_p (df_get_live_out (loop_end), modified);
check doesn't work as intended.
The problem is that it uses df, but the df analysis was only done using
  iv_analysis_loop_init (loop);
->
  df_analyze_loop (loop);
which computes df inside on the bbs of the loop.
While loop_end bb is inside of the loop, df_get_live_out computed that
way includes registers set in the loop and used at the start of the next
iteration, but doesn't include registers set in the loop (or before the
loop) and used after the loop.

The following patch fixes that by doing whole function df_analyze first,
changes the loop iteration mode from 0 to LI_ONLY_INNERMOST (on many
targets which use can_use_doloop_if_innermost target hook a so are known
to only handle innermost loops) or LI_FROM_INNERMOST (I think only bfin
actually allows non-innermost loops) and checking not just
df_get_live_out (loop_end) (that is needed for something used by the
next iteration), but also df_get_live_in (desc->out_edge->dest),
i.e. what will be used after the loop.  df of such a bb shouldn't
be affected by the df_analyze_loop and so should be from df_analyze
of the whole function.

2024-12-05  Jakub Jelinek  <[email protected]>

	PR rtl-optimization/113994
	PR rtl-optimization/116799
	* loop-doloop.cc: Include targhooks.h.
	(doloop_optimize): Also punt on intersection of modified
	with df_get_live_in (desc->out_edge->dest).
	(doloop_optimize_loops): Call df_analyze.  Use
	LI_ONLY_INNERMOST or LI_FROM_INNERMOST instead of 0 as
	second loops_list argument.

	* gcc.c-torture/execute/pr116799.c: New test.
	* g++.dg/torture/pr113994.C: New test.

(cherry picked from commit 0eed816)
The following testcase is miscompiled because of RTL represententation
of bt{l,q} insn followed by e.g. j{c,nc} being misleading to what it
actually does.
Let's look e.g. at
(define_insn_and_split "*jcc_bt<mode>"
  [(set (pc)
        (if_then_else (match_operator 0 "bt_comparison_operator"
                        [(zero_extract:SWI48
                           (match_operand:SWI48 1 "nonimmediate_operand")
                           (const_int 1)
                           (match_operand:QI 2 "nonmemory_operand"))
                         (const_int 0)])
                      (label_ref (match_operand 3))
                      (pc)))
   (clobber (reg:CC FLAGS_REG))]
  "(TARGET_USE_BT || optimize_function_for_size_p (cfun))
   && (CONST_INT_P (operands[2])
       ? (INTVAL (operands[2]) < GET_MODE_BITSIZE (<MODE>mode)
          && INTVAL (operands[2])
               >= (optimize_function_for_size_p (cfun) ? 8 : 32))
       : !memory_operand (operands[1], <MODE>mode))
   && ix86_pre_reload_split ()"
  "#"
  "&& 1"
  [(set (reg:CCC FLAGS_REG)
        (compare:CCC
          (zero_extract:SWI48
            (match_dup 1)
            (const_int 1)
            (match_dup 2))
          (const_int 0)))
   (set (pc)
        (if_then_else (match_op_dup 0 [(reg:CCC FLAGS_REG) (const_int 0)])
                      (label_ref (match_dup 3))
                      (pc)))]
{
  operands[0] = shallow_copy_rtx (operands[0]);
  PUT_CODE (operands[0], reverse_condition (GET_CODE (operands[0])));
})
The define_insn part in RTL describes exactly what it does,
jumps to op3 if bit op2 in op1 is set (for op0 NE) or not set (for op0 EQ).
The problem is with what it splits into.
put_condition_code %C1 for CCCmode comparisons emits c for EQ and LTU,
nc for NE and GEU and ICEs otherwise.
CCCmode is used mainly for carry out of add/adc, borrow out of sub/sbb,
in those cases e.g. for add we have
(set (reg:CCC flags) (compare:CCC (plus:M x y) x))
and use (ltu (reg:CCC flags) (const_int 0)) for carry set and
(geu (reg:CCC flags) (const_int 0)) for carry not set.  These cases
model in RTL what is actually happening, compare in infinite precision
x from the result of finite precision addition in M mode and if it is
less than unsigned (i.e. overflow happened), carry is set.
Another use of CCCmode is in UNSPEC_* patterns, those are used with
(eq (reg:CCC flags) (const_int 0)) for carry set and ne for unset,
given the UNSPEC no big deal, the middle-end doesn't know what means
set or unset.
But for the bt{l,q}; j{c,nc} case the above splits it into
(set (reg:CCC flags) (compare:CCC (zero_extract) (const_int 0)))
for bt and
(set (pc) (if_then_else (eq (reg:CCC flags) (const_int 0)) (label_ref) (pc)))
for the bit set case (so that the jump expands to jc) and ne for
the bit not set case (so that the jump expands to jnc).
Similarly for the different splitters for cmov and set{c,nc} etc.
The problem is that when the middle-end reads this RTL, it feels
the exact opposite to it.  If zero_extract is 1, flags is set
to comparison of 1 and 0 and that would mean using ne ne in the
if_then_else, and vice versa.

So, in order to better describe in RTL what is actually happening,
one possibility would be to swap the behavior of put_condition_code
and use NE + LTU -> c and EQ + GEU -> nc rather than the current
EQ + LTU -> c and NE + GEU -> nc; and adjust everything.  The
following patch uses a more limited approach, instead of representing
bt{l,q}; j{c,nc} case as written above it uses
(set (reg:CCC flags) (compare:CCC (const_int 0) (zero_extract)))
and
(set (pc) (if_then_else (ltu (reg:CCC flags) (const_int 0)) (label_ref) (pc)))
which uses the existing put_condition_code but describes what the
insns actually do in RTL clearly.  If zero_extract is 1,
then flags are LTU, 0U < 1U, if zero_extract is 0, then flags are GEU,
0U >= 0U.  The patch adjusts the *bt<mode> define_insn and all the
splitters to it and its comparisons/conditional moves/setXX.

2025-02-10  Jakub Jelinek  <[email protected]>

	PR target/118623
	* config/i386/i386.md (*bt<mode>): Represent bt as
	compare:CCC of const0_rtx and zero_extract rather than
	zero_extract and const0_rtx.
	(*jcc_bt<mode>): Likewise.  Use LTU and GEU as flags test
	instead of EQ and NE.
	(*jcc_bt<mode>_1): Likewise.
	(*jcc_bt<mode>_mask): Likewise.
	(Help combine recognize bt followed by cmov splitter): Likewise.
	(*bt<mode>_setcqi): Likewise.
	(*bt<mode>_setncqi): Likewise.
	(*bt<mode>_setnc<mode>): Likewise.

	* gcc.c-torture/execute/pr118623.c: New test.

(cherry picked from commit 9214201)
…lier implicit instantation [PR113976]

Already previously instantiated const variable templates had
cp_apply_type_quals_to_decl called when they were instantiated,
but if they need runtime initialization, their TREE_READONLY flag
has been subsequently cleared.
Explicit variable template instantiation calls grokdeclarator which
calls cp_apply_type_quals_to_decl on them again, setting TREE_READONLY
flag again, but nothing clears it afterwards, so we emit such
instantiations into rodata sections and segfault when the dynamic
initialization attempts to initialize them.

The following patch fixes that by not calling cp_apply_type_quals_to_decl
on already instantiated variable declarations.

2024-02-28  Jakub Jelinek  <[email protected]>
	    Patrick Palka  <[email protected]>

	PR c++/113976
	* decl.cc (grokdeclarator): Don't call cp_apply_type_quals_to_decl
	on DECL_TEMPLATE_INSTANTIATED VAR_DECLs.

	* g++.dg/cpp1y/var-templ87.C: New test.

(cherry picked from commit 29ac924)
The following testcase ICEs since r15-1579 (addition of late combiner),
because *clrmem_short can't be split.
The problem is that the define_insn uses
   (use (match_operand 1 "nonmemory_operand" "n,a,a,a"))
   (use (match_operand 2 "immediate_operand" "X,R,X,X"))
   (clobber (match_scratch:P 3 "=X,X,X,&a"))
and define_split assumed that if operands[1] is const_int_operand,
match_scratch will be always scratch, and it will be reg only if
it was the last alternative where operands[1] is a reg.
The pattern doesn't guarantee it though, of course RA will not try to
uselessly assign a reg there if it is not needed, but during RA
on the testcase below we match the last alternative, but then comes
late combiner and propagates const_int 3 into operands[1].  And that
matches fine, match_scratch matches either scratch or reg and the constraint
in that case is X for the first variant, so still just fine.  But we won't
split that because the splitters only expect scratch.

The following patch fixes it by using match_scratch instead of scratch,
so that it accepts either.

2025-04-17  Jakub Jelinek  <[email protected]>

	PR target/119834
	* config/s390/s390.md (define_split after *cpymem_short): Use
	(clobber (match_scratch N)) instead of (clobber (scratch)).  Use
	(match_dup 4) and operands[4] instead of (match_dup 3) and operands[3]
	in the last of those.
	(define_split after *clrmem_short): Use (clobber (match_scratch N))
	instead of (clobber (scratch)).
	(define_split after *cmpmem_short): Likewise.

	* g++.target/s390/pr119834.C: New test.

(cherry picked from commit 22fe83d)
This got broken with r13-9727 and fixed with either of
r13-9729 or r13-9728.

2025-05-30  Jakub Jelinek  <[email protected]>

	PR target/120480
	* gcc.dg/pr120480.c: New test.

(cherry picked from commit c13d5b9)
If expand_binop_directly fails to add a REG_EQUAL note it tries to
unwind and restart.  But it can unwind too far if expand_binop changed
some of the operands before calling it.  We don't need to unwind that
far anyway since we should end up taking exactly the same route next
time, just without a target rtx.

To fix this we remove LAST from the argument list and let the callers
(all in expand_binop) do their own unwinding if the call fails.
Instead we unwind just as far as the entry to expand_binop_directly
and recurse within this function instead of all the way back up.

gcc/ChangeLog:

	PR middle-end/117811
	* optabs.cc (expand_binop_directly): Remove LAST as an argument,
	instead record the last insn on entry.  Only delete insns if
	we need to restart and restart by calling ourself, not expand_binop.
	(expand_binop): Update callers to expand_binop_directly.  If it
	fails to expand the operation, delete back to LAST.

gcc/testsuite:

	PR middle-end/117811
	* gcc.dg/torture/pr117811.c: New test.

(cherry picked from commit 7679b82)
	PR middle-end/117811
	PR testsuite/52641
gcc/testsuite/
	* gcc.dg/torture/pr117811.c: Fix for int < 32 bit.

(cherry picked from commit 07f229c)
r8-7538 for PR84968 made strip_typedefs_expr diagnose STATEMENT_LIST
so that we reject statement-expressions in noexcept-specifiers to
match our behavior in template arguments (which the parser diagnoses
directly).

Later r11-7452 made decltype(auto) deduction canonicalize the expression
(as an implementation detail) which in turn calls strip_typedefs_expr,
and so ever since we inadvertently reject decltype(auto) deduction of a
statement-expression.

This patch just removes the diagnostic in strip_typedefs_expr and instead
treats statement-expressions similar to lambda-expressions.  The function
doesn't seem like the right place for such a diagnostic and so it seems
easier to just accept rather than try to reject them in a suitable place.

	PR c++/116418

gcc/cp/ChangeLog:

	* tree.cc (strip_typedefs_expr) <case STATEMENT_LIST>: Replace
	this error path with ...
	<case STMT_EXPR>: ... this, returning the original tree.

gcc/testsuite/ChangeLog:

	* g++.dg/eh/pr84968.C: No longer expect an ahead of time diagnostic
	for the statement-expresssion.  Instantiate the template and expect
	an incomplete type error instead.
	* g++.dg/ext/stmtexpr26.C: New test.

Reviewed-by: Jason Merrill <[email protected]>
(cherry picked from commit 12bdcc3)
r13-6452-g341e6cd8d603a3 made build_extra_args walk evaluated contexts
first so that we prefer processing a local specialization in an evaluated
context even if its first use is in an unevaluated context.  But this
means we need to avoid walking a tree that already has extra args/specs
saved because the list of saved specs appears to be an evaluated
context which we'll now walk first.  It seems then that we should be
calculating the saved specs from scratch each time, rather than
potentially walking the saved specs list from an earlier partial
instantiation when calling build_extra_args a second time around.

	PR c++/114303

gcc/cp/ChangeLog:

	* constraint.cc (tsubst_requires_expr): Clear
	REQUIRES_EXPR_EXTRA_ARGS before calling build_extra_args.
	* pt.cc (tree_extra_args): Define.
	(extract_locals_r): Assert *_EXTRA_ARGS is empty.
	(tsubst_stmt) <case IF_STMT>: Clear IF_SCOPE on the new
	IF_STMT.  Call build_extra_args on the new IF_STMT instead
	of t which might already have IF_STMT_EXTRA_ARGS.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp1z/constexpr-if-lambda6.C: New test.

Reviewed-by: Jason Merrill <[email protected]>
(cherry picked from commit b262b17)
Here when checking the access of (the injected-class-name) B in c->B::m
at parse time, we notice its context B (now the type) is a base of the
object type C<T>, so we proceed to use C<T> as the effective qualifying
type.  But this C<T> is the dependent specialization not the primary
template type, so it has empty TYPE_BINFO, which leads to a segfault later
from perform_or_defer_access_check.

The reason the DERIVED_FROM_P (B, C<T>) test guarding this code path works
despite C<T> having empty TYPE_BINFO is because of its currently_open_class
logic (added in r9-713-gd9338471b91bbe) which replaces a dependent
specialization with the primary template type if we're inside it.  So the
safest fix seems to be to call currently_open_class in the caller as well.

	PR c++/116320

gcc/cp/ChangeLog:

	* semantics.cc (check_accessibility_of_qualified_id): Try
	currently_open_class when using the object type as the
	effective qualifying type.

gcc/testsuite/ChangeLog:

	* g++.dg/template/access42.C: New test.

Reviewed-by: Jason Merrill <[email protected]>
(cherry picked from commit 484f139)
Here we end up ICEing at instantiation time for the call to
f<local_static> ultimately because we wrongly consider the call to be
non-dependent, and so we specialize f ahead of time and then get
confused when fully substituting this specialization.

The call is dependent due to [temp.dep.temp]/3 and we miss that because
function template-id arguments aren't coerced until overload resolution,
and so the local static template argument lacks an implicit cast to
reference type that value_dependent_expression_p looks for before
considering dependence of the address.  Other kinds of template-ids aren't
affected since they're coerced ahead of time.

So when considering dependence of a function template-id, we need to
conservatively consider dependence of the address of each argument (if
applicable).

	PR c++/117792

gcc/cp/ChangeLog:

	* pt.cc (type_dependent_expression_p): Consider the dependence
	of the address of each template argument of a function
	template-id.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp1z/nontype7.C: New test.

Reviewed-by: Jason Merrill <[email protected]>
(cherry picked from commit 40f0f6a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.