-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TrueColor: implement optional tablified translucency #1233
base: master
Are you sure you want to change the base?
Conversation
For nicer look and better alphabetical sorting.
src/i_truecolor.c
Outdated
@@ -37,6 +37,100 @@ typedef union | |||
}; | |||
} tcpixel_t; | |||
|
|||
// [JN] LUTs to store precomputed values for all possible 512 color combinations | |||
static uint32_t blendAddLUT[512][512]; // Additive blending |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good already, thank you! Do you think we could get away with only two arrays, a "main" one and an "alt" one (which would be additive in Doom)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like this:
diff --git a/src/i_truecolor.c b/src/i_truecolor.c
index 21f9128b..b3b4b95a 100644
--- a/src/i_truecolor.c
+++ b/src/i_truecolor.c
@@ -38,9 +38,8 @@ typedef union
} tcpixel_t;
// [JN] LUTs to store precomputed values for all possible 512 color combinations
-static uint32_t blendAddLUT[512][512]; // Additive blending
static uint32_t blendOverLUT[512][512]; // Overlay blending
-static uint32_t blendOverAltLUT[512][512]; // Overlay "alt" blending
+static uint32_t blendAltLUT[512][512]; // Additive blending (Doom), Overlay "alt" blending
const uint32_t (*I_BlendAddFunc) (const uint32_t bg_i, const uint32_t fg_i);
const uint32_t (*I_BlendOverFunc) (const uint32_t bg_i, const uint32_t fg_i, const int amount);
@@ -104,23 +103,28 @@ void R_InitBlendMaps (GameMission_t mission)
fg.g = ((fg_index >> 3) & 0x07) << 5;
fg.b = (fg_index & 0x07) << 5;
- // Additive blending calculation
- retAdd.r = MIN(bg.r + fg.r, 0xFFU);
- retAdd.g = MIN(bg.g + fg.g, 0xFFU);
- retAdd.b = MIN(bg.b + fg.b, 0xFFU);
- blendAddLUT[bg_index][fg_index] = retAdd.i;
-
// Overlay blending calculation
retOver.r = (overlay_alpha * fg.r + (0xFFU - overlay_alpha) * bg.r) >> 8;
retOver.g = (overlay_alpha * fg.g + (0xFFU - overlay_alpha) * bg.g) >> 8;
retOver.b = (overlay_alpha * fg.b + (0xFFU - overlay_alpha) * bg.b) >> 8;
blendOverLUT[bg_index][fg_index] = retOver.i;
- // Overlay "alt" blending calculation
- retOver.r = (overlay_alt_alpha * fg.r + (0xFFU - overlay_alt_alpha) * bg.r) >> 8;
- retOver.g = (overlay_alt_alpha * fg.g + (0xFFU - overlay_alt_alpha) * bg.g) >> 8;
- retOver.b = (overlay_alt_alpha * fg.b + (0xFFU - overlay_alt_alpha) * bg.b) >> 8;
- blendOverAltLUT[bg_index][fg_index] = retOver.i;
+ if (overlay_alt_alpha == 0)
+ {
+ // Additive blending calculation
+ retAdd.r = MIN(bg.r + fg.r, 0xFFU);
+ retAdd.g = MIN(bg.g + fg.g, 0xFFU);
+ retAdd.b = MIN(bg.b + fg.b, 0xFFU);
+ blendAltLUT[bg_index][fg_index] = retAdd.i;
+ }
+ else
+ {
+ // Overlay "alt" blending calculation
+ retOver.r = (overlay_alt_alpha * fg.r + (0xFFU - overlay_alt_alpha) * bg.r) >> 8;
+ retOver.g = (overlay_alt_alpha * fg.g + (0xFFU - overlay_alt_alpha) * bg.g) >> 8;
+ retOver.b = (overlay_alt_alpha * fg.b + (0xFFU - overlay_alt_alpha) * bg.b) >> 8;
+ blendAltLUT[bg_index][fg_index] = retOver.i;
+ }
}
}
}
@@ -157,7 +161,7 @@ const uint32_t I_BlendAddLow (const uint32_t bg_i, const uint32_t fg_i)
bg_index = PixelToLUTIndex(bg);
fg_index = PixelToLUTIndex(fg);
- return blendAddLUT[bg_index][fg_index];
+ return blendAltLUT[bg_index][fg_index];
}
const uint32_t I_BlendDark (const uint32_t bg_i, const int d)
@@ -214,7 +218,7 @@ const uint32_t I_BlendOverAltLow (const uint32_t bg_i, const uint32_t fg_i, cons
bg_index = PixelToLUTIndex(bg);
fg_index = PixelToLUTIndex(fg);
- return blendOverAltLUT[bg_index][fg_index];
+ return blendAltLUT[bg_index][fg_index];
}
// [crispy] TRANMAP blending emulation, used for Doom
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I was thinking how to divide tablifying per game, but in the end chickened out because of:
- Possible hit 512x512 times of checking
if/else
conditions. - To avoid bloating
for
loop.
But looks likes it's not that scary, so sure, let's go this way. Most important question for this PR: how should we name menu item? Maybe something like:
- Translucency quality: 8bit / 32bit (or 8bpp / 32bpp? or low/high?)
- Blending mode: speed / quality
Hot-switch is fairly simple, we just have to kick R_InitBlendQuality
, even post-rendering doesn't seems to be not needed.
Finally, I have an itch that something else could be done. Current implementation helps to improve performance and get higher average framerate in -timedemo
runs, but results are not as high as expected. I'm thinking to play around with R_DrawTLColumn
by unrolling by four, @rfomin recommending to go farther by turning blending functions to inline
's, but it's tricky since we are using function pointers, except for BlendDark for fuzz drawing.
So maybe you can give few recommendations regarding where to think or dig? 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Translucency quality: 8bit / 32bit (or 8bpp / 32bpp? or low/high?)
That'd be 9bit, strictly speaking. 😉
- Blending mode: speed / quality
Translucency mode: speed / quality
I'd avoid using the term "blending" if we only call it "translucency" just one line above.
Finally, I have an itch that something else could be done. Current implementation helps to improve performance and get higher average framerate in
-timedemo
runs, but results are not as high as expected. I'm thinking to play around withR_DrawTLColumn
by unrolling by four,
Sure, there's so much left for optimization in Doom's source code.
@rfomin recommending to go farther by turning blending functions to
inline
's, but it's tricky since we are using function pointers, except for BlendDark for fuzz drawing.
Nope, we cannot inline function pointers, obviously.
I'm fine with the current implementation. Heck, I was even too lazy for the last ~10 years to write the code that you just contributed in this PR. 😁
If you find some code path that could use some further optimization, sure, please feel free to investigate it. But this is probably topic for another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rfomin recommending to go farther by turning blending functions to
inline
's, but it's tricky since we are using function pointers, except for BlendDark for fuzz drawing.Nope, we cannot inline function pointers, obviously.
I suggest making more functions (e.g. R_DrawTLColumnAdd
, R_DrawTLColumnOver
). Function calls on every pixel are expensive, I checked when implementing Woof's "function factory" (about ~30% performance hit).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dammit... There is just a small difference in performance comparing between 512x512 and 256x256 LUTs. Something really have to be done with column drawing functions.
Just in case, 256x256 LUT code, still much better than just 256 colors:
--- R:/GITHUB/crispy-doom/src/i_truecolor.c Thu Oct 31 22:15:20 2024
+++ R:/MSYS/home/Julia/crispy-doom/src/i_truecolor.c Fri Nov 1 23:33:47 2024
@@ -38,8 +38,8 @@
} tcpixel_t;
// [crispy] LUTs to store precomputed values for all possible 512x512 color combinations
-static uint32_t blendOverLUT[512][512]; // Overlay blending
-static uint32_t blendAltLUT[512][512]; // Additive blending (Doom), Overlay "alt" blending
+static uint32_t blendOverLUT[256][256]; // Overlay blending
+static uint32_t blendAltLUT[256][256]; // Additive blending (Doom), Overlay "alt" blending
const uint32_t (*I_BlendAddFunc) (const uint32_t bg_i, const uint32_t fg_i);
const uint32_t (*I_BlendOverFunc) (const uint32_t bg_i, const uint32_t fg_i, const int amount);
@@ -90,18 +90,18 @@
retAdd.a = 0xFFU;
retOver.a = 0xFFU;
- for (int bg_index = 0; bg_index < 512; ++bg_index)
+ for (int bg_index = 0; bg_index < 256; ++bg_index)
{
- for (int fg_index = 0; fg_index < 512; ++fg_index)
+ for (int fg_index = 0; fg_index < 256; ++fg_index)
{
// Convert LUT indices back to RGB values with reduced bit depth
- bg.r = (bg_index >> 6) << 5;
- bg.g = ((bg_index >> 3) & 0x07) << 5;
- bg.b = (bg_index & 0x07) << 5;
-
- fg.r = (fg_index >> 6) << 5;
- fg.g = ((fg_index >> 3) & 0x07) << 5;
- fg.b = (fg_index & 0x07) << 5;
+ bg.r = (bg_index >> 5) << 5;
+ bg.g = ((bg_index >> 2) & 0x07) << 5;
+ bg.b = (bg_index & 0x03) << 6;
+
+ fg.r = (fg_index >> 5) << 5;
+ fg.g = ((fg_index >> 2) & 0x07) << 5;
+ fg.b = (fg_index & 0x03) << 6;
// Overlay blending calculation
retOver.r = (overlay_alpha * fg.r + (0xFFU - overlay_alpha) * bg.r) >> 8;
@@ -130,9 +130,9 @@
}
// [crispy] Helper function to convert a pixel color to a LUT index
-static inline uint16_t PixelToLUTIndex (const tcpixel_t color)
+static inline uint8_t PixelToLUTIndex (const tcpixel_t color)
{
- return ((color.r & 0xE0) << 1) | ((color.g & 0xE0) >> 2) | (color.b >> 5);
+ return ((color.r & 0xE0) | ((color.g & 0xE0) >> 3) | (color.b >> 6));
}
const uint32_t I_BlendAdd (const uint32_t bg_i, const uint32_t fg_i)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dammit... There is just a small difference in performance comparing between 512x512 and 256x256 LUTs.
Array access time doesn't depend on array size. 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks, good to know - that's important. I'm playing around with inline
-ing, at least for additive blending in R_DrawTLColumn
and it actually helps, -timedemo
differences before and after inlining are about ~466 w/o inlining and ~522 with inlining.
Looks likes it is have to be done, maybe really in another PR, since we have to divide R_DrawTLColumn
to blend/alt functions (and to high/low detail + high/low quality, oh God!), and unlikely I will handle Format Factory approach. Otherwise, this whole idea and implementation is nothing more than cosmetical change with tiny performance improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, tongue was ahead of thoughts. So the idea is put TrueColor TL drawing functions as macroses to i_truecolor.h
where we can have inlining without overbloating existing code in r_draw.c
of all four games and we don't need to care about tranmap[]
, tinttab[]
and xlatab[]
there. This will still require small separating of colfunc
pointers for TrueColor only, but in theory, this will do the trick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't overload this PR, though. Let's do one step after the other...
Co-Authored-By: Fabian Greffrath <[email protected]>
Also, fix nasty typo: it's 512x512, not 512 colors, i.e. 262144 in total. Co-Authored-By: Fabian Greffrath <[email protected]>
This is slightly faster than fully dynamic and may look more consistent when TrueColor mode is off. Few remarks:
malloc
needed.PLAYPAL
colors.