Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Microcode: Fix TSS task switching (fixes e.g. DOS4GW, DJGPP, CWSDPMI) #172

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

humply
Copy link
Contributor

@humply humply commented Dec 18, 2024

During TSS task switching, when reaching the step CMDEX_task_switch_3_STEP_15, it doesn't progress to CMD_task_switch_4, but hangs in CMD_task_switch_3 (command cond_204 stays active).

This has the effect of reading more memory until a page fault occurs. The task switch does not happen.

This fixes hangcrashes with e.g. DOS4GW, DJGPP, CWSDPMI, SoftICE and should at least also fix #128 and #39.

It also seems to reveal new issues with Windows 95.

@sorgelig
Copy link
Member

What issues it creates in Win95?
It's great you dig it so deep, but once it makes new issue, i'm afraid no one will fix it (may be only you).

@Breiztiger
Copy link

Breiztiger commented Dec 19, 2024

is it possible to test to fixes with a beta build please ?

@gtaylormb
Copy link
Contributor

gtaylormb commented Dec 19, 2024

There is one program, Adlib Tracker II, which hasn't worked previously due to CWSDPMI, so I'm excited to test that soon.

@Breiztiger
Copy link

Breiztiger commented Dec 19, 2024

i've just compil one build
and at2 2.3.57 work (with q87 because it seem needed fpu) with this build
it doesn't work with last unstable build

@humply
Copy link
Contributor Author

humply commented Dec 19, 2024

Yes, it took me ages to find and fix these AO486 bugs (learning verilog in the process). This issue in particular as I had to write a COM program that switches into protected mode to do some TSS task switching. It required more editing to be able to simulate it and find the culprit in verilog.

With this fix I noticed the following issues with Windows 95:

  • BSOD Exception 0xE Page Fault
  • EXPLORER caused a general protection error (KRNL386.EXE 001:00000c9b)
  • EXPLORER caused a KERNEL.DLL segment not present

Unfortunately I haven't found the cause (yet), but it could be task gate, interrupt gate or trap gate related (Windows 95 might rely on these more than Windows 98?) or something totally different? I'm not sure when/if I will find the source of these issues, but I wanted to have my current fixes in as a Christmas gift.

However, this fix now makes it possible to use SoftICE in Windows 95, so maybe someone can use that to find the cause of the new issues? It also fixes a lot of DPMI issues because this often relies on TSS task switching (e.g. QEMM dosdata.sys will also start working under MS-DOS 6.22).

Watcom Debugger 1.9 (MS-DOS 6.22) still hangs with this fix, so maybe that's a hint?

I understand you might not want to merge this, but Windows 98 does not seem to have any new issues if that helps?

Most software using DOS4GW, DJGPP or CWSDPMI will start to work with this fix. Adlib Tracker II will work with this fix with software FPU emulation.

I would also like to take this opportunity to thank all the MiSTer developers. Especially sorgelig, thank you for this amazing project!

Happy holidays!

@gtaylormb
Copy link
Contributor

This is awesome, thanks for the work. I am excited to check out AT2.

To be clear, are these issues new regressions in Win95, or did these potentially already exist and are newly exposed?

@humply
Copy link
Contributor Author

humply commented Dec 19, 2024

Thank you, I'm glad to hear you're happy with the bugfixes.

Regarding the Windows 95 issues, my best guess is that these issues already existed. Without this fix, hardware TSS task switching is not happening at all. This fix will make that work again, so new code paths can be reached, potentially revealing previously hidden bugs. Running hardware tasks bring with them a whole new can of intricacies! But it could also be something totally different.

Regarding Adlib Tracker II, I remember it running with Q87, but not with Q87X.

@luishg
Copy link

luishg commented Dec 20, 2024

@humply,

I support Sorgelig primarily because of this core. This project is an incredible way to preserve the content from the golden age of computing. The entire 286-386-486 era reproduced faithfully on modern hardware easy to maintain.

Contributing is no easy task, but your efforts have been outstanding. They give us hope for something more stable and compatible with the original PC hardware. A million thanks to you!

@sorgelig
Copy link
Member

Ok. I will going to test this in couple days to see if it's ok to merge.
I understand it fixes a serious problem, but if it will prevent using Win95 then it's not a good thing.
Sometimes need to work on fix more to make it done. Are you going to work on it more?

@humply
Copy link
Contributor Author

humply commented Dec 23, 2024

Thank you for looking into this. I think your assessment is correct to be cautious with this fix because it will break Win95. End users will complain. On the other hand, it does fix DPMI issues under DOS. So it's a difficult decision, fix something here, break something there.

Yes, this fix needs more work, but I wanted to share what I had so far. Also I was hoping someone else could maybe help look into this as I'm no Win95 kernel expert. I read that Win95 relies more on hardware task switching than Win98, so that could explain why this fix doesn't break Win98?

I have done some more research on the Win95 issue. DPMI hangs in a MS-DOS Prompt window under Win95. Closing a MS-DOS Prompt window can trigger a crash, but it doesn't crash every time?

When it crashes I get a general protection error in KRNL386.EXE at code_seg:C9B (offset 0x2C3B into KRNL386.EXE for Win95 OSR2) because it wants to check ES:0 for a 'NE' module signature, but ES = 0000h. Above that is the entry point to the function FlushCachedFileHandle, but I'm not sure if this is related to this issue. Setting a breakpoint on VmmTerminateThread seems to break before and close to the issue, but I haven't been able to trace it fully yet.

I also checked the 32-bit TSS some more and I can confirm the general-purpose registers (EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP) and the segment registers (CS, DS, ES, FS, GS, SS) are all written correctly. I still need to check the LINK, LDTR, Trap flag and I/O Map Base Address fields. 16-bit TSS maybe also needs to be checked?

Looking into the 16-bit TSS, I saw in CMD_task_switch.txt that the high words of the general-purpose registers get set to 16'hFFFF, but I could not verify in the Intel documentation if this is correct.

If you want to help find the issue (or any other dev), any help is very much appreciated, because I don't know if I will be able to find it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

BOOM and Allegro freeze
5 participants