Skip to content

Intel 430FX L2 Cache auto detection investigation.

Notifications You must be signed in to change notification settings

raszpl/430FXL2Cache

Repository files navigation

Intel 430FX L2 Cache auto detection investigation based on code extracted from Asus PCI/I-P54TP4 bios

Resources and Tools

Asus PCI/I-P54TP4 bios T15I0302.AWD

Award BIOS (4-5x PnP) post code list

Bios sections and their associated Post codes relevant are:

C1 Auto detection of onboard DRAM & Cache
3E Try to turn on level 2 cache
Note: Some chipset may need to turn on the L2 cache in this stage. But usually, the cache is turn on later in Post 61h

430FX chipset Datasheet

image

Intel Engineer wanted to be funny and the last table has FLCE and SCFMI bit order reversed, got me more than once. Correct bit order:

SCFMI FLCE L2 Cache Result
0 0 Disabled
1 0 Disabled; tag invalidate on reads
0 1 Normal L2 cache operation (dependent on SGS)
1 1 Enabled; miss forced on reads/writes

Online Assembly x86 Emulator

Online Assembler and Disassembler with x86 16bit mode support

Online-x86-assembler

HxD - Freeware Hex Editor

Looking for the code

I started by dissasembling bios file using IDA. You begin by loading bios file and selecting "Intel 80x86 processors:metapc", hitting OK, then clicking 16bit. Next you need to relocate code to the address actual BIOS is mapped at in a real computer. Open "ICD Command..." (Shift+F2) and paste:

SegCreate(0x000f0000, 0x00100000, 0xF000, 0, 0, 0);   
SegRename(0x000f0000, "_F000");       
                
auto src = [0,0x10000], dest = [0xF000, 0];   
auto ea_src, ea_dest, hi_limit; 
hi_limit = src + 0x10000;       
ea_dest = dest; 
        
for(ea_src = src; ea_src < hi_limit ; ea_src = ea_src + 4 )             
{       
PatchDword( ea_dest, Dword(ea_src));    
ea_dest = ea_dest + 4;  
}       

Ugly, but it works. Now you have secont segment called _f000. Press G to "Jump to Address" and paste _F000:fff0. Press C to interpret data under cursor as Code and voila, you are looking at x86 legacy BIOS entry point.

My next step was finding segment responsible for Post code C1. Easiest way is locating instruction that outputs actual POST code to port 80h. Here it is:

_F000:E4E7 POST_C1:
_F000:E4E7                 mov     al, 0C1h
_F000:E4E9                 mov     dx, 80h
_F000:E4EC                 out     dx, al          ; manufacture's diagnostic checkpoint
_F000:E4ED                 mov     sp, 0E4F3h
_F000:E4F0                 jmp     ram_cache

First part of C1 is responsible for detecting ram, lets skip that. Next we switch back to real mode.

_F000:E1DD torealmode:
_F000:E1DD                 mov     eax, cr0
_F000:E1E0                 and     al, 0FEh
_F000:E1E2                 mov     cr0, eax
_F000:E1E5                 jmp     far ptr loc_FE1EA
_F000:E1EA loc_FE1EA:

That jump at the end is very important to properly switch CPU mode. Next fragment disables L1 cache.

_F000:E1EA                 mov     al, 0FFh        ; Enable L1 cache
_F000:E1EC                 mov     sp, 0E1F2h
_F000:E1EF                 jmp     CMOS_L1cache

CMOS_L1cache enables/disables L1 cache depending on accumulator, but only if variable stored in CMOS under address 3Dh (0BDh but actual CMOS addresses are only 7 bit long) is not ffh but has 8th bit set. Speculation: is this where Award Bios stores L1 cache disable variable?

_F000:F4FC CMOS_L1cache    proc near
_F000:F4FC                 mov     ah, al
_F000:F4FE                 mov     al, 0BDh
_F000:F500                 out     70h, al         ; CMOS Memory:
_F000:F500                                         ;
_F000:F502                 out     0E1h, al
_F000:F504                 in      al, 71h         ; CMOS Memory
_F000:F506                 cmp     al, 0FFh
_F000:F508                 jz      short locret_FF534
_F000:F50A                 test    al, 80h
_F000:F50C                 jz      short locret_FF534
_F000:F50E                 and     al, 7Eh
_F000:F510                 nop
_F000:F511                 nop
_F000:F512                 or      ah, ah
_F000:F514                 jnz     short L1cache_enable
_F000:F516                 mov     eax, cr0
_F000:F519                 or      eax, 60000000h
_F000:F51F                 mov     cr0, eax
_F000:F522                 wbinvd
_F000:F524                 jmp     short locret_FF534
_F000:F526 ; ---------------------------------------------------------------------------
_F000:F526 L1cache_enable:
_F000:F526                 mov     eax, cr0
_F000:F529                 and     eax, 9FFFFFFFh
_F000:F52F                 mov     cr0, eax
_F000:F532                 wbinvd
_F000:F534
_F000:F534 locret_FF534:
_F000:F534                 retn
_F000:F534 CMOS_L1cache    endp

After that its finally off to the races.

Looking at the code

Actual L2 Cache detection procedure lies before us. First order of business seems to be setting maximum potential cache size supported (512KB), Cache type compatible with all possible choices (Async), and mode of operation resetting TAG ram Valid bits on access (Disabled; tag invalidate on reads).

_F000:E1F4 cache_detect:
_F000:E1F4                 mov     cx, 52h
_F000:E1F7                 mov     al, 0A2h        ; 512KB Async Disabled; tag invalidate on reads
_F000:E1F9                 mov     sp, 0E1FFh
_F000:E1FC                 jmp     pci_write_dev
_F000:E1FC ; ---------------------------------------------------------------------------
_F000:E1FF                 dw offset cache_invalidate
_F000:E201 ; ---------------------------------------------------------------------------
_F000:E201

Reads 512KB in order to ensure whole TAG ram is invalidated? This seems to load 4 bytes x 4000h x 8 = 512KB between 64KB and ~590KB. Why so weird I dont know.

_F000:E201 cache_invalidate:
_F000:E201                 cld
_F000:E202                 mov     dx, 8000h
_F000:E205
_F000:E205 loc_FE205:
_F000:E205                 mov     ds, dx
_F000:E207                 assume ds:nothing
_F000:E207                 xor     si, si
_F000:E209                 mov     cx, 4000h
_F000:E20C                 rep lodsd
_F000:E20F                 sub     dx, 1000h
_F000:E213                 jnz     short loc_FE205

Eagle eyed among you might notice unusual x86 instruction combination REP LODS, often taught as useless and never used or even non existing at all! yet here it is doing heavy lifting :) First to try is the most common L2 256KB Async setup.

_F000:E215                 mov     cx, 52h
_F000:E218                 mov     al, 61h         ; 256KB Async Normal L2 cache operation (dependent on SGS)
_F000:E21A                 mov     sp, 0E220h
_F000:E21D                 jmp     pci_write_dev
_F000:E21D ; ---------------------------------------------------------------------------
_F000:E220                 dw offset loc_FE222
_F000:E222 ; ---------------------------------------------------------------------------
_F000:E222
_F000:E222 loc_FE222:
_F000:E222                 mov     sp, 0E228h
_F000:E225                 jmp     cache_test

cache_test is at the bottom. TLDR is
ok - clc, take next jnb
bad - stc, ignore next jnb

_F000:E225 ; ---------------------------------------------------------------------------
_F000:E228                 dw offset loc_FE22A
_F000:E22A ; ---------------------------------------------------------------------------
_F000:E22A
_F000:E22A loc_FE22A:
_F000:E22A                 jnb     short loc_FE26A
_F000:E22C                 mov     cx, 52h
_F000:E22F                 mov     al, 0B1h        ; 512KB PB Normal L2 cache operation (dependent on SGS)
_F000:E231                 mov     sp, 0E237h
_F000:E234                 jmp     pci_write_dev
_F000:E234 ; ---------------------------------------------------------------------------
_F000:E237                 dw offset loc_FE239
_F000:E239 ; ---------------------------------------------------------------------------
_F000:E239
_F000:E239 loc_FE239:
_F000:E239                 mov     sp, 0E23Fh
_F000:E23C                 jmp     cache_test
_F000:E23C ; ---------------------------------------------------------------------------
_F000:E23F                 dw offset loc_FE241
_F000:E241 ; ---------------------------------------------------------------------------
_F000:E241
_F000:E241 loc_FE241:
_F000:E241                 jnb     short loc_FE296
_F000:E243                 mov     cx, 52h
_F000:E246                 mov     al, 51h         ; 256KB Burst Normal L2 cache operation (dependent on SGS)
_F000:E248                 mov     sp, 0E24Eh
_F000:E24B                 jmp     pci_write_dev
_F000:E24B ; ---------------------------------------------------------------------------
_F000:E24E                 dw offset loc_FE250
_F000:E250 ; ---------------------------------------------------------------------------
_F000:E250
_F000:E250 loc_FE250:
_F000:E250                 mov     sp, 0E256h
_F000:E253                 jmp     cache_test
_F000:E253 ; ---------------------------------------------------------------------------
_F000:E256                 dw offset loc_FE258
_F000:E258 ; ---------------------------------------------------------------------------
_F000:E258
_F000:E258 loc_FE258:
_F000:E258                 jnb     short loc_FE296
_F000:E25A
_F000:E25A loc_FE25A:
_F000:E25A                 mov     cx, 52h
_F000:E25D                 mov     al, 22h         ; 0KB Async Disabled; tag invalidate on reads
_F000:E25F                 mov     sp, 0E265h
_F000:E262                 jmp     pci_write_dev
_F000:E262 ; ---------------------------------------------------------------------------
_F000:E265                 dw offset loc_FE267
_F000:E267 ; ---------------------------------------------------------------------------
_F000:E267
_F000:E267 loc_FE267:
_F000:E267                 jmp     loc_FE30E
_F000:E26A ; ---------------------------------------------------------------------------
_F000:E26A
_F000:E26A loc_FE26A:
_F000:E26A                 mov     cx, 52h
_F000:E26D                 mov     al, 41h         ; 256KB PB Normal L2 cache operation (dependent on SGS)
_F000:E26F                 mov     sp, 0E275h
_F000:E272                 jmp     pci_write_dev
_F000:E272 ; ---------------------------------------------------------------------------
_F000:E275                 dw offset loc_FE277
_F000:E277 ; ---------------------------------------------------------------------------
_F000:E277
_F000:E277 loc_FE277:
_F000:E277                 mov     sp, 0E27Dh
_F000:E27A                 jmp     cache_test
_F000:E27A ; ---------------------------------------------------------------------------
_F000:E27D                 dw offset loc_FE27F
_F000:E27F ; ---------------------------------------------------------------------------
_F000:E27F
_F000:E27F loc_FE27F:
_F000:E27F                 jnb     short loc_FE25A
_F000:E281                 mov     cx, 52h
_F000:E284                 mov     al, 61h         ; 256KB Async Normal L2 cache operation (dependent on SGS)
_F000:E286                 mov     sp, 0E28Ch
_F000:E289                 jmp     pci_write_dev
_F000:E289 ; ---------------------------------------------------------------------------
_F000:E28C                 dw offset loc_FE28E
_F000:E28E ; ---------------------------------------------------------------------------
_F000:E28E
_F000:E28E loc_FE28E:
_F000:E28E                 mov     sp, 0E294h
_F000:E291                 jmp     cache_test
_F000:E291 ; ---------------------------------------------------------------------------
_F000:E294                 dw offset loc_FE296
_F000:E296 ; ---------------------------------------------------------------------------
_F000:E296
_F000:E296 loc_FE296:
_F000:E296                 mov     cx, 52h
_F000:E299                 mov     sp, 0E29Fh
_F000:E29C                 jmp     pci_read_dev
_F000:E29C ; ---------------------------------------------------------------------------
_F000:E29F                 dw offset loc_FE2A1
_F000:E2A1 ; ---------------------------------------------------------------------------
_F000:E2A1
_F000:E2A1 loc_FE2A1:
_F000:E2A1                 and     al, 3Fh
_F000:E2A3                 or      al, 80h         ; set 512KB, leave type and mode as is
_F000:E2A5                 mov     sp, 0E2ABh
_F000:E2A8                 jmp     pci_write_dev
_F000:E2A8 ; ---------------------------------------------------------------------------
_F000:E2AB                 dw offset loc_FE2AD
_F000:E2AD ; ---------------------------------------------------------------------------

This next part is interesting. This code tries to make sure we have 512KB cache installed by writing 8 byte magic number to address 0000, flushing and invalidating both L1 and L2 cache??, then another different magic number at 256KB, another cache flush, and finally checking if first magic number is still there. Is cache working in cache_as_ram mode? At this moment I dont understand how this works, wbinvd is supposed to drop all cache and we barely initialized ram controller at this point. Im lost here.

_F000:E2AD loc_FE2AD:
_F000:E2AD                 xor     si, si
_F000:E2AF                 xor     ax, ax
_F000:E2B1                 mov     ds, ax
_F000:E2B3                 assume ds:nothing
_F000:E2B3                 mov     eax, [si]
_F000:E2B6                 mov     dword ptr [si], 0A55A55AAh
_F000:E2BD                 mov     dword ptr [si+4], 5AA5AA55h
_F000:E2C5                 wbinvd
_F000:E2C7                 mov     ax, 4000h
_F000:E2CA                 mov     ds, ax
_F000:E2CC                 assume ds:nothing
_F000:E2CC                 mov     eax, [si]
_F000:E2CF                 mov     dword ptr [si], 5AA5AA55h
_F000:E2D6                 mov     dword ptr [si+4], 0A55A55AAh
_F000:E2DE                 wbinvd
_F000:E2E0                 xor     ax, ax
_F000:E2E2                 mov     ds, ax
_F000:E2E4                 assume ds:nothing
_F000:E2E4                 cmp     dword ptr [si], 0A55A55AAh
_F000:E2EB                 jnz     short loc_FE2F7
_F000:E2ED                 cmp     dword ptr [si+4], 5AA5AA55h
_F000:E2F5                 jz      short loc_FE30E
_F000:E2F7
_F000:E2F7 loc_FE2F7:
_F000:E2F7                 mov     cx, 52h
_F000:E2FA                 mov     sp, 0E300h
_F000:E2FD                 jmp     pci_read_dev
_F000:E2FD ; ---------------------------------------------------------------------------
_F000:E300                 dw offset loc_FE302
_F000:E302 ; ---------------------------------------------------------------------------
_F000:E302
_F000:E302 loc_FE302:
_F000:E302                 and     al, 3Fh
_F000:E304                 or      al, 40h         ; set 256KB, leave type and mode as is
_F000:E306                 mov     sp, 0E30Ch
_F000:E309                 jmp     pci_write_dev
_F000:E309 ; ---------------------------------------------------------------------------
_F000:E30C                 dw offset loc_FE30E
_F000:E30E ; ---------------------------------------------------------------------------
_F000:E30E
_F000:E30E loc_FE30E:
_F000:E30E                 mov     cx, 52h
_F000:E311                 mov     sp, 0E317h
_F000:E314                 jmp     pci_read_dev
_F000:E314 ; ---------------------------------------------------------------------------
_F000:E317                 dw offset loc_FE319
_F000:E319 ; ---------------------------------------------------------------------------
_F000:E319
_F000:E319 loc_FE319:
_F000:E319                 mov     ah, al
_F000:E31B                 and     ah, 0F0h
_F000:E31E                 cmp     ah, 80h         ; test if 512KB was selected?
_F000:E321                 jnz     short loc_FE325
_F000:E323                 or      al, 30h         ; if 512KB then set PB
_F000:E325
_F000:E325 loc_FE325:
_F000:E325                 or      al, 3           ; set Enabled; miss forced on reads/writes
_F000:E327                 mov     sp, 0E32Dh
_F000:E32A                 jmp     pci_write_dev
_F000:E32A ; ---------------------------------------------------------------------------
_F000:E32D                 dw offset cache_prefill
_F000:E32F ; ---------------------------------------------------------------------------

Again this weird 4 bytes x 4000h x 8 = 512KB between 64KB and ~590KB. Why not 0-512KB range?

_F000:E32F cache_prefill:
_F000:E32F                 mov     dx, 8000h
_F000:E332
_F000:E332 loc_FE332:
_F000:E332                 mov     ds, dx
_F000:E334                 assume ds:nothing
_F000:E334                 mov     cx, 4000h
_F000:E337                 xor     si, si
_F000:E339                 rep lodsd
_F000:E33C                 sub     dx, 1000h
_F000:E340                 jnz     short loc_FE332
_F000:E342                 mov     al, 0           ; Disable L1 cache
_F000:E344                 mov     sp, 0E34Ah
_F000:E347                 jmp     CMOS_L1cache
_F000:E347 ; ---------------------------------------------------------------------------
_F000:E34A                 dw offset loc_FE34C
_F000:E34C ; ---------------------------------------------------------------------------
_F000:E34C
_F000:E34C loc_FE34C:
_F000:E34C                 shr     esp, 10h
_F000:E350                 clc
_F000:E351                 retn
_F000:E351 ram_cache       endp
_F000:E351
_F000:E352 ; =============== S U B R O U T I N E =======================================

Load 16KB from between 64-80KB. Store 2KB of magic numbers at 64KB, flush cache, check if 2KB of magic numbers is still there.

_F000:E352 cache_test      proc near
_F000:E352                 cld
_F000:E353                 mov     ax, 1000h
_F000:E356                 mov     ds, ax
_F000:E358                 assume ds:nothing
_F000:E358                 xor     si, si
_F000:E35A                 mov     cx, 1000h
_F000:E35D                 rep lodsd
_F000:E360                 mov     ax, 1000h
_F000:E363                 mov     es, ax
_F000:E365                 assume es:nothing
_F000:E365                 xor     di, di
_F000:E367                 mov     eax, 0A55A55AAh
_F000:E36D                 mov     edx, 3C3C33CCh
_F000:E373                 mov     cx, 100h
_F000:E376
_F000:E376 loc_FE376:
_F000:E376                 mov     es:[di], eax
_F000:E37A                 add     di, 4
_F000:E37D                 mov     es:[di], edx
_F000:E381                 add     di, 4
_F000:E384                 not     eax
_F000:E387                 not     edx
_F000:E38A                 loop    loc_FE376
_F000:E38C                 wbinvd
_F000:E38E                 mov     cx, 100h
_F000:E391                 xor     di, di
_F000:E393                 mov     eax, 0A55A55AAh
_F000:E399                 mov     edx, 3C3C33CCh
_F000:E39F
_F000:E39F loc_FE39F:
_F000:E39F                 cmp     es:[di], eax
_F000:E3A3                 jnz     short loc_FE3BB
_F000:E3A5                 add     di, 4
_F000:E3A8                 cmp     es:[di], edx
_F000:E3AC                 jnz     short loc_FE3BB
_F000:E3AE                 add     di, 4
_F000:E3B1                 not     eax
_F000:E3B4                 not     edx
_F000:E3B7                 loop    loc_FE39F
_F000:E3B9                 clc
_F000:E3BA                 retn
_F000:E3BB ; ---------------------------------------------------------------------------
_F000:E3BB loc_FE3BB:
_F000:E3BB                 stc
_F000:E3BC                 retn

Helper functions to read/write PCI space registers:

_F000:F688 pci_read_dev    proc near
_F000:F688                 mov     ax, 8000h
_F000:F68B                 shl     eax, 10h
_F000:F68F                 mov     ax, cx
_F000:F691                 and     al, 0FCh
_F000:F693                 mov     dx, 0CF8h
_F000:F696                 out     dx, eax         ; PCI Configuration Space Address Register
_F000:F696                                         ; bits   7..0: configuration space offset
_F000:F696                                         ; bits  10..8: function number
_F000:F696                                         ; bits 15..11: device number
_F000:F696                                         ; bits 23..16: bus number
_F000:F698                 add     dl, 4
_F000:F69B                 mov     al, cl
_F000:F69D                 and     al, 3
_F000:F69F                 add     dl, al
_F000:F6A1                 in      al, dx
_F000:F6A2                 retn
_F000:F6A2 pci_read_dev    endp

_F000:F6A4 pci_write_dev   proc near
_F000:F6A4                 xchg    ax, cx
_F000:F6A5                 and     eax, 8000FFFFh
_F000:F6AB                 or      eax, 80000000h
_F000:F6B1                 mov     ch, al
_F000:F6B3                 and     al, 0FCh
_F000:F6B5                 mov     dx, 0CF8h
_F000:F6B8                 out     dx, eax         ; PCI Configuration Space Address Register
_F000:F6B8                                         ; bits   7..0: configuration space offset
_F000:F6B8                                         ; bits  10..8: function number
_F000:F6B8                                         ; bits 15..11: device number
_F000:F6B8                                         ; bits 23..16: bus number
_F000:F6BA                 mov     al, ch
_F000:F6BC                 add     dl, 4
_F000:F6BF                 and     ch, 3
_F000:F6C2                 add     dl, ch
_F000:F6C4                 xchg    ax, cx
_F000:F6C5                 out     dx, al
_F000:F6C6                 retn
_F000:F6C6 pci_write_dev   endp

Comments

Please feel free to correct any mistakes I made and expand explanations. You can use Github Issues of just send pull requests. All contributions are welcome.

About

Intel 430FX L2 Cache auto detection investigation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published