Background
MM and OOT have assets that load a DL from a segment address provided by code. For OOT this is always a segment with a 0 offset. However MM is unique in that some assets also provide an offset value. This is used in a way where one DList can be stored in a segment, but assets can "index" into certain parts through creative placement of gsSPEndDisplayList
.
An example of this can be seen here:
|
static Gfx renderModeSetXluSingleCycleDL[] = { |
|
gsDPSetRenderMode(AA_EN | Z_CMP | IM_RD | CLR_ON_CVG | CVG_DST_WRAP | ZMODE_XLU | FORCE_BL | |
|
GBL_c1(G_BL_CLR_IN, G_BL_0, G_BL_CLR_IN, G_BL_1), |
|
G_RM_AA_ZB_XLU_SURF2), |
|
gsSPEndDisplayList(), |
|
// These instructions will never get executed |
|
gsDPSetRenderMode(AA_EN | Z_CMP | IM_RD | CLR_ON_CVG | CVG_DST_WRAP | ZMODE_XLU | FORCE_BL | |
|
GBL_c1(G_BL_CLR_FOG, G_BL_A_SHADE, G_BL_CLR_IN, G_BL_1MA), |
|
G_RM_AA_ZB_XLU_SURF2), |
|
gsSPEndDisplayList(), |
|
}; |
This DList is 4 instructions long, but essentially is used as two DLists due to the gsSPEndDisplayList
. This DL is synced to segment 0x0C
. Then assets control which "DL" they get by setting the segment offset. 0x0C000000
would execute the first half, where as 0x0C000010
would execute the second half.
The Problem
Where this becomes a problem is with our definition of Gfx
words being uintptr_t
instead of uint32_t
. On 64bit machines, the size of Gfx
is double compared to 32bit/N64 hardware. This means that segment offset values are invalid/index to the wrong location.
With the example above, 0x0C000010
has an offset of 0x10
. Gfx
has a size of 0x8
on 32bit and a size of 0x10
on 64bit. This means that the original offset of 0x10
is meant to index the segment address by 2
, but on a 64bit machine this translates into only an index of 1
.
Possible solutions / Proposals
Option 1: Exporter Fix
We could updated the exporter to adjust the segment offset for DList lookups based on the system performing the export. This would allow everything to work as expected without any changes in 2ship/Fast3D.
Example of proposed changes: Archez/OTRExporter@8ac66fe
Pros:
- Keeps Fast3D ignorant and 2ship
Cons:
- Prevents portability of exported OTRs from working on opposite architecture machines (OTRs generated on a 64bit machine will only work on a 64bit machine
We would also probably need to track in the OTR what architecture was used to create it so we can warn/notify when it is used on an incorrect machine.
Option 2: 2ship/Fast3D fix
We could change Fast3D to handle adjusting the segment value for DList lookups at render time. This would require us to adjust our existing 64bit modified segment values used by the master gfx DLs to be forced as 32bit sized offsets.
Example of proposed changes (also look at the LUS submodule change): Archez@6d479a8
Pros:
- Preserves portability of generated OTRs
Cons:
- Requires Fast3D to handle and be aware of the Gfx size difference
- Requires 2ship and other ports using LUS be aware that DList segment values must be in the "32bit size"
Option 3: New custom opcode
We could add a new custom DL opcode for use by the exporter when encountering DList segment addresses. This opcode can then signal to Fast3D to perform the offset adjustment strictly for address coming from the OTR.
Example of proposed changes: Archez/libultraship@552e192 + Archez/OTRExporter@4c1a36b
Pros:
- Preservces OTR portability
- Keeps segment valus coming from 2ship code using their true offset values unmodified
Cons:
- Yet another custom opcode to manage
Option 4: Same opcode with extra custom flag
Similar to option 3, however, instead of adding a new opcode we can leverage unused space in the original G_DL
opcode to set a flag (16 bits of free space through gsDma1p
l
argument). This flag can then be used by Fast3D to perform the offset adjustment. A new macro can be used to set the flag into the command.
Example of proposed changes: Archez/libultraship@057d374 + Archez/OTRExporter@4c1a36b
Pros:
- All from option 3
- No new opcodes
Cons:
- Technically squeezes in custom flags into an existing opcode, but should be safe assuming this bit range is truly unused in N64 hardware (it is at least unused in LUS)