Comments (9)
There is also the strange pushw %ds
. Looking at -O1 code this replaces subw $2, %sp
. It looks like space on the stack is created for local variables but there are none.
from gcc-ia16.
Hello @bartoldeman ,
Thanks for the reports! I managed to hack together a peephole optimization that can generate addw $2, (%bx)
. I believe the i386 back-end uses a different approach to achieve the same thing, since it can combine the instructions even when I disable peepholes.
Now I "just" need to figure out why the code for incfar (.)
is reloading %es
, and what exactly is up with the unused stack variable. :-|
from gcc-ia16.
Hello @bartoldeman ,
I am making some progress in improving the output code for cases like incfar (.)
.
It seems that the problem is not with my far pointer patches, but due to some weirdness in how GCC itself was handling multi-shortword values when -fsplit-wide-types
was in effect. One of the optimization passes for splitting multi-shortwords into separate shortword variables (subreg1
) was apparently throwing away information about values stored in registers, and this was leading to the extraneous reloads and spills (the unused stack variable was for spilling and reloading %bx
; these later got elided).
After patching GCC to disable this pass (but leaving a later subreg expansion pass subreg2
in place), I can now obtain this output, even without any peephole optimization:
incfar:
pushw %es
pushw %bp
movw %sp, %bp
movw 6(%bp), %bx
movw 8(%bp), %ax
movw %ax, %es
movw %es:(%bx), %dx
addw $2, %dx
movw %dx, %es:(%bx)
popw %bp
popw %es
ret
And with an additional peephole rule to handle the loads into %es:%bx
:
incfar:
pushw %es
pushw %bp
movw %sp, %bp
lesw 6(%bp), %bx
addw $2, %es:(%bx)
popw %bp
popw %es
ret
The only problem now is that disabling the subreg1
pass also causes some previously latent bugs to show up (specifically, there is a regression in gcc/testsuite/gcc.c-torture/execute/pr60960.c
), so now I need to find these bugs and deal with them...
Thank you!
from gcc-ia16.
thanks for the good work. However overall the compiler is now worse for me in terms of size optimizations. I tried to get an example with -Os that shows this. It seems to be mostly that the stack is used more:
struct ab {
unsigned long a, b;
};
extern int test(struct ab __far *p, unsigned long b);
unsigned long find_b(struct ab __far *p)
{
unsigned long a = 0, b = 0;
if (p->a>64000)
{
if (p->b != 0xffffful)
b = p->b;
a = p->a;
}
while (!test(p, b))
{
b++;
if (b > a) b = 2;
}
return b;
}
before: 0x7b bytes, after 0x92 bytes (+19%)
from gcc-ia16.
Hello @bartoldeman ,
Thanks for the report. So it looks like turning off the subreg1
optimization pass wins for some routines but loses for others (argh!).
It seems that, as long as I manage to get the far pointer into a register pair, then GCC can easily generate good code. E.g.:
void incfar(int __far *a)
{
__asm __volatile("" : "=k" (a) : "0" (a));
*a+=2;
}
but I am not sure yet how to get the compiler itself to actually do this. I guess I need to really rethink some other ways to get rid of the reloading of %es
.
By the way, may I take a look at your setup for compiling the FreeDOS kernel using ia16-elf-gcc
, if it is convenient for you? I think it will be good for me to be able to see how any changes to GCC may affect code generation for the kernel as a whole.
Thank you!
from gcc-ia16.
I finally uploaded my fdkernel draft. It's in a separate branch because the patch is still one big chunk.
https://github.com/bartoldeman/fdkernel/tree/ia16-elf-gcc-draft
you should be able to compile it using "make all COMPILER=gcc"
from gcc-ia16.
Hello @bartoldeman ,
Thank you very much! I am able to compile the code using your setup; I will see how I can improve GCC's code generation.
from gcc-ia16.
Hello @bartoldeman ,
I added a patch that now makes -Os
really optimize for size (for some reason, previously the effect of -Os
was limited to just a few peephole rules, such as the pushw %ds
thing).
A result is that the routine
ddt *getddt(int dev)
{
return &(((ddt *) Dyn.Buffer)[dev]);
}
in FreeDOS kernel/dsk.c
now compiles to
_getddt:
pushw %bp
movw %sp, %bp
movw $136, %ax
mulw 4(%bp)
addw $_Dyn+2, %ax
popw %bp
ret $2
whereas before, GCC would do the multiplication using shifts and adds (!).
The FreeDOS kernel code now compiles to 78,284 bytes (uncompressed), whereas before it was 79,052 bytes. Though this is still some way from the 69,828 bytes possible under the Watcom compiler. I will look into ways to further shrink the output.
The incfar (.)
routine you laid out above now also results in good code under -Os
:
incfar:
pushw %es
pushw %bp
movw %sp, %bp
lesw 6(%bp), %bx
addw $2, %es:(%bx)
popw %bp
popw %es
ret
(However, it still does the extra reloads if I use -O3
, so possibly something is still out of whack with the speed optimizations.)
Thank you!
from gcc-ia16.
I see we are now at 77,596 bytes so even better! Actually, as for OW, the kernel resident and init code segments can now be merged into a single segment which makes it even smaller. I'm working on an update.
from gcc-ia16.
Related Issues (20)
- please enable UMB for internal allocs
- 32bit dpmi mode? HOT 48
- How to add option to r-elks.specs file HOT 2
- possible -mdpmi extensions HOT 2
- enlarge overlay info? HOT 15
- Potential optimization: (AX << 8) | value => AH<-AL; AL=value HOT 2
- ICE in g++ for template constructor with -mno-callee-assume-ss-data-segment HOT 3
- Can't install gcc-ia16-elf on ubuntu lunar HOT 5
- Potential 186+ size optimization: MOV AX, [value] / MOV DS, AX => PUSH [value] / POP DS
- Potential optimization: -mno-callee-assume-ss-data-segment overzealous with %ds restoring
- support for int86 C function HOT 7
- Incorrectly writing offset to DS in specific nested for loops with optimization enabled
- Not restoring DS in specific for loops
- internal compiler errors while building Doom HOT 15
- Does the compiler support 8087 hardware floating point instructions? HOT 1
- Compilation failure HOT 2
- Stacktraces HOT 1
- ia16 with newer GCC? HOT 2
- Ubuntu packages for 24.04 Noble?
- Packages on Mac OS?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gcc-ia16.