Giter Club home page Giter Club logo

Comments (9)

bartoldeman avatar bartoldeman commented on September 18, 2024

There is also the strange pushw %ds. Looking at -O1 code this replaces subw $2, %sp. It looks like space on the stack is created for local variables but there are none.

from gcc-ia16.

tkchia avatar tkchia commented on September 18, 2024

Hello @bartoldeman ,

Thanks for the reports! I managed to hack together a peephole optimization that can generate addw $2, (%bx). I believe the i386 back-end uses a different approach to achieve the same thing, since it can combine the instructions even when I disable peepholes.

Now I "just" need to figure out why the code for incfar (.) is reloading %es, and what exactly is up with the unused stack variable. :-|

from gcc-ia16.

tkchia avatar tkchia commented on September 18, 2024

Hello @bartoldeman ,

I am making some progress in improving the output code for cases like incfar (.).

It seems that the problem is not with my far pointer patches, but due to some weirdness in how GCC itself was handling multi-shortword values when -fsplit-wide-types was in effect. One of the optimization passes for splitting multi-shortwords into separate shortword variables (subreg1) was apparently throwing away information about values stored in registers, and this was leading to the extraneous reloads and spills (the unused stack variable was for spilling and reloading %bx; these later got elided).

After patching GCC to disable this pass (but leaving a later subreg expansion pass subreg2 in place), I can now obtain this output, even without any peephole optimization:

incfar:
	pushw	%es
	pushw	%bp
	movw	%sp,	%bp
	movw	6(%bp),	%bx
	movw	8(%bp),	%ax
	movw	%ax,	%es
	movw	%es:(%bx),	%dx
	addw	$2,	%dx
	movw	%dx,	%es:(%bx)
	popw	%bp
	popw	%es
	ret

And with an additional peephole rule to handle the loads into %es:%bx:

incfar:
        pushw   %es
        pushw   %bp
        movw    %sp,    %bp
        lesw    6(%bp), %bx
        addw    $2,     %es:(%bx)
        popw    %bp
        popw    %es
        ret

The only problem now is that disabling the subreg1 pass also causes some previously latent bugs to show up (specifically, there is a regression in gcc/testsuite/gcc.c-torture/execute/pr60960.c), so now I need to find these bugs and deal with them...

Thank you!

from gcc-ia16.

bartoldeman avatar bartoldeman commented on September 18, 2024

thanks for the good work. However overall the compiler is now worse for me in terms of size optimizations. I tried to get an example with -Os that shows this. It seems to be mostly that the stack is used more:

struct ab {
  unsigned long a, b;
};
extern int test(struct ab __far *p, unsigned long b);

unsigned long find_b(struct ab __far *p)
{
  unsigned long a = 0, b = 0;
  if (p->a>64000)
  {
    if (p->b != 0xffffful)
      b = p->b;
    a = p->a;
  }
  while (!test(p, b))
  {
    b++;
    if (b > a) b = 2;
  }
  return b;
}

before: 0x7b bytes, after 0x92 bytes (+19%)

from gcc-ia16.

tkchia avatar tkchia commented on September 18, 2024

Hello @bartoldeman ,

Thanks for the report. So it looks like turning off the subreg1 optimization pass wins for some routines but loses for others (argh!).

It seems that, as long as I manage to get the far pointer into a register pair, then GCC can easily generate good code. E.g.:

void incfar(int __far *a)
{
  __asm __volatile("" : "=k" (a) : "0" (a));
  *a+=2;
}

but I am not sure yet how to get the compiler itself to actually do this. I guess I need to really rethink some other ways to get rid of the reloading of %es.

By the way, may I take a look at your setup for compiling the FreeDOS kernel using ia16-elf-gcc, if it is convenient for you? I think it will be good for me to be able to see how any changes to GCC may affect code generation for the kernel as a whole.

Thank you!

from gcc-ia16.

bartoldeman avatar bartoldeman commented on September 18, 2024

I finally uploaded my fdkernel draft. It's in a separate branch because the patch is still one big chunk.
https://github.com/bartoldeman/fdkernel/tree/ia16-elf-gcc-draft

you should be able to compile it using "make all COMPILER=gcc"

from gcc-ia16.

tkchia avatar tkchia commented on September 18, 2024

Hello @bartoldeman ,

Thank you very much! I am able to compile the code using your setup; I will see how I can improve GCC's code generation.

from gcc-ia16.

tkchia avatar tkchia commented on September 18, 2024

Hello @bartoldeman ,

I added a patch that now makes -Os really optimize for size (for some reason, previously the effect of -Os was limited to just a few peephole rules, such as the pushw %ds thing).

A result is that the routine

ddt *getddt(int dev)
{
  return &(((ddt *) Dyn.Buffer)[dev]);
}

in FreeDOS kernel/dsk.c now compiles to

_getddt:
        pushw   %bp
        movw    %sp,    %bp
        movw    $136,   %ax
        mulw    4(%bp)
        addw    $_Dyn+2,        %ax
        popw    %bp
        ret     $2

whereas before, GCC would do the multiplication using shifts and adds (!).

The FreeDOS kernel code now compiles to 78,284 bytes (uncompressed), whereas before it was 79,052 bytes. Though this is still some way from the 69,828 bytes possible under the Watcom compiler. I will look into ways to further shrink the output.

The incfar (.) routine you laid out above now also results in good code under -Os:

incfar:
        pushw   %es
        pushw   %bp
        movw    %sp,    %bp
        lesw    6(%bp), %bx
        addw    $2,     %es:(%bx)
        popw    %bp
        popw    %es
        ret

(However, it still does the extra reloads if I use -O3, so possibly something is still out of whack with the speed optimizations.)

Thank you!

from gcc-ia16.

bartoldeman avatar bartoldeman commented on September 18, 2024

I see we are now at 77,596 bytes so even better! Actually, as for OW, the kernel resident and init code segments can now be merged into a single segment which makes it even smaller. I'm working on an update.

from gcc-ia16.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.