NWClient VLM.EXE causes FDPP crash ,about dosemu2/comcom64

Comments (107)

stsp commented on May 26, 2024

Did you never enabled umb at b0
by any chance?
Please see if its not crashing with
freedos.

from comcom64.

andrewbird commented on May 26, 2024

Nope, rest of dosemu.conf is pretty standard stuff

$ cat test-imagedir/dosemu.conf

$_lpt1 = ""
$_hdimage = "dXXXXs/c:hdtype1 +1"
$_floppy_a = ""
$_com1 = "/tmp/ttyV0"

$_pktdriver=(on)                                                                
$_vnet = "tap"                                                                  
$_tapdev = "tap0"

FreeDOS 1.20 seems okay.

from comcom64.

stsp commented on May 26, 2024

Can't reproduce...

from comcom64.

stsp commented on May 26, 2024

Though I didn't configure any bridges.
Could you confirm that the crash happens not
when vlm initializes, but later? In which case I
should configure the network too.

from comcom64.

stsp commented on May 26, 2024

Configuring and activating bridge does not
help, still no crash. I wonder if it crashes for
you on init stage or when handling some
packet? -D9+P

from comcom64.

andrewbird commented on May 26, 2024

I don't think it got as far as looking for the netware server as the only traffic I see on tap0 are spanning tree for the bridge.

root@polly:~# tshark -i tap0
Running as user "root" and group "root". This could be dangerous.
Capturing on 'tap0'
    1 0.000000000 fe80::a4e2:a1ff:fe19:ec0 → ff02::16     ICMPv6 110 Multicast Listener Report Message v2
    2 0.179970211 fe80::a4e2:a1ff:fe19:ec0 → ff02::16     ICMPv6 110 Multicast Listener Report Message v2
    3 0.619964140 a6:e2:a1:19:0e:c0 → Spanning-tree-(for-bridges)_00 STP 52 Conf. Root = 32768/0/52:54:00:e8:3d:3c  Cost = 0  Port = 0x8002
    4 2.603962331 a6:e2:a1:19:0e:c0 → Spanning-tree-(for-bridges)_00 STP 52 Conf. Root = 32768/0/52:54:00:e8:3d:3c  Cost = 0  Port = 0x8002
    5 4.619960918 a6:e2:a1:19:0e:c0 → Spanning-tree-(for-bridges)_00 STP 52 Conf. TC + Root = 32768/0/52:54:00:e8:3d:3c  Cost = 0  Port = 0x8002
    6 6.603963806 a6:e2:a1:19:0e:c0 → Spanning-tree-(for-bridges)_00 STP 52 Conf. TC + Root = 32768/0/52:54:00:e8:3d:3c  Cost = 0  Port = 0x8002
    7 8.619967041 a6:e2:a1:19:0e:c0 → Spanning-tree-(for-bridges)_00 STP 52 Conf. TC + Root = 32768/0/52:54:00:e8:3d:3c  Cost = 0  Port = 0x8002
    8 10.603963211 a6:e2:a1:19:0e:c0 → Spanning-tree-(for-bridges)_00 STP 52 Conf. TC + Root = 32768/0/52:54:00:e8:3d:3c  Cost = 0  Port = 0x8002
    9 12.619963403 a6:e2:a1:19:0e:c0 → Spanning-tree-(for-bridges)_00 STP 52 Conf. TC + Root = 32768/0/52:54:00:e8:3d:3c  Cost = 0  Port = 0x8002
   10 14.603966118 a6:e2:a1:19:0e:c0 → Spanning-tree-(for-bridges)_00 STP 52 Conf. TC + Root = 32768/0/52:54:00:e8:3d:3c  Cost = 0  Port = 0x8002
   11 16.619962525 a6:e2:a1:19:0e:c0 → Spanning-tree-(for-bridges)_00 STP 52 Conf. TC + Root = 32768/0/52:54:00:e8:3d:3c  Cost = 0  Port = 0x8002
   12 18.603961349 a6:e2:a1:19:0e:c0 → Spanning-tree-(for-bridges)_00 STP 52 Conf. TC + Root = 32768/0/52:54:00:e8:3d:3c  Cost = 0  Port = 0x8002
   13 20.619963745 a6:e2:a1:19:0e:c0 → Spanning-tree-(for-bridges)_00 STP 52 Conf. TC + Root = 32768/0/52:54:00:e8:3d:3c  Cost = 0  Port = 0x8002
   14 22.603963868 a6:e2:a1:19:0e:c0 → Spanning-tree-(for-bridges)_00 STP 52 Conf. TC + Root = 32768/0/52:54:00:e8:3d:3c  Cost = 0  Port = 0x8002
   15 24.619963950 a6:e2:a1:19:0e:c0 → Spanning-tree-(for-bridges)_00 STP 52 Conf. TC + Root = 32768/0/52:54:00:e8:3d:3c  Cost = 0  Port = 0x8002
   16 26.603980642 a6:e2:a1:19:0e:c0 → Spanning-tree-(for-bridges)_00 STP 52 Conf. TC + Root = 32768/0/52:54:00:e8:3d:3c  Cost = 0  Port = 0x8002
   17 28.619970755 a6:e2:a1:19:0e:c0 → Spanning-tree-(for-bridges)_00 STP 52 Conf. Root = 32768/0/52:54:00:e8:3d:3c  Cost = 0  Port = 0x8002
   18 30.603965086 a6:e2:a1:19:0e:c0 → Spanning-tree-(for-bridges)_00 STP 52 Conf. Root = 32768/0/52:54:00:e8:3d:3c  Cost = 0  Port = 0x8002
   19 32.619966633 a6:e2:a1:19:0e:c0 → Spanning-tree-(for-bridges)_00 STP 52 Conf. Root = 32768/0/52:54:00:e8:3d:3c  Cost = 0  Port = 0x8002

Here's the 9+P log
packet-log.zip

from comcom64.

stsp commented on May 26, 2024

So please verify that its not related to the
network packets (by disabling bridge), and
also we have the different versions of some
software, can you try my versions? (from
screenshot)

from comcom64.

andrewbird commented on May 26, 2024

There's nothing on the other side of the bridge yet, output packets should just be disappearing.

Have you a link to the different client?

from comcom64.

andrewbird commented on May 26, 2024

But anyway, I removed tap0 from the bridge.

root@polly:~# brctl delif virbr0 tap0
root@polly:~# brctl show
bridge name	bridge id		STP enabled	interfaces
virbr0		8000.525400e83d3c	yes		virbr0-nic

But still crashes.

Here's the client I'm using
nwclient.zip

from comcom64.

stsp commented on May 26, 2024

pdether.exe.gz
New pdether.

from comcom64.

stsp commented on May 26, 2024

vlm.exe.gz
Older vlm.

from comcom64.

andrewbird commented on May 26, 2024

Crashed with your pdether.exe
Crashed with your vlm.exe

from comcom64.

andrewbird commented on May 26, 2024

I am on 32bit with hardware vm86()

from comcom64.

stsp commented on May 26, 2024

Try cpu_emu, though I dont suppose that
would help...

from comcom64.

stsp commented on May 26, 2024

net.cfg.gz
net.cfg

from comcom64.

andrewbird commented on May 26, 2024

Crashed with '-I cpuemu fullsim'

from comcom64.

stsp commented on May 26, 2024

Crashed with '-I cpuemu fullsim'

As always havent switched $_cpu_vm?

from comcom64.

andrewbird commented on May 26, 2024

Crashed with your net.cfg

from comcom64.

andrewbird commented on May 26, 2024

With -I \'cpuemu fullsim\' -I \'cpu_vm emulated\' I get dosemu crash immediately, don't even see the boot.
log from gdb

Thread 1 "dosemu.bin" received signal SIGSEGV, Segmentation fault.
0xb39fd253 in ?? ()
(gdb) bt
#0  0xb39fd253 in ?? ()
stsp/fdpp#1  0x081412fb in CloseAndExec_x86 (PC=652327, mode=3, ln=623) at codegen-x86.c:3201
stsp/fdpp#2  0x0811eaf4 in _Interp86 (PC=652327, basemode=3) at interp.c:623
stsp/fdpp#3  0x0811d2ce in Interp86 (PC=11003, mod0=3) at interp.c:395
stsp/fdpp#4  0x08131b7f in e_vm86 () at cpu-emu.c:1144
stsp/fdpp#5  0x0810f851 in do_vm86 (x=0x88db060 <vm86u>) at do_vm86.c:433
stsp/fdpp#6  0x0810f8be in _do_vm86 () at do_vm86.c:455
stsp/fdpp#7  0x0811006b in run_vm86 () at do_vm86.c:590
stsp/fdpp#8  0x081100e7 in loopstep_run_vm86 () at do_vm86.c:614
stsp/fdpp#9  0x080affe7 in main (argc=12, argv=0xbffff6e4) at emu.c:422

from comcom64.

andrewbird commented on May 26, 2024

That's another ticket I think!

from comcom64.

andrewbird commented on May 26, 2024

Oh I do see a warning intermittently from codegen-x86.c about not being able to represent 'large number' in an int, so perhaps it's a 32bit thing there.

from comcom64.

stsp commented on May 26, 2024

Dont use fullsim...

from comcom64.

andrewbird commented on May 26, 2024

what then vm86sim?

from comcom64.

stsp commented on May 26, 2024

Yes!

from comcom64.

andrewbird commented on May 26, 2024

Appears the same

[New Thread 0x80cc2b40 (LWP 28467)]
ERROR: fdpp booting, this is very experimental!

Thread 1 "dosemu.bin" received signal SIGSEGV, Segmentation fault.
0xb39fd253 in ?? ()
(gdb) bt
#0  0xb39fd253 in ?? ()
stsp/fdpp#1  0x081412fb in CloseAndExec_x86 (PC=652327, mode=3, ln=623) at codegen-x86.c:3201
stsp/fdpp#2  0x0811eaf4 in _Interp86 (PC=652327, basemode=3) at interp.c:623
stsp/fdpp#3  0x0811d2ce in Interp86 (PC=11003, mod0=3) at interp.c:395
stsp/fdpp#4  0x08131b7f in e_vm86 () at cpu-emu.c:1144
stsp/fdpp#5  0x0810f851 in do_vm86 (x=0x88db060 <vm86u>) at do_vm86.c:433
stsp/fdpp#6  0x0810f8be in _do_vm86 () at do_vm86.c:455
stsp/fdpp#7  0x0811006b in run_vm86 () at do_vm86.c:590
stsp/fdpp#8  0x081100e7 in loopstep_run_vm86 () at do_vm86.c:614
stsp/fdpp#9  0x080affe7 in main (argc=12, argv=0xbffff6e4) at emu.c:422

I think this is a tangent to the original problem.

from comcom64.

stsp commented on May 26, 2024

CloseAndExec_x86 should not be called in vm86sim mode...

from comcom64.

stsp commented on May 26, 2024

Please see why it is, its very simple
(it just shouldn't be registered as a codegen callback)

from comcom64.

andrewbird commented on May 26, 2024

Seems to be switched on config.cpusim value, but that doesn't seem to be modified in global.conf?

from comcom64.

stsp commented on May 26, 2024

#ifdef HOST_ARCH_X86
  if (config.cpusim)
    InitGen_sim();
  else
    InitGen_x86();
#else
  InitGen_sim();
#endif

InitGen_sim() should be in effect for you.
_x86 not.

from comcom64.

stsp commented on May 26, 2024

                | CPUEMU cpuemu
                        {
#ifdef X86_EMULATOR
                        config.cpuemu = $2;
                        if (config.cpuemu > 4) {
                                config.cpuemu -= 2;
#ifdef HOST_ARCH_X86
                                config.cpusim = 1;
#endif
                        }

So its not modified directly from global.conf

from comcom64.

stsp commented on May 26, 2024

cpuemu          : L_OFF         { $$ = 0; }
                | VM86          { $$ = 3; }
                | FULL          { $$ = 4; }
                | VM86SIM       { $$ = 5; }
                | FULLSIM       { $$ = 6; }

Not sure why there is a gap 0...3, but it should work.

from comcom64.

andrewbird commented on May 26, 2024

Looks like it silently accepts two -I options, but one overrides the other. This seems to work
-I 'cpuemu vm86sim cpu_vm emulated'

CONF: config variable c_comline set                                             
Parsing commandline statements.                                                 
CONF: Parsing commandline file.                                                 
CONF: simulated CPUEMU set to 3 for 586                                         
CONF: CPU VM set to 2                                                           
CONF: config variable c_comline unset

from comcom64.

andrewbird commented on May 26, 2024

So finally a new log with -D9+P
log2.zip

from comcom64.

andrewbird commented on May 26, 2024

and yes still crashes with tap0 removed from the bridge.

from comcom64.

andrewbird commented on May 26, 2024

Not sure why there is a gap 0...3, but it should work.

Pretty sure it's for this horrid -= 2 fixup

#ifdef X86_EMULATOR                                                             
                        config.cpuemu = $2;                                     
                        if (config.cpuemu > 4) {                                
                                config.cpuemu -= 2;                             
#ifdef HOST_ARCH_X86                                                            
                                config.cpusim = 1;                              
#endif                                                                          
                        }                                                       
                        c_printf("CONF: %s CPUEMU set to %d for %d86\n",        
                                CONFIG_CPUSIM ? "simulated" : "JIT",            
                                config.cpuemu, (int)vm86s.cpu_type);            
#endif

from comcom64.

stsp commented on May 26, 2024

Please fill the bugs about -I, segfaults
and all that.

from comcom64.

andrewbird commented on May 26, 2024

-I is probably okay, dosemu start script adds multiple ones together, but I was calling the binary directly. Incidentally I noticed that the -valgrind option will have a similar problem, I think.

from comcom64.

andrewbird commented on May 26, 2024

So in the meantime I switched back to my versions and FreeDOS, do you know where Novell has the command line tools to download for querying shares etc is it 'net'?

from comcom64.

stsp commented on May 26, 2024

I created the branch "vg" (in both projects)
with the valgrind changes. Unfortunately I
can't test it right now as dosemu -valgrind
seems to broke for me even w/o this patch...
Could you check what works for you?

from comcom64.

andrewbird commented on May 26, 2024

Using the vg branch of both, valgrind seems to work. Here's the last section before the problem

==26349== 
==26349== Use of uninitialised value of size 4
==26349==    at 0x815152A: Gen_sim (codegen-sim.c:2210)
==26349==    by 0x81201CD: _Interp86 (interp.c:917)
==26349==    by 0x811D2CD: Interp86 (interp.c:395)
==26349==    by 0x8131B7E: e_vm86 (cpu-emu.c:1144)
==26349==    by 0x810F850: do_vm86 (do_vm86.c:433)
==26349==    by 0x810F8BD: _do_vm86 (do_vm86.c:455)
==26349==    by 0x811006A: run_vm86 (do_vm86.c:590)
==26349==    by 0x81100E6: loopstep_run_vm86 (do_vm86.c:614)
==26349==    by 0x80AFFE6: main (emu.c:422)
==26349== 
==26349== Invalid read of size 1
==26349==    at 0x81450DB: Gen_sim (codegen-sim.c:642)
==26349==    by 0x8120A16: _Interp86 (interp.c:983)
==26349==    by 0x811D2CD: Interp86 (interp.c:395)
==26349==    by 0x8131B7E: e_vm86 (cpu-emu.c:1144)
==26349==    by 0x810F850: do_vm86 (do_vm86.c:433)
==26349==    by 0x810F8BD: _do_vm86 (do_vm86.c:455)
==26349==    by 0x811006A: run_vm86 (do_vm86.c:590)
==26349==    by 0x81100E6: loopstep_run_vm86 (do_vm86.c:614)
==26349==    by 0x80AFFE6: main (emu.c:422)
==26349==  Address 0xda1f930 is in a rwx mapped file /dev/shm/dosemu_26349 (deleted) segment
==26349== 
MCB corruption
ERROR: fdpp: abort at memmgr.cc:332
==26381== 
==26381== HEAP SUMMARY:
==26381==     in use at exit: 19,488,680 bytes in 7,864 blocks
==26381==   total heap usage: 68,227 allocs, 60,363 frees, 29,697,750 bytes allocated
==26381== 
==26381== LEAK SUMMARY:
==26381==    definitely lost: 17,533 bytes in 8 blocks
==26381==    indirectly lost: 776 bytes in 1 blocks
==26381==      possibly lost: 41,433 bytes in 1,776 blocks
==26381==    still reachable: 19,428,938 bytes in 6,079 blocks
==26381==         suppressed: 0 bytes in 0 blocks
==26381== Rerun with --leak-check=full to see details of leaked memory
==26381== 
==26381== For counts of detected and suppressed errors, rerun with: -v
==26381== Use --track-origins=yes to see where uninitialised values come from
==26381== ERROR SUMMARY: 3641 errors from 446 contexts (suppressed: 3 from 1)

Are there any options you'd like me to add to valgrind?

from comcom64.

stsp commented on May 26, 2024

OK, -valgrind didn't work for me because of the silly
-pg build. So I merged the valgrind support now, and
I am sure it will help to debug this.
You need to ignore the uninitialized errors because
valgrind doesn't have the notion of r/o memory.
But the problem is, vlm crashes for me under
valgrind w/o "MCB corruption" msg.

from comcom64.

stsp commented on May 26, 2024

Ignore invalid reads.
Hunt for invalid writes.
And update.

from comcom64.

stsp commented on May 26, 2024

Use dosemu -valgrind please.

from comcom64.

andrewbird commented on May 26, 2024

I didn't use the dosemu script, but i harvested your -valgrind options from it

  VLG="valgrind --log-file=valgrind.log"                                        
  VLG_DOSEMU_ARGS="-I 'cpuemu vm86sim cpu_vm emulated cpu_vm_dpmi kvm'"

I hope that's alright?
Here are the log files
t1.zip

Edit: This is back on fdpp/master and dosemu2/devel

from comcom64.

stsp commented on May 26, 2024

No, that's not right.
Where is VG="valgrind --trace-children=yes --track-origins=yes"?
I think you are complicating things w/o any need
for this. Script works.

from comcom64.

stsp commented on May 26, 2024

Also its not seen in the valgrind log the point
where the "MCB corruption" was printed.

from comcom64.

stsp commented on May 26, 2024

From script that would be visible as both print
to a console...

from comcom64.

andrewbird commented on May 26, 2024

I'm using your script now the only way to capture it I know of is `> valgrind.log 2>&1' but that's mixing stdout and stderr streams so synchronisation is dubious
t2.zip

from comcom64.

stsp commented on May 26, 2024

It has only a couple Invalid write of size msgs,
and I've seen that with freecom. Please try comcom32 -
it doesn't produce any errors (not because its that
good, but just because valgrind can't catch kvm).

If the writes are indeed related to freecom, try to
disable dos=high,umb.

from comcom64.

stsp commented on May 26, 2024

I removed the "read" errors.
Not sure why they were there, but they
definitely do not help us fixing this bug.
So please update.

from comcom64.

stsp commented on May 26, 2024

I added more patches, and now, loading
the entire ipx stack, there is only 1 read error
on my screen. If you have more than that,
then we are progressing. :)

from comcom64.

stsp commented on May 26, 2024

And that read error comes from lsl, which
probably traverses the mcb chain by hands.

from comcom64.

andrewbird commented on May 26, 2024

Tested latest git as before with both freecom and comcom32, although comcom seems to crash dosemu itself.
t4-comcom32.zip
t3-freecom.zip

I don't think the comcom crash is anything to do with valgrind as it seems to occur without it too.

C:\NWCLIENT>startnet

C:\NWCLIENT>SET NWLANGUAGE=ENGLISH

C:\NWCLIENT>C:\NWCLIENT\LSL.COM
Novell Link Support Layer for DOS ODI  v2.20 (960401)
(c) Copyright 1990 - 1996, by Novell, Inc. All rights reserved.

BUFFERS 4 1514
The configuration file used was "C:\NWCLIENT\NET.CFG".
Max Boards 4, Max Stacks 4
Buffers 4, Buffer size 1514 bytes, Memory pool 0 bytes.


C:\NWCLIENT>
C:\NWCLIENT>rem C:\NWCLIENT\NE2000.COM

C:\NWCLIENT>C:\pdether\pdether
Ethernet Packet Driver MLID, v1.01, built Nov 01 1991 at 16:30:31
PDEther installed successfully.

C:\NWCLIENT>
C:\NWCLIENT>C:\NWCLIENT\IPXODI.COM

NetWare IPX/SPX Protocol  v3.03  (960611)
(C) Copyright 1990-1995 Novell, Inc.  All Rights Reserved.

Bound to logical board 1 (PDETHER) : Protocol ID 8137

C:\NWCLIENT>C:\NWCLIENT\VLM.EXE
VLM.EXE      - NetWare virtual loadable module manager  v1.21 (960514)
(C) Copyright 1993 - 1996 Novell, Inc.  All Rights Reserved.
Patent pending.
Patent No. 5,349,642.

The VLM.EXE file is pre-initializing the VLMs.............
The VLM.EXE file is using extended memory (XMS).
ERROR: general protection at 0xb6052c10: 50

from comcom64.

andrewbird commented on May 26, 2024

Even though the comcom test was inconclusive I switched back to freecom and tried dos=low first, then dos=low,noumb and the fdpp crash/mcb corruption occurred with both.

from comcom64.

stsp commented on May 26, 2024

Fixed the error from the log.
Please re-do the log and insert manually
the separators so that it is clear what prog
produces what messages.

from comcom64.

stsp commented on May 26, 2024

Improved valgrinding a bit more.

from comcom64.

andrewbird commented on May 26, 2024

Here's the latest log, but I didn't manage to annotate it yet. Will keep trying but really need a DOS builtin command to write to unix stderr or stdout. I tried sprinkling 'unix echo.sh' in the startnet.bat where echo was a shell script that echoed command line to append to the log file, but strangely I got the echoes in the log, and the valgrind output in the DOS window!
t5.zip

from comcom64.

stsp commented on May 26, 2024

Why can't you just annotate it by hands?

from comcom64.

stsp commented on May 26, 2024

Please try again.

from comcom64.

stsp commented on May 26, 2024

I am currently running dosemu -valgrind -D9+ge.
It runs for a few hours already, and is currently in
a process of starting ipxodi. :) Next thing to start is
vlm, and I'll have the full execution trace of the IPX
stack.

from comcom64.

andrewbird commented on May 26, 2024

Annotated by hand log, position found by inserting 'PAUSE' into batchfile at each point. Seems like only VLM has problems.
t6.zip

from comcom64.

stsp commented on May 26, 2024

Cool, thanks.
So most errors are fixed, but the
crash is still there, and valgrind
shows no write errors, so it misses
the problem. This may mean that
the corruption comes from fdpp side
and not from dos side. If this is true,
we are very lucky. Ill work on checking
the fdpp side too, and we ll see.
Current code only checks dos side,
and this gave nothing, so staying
optimistic.

from comcom64.

stsp commented on May 26, 2024

Added code to check for MCB corruption from fdpp side.
Please see the new log.

from comcom64.

andrewbird commented on May 26, 2024

new annotated logs t7.zip

from comcom64.

stsp commented on May 26, 2024

Added more annotations, but this will unlikely
help. For some reason valgrind doesn't see the
corruption for you, but in all my tests it does...
So please update the log, and I'll get back to
this after processing another back-log ot
regressions.

from comcom64.

andrewbird commented on May 26, 2024

Here's the latest (unfortunately I didn't have 'f' debug flag set, but the corruption event is still in test.log)
Strange how the valgrind memchk report ends up in the log, I didn't expect that.
t8.zip

from comcom64.

andrewbird commented on May 26, 2024

I see this in the log

==31987== More than 100 errors detected.  Subsequent errors                     
==31987== will still be recorded, but in less detail than before.

Perhaps that's why you don't see the corruption in my logs?

from comcom64.

andrewbird commented on May 26, 2024

I also repeated the test and used dosdebug 'mcbs' command at each point

before LSL

dosdebug> mcbs
dosdebug> 

ADDR(LOW) PARAS  OWNER
0290:0000 0x0536 [DOS]
  => ADDR      PARAS TYPE USAGE
     0291:0000 0x000c [F] Files
     029e:0000 0x0008 [D] Driver (EMUFS)
     02a7:0000 0x0009 [D] Driver (EMS)
     02b1:0000 0x029e [B] Buffers
     0550:0000 0x0154 [F] Files
     06a5:0000 0x008f [L] CDS Array
     0735:0000 0x0080 [S] Stacks
     07b6:0000 0x0010 [B] Buffers
07c7:0000 ------ [LINK]
0866:0000 0x0006 [FREE]
086d:0000 0x12c1 [COMMAND]
1b2f:0000 0x7e14 [FREE]
9944:0000 0x0619 [COMMAND]
9f5e:0000 0x0090 [COMMAND]
9fef:0000 0x0010 [COMMAND] (END)

before PDETHER

dosdebug> mcbs
dosdebug> 

ADDR(LOW) PARAS  OWNER
0290:0000 0x0536 [DOS]
  => ADDR      PARAS TYPE USAGE
     0291:0000 0x000c [F] Files
     029e:0000 0x0008 [D] Driver (EMUFS)
     02a7:0000 0x0009 [D] Driver (EMS)
     02b1:0000 0x029e [B] Buffers
     0550:0000 0x0154 [F] Files
     06a5:0000 0x008f [L] CDS Array
     0735:0000 0x0080 [S] Stacks
     07b6:0000 0x0010 [B] Buffers
07c7:0000 ------ [LINK]
0866:0000 0x0006 [FREE]
086d:0000 0x00bc [COMMAND]
092a:0000 0x000f [FREE]
093a:0000 0x0306 [LSL]
0c41:0000 0x12c1 [02158]
1f03:0000 0x805a [FREE]
9f5e:0000 0x0090 [COMMAND]
9fef:0000 0x0010 [COMMAND] (END)

before IPXODI

dosdebug> mcbs
dosdebug> 

ADDR(LOW) PARAS  OWNER
0290:0000 0x0536 [DOS]
  => ADDR      PARAS TYPE USAGE
     0291:0000 0x000c [F] Files
     029e:0000 0x0008 [D] Driver (EMUFS)
     02a7:0000 0x0009 [D] Driver (EMS)
     02b1:0000 0x029e [B] Buffers
     0550:0000 0x0154 [F] Files
     06a5:0000 0x008f [L] CDS Array
     0735:0000 0x0080 [S] Stacks
     07b6:0000 0x0010 [B] Buffers
07c7:0000 ------ [LINK]
0866:0000 0x0006 [FREE]
086d:0000 0x00bc [COMMAND]
092a:0000 0x000f [FREE]
093a:0000 0x0306 [LSL]
0c41:0000 0x00b5 [PDETHER]
0cf7:0000 0x12c1 [02158]
1fb9:0000 0x7fa4 [FREE]
9f5e:0000 0x0090 [COMMAND]
9fef:0000 0x0010 [COMMAND] (END)

before VLM

dosdebug> mcbs
dosdebug> 

ADDR(LOW) PARAS  OWNER
0290:0000 0x0536 [DOS]
  => ADDR      PARAS TYPE USAGE
     0291:0000 0x000c [F] Files
     029e:0000 0x0008 [D] Driver (EMUFS)
     02a7:0000 0x0009 [D] Driver (EMS)
     02b1:0000 0x029e [B] Buffers
     0550:0000 0x0154 [F] Files
     06a5:0000 0x008f [L] CDS Array
     0735:0000 0x0080 [S] Stacks
     07b6:0000 0x0010 [B] Buffers
07c7:0000 ------ [LINK]
0866:0000 0x0006 [FREE]
086d:0000 0x00bc [COMMAND]
092a:0000 0x000f [FREE]
093a:0000 0x0306 [LSL]
0c41:0000 0x00b5 [PDETHER]
0cf7:0000 0x040f [IPXODI]
1107:0000 0x12c1 [02158]
23c9:0000 0x7b94 [FREE]
9f5e:0000 0x0090 [COMMAND]
9fef:0000 0x0010 [COMMAND] (END)

after crash

dosdebug> mcbs
dosdebug> 

ADDR(LOW) PARAS  OWNER
0290:0000 0x0536 [DOS]
  => ADDR      PARAS TYPE USAGE
     0291:0000 0x000c [F] Files
     029e:0000 0x0008 [D] Driver (EMUFS)
     02a7:0000 0x0009 [D] Driver (EMS)
     02b1:0000 0x029e [B] Buffers
     0550:0000 0x0154 [F] Files
     06a5:0000 0x008f [L] CDS Array
     0735:0000 0x0080 [S] Stacks
     07b6:0000 0x0010 [B] Buffers
07c7:0000 ------ [LINK]
0866:0000 0x0006 [FREE]
086d:0000 0x00bc [COMMAND]
092a:0000 0x000f [04360]
093a:0000 0x0306 [LSL]
0c41:0000 0x00b5 [PDETHER]
0cf7:0000 0x040f [IPXODI]
1107:0000 0x0155 [VLM]
125d:0000 0x7804 [FREE]
8a62:0000 0x058c [04360]
8fef:0000 0x0fff [FREE]
9fef:0000 0x0010 [COMMAND] (END)

from comcom64.

andrewbird commented on May 26, 2024

that bad memory was always zero until sometime during VLM load

dosdebug> d cc0d:0000
dosdebug> 

cc0d:0000 06 CF 00 00 00 00 00 00 00 00 00 00 00 00 00 00  .O..............
cc0d:0010 00 00 54 CF 74 02 0E CC E1 02 0E CC 6B 03 0E CC  ..TOt..La..Lk..L
cc0d:0020 8C 03 0E CC 79 04 0E CC 02 05 0E CC F4 00 0E CC  ...Ly..L...Lt..L
cc0d:0030 14 01 0E CC 56 05 0E CC 0D 05 0E CC 00 00 00 00  ...LV..L...L....
cc0d:0040 4E 56 6C 6D 40 00 BB 00 00 55 BD 40 00 55 BD 01  NVlm@.;[email protected]=.
cc0d:0050 00 55 BD 04 00 55 2E FF 1E C8 29 5D C3 55 BD 40  .U=..U..H)]CU=@
cc0d:0060 00 55 BD 01 00 55 BD 01 00 55 2E FF 1E C8 29 5D  .U=..U=..U..H)]
cc0d:0070 C3 55 BD 40 00 55 BD 43 00 55 BD 06 00 55 2E FF  [email protected]=C.U=..U.

and it looks like code not data nor minor corruption of an MCB

dosdebug> u  cc0b:0000
dosdebug> 

cc0b:0000 9C               pushf 
cc0b:0001 3D0516           cmp  ax,1605
cc0b:0004 7406             je   000C ($+6)
cc0b:0006 9D               popf 
cc0b:0007 2EFF2E2706       jmp  far word cs:[0627]
cc0b:000c 2EFF1E2706       call far word cs:[0627]
cc0b:0011 06               push es
cc0b:0012 0E               push cs
cc0b:0013 07               pop  es
cc0b:0014 58               pop  ax
cc0b:0015 26891E2E06       mov  es:[062E],bx
cc0b:001a 26A33006         mov  es:[0630],ax
cc0b:001e BB2C06           mov  bx,062C
dosdebug> u
dosdebug> 

cc0b:0021 CF               iret

I guess we need to see the previous entry in the MCB chain that led us here.

from comcom64.

stsp commented on May 26, 2024

Please update the logs.

from comcom64.

andrewbird commented on May 26, 2024

New logs t9.zip

from comcom64.

andrewbird commented on May 26, 2024

I am currently running dosemu -valgrind -D9+ge.
It runs for a few hours already, and is currently in
a process of starting ipxodi. :)

I notice that after every run there are two orphaned valgrind processes left behind. I kill them (-9) but the system never feels responsive and the following valgrind runs take longer until I've rebooted the machine.

from comcom64.

stsp commented on May 26, 2024

If you know what exactly mcb gets
corrupted (it seems you do), maybe
you can check that the appropriate
fd_prot_mem() is called, and then
filter out all other addresses from that
call, to reduce the logging?

from comcom64.

stsp commented on May 26, 2024

the system never feels responsive and the following valgrind runs
take longer until I've rebooted the machine.

Maybe things got swapped out?
I've got 20gigorama and 8core CPU for
properly debugging dosemu2.

from comcom64.

stsp commented on May 26, 2024

I guess to properly run it under valgrind
with -D9+e, we'd need to wait for 20THz
cpus. :)

from comcom64.

andrewbird commented on May 26, 2024

If I wanted to tweak fdpp to log the binary at load, how would I convert the FP_DS_DX to char * for fdebug to handle?

diff --git a/kernel/inthndlr.c b/kernel/inthndlr.c
index d0f62c8..aaae7e7 100644
--- a/kernel/inthndlr.c
+++ b/kernel/inthndlr.c
@@ -1074,6 +1074,8 @@ dispatch:
     case 0x4b:
       break_flg = FALSE;
 
+      fdebug("#################### int21/4Bh Program exec\n");
+
       rc = DosExec(lr.AL, MK_FP(lr.ES, lr.BX), FP_DS_DX);
       goto short_check;

I injured my back again, so I can't sit for long periods without 'paying the price' when I get back up again!

from comcom64.

andrewbird commented on May 26, 2024

New logs t10.zip

from comcom64.

stsp commented on May 26, 2024

If I wanted to tweak fdpp to log the binary at load, how would I convert the FP_DS_DX to char *

For example with GET_PTR() macro.

I injured my back again, so I can't sit for long periods

fdpp is not worth the health!
So please get well. :)
I'll take the care of it while you are away.

from comcom64.

stsp commented on May 26, 2024

Would be good to get ssh for this btw.

from comcom64.

stsp commented on May 26, 2024

It appears if you remove all *.vlm files, then no
crash. This is why I wasn't able to reproduce it.
Now I do, though not with MCB corruption, it just
crashes.

Some debugging.
If you produce the -D+e log and search for
call 02B5 in it, you'll see the following:

        esi=000c0000 edi=00000059 ebp=0000356e esp=0000355c
         vf=000b7216  cs=a28c      ds=a1d9      es=c217
         fs=0000      gs=0000      ss=8a73     flg=00030213
        stk=0ace 8a73 356e 8a73 113a 8a73 1399 f6f4 0000 091e
  
Fetch e80114e8 at 000a2a5e mode 3
  000a2a5e: e81401            a28c:019e call 02B5 ($+114)
CALL: ret=000001a1
** Jump taken to 000a2b75
(R) DR1=0000070a DR2=0000040a AR1=404ed462 AR2=404d5730
(R) SR1=00003556 TR1=000006d2
(R) RFL m=[BDA] v=0 cout=00000000 RES=00000000
== (1454) == Closing sequence at 000a2a61
(R) DR1=0000070a DR2=0000040a AR1=404ed462 AR2=404d5730
(R) SR1=00003556 TR1=000006d2
(R) RFL m=[BDA] v=0 cout=00000000 RES=00000000
        

        eax=00000a07 ebx=0000070a ecx=00000000 edx=0000001a
        esi=000c0000 edi=00000059 ebp=0000356e esp=0000355a
         vf=000b7216  cs=a28c      ds=a1d9      es=c217
         fs=0000      gs=0000      ss=8a73     flg=00030213
        stk=01a1 0ace 8a73 356e 8a73 113a 8a73 1399 f6f4 0000

Now search further for nearest pop es:

       eax=000002a9 ebx=00000024 ecx=00000000 edx=000002aa
        esi=000c0000 edi=00000059 ebp=0000356e esp=0000355a
         vf=000b7212  cs=a28c      ds=a1d9      es=5252
         fs=0000      gs=0000      ss=8a73     flg=00030246
        stk=01a1 0ace 8a73 356e 8a73 113a 8a73 1399 f6f4 0000

Fetch 20b85a07 at 000a2c07 mode 3
  000a2c07: 07                a28c:0347 pop  es
(G) O_POP        MODE+1 [DA]
(V) 000001a1
(R) DR1=000001a1 DR2=0000040a AR1=404ed43e AR2=404d5730
(R) SR1=0000355c TR1=000006ae
(R) RFL m=[DA] v=2 cout=00000000 RES=000002a9
(G) S_REG_WL       ES [DA]
(R) DR1=000001a1 DR2=0000040a AR1=404ed43e AR2=404d5730
(R) SR1=0000355c TR1=000006ae
(R) RFL m=[DA] v=2 cout=00000000 RES=000002a9
(G) A_SR_SH4       ES [DA]
SetSeg REAL ES:01a1

        eax=000002a9 ebx=00000024 ecx=00000000 edx=000002aa
        esi=000c0000 edi=00000059 ebp=0000356e esp=0000355c
         vf=000b7212  cs=a28c      ds=a1d9      es=01a1
         fs=0000      gs=0000      ss=8a73     flg=00030246
        stk=0ace 8a73 356e 8a73 113a 8a73 1399 f6f4 0000 091e

Fetch 7a20b85a at 000a2c08 mode 3
  000a2c08: 5a                a28c:0348 pop  dx
(G) O_POP1       [DA]
(R) DR1=000001a1 DR2=0000040a AR1=404ed43e AR2=404d5730
(R) SR1=0000355c TR1=000006ae
(R) RFL m=[DA] v=2 cout=00000000 RES=000002a9
(G) O_POP2        EDX [DA]
(V) 00000ace
(R) DR1=00000ace DR2=0000040a AR1=404ed43e AR2=404d5730
(R) SR1=0000355e TR1=000006ae
(R) RFL m=[DA] v=2 cout=00000000 RES=000002a9
(G) O_POP3       [DA]
(R) DR1=00000ace DR2=0000040a AR1=404ed43e AR2=404d5730
(R) SR1=0000355e TR1=000006ae
(R) RFL m=[DA] v=2 cout=00000000 RES=000002a9

        eax=000002a9 ebx=00000024 ecx=00000000 edx=00000ace
        esi=000c0000 edi=00000059 ebp=0000356e esp=0000355e
         vf=000b7212  cs=a28c      ds=a1d9      es=01a1
         fs=0000      gs=0000      ss=8a73     flg=00030246
        stk=8a73 356e 8a73 113a 8a73 1399 f6f4 0000 091e 051c

So we can see that at that point ES contains the return
address and DX got a subsequent stack word, so the state
is corrupted right here. The actual crash happens much
much later when ret later goes to junk address.
So in this debugging session I was able to trace from the
actual crash back to the state corruption. But more debugging
is needed to find out why exactly it pops return address by
mistake.

from comcom64.

stsp commented on May 26, 2024

logs.tar.gz

I distilled the ~500Gb logs into a smaller ones
that just contain the problematic function call
under freedos (good) and fdpp (bad).
They can be made even smaller by cutting off
at pop es point:
logs_trim.tar.gz

One has esp=0000355c (bad) and another esp=00003558 (good).
In the good log you can search for push:

  000c5bac: 52                c58b:02fc push dx
(G) O_PUSH1      [DA]
(R) DR1=0000c41b DR2=0000080a AR1=41dd41a1 AR2=41d9a730
(R) SR1=0000355a TR1=00000001
(R) RFL m=[DA] v=2 cout=00000000 RES=00000000
(G) O_PUSH2       EDX [DA]
(V) 0000c41b
(R) DR1=0000c41b DR2=0000080a AR1=41dd41a1 AR2=41d9a730
(R) SR1=00003558 TR1=00000001
(R) RFL m=[DA] v=2 cout=00000000 RES=00000000
(G) O_PUSH2        ES [DA]
(V) 0000c41a
(R) DR1=0000c41a DR2=0000080a AR1=41dd41a1 AR2=41d9a730
(R) SR1=00003556 TR1=00000001
(R) RFL m=[DA] v=2 cout=00000000 RES=00000000
(G) O_PUSH3      [DA]
(R) DR1=0000c41a DR2=0000080a AR1=41dd41a1 AR2=41d9a730
(R) SR1=00003556 TR1=00000001
(R) RFL m=[DA] v=2 cout=00000000 RES=00000000

        eax=00000a07 ebx=00000024 ecx=00000000 edx=0000c41b
        esi=00000000 edi=00000059 ebp=0000356e esp=00003556
         vf=00093246  cs=c58b      ds=c4d8      es=c41a
         fs=0c07      gs=c42b      ss=8a73     flg=00030212
        stk=c41a c41b 01a1 0ace 8a73 356e 8a73 0c17 8a73 1399

Which means that dx&es were pushed.
In a bad log there is no such part.

from comcom64.

stsp commented on May 26, 2024

Made a disasm diff:
asm.diff.txt.gz
And the problematic hunk is:

 02ee cmp  dx,es:[0026]
 02f3 jne  02C2 ($-33)
-02f5 cmp  dx,es:[0001]
-02fa jne  02C2 ($-3a)
-02fc push dx
-02fe mov  bx,es:[0003]
-0303 mov  [06B2],bx
-0307 mov  bx,es:[003C]
-030c test bx,bx
-030e je   0318 ($+8)
-0310 cmp  word [06D2],031E
-0316 jnc  032E ($+16)
+02c2 add  dx,es:[0003]
+02c7 mov  es,dx
+02c9 inc  dx
+02ca cmp  byte es:[0000],4D
+02d0 je   02E5 ($+13)
+02e5 cmp  word es:[0010],20CD
+02ec jne  02C2 ($-2c)
+02c2 add  dx,es:[0003]
+02c7 mov  es,dx
+02c9 inc  dx
+02ca cmp  byte es:[0000],4D
+02d0 je   02E5 ($+13)
+02e5 cmp  word es:[0010],20CD
+02ec jne  02C2 ($-2c)
+02c2 add  dx,es:[0003]
+02c7 mov  es,dx
+02c9 inc  dx
+02ca cmp  byte es:[0000],4D
+02d0 je   02E5 ($+13)
+02e5 cmp  word es:[0010],20CD
+02ec jne  02C2 ($-2c)
+02c2 add  dx,es:[0003]
+02c7 mov  es,dx
+02c9 inc  dx
+02ca cmp  byte es:[0000],4D
+02d0 je   02E5 ($+13)
+02d2 cmp  byte cs:[0510],FF
+02d8 je   0325 ($+4b)
+0325 xor  ax,ax
+0327 mov  es,ax
+0329 mov  dx,es:[00BA]
 032e mov  es,dx
 0330 mov  dx,es:[002C]

It search for some signatures, seemingly PSP (20CD),
and in a freedos case, finds whatever it needs. In fdpp
case it finds nothing and that triggers a crasher bug in
vlm itself. It just is not prepared to not find what it was
looking for.
The search procedure must be studied a bit more.
Full disasm traces, distilled:
disasms.tar.gz
They are quite small, 130 and 225 lines only.
I wonder if you can do the rest. :)

from comcom64.

stsp commented on May 26, 2024

Note that push dx above is actually
push dx ; push es because simx86
optimized them into a single push of
2 registers. I think its kinda wrong to
do such optimization on an interpreter
level (it could do it at codegen backend
instead), but this is it.

from comcom64.

stsp commented on May 26, 2024

Ok, so it walks an mcb chain
searching for a self-owned PSP
(parent_psp==self), then it also
checks that mcb is owned by that
psp. I.e. it just searches for the
master psp of some process, but
finds nothing. So either fdpp produces
the corrupted psp, or the initial
search address is wrong.
Problem deciphered.

from comcom64.

stsp commented on May 26, 2024

Mm, it is looking for a shell, because only
shell seems to have a self-owned PSP.
Why would it look for a shell...

from comcom64.

stsp commented on May 26, 2024

Damn, comcom32 bug...

from comcom64.

andrewbird commented on May 26, 2024

Thing is I'm seeing this with Freecom, perhaps there are two bugs?

from comcom64.

stsp commented on May 26, 2024

Very likely...
But this is all I could reproduce.

from comcom64.

stsp commented on May 26, 2024

Works very well for me now.
Please open another ticket if this is still a problem.

from comcom64.

stsp commented on May 26, 2024

And if you do, please make sure the MCB
of command.com is self-owned.

from comcom64.

stsp commented on May 26, 2024

I mean, with comcom32 it crashed for me
even under freedos. So this definitely had
to be fixed before anything else.

from comcom64.

andrewbird commented on May 26, 2024

Here's FreeCOM's MCB, you can see the owner is itself so I think that's all good

dosdebug> mcbs
dosdebug> 

ADDR(LOW) PARAS  OWNER
0291:0000 0x0536 [DOS]
  => ADDR      PARAS TYPE USAGE
     0292:0000 0x000c [F] Files
     029f:0000 0x0008 [D] Driver (EMUFS)
     02a8:0000 0x0009 [D] Driver (EMS)
     02b2:0000 0x029e [B] Buffers
     0551:0000 0x0154 [F] Files
     06a6:0000 0x008f [L] CDS Array
     0736:0000 0x0080 [S] Stacks
     07b7:0000 0x0010 [B] Buffers
07c8:0000 ------ [LINK]
0871:0000 0x0006 [FREE]
0878:0000 0x00bc [COMMAND]
0935:0000 0x000f [04371]
0945:0000 0x0306 [LSL]
0c4c:0000 0x00b5 [PDETHER]
0d02:0000 0x040f [IPXODI]
1112:0000 0x015f [VLM]
1272:0000 0x77ef [04371]
8a62:0000 0x058c [04371]
8fef:0000 0x0fff [FREE]
9fef:0000 0x0010 [COMMAND] (END)
dosdebug> d 0878:0000
dosdebug> 

0878:0000 4D 79 08 BC 00 00 00 00 43 4F 4D 4D 41 4E 44 00  My.<....COMMAND.

0878:0010 CD 20 06 1C 00 9A C0 00 00 00 0C 0A 89 08 25 F1  M ....@.......%q
0878:0020 00 F0 EF 0F D9 00 79 08 01 01 01 00 02 FF FF FF  .po.Y.y......
0878:0030 FF FF FF FF FF FF FF FF FF FF FF FF F0 9F 7E 08  p.~.
0878:0040 89 08 14 00 18 00 79 08 00 00 60 00 00 00 00 00  ......y...`.....
0878:0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0878:0060 CD 21 CB 00 00 00 00 00 00 00 00 00 00 20 20 20  M!K..........   
0878:0070 20 20 20 20 20 20 20 20 00 00 00 00 00 20 20 20          .....

I'm going to stop pursuing this as I've pretty much forgotten why I was trying this network client anyway. Unfortunately I use this machine for private stuff so I can't open it to the Internet, so let's leave it unresolved and perhaps a better test case will present itself sometime.

from comcom64.

stsp commented on May 26, 2024

Can anyone else still reproduce that, even
with the fixed comcom32?

from comcom64.

stsp commented on May 26, 2024

Built with -m32 and still no crash.
So this is not a 32bit-specific.
Could you please upload a self-contained test-case?

from comcom64.

andrewbird commented on May 26, 2024

Here's what I have nwtest.tar.gz

1/ cd nwclient
2/ startnet
3/ Crash on VLM load of NETX.VLM

from comcom64.

stsp commented on May 26, 2024

Reproduced.

from comcom64.

stsp commented on May 26, 2024

So the crash happens because of your
dos=low,noumb which is not in a default
config. I am shocked you haven't told me
that for over a week...

from comcom64.

andrewbird commented on May 26, 2024

The dos=low,noumb is an artefact from the experiment you asked me to do in #27 (comment). Before I had no dos=, would that equate to noumb?

from comcom64.

stsp commented on May 26, 2024

I had no dos=, would that equate to noumb?

Of course!

from comcom64.

stsp commented on May 26, 2024

Please, test with the default setup first,
otherwise this is getting ridiculously time-consuming
every now and then.

from comcom64.

NWClient VLM.EXE causes FDPP crash about comcom64 HOT 107 CLOSED

Comments (107)

$ cat test-imagedir/dosemu.conf

before LSL

before PDETHER

before IPXODI

before VLM

after crash

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent