att / ast Goto Github PK

AST - AT&T Software Technology

License: Eclipse Public License 1.0

HTML 3.25% Shell 4.68% Makefile 1.31% C 66.32% Scilab 14.99% Roff 6.04% C++ 0.29% Objective-C++ 0.01% Rich Text Format 0.59% Awk 0.16% Scheme 0.01% Objective-C 0.18% Yacc 0.09% PostScript 0.45% Tcl 1.63%

ksh ksh93 kornshell unix linux shell

ast's Introduction

AST

This is the AT&T Software Technology (AST) toolkit from AT&T Research. It includes many tools and libraries, like KSH, NMAKE, SFIO, VMALLOC, VCODEX, etc. It also includes more efficient replacements for a lot of the POSIX tools. It was designed to be portable across many UNIX systems and also works under UWIN on Microsoft Windows (see UWIN repo on GitHub under att/uwin).

ksh93u+ and v-

This repo contains the ksh93u+ and ksh93v- versions of KSH.

ksh93u+, the master branch, was the last version released by the main AST authors in 2012, while they were at AT&T. It also has some later build fixes but it is not actively maintained.
ksh93v-, ksh93v tag, contains contributions from the main authors through 2014 (after they left) and is considered less stable

Please search the web for forks of this repo (or check the Network graph on GitHub) if you are looking for an actively maintained version of ksh.

Build

This software is used to build itself, using NMAKE. After cloning this repo, cd to the top directory of it and run:

./bin/package make

Almost all the tools in this package (including the bin/package script are self-documenting; run --man (or --html) for the man page for the tool.

(If you were used to the old AST packaging mechanism, on www.research.att.com, this repo is equivalent to downloading the INIT and ast-open packages and running: ./bin/package read on them).

ast's People

Contributors

Stargazers

Watchers

Forkers

shwaresystems saper lijog rijalati mbhatia dannyweldon islecode rhencke wsuetholz larryv webciao stellabs gits68 cstroe llua philippe56 aartea a-yiorgos kdudka christopher302014 jelmd sneyx123 andrewgreen5610 neuroradiology catull leahneukirchen ignatenkobrain mvertes masalomon tnoytzw alexxnica kryndex phoenixml c744402859 kernigh westmere crownbonded hybridious bitstreamout jon-turney etscrivner happy-ferret jmcabandara sskras slunski mdomans habibtalib cloudxtreme brdjns-zz ubuntu-repo matttproud geleser structure-charger sneyx1234 pombredanne envp danyspin97 damienlog 98-f355-f1 brunotrindade-broadcom cnxtech orbea microhexhq perlguy fcemarslan proteanthread jens-maus jghub kernelknight hosiet siteshwar testus-org lkoutsofios dataix testcamelcase 5l1v3r1 alansaniewhitesource treym-wss yixiangzhike datuser ryanwoodsmall octokas octurite ksh93 hixio-mh 00mjk pbergheaud illumos hbnworkstation ramonbrugman gkamat jlong2490 cherry2tm gabrielvicenteyt dyslexictech davidbroughsmyth grylem lkujaw ekmixon seanwallawalla-forks

ast's Issues

Wrong syntax for the "suspend" alias in ksh93

In ksh93 (and mksh and ksh88), suspend is defined as:

suspend='kill -s STOP $$'

Unquoted parameter expansion means to ask the shell to perform word splitting and globbing on it.

$ echo "$$"
24725
$ IFS=7
$ suspend
kill: 24: permission denied
kill: 25: permission denied

It should be:

alias suspend='kill -s STOP "$$"'

bogus NLSPATH handling

NLSPATH resolver bug needs to be fixed: see jelmd#1

nmake needs to be build with -D_FORTIFY_SOURCE=0 on MacOSX or buffer overlap protection kills it

During my tries to get ksh compiled on OSX El Capitan using the ast build environment I ran into the problem that I am presented with some "Abort trap 6" messages as soon as nmake is running of the ksh sources:

$ bin/package make ksh93 SHELL=sh

package: initialize the /Users/maus/Documents/projekte/ksh-beta/arch/darwin.i386 view
package: update /Users/maus/Documents/projekte/ksh-beta/arch/darwin.i386/bin/cc
package: update /Users/maus/Documents/projekte/ksh-beta/arch/darwin.i386/bin/ldd
package: update /Users/maus/Documents/projekte/ksh-beta/arch/darwin.i386/lib/probe/C/make/probe
package: update /Users/maus/Documents/projekte/ksh-beta/arch/darwin.i386/bin/mamake
[...]
probing C language processor /Users/maus/Documents/projekte/ksh-beta/arch/darwin.i386/bin/cc for make information
++ set -
cmd/INIT:
sh: line 114: 68354 Abort trap: 6           /Users/maus/Documents/projekte/ksh-beta/arch/darwin.i386/bin/nmake --ignorelock --keepgoing --errorid=cmd/INIT .RWD.=cmd/INIT RECURSEROOT=.. believe
make: *** termination code 6 making cmd/INIT

Looking at the MacOSX system log files a crash is reported within nmake:

Process:               nmake [68354]
Path:                  /Users/USER/Documents/*/nmake
Identifier:            nmake
Version:               0
Code Type:             X86 (Native)
Parent Process:        ??? [68353]
Responsible:           nmake [68354]
User ID:               501

Date/Time:             2016-03-09 18:24:05.117 +0100
OS Version:            Mac OS X 10.11.4 (15E49a)
Report Version:        11
Anonymous UUID:        EDEE8ECF-E07E-787D-E6DF-2B5B6B158D92


Time Awake Since Boot: 1200000 seconds

System Integrity Protection: disabled

Crashed Thread:        0  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_CRASH (SIGABRT)
Exception Codes:       0x0000000000000000, 0x0000000000000000
Exception Note:        EXC_CORPSE_NOTIFY

Application Specific Information:
detected source and destination buffer overlap

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib          0x9e3dd572 __pthread_kill + 10
1   libsystem_pthread.dylib         0x92438654 pthread_kill + 101
2   libsystem_c.dylib               0x9a5dbd00 __abort + 187
3   libsystem_c.dylib               0x9a5dbc45 abort + 173
4   libsystem_c.dylib               0x9a5dbd7f abort_report_np + 82
5   libsystem_c.dylib               0x9a60aad1 __chk_fail + 54
6   libsystem_c.dylib               0x9a60aae8 __chk_fail_overlap + 23
7   libsystem_c.dylib               0x9a60ab23 __chk_overlap + 59
8   libsystem_c.dylib               0x9a60ad29 __strcpy_chk + 72
9   nmake                           0x000d5515 resetvar + 341
10  nmake                           0x000d4ca8 setvar + 1576
11  nmake                           0x000b16cf assignment + 1295
12  nmake                           0x000ab8e9 parse + 1433
13  nmake                           0x00064432 apply + 706
14  nmake                           0x000ade10 assertion + 224
15  nmake                           0x000ab8cb parse + 1403
16  nmake                           0x000baf2e readfp + 6590
17  nmake                           0x000b9197 readfile + 1335
18  nmake                           0x000884b2 main + 7938
19  libdyld.dylib                   0x95e2c6ad start + 1

This crashlog suggests that the source and destination buffer in the strcpy() call in resetvar() overlaps and thus MacOSX is terminating the nmake process resulting in the "Abort trap: 6" messages above.

Using -D_FORTIFY_SOURCE=0 when calling bin/package make seems to workaround this problem. However, the build then fails at another sudden point (probably due to the still existing buffer overlap problem which is simply not reported anymore):

$ bin/package make ksh93 SHELL=sh CCFLAGS=-D_FORTIFY_SOURCE=0

[...]
cpp: "/Users/maus/Documents/projekte/ksh-beta/src/cmd/ksh93/include/shell.h", line 172: cmd.h: cannot find include file
cpp: "FEATURE/dynamic", line 10: dlldefs.h: cannot find include file
cpp: "/Users/maus/Documents/projekte/ksh-beta/src/cmd/ksh93/include/shell.h", line 172: cmd.h: cannot find include file
cpp: "FEATURE/dynamic", line 10: dlldefs.h: cannot find include file
cpp: "/Users/maus/Documents/projekte/ksh-beta/src/cmd/ksh93/include/shell.h", line 172: cmd.h: cannot find include file
make [cmd/ksh93]: *** exit code 2 making cd_pwd.o
make [cmd/ksh93]: *** exit code 2 making cflow.o
make [cmd/ksh93]: *** exit code 1 making deparse.o
cpp: "/Users/maus/Documents/projekte/ksh-beta/src/cmd/ksh93/include/shell.h", line 172: cmd.h: cannot find include file
cpp: "/Users/maus/Documents/projekte/ksh-beta/src/cmd/ksh93/include/shell.h", line 172: cmd.h: cannot find include file
cpp: "FEATURE/dynamic", line 10: dlldefs.h: cannot find include file
[...]

I am using MacOSX 10.11.4 with Xcode 7.2.1 (7C1002) ending up in Apple LLVM version 7.0.2 (clang-700.1.81) being used for compilation.

ksh93: "$*" joins positional parameters on the first byte of $IFS instead of first character

$ ksh -c 'IFS=é; set : :; echo "$*"' | hd
00000000  3a c3 3a 0a                                       |:.:.|
00000004
$ echo é | hd
00000000  c3 a9 0a                                          |...|
00000003
$ locale charmap
UTF-8

Expected :é: (3a c3 a9 3a 0a)

POSIX says it must be the first character not byte http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_05_02

bash and zsh use the first character.

variable exported in function in a subshell is also visible in a different subshell

I've observed this issue in a Solaris 11 system with all the ksh93 versions(alpha, beta and the master versions) as well as in Ubuntu 14 ( ksh2012).

Here's a testcase which reproduces the issue.

# cat test1.ksh
# !/usr/bin/ksh

function proxy { 
         export MYVAR="bla" 
        child 
        unset MYVAR 
} 

function child { 
        echo "MYVAR=$MYVAR" >> /var/tmp/debug.log 
} 

function test { 
        $(child) 
        $(proxy) 
        $(child) 
} 

rm /var/tmp/debug.log 
test 
cat /var/tmp/debug.log 
# ./test1.ksh

MYVAR= 
MYVAR=bla 
MYVAR=bla <------------------------------ this should not happen 
#

The following patch which removes an optimization fixes the issue. If there is any other patch, please let me know.

--- INIT.2012-08-01.old/src/cmd/ksh93/sh/subshell.c     2016-03-01 04:01:06.513890578 -0800
+++ INIT.2012-08-01/src/cmd/ksh93/shsubshell.c  2016-03-01 04:02:43.617872391 -0800
@@ -260,9 +260,6 @@
        shp = sp->shp;
        dp = shp->var_tree;
-       /\* don't bother to save if in newer scope */
-       if(sp->var!=shp->var_tree && sp->var!=shp->var_base && shp->last_root==shp->var_tree)
-               return(np);
      if((ap=nv_arrayptr(np)) && (mp=nv_opensub(np)))
      {
              shp->last_root = ap->table;

The issue seems to be happening because of the following optimization in sh_assignok function in sh/subshell.c.


 /\* don't bother to save if in newer scope */ 
 if(sp->var!=shp->var_tree && shp->last_root==shp->var_tree) 
        return(np);

This optimization prevents saving the variables in situations where they don't need to be restored. If we remove the optimization, the issue will go away.

ksh93 echoing wrong output due to missing EIO handling during logout

Here's a reproducible testcase on a Solaris11 host running ksh93u+(2012-08-01).
$ cat a.sh
#!/bin/sh

AAA="aaa"
echo 'insert character'
BBB=echo ${AAA} | sed "s/aaa/bbb/g"
logger "variable BBB = ${BBB}"

$ cat t.sh
#!/bin/ksh

sleep 10
/bin/ksh ./a.sh
exit 0

$ ./t.sh

The expected result is:

Apr 9 12:43:34 lab user: [ID 702911 user.notice] variable BBB = bbb

because variable "BBB" is supposed to be set to 'bbb' in a.sh.

But if the parent shell is terminated, the variable is wrongly set.

user@xxxxx$ telnet lab
...
$ ./t.sh & <--- Run t.sh in background.
[1] 2067
$ logout <--- CTRL + D to exit while t.sh is running.
Connection to lab closed by foreign host.

Again, access the system and check the output:

user@xxxxx$ telnet lab
...
$ tail -f /var/adm/messages
:
Apr 9 12:47:47 lab user: [ID 702911 user.notice] variable BBB = insert
character <--- !!!
Apr 9 12:47:47 lab bbb
<--- !!!

Thus the variable is wrongly set. (The previous echo string was not cleared.)

The issue happens because the EIO error during the logout is not handled properly.
The following patch fixes the issue

--- INIT.2012-08-01.old/src/cmd/ksh93/sh/io.c 2017-01-04 14:41:25.199402375 +0000
+++ INIT.2012-08-01/src/cmd/ksh93/sh/io.c 2017-01-04 14:32:20.279449987 +0000
@@ -64,9 +64,9 @@

#ifndef ERROR_PIPE
#ifdef ECONNRESET
-#define ERROR_PIPE(e) ((e)==EPIPE||(e)==ECONNRESET)
+#define ERROR_PIPE(e) ((e)==EPIPE||(e)==ECONNRESET||(e)==EIO)
#else
-#define ERROR_PIPE(e) ((e)==EPIPE)
+#define ERROR_PIPE(e) ((e)==EPIPE||(e)==EIO)
#endif
#endif

ksh93 dumps core in emacs mode while entering characters in different locale.

I observed this issue in a Solaris 11 system on ksh2012-08-01, ie the master version. I guess this issue is present in the later versions too as the relevant code has not changed.

The issue can be reproduced if we add Asian locales to ibus (such as Korean).
In the ksh93 shell prompt, input some Asian character. ksh promptly dumps core with the following stacktrace.

bash-4.2$ pstack core
core 'core' of 1134: ksh
00000000004f1cf4 ed_emacsread () + 404
00000000004a5096 slowread () + 116
0000000000592d12 sfrd () + 482
000000000058b707 _sffilbuf () + 297
00000000005936ac sfreserve () + 2ac
0000000000477ae2 exfile () + 6e2
0000000000477393 sh_main () + af3
00000000004767dd main () + 4d
0000000000476614 ???????? ()

The coredump happens at the following line no 320 in src/cmd/ksh93/edit/emacs.c
i.e if(c!='\t' && c!=ESC && !isdigit(c)).

I referred the vi.c code and added the digit(c) macro, i.e
((c&~STRIP)==0 && isdigit(c)) and replaced the isdigit(c) usage with the "digit(c)" macro. Here's the patch which fixes the issue for me.

--- INIT.2012-08-01.old/src/cmd/ksh93/edit/emacs.c 2016-01-18 03:52:58.380801240 -0800
+++ INIT.2012-08-01/src/cmd/ksh93/edit/emacs.c 2016-02-05 01:39:08.350312914 -0800
@@ -90,6 +90,7 @@
static int print(int);
static int _isword(int);

define isword(c) _isword(out[c])

+# define digit(c) ((c&~STRIP)==0 && isdigit(c))

#else

define gencpy(a,b) strcpy((char_)(a),(char_)(b))

@@ -97,6 +98,7 @@

define genlen(str) strlen(str)

define print(c) isprint(c)

define isword(c) (isalnum(out[c]) || (out[c]=='_'))

+# define digit(c) isdigit(c)
#endif /*SHOPT_MULTIBYTE */

typedef struct emacs
@@ -317,7 +319,7 @@
count = 1;
adjust = -1;
i = cur;

          if(c!='\t' && c!=ESC && !isdigit(c))

          if(c!='\t' && c!=ESC && !digit(c))
                ep->ed->e_tabcount = 0;
        switch(c)
        {

@@ -775,7 +777,7 @@
int digit,ch;
digit = 0;
value = 0;

  while ((i=ed_getchar(ep->ed,0)),isdigit(i))

  while ((i=ed_getchar(ep->ed,0)),digit(i))
{
        value *= 10;
        value += (i - '0');

@@ -1013,7 +1015,7 @@
{
i=ed_getchar(ep->ed,0);
ed_ungetchar(ep->ed,i);

                                  if(isdigit(i))

                                  if(digit(i))
                                        ed_ungetchar(ep->ed,ESC);
                        }
                }

Please create tag(s)

In order to adopt the new GitHub repository as its upstream, the Homebrew formula will need a release tag that can be assigned to the stable spec. This allows a tarball of the tag to be downloaded from GitHub and the sha256 of the tarball to be recorded in the formula to ensure integrity.

Please see Homebrew/legacy-homebrew#49653 (comment)

cc1: note: obsolete option -I- used, please use -iquote instead

This is an nmake issue, when nmake probe's the gcc compile it still used the -I- notation. Is there a work around for this?

JIT for ksh

plz

ksh regression: with FIGNORE, . and .. are no longer automatically excluded from glob expansions

From the manual:

If FIGNORE is set, then each file name component that matches the pattern defined by the value of FIGNORE is ignored when generating the matching filenames. The names . and .. are also ignored.

That used to be true. As in, that works as documented in ksh93k+ for instance, but in modern versions (and also in ksh93m, I couldn't find ksh93l to test) that doesn't:

$ FIGNORE=x ksh93v- -c 'echo *'
. .. a .a
$ FIGNORE=x ksh93k+ -c 'echo *'
a .a

zrep keeps 3 additional snapshots on remote

after zrepping some local zfs fs, i found that zrep Remote/Destination fs has still snapshots which are already removed on the source fs (using zfs-auto-snapshot, which purged the 3 oldest ones), although i set savecount to "1".

I would like that zrep makes identical copy of zfs fs including all snapshots.

Any clue where to look or why this is happening ?

[root@backupvm1 ~]# zfs list -r -H -t snapshot -o name -s name zfspool/backup/adminstation.local|grep -v zrep|wc -l
45
[root@backupvm1 ~]# zfs list -r -H -t snapshot -o name -s name zfsiscsipool/backup-repl/adminstation.local|grep -v zrep|wc -l
48

VAR=value followed by nested function call fails to put VAR into environment

The following code should print V=1, but doesn't:

function f2 { env | grep '^V='; }
function f1 { f2; }
V=1 f1

However, V=1 f2 behaves correctly.

Every ksh93 I tried has the same behavior, including the following:

Version M-12/28/93e (AIX 6.1 /bin/ksh93)
Version AJM 93u+ 2012-08-01 (Red Hat Enterprise Linux Server release 7.2 /bin/ksh93)

ksh93: When "read -m json" is used to read a single-line JSON object, text fields following the first numeric field are re-interpreted as numeric variable names

I realize the JSON functionality in Korn Shell isn't fully mature yet, but I built myself a shell from the beta branch on git (Version ABIJM 93v- 2014-12-24) to try it out... I found that when I feed "read -m json" a JSON object with no newlines in it, a numeric field would cause all following fields to appear wrong

$ foo=9
$ read -m json json_test <<<$'{ "squanchy" : "cromulent", "num" : 1, "text": "foo", "notbool": "true", "bool": true }'
$ print -j json_test
{
	"bool": 0,
	"notbool": 0,
	"num": 1,
	"squanchy": "cromulent",
	"text": 9
}
# "squanchy" comes through just fine, because it precedes "num".
# All variables following "num" : 1 were replaced with numeric variable lookup, as if they'd appeared in $(())
$ echo "$json_test"
(
	typeset -l -E bool=0
	typeset -l -E notbool=0
	typeset -l -E num=1
	squanchy=cromulent
	typeset -l -E text=9
)
# typeset -E is a floating-point numeric type. "bool", "notbool", and "text" have all inherited this type from "num"

However, if I insert newlines into the JSON string, this doesn't happen:

$ read -m json json_test <<<$'{ "squanchy" : "cromulent", "num" : 1,\n "text": "bar", "notbool": "true", "bool": true }'
$ print -j json_test
{
	"bool": true,
	"notbool": "true",
	"num": 1,
	"squanchy": "cromulent",
	"text": "bar"
}

I haven't looked at the source code for the JSON support yet. It does appear to be in pretty rough shape overall... When I get some time I'll see what I can do with it.

build process -- Food for thought

Should we change/adapt the build process?

(with the objective to make it easily maintainable and accessible to a greater number)

Small knowledgeable community

I know nobody amongst my fellow developers that has insight knowledge on the build process beyond bin/package make, let alone to maintain it, but simply to customise it. Neither do I.

That said, they know how to maintain and tweak GNU Autotools and CMake toolchains.

Note: Do not jump to conclusions here, I am not trying to promote Autotools or CMake; quite the contrary. I simply want to emphasise the lack of openess of the build process' logic and toolchain.

Scarse documentation

Embedded usage documentation in any C or Korn shell script is a fantastic asset of the AST developments, and certainly immensely underused among users of AST packages.... except for the AST developers who have consistently added usage information to all their utilities.

Nonetheless that documentation does not suffice for a newbie to get his head round the build toolchain and gain sufficient insight information to act alone without calling out for help.

Today I am confronted with a failing build on a platform, which is certainly not exotic (MacOS), and I find myself spending hours trying to understand where the errors occur.

Unless told otherwise, I have no supporting information to help me get through my build failures. And calling out for help won't be of great help because ( I presume) only a few have invested significant time in understanding the guts of the toolchain. Questions will take time to be answered, if ever answered.

Build tool

When all goes well, the AST build toolchain seems to beat flat out the other tools mentioned above. It has (apparently) no dependencies, allows for all the GNU Autoconfigure probing without the M4 hell, and nicely lays out its build products.

Opinion: GNU Autotools are a fantastic suite. But they have a major inconvenience: M4. Opaque and to a certain extend clumsy. Probably a good compromise for portability 30 to 40 years ago. But no longer the ad hoc tool for today; pre-processing could be done the AST way :-)

Could the AST build toolchain be system agnostic and a possible replacement for GNU Autotools or CMake on other projects? A toolchain written in portable POSIX shell targeting any raw (POSIX) UNIX or Linux.

Whilst this was probably a driver in its conception, we see, going through the source files that it depend on bash here, lynx or wget there, etc. So it is not agnostic and doesn't build on a raw system; it requires GNUish capabilities. Hence it targets UNIX/GNU or Linux/GNU platforms.

Note: for the reasoning let us ignore for now that we probably need gcc to avoid proprietary compilers (where such compilers still exist).

Logically one can ask, why then maintain a distinct build toolchain? Why not use GNU Autotools or CMake?

Liminary thoughts

The breakup of the AST development team has (luckily) brought the AST developments to the open source community. But the community is small (and probably fragile).

If the AST packages and the Korn shell are here to stay, the community needs to be enlarged.

Enlarging the community means, making the build process accessible to many.

Migrating to GNU Autotools or CMake is an enormous effort which would require such time investment that it is almost guaranteed to stall.

Documentation and HOWTOs seems to be the only realistic approach. This also requires time, and reverse engineering.

Request for comments

In the 90s, shell portability was a big concern, and scripting had to focus on POSIX shells only (Korn shell wasn't a POSIX shell at the time, it now is).

Today, thanks to AT&T opening up the source code, a Korn shell exists on (almost) every platform. Not PDKSH or old versions, but a ksh93 executable (whatever its release).

Consequently, in 2017 onwards, we can assume that we have a Korn shell executable that supports the 93 syntax and features.

Converting the AST build toolchain scripts from universal shell syntax to Korn shell 93 syntax can:
a) greatly reduce the LOC (e.g. iffe could be reduced by 50%)
b) allow for clean environments with the function keywords, limiting globals
c) break down the code into smaller and more maintainable chunks using FPATH
d) usage information can be added to all functions

This doesn't require a full reverse engineering effort, nor does it require a full rewrite of the code.

At the same time this allows for a learning curve which can be populated in HOWTO's and central documentation.

By doing this we can (re)gain knowledge of the AST build toolchain, document it properly for the community to get involved, and lead the way for a ksh2023 rather than a ksh93+z2023 :-)

Missing license

The software is missing a license. Please add one by committing a file to the root of your repo titled LICENSE. See https://github.com/blog/1530-choosing-an-open-source-license if you need help choosing a license.

ksh93: random behaviour of `read -n <nchar>` for multi-byte characters.

Reproduced with version sh (AT&T Research) 93u+ 2012-08-01 and version sh (AT&T Research) 93v- 2014-12-24 on Debian GNU/Linux amd64:

According to the man page read -n reads a number of bytes, while read --help says characters.

Tests are inconsistent: here testing in a UTF-8 locale with the 3-byte € (EURO U+20AC) character:

$ ksh -c 'for ((i=1;i<=6;i++)); do echo €€€€€€€€ | IFS= read -rn "$i" a; printf "$i %q\n" "$a"; done'
1 $'\u[20ac]'
2 $'\u[20ac]\u[20ac]'
3 $'\u[20ac]'
4 $'\u[20ac]\u[20ac]\u[20ac]'
5 $'\u[20ac]\u[20ac]\u[20ac]'
6 $'\u[20ac]\u[20ac]'

The 1 case suggests a number of characters, the 3 case a number of bytes, the rest doesn't seem to make any sense.

read -N doesn't have the issue (and seems to take a number of characters):

$ ksh -c 'for ((i=1;i<=6;i++)); do echo €€€€€€€€ | IFS= read -rN "$i" a; printf "$i %q\n" "$a"; done'
1 $'\u[20ac]'
2 $'\u[20ac]\u[20ac]'
3 $'\u[20ac]\u[20ac]\u[20ac]'
4 $'\u[20ac]\u[20ac]\u[20ac]\u[20ac]'
5 $'\u[20ac]\u[20ac]\u[20ac]\u[20ac]\u[20ac]'
6 $'\u[20ac]\u[20ac]\u[20ac]\u[20ac]\u[20ac]\u[20ac]'

'times' should be a special builtin

In ksh93, times is implemented as an alias to "{ { time;} 2>&1;}" and "command" as an alias to "command " (which means aliases are expanded after it).

That means that things like:

$ ksh93 -c 'LC_ALL=C times'
ksh93: syntax error at line 1: `{' unexpected
$ ksh93 -c 'command times'
ksh93: syntax error at line 1: `{' unexpected

Don't work, so the times utility is not POSIX compliant.

"times" should be implemented as a special builtin.

More generally, even though allowed by POSIX, implementing builtin utilities as aliases is not a good idea IMO if only for the reasons detailed at:
http://thread.gmane.org/gmane.comp.standards.posix.austin.general/12485/focus=12568

ksh does not detect invalid array declarations

ksh -n does not detect bad array declarations in following code :

bad ksh array :

#!/usr/bin/ksh

typeset -A fn
fn=([foo_key]=foo_val [bar_key])

printf %s\\n ${fn[foo_key]}

zsh arrays :

#!/usr/bin/zsh

typeset -A fn
fn=(foo_key foo_val bar_key bar_val)

printf %s\\n ${fn[foo_key]} ${fn[bar_key]}

ksh93 tests failing

Currently there are several tests failing for ksh93, so I think they are a good starting point to get fixed (or excluded), so that we can start making the tests part of the automated travis ci build to ensure the integrity of any changes.

Then we can add this to the .travis.yml file:

bin/package test ksh93
bin/package results test | grep '\*\*\*' && false

This is the output of bin/package results test after testing just ksh93:

dannyw@dannyw-ubuntu:~/wrk/att/ast$ bin/package results test | grep '\*\*\*' && false
INIT iffe ...................................  162 tests    1 error  ***
ksh93 io(shcomp) ............................   99 tests  141 errors ***
ksh93 namespace(shcomp) .....................   26 tests    1 error  ***
ksh93 treemove(shcomp) ......................   22 tests    1 error  ***
ksh93 wchar .................................    4 tests    4 errors ***
ksh93 wchar(C.UTF-8) ........................    4 tests    4 errors ***
ksh93 wchar(shcomp) .........................    4 tests    4 errors ***

Ideally, once all ast tests are working we can widen the tests using just:

bin/package test

However, there are lots of other failures and there seems to be a bug with bin/package test that causes tee to sit waiting for input.

Here is the output of the ksh93 test failures I am getting on ubuntu 16.04:

test io(shcomp) begins at 2017-07-21+20:14:53
test io(shcomp) failed at 2017-07-21+20:14:53 with exit code 141 [ 99 tests 141 errors ]

test namespace(shcomp) begins at 2017-07-21+20:15:00
/tmp/tmpjogoX8m.B0o/shcomp-namespace.ksh[127]: .a.b.x_t: not found [No such file or directory]
/tmp/tmpjogoX8m.B0o/shcomp-namespace.ksh[128]: var.pi: not found [No such file or directory]
discipline functions for types in namespace not working
/tmp/tmpjogoX8m.B0o/shcomp-namespace.ksh[138]: .com.foo.test1.y_t: not found [No such file or directory]
/tmp/tmpjogoX8m.B0o/shcomp-namespace.ksh[139]: v.x.pr: not found [No such file or directory]
	shcomp-namespace.ksh[139]: _.__ not working with nested types in a namespace
test namespace(shcomp) failed at 2017-07-21+20:15:00 with exit code 1 [ 26 tests 1 error ]

test treemove(shcomp) begins at 2017-07-21+20:22:18
/home/dannyw/wrk/att/ast/arch/linux.i386-64/bin/ksh: line 2: syntax error at line 6: `function' unexpected
	[78]: typeset -C c=(objstack_t ost=(typeset -l -i st_n=1;st[0]=(obj=(typeset -l -i val=5)))) is not idempotent
test treemove(shcomp) failed at 2017-07-21+20:22:18 with exit code 1 [ 22 tests 1 error ]

test wchar begins at 2017-07-21+20:22:35
	wchar.sh[60]: en_US.ISO-8859-15 nounicodeliterals FAILED -- expected '0000000 24 27 e2 82 ac 27 0a', got '0000000 27 5c 75 5b 32 30 61 63 5d 27 0a'
	wchar.sh[63]: en_US.ISO-8859-15 (nounicodeliterals) FAILED -- expected '0000000 24 27 e2 82 ac 27 0a', got '0000000 27 5c 75 5b 32 30 61 63 5d 27 0a'
	wchar.sh[60]: zh_CN.GB18030 nounicodeliterals FAILED -- expected '0000000 24 27 e2 82 ac 27 0a', got '0000000 27 5c 75 5b 32 30 61 63 5d 27 0a'
	wchar.sh[63]: zh_CN.GB18030 (nounicodeliterals) FAILED -- expected '0000000 24 27 e2 82 ac 27 0a', got '0000000 27 5c 75 5b 32 30 61 63 5d 27 0a'
test wchar failed at 2017-07-21+20:22:35 with exit code 4 [ 4 tests 4 errors ]
test wchar(C.UTF-8) begins at 2017-07-21+20:22:35
	wchar.sh[60]: en_US.ISO-8859-15 nounicodeliterals FAILED -- expected '0000000 24 27 e2 82 ac 27 0a', got '0000000 27 5c 75 5b 32 30 61 63 5d 27 0a'
	wchar.sh[63]: en_US.ISO-8859-15 (nounicodeliterals) FAILED -- expected '0000000 24 27 e2 82 ac 27 0a', got '0000000 27 5c 75 5b 32 30 61 63 5d 27 0a'
	wchar.sh[60]: zh_CN.GB18030 nounicodeliterals FAILED -- expected '0000000 24 27 e2 82 ac 27 0a', got '0000000 27 5c 75 5b 32 30 61 63 5d 27 0a'
	wchar.sh[63]: zh_CN.GB18030 (nounicodeliterals) FAILED -- expected '0000000 24 27 e2 82 ac 27 0a', got '0000000 27 5c 75 5b 32 30 61 63 5d 27 0a'
test wchar(C.UTF-8) failed at 2017-07-21+20:22:35 with exit code 4 [ 4 tests 4 errors ]
test wchar(shcomp) begins at 2017-07-21+20:22:35
	shcomp-wchar.ksh[60]: en_US.ISO-8859-15 nounicodeliterals FAILED -- expected '0000000 24 27 e2 82 ac 27 0a', got '0000000 27 5c 75 5b 32 30 61 63 5d 27 0a'
	shcomp-wchar.ksh[63]: en_US.ISO-8859-15 (nounicodeliterals) FAILED -- expected '0000000 24 27 e2 82 ac 27 0a', got '0000000 27 5c 75 5b 32 30 61 63 5d 27 0a'
	shcomp-wchar.ksh[60]: zh_CN.GB18030 nounicodeliterals FAILED -- expected '0000000 24 27 e2 82 ac 27 0a', got '0000000 27 5c 75 5b 32 30 61 63 5d 27 0a'
	shcomp-wchar.ksh[63]: zh_CN.GB18030 (nounicodeliterals) FAILED -- expected '0000000 24 27 e2 82 ac 27 0a', got '0000000 27 5c 75 5b 32 30 61 63 5d 27 0a'
test wchar(shcomp) failed at 2017-07-21+20:22:35 with exit code 4 [ 4 tests 4 errors ]

%L NLSPATH handling: just use locale name, skip language code

NLSPATH resolver fix: see jelmd#2

ksh93: random behaviour of += on multi-dimensional arrays

$ ksh93 -c 'a=((a b c) (1 2 3)); a+=( (X Y Z)); typeset -p a'
typeset -a a=((X Y Z) )
$ ksh93u+ -c 'a[3]=(1 2 3); a+=( (x y)); typeset -p a'
typeset -a a=([3]=(1 2 3) [4]=$'\xf8\u[784]D\x7f')
$ ksh93v- -c 'a[3]=(1 2 3); a+=( (x y)); typeset -p a'
typeset -a a=([1]=$'\n-\xbc\xb8%' [3]=(1 2 3) )

(On Debian GNU/Linux amd64)

ksh93: Local variables are passed on to called functions when they share their name with an invocation-level binding

When a function defines a local variable, normally that variable is not accessible to or modifiable by other functions called by the function:

$ function f1 { typeset var VF19; VF19=excalibur; echo "f1: VF1=$VF1, VF19=$VF19"; f2; echo "f1: VF1=$VF1, VF19=$VF19"; }
$ function f2 { echo "f2: VF1=$VF1, VF19=$VF19"; VF1=vf1a; VF19=VF19a; echo "f2: VF1=$VF1, VF19=$VF19"; }      
$ VF1=valkyrie f1
f1: VF1=valkyrie, VF19=excalibur
f2: VF1=valkyrie, VF19=
f2: VF1=vf1a, VF19=VF19a
f1: VF1=valkyrie, VF19=excalibur
# VF19 is local to f1, so it isn't visible to or modified by f2.
# VF1 is an invocation-level binding on f1, so it is visible to f2 but not modified by it.

However, if there's an "invocation-level" binding of the same name as a local variable, the local variable takes on the characteristics of the "invocation-level" binding:

$ unset VF1
$ unset VF19
$ VF19=YF19 f1      # Binding VF19 on the invocation of f1 prevents f1 from using it as a local variable
f1: VF1=, VF19=excalibur
f2: VF1=, VF19=excalibur
f2: VF1=vf1a, VF19=VF19a
f1: VF1=vf1a, VF19=excalibur
$ echo ${VF19-'{unset}'}   # f2's assignment of VF19 no longer reaches global scope
{unset}

$VF19 is no longer local to f1. The value that's set to it in f1 becomes visible to f2 (but f2 still can't modify it in a way that's visible to f1)

I think the proper result in this case would be like this:
$ unset VF1
$ unset VF19
$ VF19=YF19 f1 # Binding VF19 on the invocation of f1 prevents f1 from using it as a local variable
f1: VF1=, VF19=excalibur
f2: VF1=, VF19=YF19
f2: VF1=vf1a, VF19=VF19a
f1: VF1=vf1a, VF19=excalibur
$ echo ${VF19-'{unset}'} # f2's assignment of VF19 no longer reaches global scope
{unset}


That is, the invocation-level binding of VF19 should be shadowed by the local-variable definition of VF19, and then on the call to f2, the local-variable definition should be discarded and the invocation-level binding should be visible again.

(My tests are on version 93u+ 2012-08-01, RHEL 7)

man sh.1 not include json print and printf, test pattern also not include full json testing

sh.1 not include print -j and printf %(json)B extensions.

RELEASE:14-07-15 Fixed a bug in which json format output with 'print -j' had a comma
RELEASE:13-05-28 +Added -j option to print (and %(json)B format specifier to printf)
RELEASE: which will print a compound shell variable in JSON format.

My example read and print json.

I think that comvar.sh test is not enough for print -j. Arrays should be part of the compound variable. Also printf should include json testing.

ksh: ${var#"$*"} does pattern matching if the first char of `$IFS` is a wildcard

Very minor bug as unlikely to hit anyone:

$ ksh -c 'IFS=?; a=abcd; set a c; echo ${a#"$*"}'
d

Even though $* was quoted, the ? in it was treated as a wildcard.

Same for:

$ ksh -c 'IFS=?; set a c; case abc in "$*") echo yes; esac'
yes

or:

$ ksh -c 'IFS=?; set a c; [[ abc = "$*" ]] && echo yes'
yes

KSH hang in |spawnvex(3ast)|

#undef _lib_vfork
#undef _real_vfork

Enjoy.

ksh93: read -r doesn't work if -d is also specified

(This is using ksh 93u+ 2012-08-01 on RHEL7)

Hi, I'm really hoping Korn Shell development will continue. Right now it kind of looks like a dead project - which is a shame because it's probably still the best of the Unix shells.

Anyway, I'm writing this shell library called "shell-pepper" and in the course of thinking about how to write a version of "read" that would read a single JSON value from the input (and stop at the end - and without writing it as a loadable "built-in") it led to this line of experimentation with the built-in "read":

$ a='foo\ bar }'
$ echo "'$a'"      # sanity check that $a contains what I expect
'foo\ bar }'
$ IFS='' read -r -d '}' x <<<"$a"     # Read to the next '}' or EOF.  $? tells us whether it was a delimiter or EOF.
$ echo "'$x'"                       # Thanks to IFS the space at the end is preserved, but we lose the backslash.
'foo bar '
$ IFS='' read -r y <<<"$a"    # If I remove -d, then the backslash is retained, but I lose the ability to stop the read at the next curly brace
$ echo "'$y'"
'foo\ bar }'

Basically the idea here is that if I've started reading a JSON object, reading to the next '}' may not get me to the end of the object, but it certainly won't take me past the end of the object. But I need backslashes intact (hence the -r), but when I use "-d" as well, backslashes in the input are lost (as if -r weren't specified)

BASH gets this one right:

a='foo\ bar }'
$ echo "'$a'"
'foo\ bar }'
$ IFS='' read -r -d '}' x <<<"$a"
$ echo "'$x'"
'foo\ bar '

As a side note - I had heard that JSON read and write were to be added to Korn Shell in upcoming versions. I am mostly using 93u but with a couple 93v builds kicking around. At the time I wrote this I was unaware that the JSON functionality is already present in 93v. Nice!

string match on ERE quantifiers fails

brace quantifiers in extended regular expression string match test cause syntax error

% ksh -c '[[ abc =~ a{2,} ]] && echo z
ksh: syntax error at line 1: `~(E)a{2,} ]] && echo z' unexpected

% ksh --version
  version         sh (AT&T Research) 93u+ 2012-08-01

ksh93: Feature Request: Redirection syntax that allows built-in commands to provide file descriptors for redirection

On the GNU Bash patches page there is a feature request/code patch which expands upon the /dev/tcp special redirection syntax to add listening on a socket:

$ cmd <>/dev/tcp-listen/localhost/$port
$ # Or, to make the file descriptor persist, use "exec {fd_var}<>/dev/tcp-listen/localhost/$port"

I bring this up not to advocate for this feature, (on the contrary, I think having the shell spoof one thing in /dev/ is one thing too many), but rather because it got me thinking of how to add similar features without creating "special" filenames.

In the case of TCP connections, for instance, one could (almost) replace /dev/tcp with a built-in. (There's not much point replacing /dev/tcp now, of course, this is more an example of how similar features might be implemented without similar "magic") For the sake of this discussion assume this built-in is called open_tcp and takes the destination host address and port number as its arguments, opens the connection, and writes the number of the newly-opened file descriptor to its stdout:

$ open_tcp localhost 80
10
$ cmd <>&10
$ fd=10; exec {fd}<&-    # close the file descriptor

This exposes one of the disadvantages of creating such a built-in: Unlike redirecting to /dev/tcp, the lifetime of this open_tcp file descriptor can not be automatically managed by the shell. A redirection can be limited to the lifetime of a command or group of commands, but open_tcp cannot.

One could imagine trying to get around the issue like this:

$ cmd >&$(open_tcp localhost 49152)-   # Evaluates as "cmd >&10-" for instance: Run "open_tcp" to open a file descriptor, redirect it to the output of "cmd", and close the original FD afterward

This wouldn't work as things stand for a couple reasons:

The command substitution happens in a subshell, so the file descriptor opened by open_tcp is not accessible to the parent shell
The effect of moving the file descriptor (with >&fdnum-), rather than just duplicating it (as with >&fdnum) is localized to the command redirection, and doesn't affect the shell's state once the command has ended.

So I propose introducing a syntax that would create the necessary sequence of operations to make this work:

Run a set of commands that is embedded in the redirection in the environment of the current shell and capture its output.
Attempt to interpret the output of those commands as a numeric file descriptor that will be duplicated by the redirection
Once the file descriptor is cloned by the redirection, close the original. (Do not retain it or restore it when the command being redirected ends.)

One option would be to provide this behavior with the syntax I described above:
$ cmd1 >&$(open_fd_cmd)

That is, recognize that $() is being used to provide the numeric argument to >&, and treat any file descriptors opened in the shell process by open_fd_cmd as local to the redirection rather than local to the command substitution. (Though this means that command substitution in this context can't be forked - it must be evaluated as part of the main shell process. But that's the case with ksh anyway, right?)

Missing `alarm` man page

Unable to locate any documentation on the alarm builtin.
According to this thread, circa 2006, David Korn was not ready to publicise -- he had concerns about possible conflicts with new functionality. What's the status today?

In summary:
a) Is the alarm builtin usable?
b) Can we trace sufficient usage information to build a man page?

Cheers, don

Incorrect exit message when exiting

ksh gives incorrect exit message if exit code is greater than 256. ksh uses 256+signal number to show error message codes for signals. However below command should not give an error message :

$ exit 257
Hangup

ksh93 stdout not proper if EXIT/ERR traphandler defined in commandline mode

The following is the reproducible testcase.
.
$ ksh -ec 'rm -f /tmp/$USER.test.log; function log { echo $* to stdout; echo
$* to file >> /tmp/$USER.test.log; }; function test_exit { log trap; }; trap
test_exit EXIT; log exit'
.
exit to stdout
.
$ cat /tmp/$USER.test.log
exit to file
trap to stdout
trap to file

There is another case which is failing

ksh -ec 'rm -f /tmp/$USER.test.log; function log { echo $* to

stdout; echo $* to file >> /tmp/$USER.test.log; }; function test_exit { trap
test_exittrap EXIT; log trap; }; function test_exittrap { log exit;
};test_exit'
trap to stdout

cat /tmp/root.test.log

trap to file
exit to stdout
exit to file

This works if this is run as a script though. This seems to be a side effect of the ksh optimization of running the last command without forking.

Here are some details of why this is happening

I traced the code flow for both the cases(cmdline and script) and it looks
like it works for the script case because of the ksh93 optimization of
running the last cmd without forking.
.
Here are the details
.
After the "log exit" method is run, the script restores the filedescriptors
by calling the sh_iorestorefd() at
src/cmd/ksh93/sh/xec.c#1471
.
1471 if((shp->topfd>topfd) && !(shp->subshell &&
np==SYSEXEC))
1472 sh_iorestore(shp,topfd,jmpval);
whereas for the cmdline case, it does not restore the fds.
.
Here for the script case, shp->topfd=1 whereas for the cmdline case, it is 0.

In src/cmd/ksh93/sh/xec.c#1347
1347 else
1348 type = (execflg && !shp->subshell &&
!shp->st.trapcom[0]);
1349 shp->redir0 = 1;
1350 sh_redirect(shp,io,type);
.
Here the "type" parameter for the sh_redirect fn is 0 for the script, whereas
it is 1 for the cmdline.
(The execflg is being set as part of the optimization of running the last
command without a fork in the following code.
.
src/cmd/ksh93/sh/xec.c#984
.
984 int execflg = (type&sh_state(SH_NOFORK));
.
which gets the type input from
.
src/cmd/ksh93/sh/main.c#581
.
581 if(!sh_isstate(SH_PROFILE) && sh_isoption(SH_CFLAG) &&
582 (fno<0 || !(shp->fdstatus[fno]&(IOTTY|IONOSEEK)))
583 && !sfreserve(iop,0,0))
584 {
585 execflags |= sh_state(SH_NOFORK);
586 }
587 shp->st.execbrk = 0;
588 sh_exec(t,execflags);
)
.
.
The following code in sh_iosave() sets shp->topfd to 1.
src/cmd/ksh93/sh/io.c#1728
.
1728 filemap[shp->topfd++].save_fd = savefd;
.
which is called from sh_redirect as given below
( Here flag = 1 as it corresponds to the "type" parameter
of sh_redirect we saw earlier)
.
src/cmd/ksh93/sh/io.c#1501
1503 if(flag==0 || tname || (flag==1 && fn==1 &&
(shp->fdstatus[fn]&IONOSEEK) && shp->outpipepid &&
shp->outpipepid==getpid()))
1504 {
1505 if(fd==fn)
1506 {
1507 if((r=sh_fcntl(fd,F_DUPFD,10)) > 0)
1508 {
1509 fd = r;
1510 sh_close(fn);
1511 }
1512 }
1513 sh_iosave(shp,fn,indx,tname?fname:(trunc?Empty:0));
1514 }
1515 else if(sh_subsavefd(fn))
1516 sh_iosave(shp,fn,indx|IOSUBSHELL,tname?fname:0);
1517 }

I've created a patch which fixes this issue but not sure if this is really the best solution. This basically
flags the specific cases and disables the optimization.
Here are some test results

./ksh -ec 'rm -f /tmp/$USER.test.log; function log { echo $*

to stdout; echo $* to file >> /tmp/$USER.test.log; }; function test_exit {
log trap; }; trap test_exit EXIT;log exit'
exit to stdout
trap to stdout

cat /tmp/root.test.log

exit to file
trap to file

Last statement is a function with own EXIT handler

./ksh -ec 'rm -f /tmp/$USER.test.log; function log { echo $*

to stdout; echo $* to file >> /tmp/$USER.test.log; }; function test_exit {
trap test_exittrap EXIT; log trap; };function test_exittrap { log exit;
};test_exit'
trap to stdout
exit to stdout

cat /tmp/root.test.log

trap to file
exit to file
.
ERROR trap only

./ksh -ec 'function error { echo "ERROR trap to stdout";

return 1; }; trap error ERR; false > /tmp/test.log'
ERROR trap to stdout

cat /tmp/test.log

.
Non builtin echo function

./ksh -c '

rm -f /tmp/test.log
function my_echo
{
echo $* to file
}
function log
{
echo $* to stdout
my_echo $* >> /tmp/test.log
}
function test_exit
{
log trap
}
trap test_exit EXIT
log exit'
exit to stdout
trap to stdout

cat /tmp/test.log

exit to file
trap to file
.
Original testcase but run as a script

cat test-script.sh

rm -f /tmp/$USER.test.log
function log {
echo $* to stdout
echo $* to file >> /tmp/$USER.test.log
}
function test_exit {
log trap
}
trap test_exit EXIT
log exit

./ksh test-script.sh

exit to stdout
trap to stdout

cat /tmp/root.test.log

exit to file
trap to file

ksh93: "OLDPWD=/dir cd -" doesn't take you to /dir

$ OLDPWD=/bin ksh93 -c 'OLDPWD=/tmp cd -'
/bin

Expected:

/tmp

Things like CDPATH=/usr cd bin are OK though.

(on ksh93u on Debian)

optimisation done by ksh93 file reading builtins break functionality and cause memory leak

(tested with ksh93u from package on Debian amd64)

$ seq 5 > a; ksh93 -c 'read a; echo test > a; read b; echo "$a $b"' < a
1 2

The second "read" could not possibly have read "2" because we have replaced the content of the "a" file with "test\n". What happens (from examining strace output) is that on the first "read", ksh93 has read up to 64kB worth of data, put the first line in $a, lseek()ed back to the end of that first line and remembered the data that was after that. Upon the second read, it optimizes out the read() system call and uses the remembered data instead.

It does check that the position of the stdin cursor within the file has not changed to assert that the optimisation is valid, but here, stdin has not moved, but the optimisation is not valid for a different reason: the content of the file has changed.

That optimisation will probably not gain you much in most cases on most systems because the OS will keep the read data in cache already (and knows better how and when to invalidate the cache), so that ksh93 behaviour could be seen as wasting resources by keeping another copy of that data in memory.

It also seems like there's a memory leak in that that "remembered" data seems never to be freed even after a file descriptor has been reused on a different file:

$ ksh93 -c 'ps -o rss,comm -p "$$"; for f in /usr/*/*; do read -n1 a < $f; done; ps -o rss,comm -p "$$"; :'
  RSS COMMAND
  1472 ksh93
  RSS COMMAND
134860 ksh93

It affects "read" and other builtin utilities that read data (which seem to share that same "remembered" data). I've verified it with cat and head.

Active locale/character set is not properly applied when parsing C-style strings

This issue affects the use of Korn Shell with variable-width character encodings that are not as well-behaved as UTF-8. In this case I am using GB-18030, an extended version of the Chinese national standard character encoding that covers all Unicode code points as well. When I say it is "not as well-behaved as UTF-8", specifically I mean that it is not self-synchronizing, and bytes from multi-byte characters, if taken out of context, can appear identical to other characters.

Take, for example, U+4E57, a Chinese character which is encoded in GB18030 and GBK as 0x81 0x5C:

$ LANG=zh_CN.GBK printf "echo \$'\\u4e57'" | LANG=zh_CN.GBK ksh | od -t x1z
0000000 81 0a                                            >..<

$ LANG=zh_CN.GBK printf "echo \$'\\u4e57n'" | LANG=zh_CN.GBK ksh | od -t x1z      
0000000 81 0a 0a                                         >...<

Basically, the second byte of the character, 0x5C is apparently interpreted as a backslash. This also occurs with "printf":

$ # 0x5C is interpreted as backslash and combined with "n"
$ LANG=zh_CN.GBK printf 'printf "\u4e57n"' | LANG=zh_CN.GBK ksh | od -t x1z       
0000000 81 0a                                            >..<

As far as I am aware this doesn't occur in other syntax:

$ # This turns out like "echo �\  x" - If the 0x5C byte is interpreted as "backslash" then it'd combine with a space - but it doesn't.
$ LANG=zh_CN.GBK printf 'echo \u4e57  x' | LANG=zh_CN.GBK ksh | od -t x1z         
0000000 81 5c 20 78 0a                                   >.\ x.<

With the locale set to zh_CN.GBK, the shell should interpret its input according to the GBK character encoding. As far as this encoding is concerned, there is no backslash in these examples.

<>; redirection operator doesn't work for the last command of a `ksh -c inline-script`

$ echo test > a; ksh -c 'echo x 1<>; a'; cat a
x
st

It's OK, if we insert a command after that:

$ echo test > a; ksh -c 'echo x 1<>; a; exit'; cat a
x

Regression appending to an indexed array overwrites arr[-1] if arr[0] is unset

This appears to be a regression since the last release.

 $ ksh /dev/fd/9 9<<\EOF
set -x
typeset -a a=(w x) b=(a b c)
a+=("${b[@]}")           # Correct behavior with a[0] set
typeset -p a
typeset -a a=([1]=w [2]=x)
a+=("${b[@]}")           # Incorrectly overwrites a[-1] when a[0] is unset
typeset -p a
a[${#__[@]}+1].__+=(y z) # Hack to get a reference to the correct element.
typeset -p a .sh.version
EOF

+ a=( w x )
+ b=( a b c )
+ typeset -a a b
+ a+=( a b c )
+ typeset -p a
typeset -a a=(w x a b c)
+ a[1]=w
+ a[2]=x
+ typeset -a a
+ a+=( a b c )
+ typeset -p a
typeset -a a=([1]=w [2]=a [3]=b [4]=c)
+ a+=( y z )
+ typeset -p a .sh.version
typeset -a a=([1]=w [2]=a [3]=b [4]=c [5]=y [6]=z)
.sh.version='Version ABIJM 93v- 2014-12-24'

nmake fails to build on FreeBSD 11.0 and 11.1

This is probably the wrong place to get help. I'm trying to get a recent version of ksh going so I can install CDE. When I run ./bin/package make, it gives me this error and fails compiling ast.

`mamake [cmd/nmake]: *** exit code 1 making expand.o

nmake --base --compile '--file=/home/wfisher/ast/src/cmd/nmake/Makerules.mk'
/bin/sh: nmake: not found
mamake [cmd/nmake]: *** exit code 127 making Makerules.mo
mamake: *** exit code 1 making cmd/nmake
package: make: errors making /home/wfisher/ast/arch/freebsd11.amd64/bin/nmake`

I don't know if I'm compiling things wrong or missing a dependency but the Googles hasn't turned up any help.

Thanks!!

date: wrong timezone for 1970 -> 1971 in British timezone

https://en.wikipedia.org/wiki/British_Summer_Time#Periods_of_deviation

Between 27 October 1968 and 31 October 1971, there was no daylight saving time in mainland Britain. It was GMT+1 all year round. On both GNU and Solaris system, the system's strftime/localtime is correct:

$ TZ=Europe/London perl -MPOSIX -le 'print strftime "%F %T %Z %z", localtime 0'
1970-01-01 01:00:00 BST +0100

(BST being then British Standard Time, not Summer this time).

But ksh93's "printf %T" or the ast date utility seem to get it wrong:

$ ksh93 -c 'printf "%(%F %T %Z %z)T\n" "#0"'
1970-01-01 00:00:00 GMT -0000

(ksh93u on Solaris 10)

$ arch/linux.i386-64/bin/date -d "#0"
Thu Jan  1 00:00:00 GMT 1970

(from the beta branch on Debian).

typeset -f output truncated for functions within functions

With ksh93u+ and v- on Debian amd64:

$ ksh -c 'function f { g() uname; g; }; typeset -f f'
function f { g() uname;

Or using POSIX function declaration syntax only:

$ ksh -c 'f() { g() { uname; }; g; }; typeset -f f'
f() { g() { uname; };

See how the definition of the f function is truncated just after the end of the g definition. Note that the output doesn't even include a newline.

If I pipe that output to ksh, I also get a SEGV (with ksh93u+, not ksh93v-):

$ ksh -c 'f() { g() { uname; }; g; }; typeset -f f' | ksh
ksh: syntax error at line 1: `{' unmatched
zsh: done                ksh -c 'f() { g() { uname; }; g; }; typeset -f f' |
zsh: segmentation fault  ksh

(not when passed as ksh -c or ksh file-that-contains-that-output)

ksh `..` vs $(..) differences when results are massive

If you create a variable using backticks vs parenthesis which results in a large amount of output, the backticks version will cut the results off. For example:

a=`find . -type f`
b=$(find . -type f)

[ "$a" == "b" ]; echo $?
> 1

echo "$a" | wc -c
> 1388545
echo "$b" | wc -c
> 1881923

It appears the backticks method has a hard character limit of 1388545 (at least on the system I am testing this on).

ksh -u or 'set -o nounset' behaviour for undefined positional parameter like $1.

This is an issue which was reported earlier in the ast-developers forum, details can be found at
https://www.mail-archive.com/[email protected]/msg01906.html

I've observed the issue in the beta, alpha and master versions.
I've applied the following patch( for alpha and the master version) to fix the issue.

--- INIT.2013-10-10/src/cmd/ksh93/sh/macro.c 2015-11-12 03:05:54.008417740 -0800
+++ INIT.2013-10-10/src/cmd/ksh93/sh/macro.c 2016-03-14 11:15:32.158386840 -0700
@@ -1220,7 +1220,7 @@
{
d=fcget();
fcseek(-1);

                  if(!strchr(":+-?=",d))

                  if(d=='\0'  || !strchr(":+-?=",d))
                        errormsg(SH_DICT,ERROR_exit(1),e_notset,ltos(c));
        }
        break;

ksh should show proper error message if command inside back quote ends in a quote

for f in `ls -d /home/*"`; do echo $f; done
ls: cannot access /home/*
: No such file or directory

should show the same error as when using $():

for f in $(ls -d /home/*"); do echo $f; done
ksh: syntax error: `"' unmatched

ksh93: <>; combined with <#pattern with some builtins or no command fails to truncate

(tested with ksh93u+ and ksh93v- 2014-12-24 on Ubuntu 16.04 amd64)

In:

$ seq 10 > a; strace -e read,write,lseek,ftruncate ksh -c 'printf "" <>; a >#5; cat a'
lseek(1, 0, SEEK_CUR)                   = 0
read(1, "1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n", 65536) = 21
lseek(1, 0, SEEK_CUR)                   = 21
read(1, "", 65515)                      = 0
lseek(1, 8, SEEK_SET)                   = 8
lseek(1, 0, SEEK_CUR)                   = 8
ftruncate(1, 8)                         = 0
1
2
3
4

The file is properly truncated to the start of the line matching the pattern (5).

But if we remove printf '' or replace it with many other builtins (I tried :, true, eval, alias x=x...), then we see:

$ seq 10 > a; strace -e read,write,lseek,ftruncate ksh -c '<>; a >#5; cat a'
lseek(1, 0, SEEK_CUR)                   = 0
read(1, "1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n", 65536) = 21
lseek(1, 0, SEEK_CUR)                   = 21
read(1, "", 65515)                      = 0
lseek(1, 0, SEEK_CUR)                   = 21
ftruncate(1, 21)                        = 0
lseek(1, 8, SEEK_SET)                   = 8
1
2
3
4
5
6
7
8
9
10

We see a truncation attempt but at the end of the file as if the <#5 failed to find a match.

It's the same if we use a fd other than 1 with printf "" like:

$ seq 10 > a; strace -e read,write,lseek,ftruncate ksh -c 'printf "" 3<>; a 3<#5; cat a'
lseek(3, 0, SEEK_CUR)                   = 0
lseek(3, 0, SEEK_CUR)                   = 0
read(3, "1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n", 65536) = 21
read(3, "", 65515)                      = 0
lseek(3, 0, SEEK_CUR)                   = 21
ftruncate(3, 21)                        = 0
lseek(3, 8, SEEK_SET)                   = 8
1
2
3
4
5
6
7
8
9
10

That does smell like an incorrect optimisation.

Note that the <#((expr)) operator doesn't seem to have a similar issue:

$ seq 10 > a; strace -e read,write,lseek,ftruncate ksh -c '<>; a >#((10)); cat a'
lseek(1, 0, SEEK_SET)                   = 0
lseek(1, 0, SEEK_CUR)                   = 0
read(1, "1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n", 65536) = 21
lseek(1, 10, SEEK_SET)                  = 10
lseek(1, 0, SEEK_CUR)                   = 10
ftruncate(1, 10)                        = 0
1
2
3
4
5

parameter expansion pattern replacement does not match empty string at beginning/end

ksh ignores empty string pattern and does not perform substitution

% ksh -c 'x=a; echo ${x/#/.}'
a

by contrast bash matches and substitutes as expected

% bash -c 'x=a; echo ${x/#/.}'
.a

% ksh --version
  version         sh (AT&T Research) 93u+ 2012-08-01

beta branch: Building ksh93 fails at libast/comp/tmpnam.c

When trying to build the beta branch under Ubuntu/Linux 15.10 using bin/package make the build fails with the following error message at tmpnam.c:

+ cc -D_BLD_DLL -fPIC -D_BLD_ast -O -I. -I/home/maus/projekte/ksh/src/lib/libast -Icomp -I/home/maus/projekte/ksh/src/lib/libast/comp -Iinclude -I/home/maus/projekte/ksh/src/lib/libast/include -Istd -I/home/maus/projekte/ksh/src/lib/libast/std -D_PACKAGE_ast -c /home/maus/projekte/ksh/src/lib/libast/comp/tmpnam.c
/home/maus/projekte/ksh/src/lib/libast/comp/tmpnam.c: In function '_ast_tmpnam':
/home/maus/projekte/ksh/src/lib/libast/comp/tmpnam.c:48:14: error: storage size of 'buf' isn't known
  static char buf[L_tmpnam];
              ^
/home/maus/projekte/ksh/src/lib/libast/comp/tmpnam.c:50:39: error: expected expression before ',' token
  return pathtemp(s ? s : buf, L_tmpnam, NiL, "tn", NiL);
                                       ^
mamake [lib/libast]: *** exit code 1 making tmpnam.o

After analyses the L_tmpnam, P_tmpdir and L_ctermid seem to be defined empty as can be seen by looking at arch/linux.i386-64/src/lib/libast/FEATURE/stdio:

#if defined(__STDPP__directive) && defined(__STDPP__initial)
__STDPP__directive pragma pp:initial
#endif
#ifndef P_tmpdir
#define P_tmpdir
#endif
#ifndef L_ctermid
#define L_ctermid
#endif
#ifndef L_tmpnam
#define L_tmpnam
#endif
#if defined(__STDPP__directive) && defined(__STDPP__initial)
__STDPP__directive pragma pp:noinitial
#endif

After manually patching these values to sensible values like the following the build succeeds:

#if defined(__STDPP__directive) && defined(__STDPP__initial)
__STDPP__directive pragma pp:initial
#endif
#ifndef P_tmpdir
#define P_tmpdir "/tmp"
#endif
#ifndef L_ctermid
#define L_ctermid 1024
#endif
#ifndef L_tmpnam
#define L_tmpnam 1024
#endif
#if defined(__STDPP__directive) && defined(__STDPP__initial)
__STDPP__directive pragma pp:noinitial
#endif

After having applied these changes the build succeeds and ksh93 is correctly build.

Issue with printf %Lb "\0200" in UTF-8 locales

printf %Lb "\0200"

In UTF-8 locales seems to print random areas of memory.

In ksh93u on Debian amd64 (from package):

$ ksh -c 'printf %Lb "\0200"' | wc -c
18564
$ ksh -c 'printf %Lb "\0200"' | wc -c
18972

With ksh93v- (built from beta git branch), it seems to enter some infinite loop in:

#0  ast_mbrchar (w=0x7ffc6404c664 L"", s=0x25b781d23d41 "", n=16, q=0x7ffc6404c7d0) at src/lib/libast/comp/setlocale.c:2188
#1  0x0000000000574583 in sfvprintf (f=0x841ec0 <_Sfstdout>, form=0x25b781d23d33 "", args=0x7ffc64051838) at src/lib/libast/sfio/sfvprintf.c:744
#2  0x0000000000566b67 in sfprintf (f=0x841ec0 <_Sfstdout>, form=0x5bddef "%!") at src/lib/libast/sfio/sfprintf.c:48
#3  0x0000000000492ec3 in b_print (argc=-1, argv=0x25b781d23c10, context=0x7ffc64051ab0) at src/cmd/ksh93/bltins/print.c:350
#4  0x00000000004925ea in b_printf (argc=3, argv=0x25b781d23c00, context=0x8433f0 <sh+1392>) at src/cmd/ksh93/bltins/print.c:150
#5  0x0000000000472692 in sh_exec (shp=0x7ffc6404c664, t=0x25b781d23d41, flags=5) at src/cmd/ksh93/sh/xec.c:1387
#6  0x0000000000416cad in exfile (shp=0x7ffc6404c664, iop=0x25b781d23d41, fno=16) at src/cmd/ksh93/sh/main.c:610
#7  0x0000000000416065 in sh_main (ac=3, av=0x7ffc640522e8, userinit=0x0) at src/cmd/ksh93/sh/main.c:382
#8  0x0000000000415192 in main (argc=3, argv=0x7ffc640522e8) at src/cmd/ksh93/sh/pmain.c:45

typeset -S within functions

Weird behaviour when using static function variables (typeset -S)

Consider the following script:

function alpha {
    integer -S count=0
    (( ++ count ))
    print -n " $count"
}

function beta {
    integer count=0
    (( ++ count ))
    print -n " $count"
}

print -n "Alpha:"; alpha; alpha; alpha; alpha; alpha; print
print -n "Beta: "; beta;  beta;  beta;  beta;  beta;  print

My understanding is that since the static variable is declared within a non-POSIX function its scope is that function's scope. Consequently the expected output should be:

Alpha: 1 2 3 4 5
Beta:  1 1 1 1 1

While the output of the alpha() function is consistent across tests. The behaviour of beta() is not consistent. I get various outputs, sometimes correct, but mostly incorrect. This has been tested with 93u+ on Linux and macOS.

Sample outputs:

Beta:  0 0 0 0 0
Beta:  0 0 0 1 0
Beta:  0 0 1 0 0
Beta:  1 1 0 1 1

I have not detected the pattern. Consequently I can reproduce the error (almost systematically), but not the exact output.

I tried the following without success:

Change the loading order of functions
Declare a variable with same name in the global scope (to eventually force a local scoping)
Replace the integer alias by its typeset equivalent

iffe has trouble detecing dynamic linking on FreeBSD 11

On FreeBSD 10 it works:

+ mamake -C lib/libdll -k install
+ set -
+ iffe -v -c 'cc -D_BLD_DLL -fPIC -Wno-unused-value -Wno-parentheses -Wno-logical-op-parentheses -O2 -pipe  -fstack-protector -fno-strict-aliasing    -lm -fstack-protector ' ref -L/wrkdirs/usr/ports/shells/ksh93/work/ksh93-20160716/arch/freebsd11.amd64/lib -I/wrkdirs/usr/ports/shells/ksh93/work/ksh93-20160716/arch/freebsd11.amd64/include/ast -I/wrkdirs/usr/ports/shells/ksh93/work/ksh93-20160716/arch/freebsd11.amd64/include /wrkdirs/usr/ports/shells/ksh93/work/ksh93-20160716/arch/freebsd11.amd64/lib/libast.a -lm : run /wrkdirs/usr/ports/shells/ksh93/work/ksh93-20160716/src/lib/libdll/features/dll
iffe: test: is sys/types.h a header ... yes
iffe: test: is -lm a library ... yes
iffe: test: is /wrkdirs/usr/ports/shells/ksh93/work/ksh93-20160716/arch/freebsd11.amd64/lib/libast.a a library ... yes
iffe: test: is dl.h a header ... no
iffe: test: is dlfcn.h a header ... yes
iffe: test: is dll.h a header ... no
iffe: test: is rld_interface.h a header ... no
iffe: test: is mach-o/dyld.h a header ... no
iffe: test: is sys/ldr.h a header ... no
iffe: test: is -ldl a library ... no
iffe: test: is dlopen a library function ... yes
iffe: test: is dllload a library function ... no
iffe: test: is loadbind a library function ... no
iffe: test: is shl_load a library function ... no
iffe: test: link{ ... }end ... no
iffe: test: run{ ... }end ... yes
iffe: test: output{ ... }end ... yes

On FreeBSD 11, however, it does not:

+ iffe -v -c 'cc -D_BLD_DLL -fPIC -Wno-unused-value -Wno-parentheses -Wno-logical-op-parentheses -O2 -pipe  -fstack-protector -fno-strict-aliasing    -lm -fstack-protector ' ref -L/wrkdirs/usr/ports/shells/ksh93/work/ksh93-20160716/arch/freebsd11.amd64/lib -I/wrkdirs/usr/ports/shells/ksh93/work/ksh93-20160716/arch/freebsd11.amd64/include/ast -I/wrkdirs/usr/ports/shells/ksh93/work/ksh93-20160716/arch/freebsd11.amd64/include -last -lm : run /wrkdirs/usr/ports/shells/ksh93/work/ksh93-20160716/src/lib/libdll/features/dll
iffe: test: is sys/types.h a header ... yes
iffe: test: is -lm a library ... yes
iffe: test: is -last a library ... no
iffe: test: is dl.h a header ... no
iffe: test: is dlfcn.h a header ... yes
iffe: test: is dll.h a header ... no
iffe: test: is rld_interface.h a header ... no
iffe: test: is mach-o/dyld.h a header ... no
iffe: test: is sys/ldr.h a header ... no
iffe: test: is -ldl a library ... no
iffe: test: is dlopen a library function ... yes
iffe: test: is dllload a library function ... no
iffe: test: is loadbind a library function ... no
iffe: test: is shl_load a library function ... no
iffe: test: link{ ... }end ... no
iffe: test: run{ ... }end ... yes
iffe: test: output{ ... }end ... no

I wonder why FreeBSD 11 is getting -last parameter
while working FreeBSD 10 uses /wrkdirs/usr/ports/shells/ksh93/work/ksh93-20160716/arch/freebsd11.amd64/lib/libast.a. Where does this difference come from?

Additional info:

FreeBSD 10 uses clang 3.4.1
FreeBSD 11 uses clang 4.0.0
I have 477c024 already applied to iffe

Since is iffe: test: output{ ... }end ... no it all leads to failures in recognizing dlopen() and therefore -last does not get built.

nv_open("tcl_library", 0, 0) will crash tksh on startup

tksh crashes on startup with the following backtrace:

#0  0x0000000000613d44 in dtuserdata (dt=0x0, data=0x0, set=0)
    at /home/saper/sw/ast/src/lib/libast/cdt/dtuser.c:45
#1  0x0000000000519d26 in nv_open (name=0x676c84 "tcl_library", root=0x0, flags=0)
    at /home/saper/sw/ast/src/cmd/ksh93/sh/name.c:1427
#2  0x00000000004a54e5 in TkshOpenVar (interp=0x1e06422c41c0, name1=0x7fffffffde10, name2=0x7fffffffde08, 
    flags=65537, options=8, msg=0x676e18 "set") at /home/saper/sw/ast/src/lib/libtksh/src/var.c:137
#3  0x00000000004a5cd8 in Tcl_SetVar2 (interp=0x1e06422c41c0, part1=0x676c84 "tcl_library", part2=0x0, 
    newValue=0x1e0642273860 "lib/tksh7.6", flags=65537) at /home/saper/sw/ast/src/lib/libtksh/src/var.c:338
#4  0x00000000004a7167 in Tcl_SetVar (interp=0x1e06422c41c0, varName=0x676c84 "tcl_library", 
    newValue=0x1e0642273860 "lib/tksh7.6", flags=1) at /home/saper/sw/ast/src/lib/libtksh/src/var.c:865
#5  0x00000000004a4f27 in TkshCreateInterp (interp=0x1e06422c41c0, data=0xa19880 <builtInCmds>)
    at /home/saper/sw/ast/src/lib/libtksh/src/init.c:108
#6  0x00000000004acfd1 in Tcl_CreateInterp () at /home/saper/sw/ast/src/lib/libtksh/src/basic.c:210
#7  0x000000000040a03a in Tksh_TkMain (argc=1, argv=0x7fffffffe140, appInitProc=0x40a788 <Tksh_AppInit>)
    at /home/saper/sw/ast/src/cmd/tksh/tkMain.c:108
#8  0x000000000040ab09 in b_tkinit (argc=1, argv=0x7fffffffe140, context=0x0)
    at /home/saper/sw/ast/src/cmd/tksh/tkMain.c:676
#9  0x0000000000409f58 in tksh_userinit (shp=0xa34d80 <sh>, subshell=0)
    at /home/saper/sw/ast/src/cmd/tksh/uinit.c:57
#10 0x00000000004f48f4 in sh_init (argc=1, argv=0xa1d840 <_error_info_>, userinit=0x409e36 <tksh_userinit>)
    at /home/saper/sw/ast/src/cmd/ksh93/sh/init.c:1787
#11 0x00000000004d16e6 in sh_main (ac=1, av=0x7fffffffe778, userinit=0x409e36 <tksh_userinit>)
    at /home/saper/sw/ast/src/cmd/ksh93/sh/main.c:146
#12 0x000000000040a002 in main (argc=1, argv=0x7fffffffe778) at /home/saper/sw/ast/src/cmd/tksh/uinit.c:77

According to the nvl(3):

SYNOPSIS

       Namval_t        *nv_open(const char *name, Dt_t *dict, int flags);

DESCRIPTION

       The function nv_open() returns a pointer to a  name-value  pair  corre‐
       sponding  to  the  given  name.   It  can  also assign a value and give
       attributes to a name-value pair.  The argument dict defines the dictio‐
       nary  to search.  A NULL value causes the shell global variable dictio‐
       nary to be searched.

TkshOpenVar() may intentionally call nv_open() with a dict set to NULL. This causes a crash now.

nv_open() now just calls dtuserdata(root, 0, 0) - this has been changed in the 2013-10-10 alpha release, while previously a call to sh_getinterp() has been used to determine the default value of root if unspecified.

att / ast Goto Github PK

ast's Introduction

AST

ksh93u+ and v-

Build

ast's People

Contributors

Stargazers

Watchers

Forkers

ast's Issues

define isword(c) _isword(out[c])

define gencpy(a,b) strcpy((char_)(a),(char_)(b))

define genlen(str) strlen(str)

define print(c) isprint(c)

define isword(c) (isalnum(out[c]) || (out[c]=='_'))

Should we change/adapt the build process?

Small knowledgeable community

Scarse documentation

Build tool

Liminary thoughts

Request for comments

ksh -ec 'rm -f /tmp/$USER.test.log; function log { echo $* to

cat /tmp/root.test.log

./ksh -ec 'rm -f /tmp/$USER.test.log; function log { echo $*

cat /tmp/root.test.log

./ksh -ec 'rm -f /tmp/$USER.test.log; function log { echo $*

cat /tmp/root.test.log

./ksh -ec 'function error { echo "ERROR trap to stdout";

cat /tmp/test.log

./ksh -c '

cat /tmp/test.log

cat test-script.sh

./ksh test-script.sh

cat /tmp/root.test.log

Recommend Projects

Recommend Topics

Recommend Org