Giter Club home page Giter Club logo

cl-ana's Introduction

cl-ana is a free (GPL) library of Common Lisp code for doing data
analysis via either straightforward programming or dependency oriented
programming.  It aims to be a general purpose framework for analyzing
small and large scale datasets, including binned data analysis and
visualization.  Much effort has been made to ensure modularity so that
individual components may be used/re-used for a new purpose.

cl-ana is available via quicklisp (http://www.quicklisp.org/beta/);
for other dependencies see below.

Example code for using some of the functionality is contained in
various test.lisp files throughout the project; the full documentation
is located on the wiki page: http://github.com/ghollisjr/cl-ana/wiki

There is a Matrix live chat for cl-ana located here:
https://matrix.to/#/!cANztuGawRmRSdyLhu:matrix.org?via=matrix.org
Public address: #cl-ana:matrix.org

Whenever possible, features are implemented via generic functions so
that users can extend cl-ana to whatever they want to do.

The functionality of this framework is divided into two layers.  The
lower layer provides basic libraries for the following:

* Tabulated data: Supports data tables read-from and written-to HDF5
  files (buffered read-write), ntuples (like CERN's PAW uses), comma
  separated value (CSV) files, and plists for all-in-memory operation.
  Adding a new table type is as easy as extending the table class and
  defining 4 functions for the table type.  (The libraries cl-csv and
  GSLL provide the backbone for the CSV and ntuple tables; the HDF5
  table access is completely new.)

* Histograms: Supports categorical, contiguous, and sparse histograms
  of arbitrary dimensions.  Provides functional access to histograms
  via mapping (which allows reducing) and filtering.

* Nonlinear least squares fitting: Allows plain-old lisp functions to
  be fitted to data using the GNU Scientific Library (GSL); infers the
  number of fit parameters the function takes from the initial
  parameter guess.  Can fit against alists of data & histograms and is
  easily extended to allow fitting against other types by defining a
  single function for the new type.

* Plotting: Uses gnuplot to plot histograms, data samples, plain-old
  lisp functions, and strings interpreted as formulae.

* Generic math: Common Lisp doesn't provide user-extendable math
  functions; cl-ana provides its own versions of the basic math
  functions CL gives you but with the ability to extend them for
  whatever types you want.  Also provides use-gmath which easily adds
  generic-math's symbols to a package even if you already use the
  common-lisp package.  Already provided are extensions to the generic
  math functions for error propogation, quantities (values with
  units), and treating CL sequences as tensors with all the usual math
  functions being applied element-by-element in a MATLAB/GNU Octave
  fashion.

The higher layer provides dependency oriented programming.  Dependency
oriented programming is my own term for defining your program in terms
of targets needing execution as opposed to an explicit computation.
It is a hybrid of imperative and declarative programming.  The target
table can be transformed to allow for optimizations.  Provided
optimizations include table pass merge and collapse which minimize the
number of passes over source datasets.

Also included are various utilities which have use in a variety of
places.

The main principles of the project are:

1. Conceptual clarity and documentation.  These are often neglected in
   software development, to the point where reading code can cause one
   to drink.  Conceptual clarity refers to the way in which code is
   written and the way in which algorithms are implemented: A slightly
   slower but easier to understand implementation is favored above a
   labyrinth of bit shifts.  Documentation should always be provided
   for any feature along with example usages--ESPECIALLY with example
   usages, as these are sometimes more helpful than the actual
   documentation.

2. Modularity/Bottom-up design.  Whenever two components have a common
   feature/function/dependency, this commonality should be placed in a
   separate sublibrary.  To limit sublibrary number explosion, this
   should be done in conjunction with point 1 preserving conceptual
   clarity.  For example list utilities should be a sublibrary for
   general purpose list functions.  Further: If a feature can be
   provided by either a set of utility functions or a type heirarchy,
   strong preference should be given to the utility functions
   approach; i.e. one should have to argue long and hard before
   stratifying things into classes.

3. Lispyness.  Whenever possible, already established motifs from Lisp
   programming practices should be used.  This goes for naming
   conventions, access macros, and the general desire to provide at
   least functional access to things.

Each sublibrary should go in its own directory and come with its own
.asdf file so that one can choose any subset of functionality to use
from the library.

As you will see in reading the code, I've tried to keep everything
well documented.  I place a high emphasis on documentation since I
know how easy it is to fall out of practice.  The last thing I want is
for the usual cargo-cult around old code to emerge.

Disclaimer: much of the code I've written has been part of my own
personal development as a Lisp programmer; this is my first
non-trivial project with Lisp, and coming from a C++ background I've
had to learn quite a few things along the way.  This means that there
may be some dark corners of the code which need help from more
experienced coders/myself at a later time.  In addition, I haven't
used any general testing framework.  (To be honest I haven't needed
one either as I've done the development in a highly bottom-up way,
testing everything as I write it.)  In short this is a work in
progress.

The code tries to be self documented, but I'm working on a
tutorial/user's guide on the github wiki page to explain how to use
the software to best effect.

The dependencies for this project are:

* HDF5 (http://www.hdfgroup.org/HDF5/)
* GSL (http://www.gnu.org/software/gsl/)
* CFFI (http://common-lisp.net/project/cffi/)
* GSLL (http://common-lisp.net/project/gsll/)
* Alexandria (http://common-lisp.net/project/alexandria/)
* iterate (http://common-lisp.net/project/iterate/)
* antik (http://www.common-lisp.net/project/antik/)
* closer-mop (http://common-lisp.net/project/closer/closer-mop.html)
* cl-csv (https://github.com/AccelerationNet/cl-csv)
* gnuplot (http://www.gnuplot.info/)
* cl-fad (http://weitz.de/cl-fad/)
* external-program (http://github.com/sellout/external-program)

All of the Lisp dependencies can be installed via quicklisp
(http://www.quicklisp.org/).

I copied the API for using gnuplot from gnuplot_i
(http://ndevilla.free.fr/gnuplot/).  gnuplot_i was written by
N. Devillard <[email protected]>, released to the public domain, and is
a no-nonsense gnuplot session manager written in C.

I use SBCL (http://www.sbcl.org/) almost exclusively; however, I also
intentionally try to ensure that all the code only assumes what the CL
standard provides.  Anytime implementation-specific functionality is
needed I try to use third party libraries for this.

cl-ana's People

Contributors

djeis97 avatar ghollisjr avatar hellseher avatar jkordani avatar jzumer avatar kat-co avatar khinsen avatar zodmaner avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cl-ana's Issues

Some systems failed to build for Quicklisp dist

Building with SBCL 2.0.5 / ASDF 3.3.1 for quicklisp dist creation.

Trying to build commit id 2d02056

cl-ana.hdf-typespec fails to build with the following error:

Unhandled PACKAGE-DOES-NOT-EXIST in thread #<SB-THREAD:THREAD "main thread" RUNNING {1000A10083}>: The name "HDF5" does not designate any package.

Full log here

Issue installing this package

Hello,

I am trying to install cl-ana, but it keeps giving me the following errors that render me unable to get it working:

This is SBCL 2.3.5, an implementation of ANSI Common Lisp.
More information about SBCL is available at http://www.sbcl.org/.

SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses. See the CREDITS and COPYING files in the
distribution for more information.

  • (ql:quickload 'cl-ana)
    To load "cl-ana":
    Load 1 ASDF system:
    cl-ana
    ; Loading "cl-ana"
    ................................................; pkg-config hdf5 --cflags
    .
    ; cc -o /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel-tmpGHU3ALSV.o -c -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security -fstack-clash-protection -fcf-protection -g -ffile-prefix-map=/build/sbcl/src=/usr/src/debug/sbcl -flto=auto -D_GNU_SOURCE -fno-omit-frame-pointer -DSBCL_HOME=/usr/lib/sbcl -g -Wall -Wundef -Wsign-compare -Wpointer-arith -O3 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -Wunused-parameter -fno-omit-frame-pointer -momit-leaf-frame-pointer -fPIC -I/home/user/quicklisp/dists/quicklisp/software/cffi-20230618-git/ /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.c
    /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.c: In function 'main':
    /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.c:12:7: warning: unused variable 'autotype_tmp' [-Wunused-variable]
    12 | int autotype_tmp;
    | ^~~~~~~~~~~~
    ; cc -o /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel-tmpAAURSO1 -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -flto=auto -g -Wl,--export-dynamic /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.o
    /usr/bin/ld: /tmp/cc8OVkhh.ltrans0.ltrans.o: in function main': /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.c:23: undefined reference to H5check_version'
    /usr/bin/ld: /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.c:23: undefined reference to H5open' /usr/bin/ld: /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.c:24: undefined reference to H5check_version'
    /usr/bin/ld: /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.c:24: undefined reference to H5open' /usr/bin/ld: /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.c:37: undefined reference to H5check_version'
    /usr/bin/ld: /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.c:37: undefined reference to H5open' /usr/bin/ld: /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.c:38: undefined reference to H5check_version'
    /usr/bin/ld: /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.c:38: undefined reference to H5open' /usr/bin/ld: /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.c:51: undefined reference to H5check_version'
    /usr/bin/ld: /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.c:51: undefined reference to H5open' /usr/bin/ld: /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.c:52: undefined reference to H5check_version'
    /usr/bin/ld: /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.c:52: undefined reference to H5open' /usr/bin/ld: /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.c:65: undefined reference to H5check_version'
    /usr/bin/ld: /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.c:65: undefined reference to H5open' /usr/bin/ld: /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.c:66: undefined reference to H5check_version'
    /usr/bin/ld: /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.c:66: undefined reference to H5open' /usr/bin/ld: /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.c:79: undefined reference to H5check_version'
    /usr/bin/ld: /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.c:79: undefined reference to H5open' /usr/bin/ld: /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.c:80: undefined reference to H5check_version'
    /usr/bin/ld: /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.c:80: undefined reference to H5open' /usr/bin/ld: /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.c:93: undefined reference to H5check_version'
    /usr/bin/ld: /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.c:93: undefined reference to H5open' /usr/bin/ld: /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.c:94: undefined reference to H5check_version'
    /usr/bin/ld: /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.c:94: undefined reference to `H5open'
    collect2: error: ld returned 1 exit status

debugger invoked on a CFFI-GROVEL:GROVEL-ERROR in thread #<THREAD tid=10782 "main thread" RUNNING {10013C8073}>: Subprocess #<UIOP/LAUNCH-PROGRAM::PROCESS-INFO {100562CA93}>
with command ("cc" "-o" "/home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel-tmpAAURSO1" "-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now" "-flto=auto" "-g" "-Wl,--export-dynamic" "/home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.o")
exited with error code 1

cl-ana, meet numcl

I have just stumbled upon numcl, a clone of numpy, which -- as I understand it -- is the de facto standard for scientific computing, possibly in any language. It looks like this package was released publicly last week.

I haven't yet looked into this, but numpy is the library whose behaviour I have been emulating and attempting to get into cl-ana. I am wondering if there should be a relationship between cl-ana an numcl?

cc numcl/numcl#1

`cl-ana.reusable-table:reusable-table` can be misleading and lead to surprising bugs

I realize I'm hitting all kinds of edge cases because I'm currently only working with in-memory data sets.

I have introduced bugs into my code a couple times now because I'm forgetting how reusable-table works. To restate how it works here for benefit of readers: make-reusable-table takes in a creation-fn and optionally an opener-form which tell the table how to recreate the underlying table when it needs to. It then uses a needs-reloading slot variable to determine when it needs to call these two functions.

The bug I've introduced in my code is when I partially loop through a table, but logically I'm finished with that pass of the table. I think what I should be doing is calling (setf (needs-reloading my-table) t), but I am obviously expecting forms like do-table to do that for me once they exit.

I find myself wondering if a "cursor" concept wouldn't be clearer. E.g. you only ever have 1 representation of the table, but you might create multiple cursors on that table. And do-table could take in &key (cursor (make-cursor table)).

Thoughts?

Symbol Clashes

Hi,

I am getting a number of clashes when trying to load cl-ana - packages MAP, LIST-UTILS clash with both themselves (eg LIST-transpose) and with alexandria (MEAN, Standard-deviation ) . Is this expected behaviour?

Also, I think some thought needs to be made about package names. I already have a package called gnuplot-interface. Trying to load cl-ana, picks up my gnuplot-interface not yours. Using a prefix (cl-ana?) for internal packages less exposed to environmental issues.

Cheers

local hdf5-cffi

I can't use cl-ana's version of hdf5-cffi - what's the current issue with upstream? Are you considering forking and maintaining it?

Cannot load

Hi when I tried to load cl-ana on arch linux, using sbcl 1.3.1 and slime I get this error:

on the REPL

(ql:quickload 'cl-ana)
To load "cl-ana":
  Load 6 ASDF systems:
    alexandria antik cffi gsll iterate split-sequence
  Install 8 Quicklisp releases:
    bordeaux-threads cl-ana cl-csv cl-fad cl-interpol
    cl-unicode closer-mop external-program
; Fetching #<URL "http://beta.quicklisp.org/archive/bordeaux-threads/2016-03-18/bordeaux-threads-v0.8.5.tgz">
; 19.63KB
==================================================
20,105 bytes in 0.05 seconds (417.74KB/sec)
; Fetching #<URL "http://beta.quicklisp.org/archive/cl-fad/2014-12-17/cl-fad-0.7.3.tgz">
; 24.08KB
==================================================
24,658 bytes in 0.04 seconds (573.34KB/sec)
; Fetching #<URL "http://beta.quicklisp.org/archive/external-program/2016-03-18/external-program-20160318-git.tgz">
; 10.02KB
==================================================
10,260 bytes in 0.00 seconds (10019.53KB/sec)
; Fetching #<URL "http://beta.quicklisp.org/archive/cl-unicode/2014-12-17/cl-unicode-0.1.5.tgz">
; 474.62KB
==================================================
486,011 bytes in 0.94 seconds (506.53KB/sec)
; Fetching #<URL "http://beta.quicklisp.org/archive/cl-interpol/2015-12-18/cl-interpol-0.2.5.tgz">
; 44.28KB
==================================================
45,343 bytes in 0.19 seconds (233.05KB/sec)
; Fetching #<URL "http://beta.quicklisp.org/archive/cl-csv/2015-06-08/cl-csv-20150608-git.tgz">
; 19.80KB
==================================================
20,271 bytes in 0.10 seconds (190.35KB/sec)
; Fetching #<URL "http://beta.quicklisp.org/archive/closer-mop/2016-03-18/closer-mop-20160318-git.tgz">
; 21.90KB
==================================================
22,427 bytes in 0.05 seconds (486.70KB/sec)
; Fetching #<URL "http://beta.quicklisp.org/archive/cl-ana/2016-03-18/cl-ana-20160318-git.tgz">
; 479.79KB
==================================================
491,305 bytes in 3.55 seconds (135.04KB/sec)
; Loading "cl-ana"
To load "fare-utils":
  Load 1 ASDF system:
    asdf
  Install 1 Quicklisp release:
    fare-utils
; Fetching #<URL "http://beta.quicklisp.org/archive/fare-utils/2015-12-18/fare-utils-20151218-git.tgz">
; 31.51KB
==================================================
32,270 bytes in 0.05 seconds (685.08KB/sec)
; Loading "fare-utils"
..................................................
[package fare-stateful].....
; Loading "cl-ana"
To load "trivial-utf-8":
  Install 1 Quicklisp release:
    trivial-utf-8
; Fetching #<URL "http://beta.quicklisp.org/archive/trivial-utf-8/2011-10-01/trivial-utf-8-20111001-darcs.tgz">
; 5.91KB
==================================================
6,055 bytes in 0.00 seconds (0.00KB/sec)
; Loading "trivial-utf-8"
[package trivial-utf-8].
; Loading "cl-ana"
[package cl-ana.pathname-utils]...................
[package cl-ana.package-utils]....................
[package cl-ana.functional-utils].................
[package cl-ana.string-utils].....................
[package cl-ana.list-utils].......................
[package cl-ana.generic-math]...; cc -m64 -I/usr/lib/libffi-3.2.1/include -o /home/anquegi/.cache/common-lisp/sbcl-1.3.1-linux-x64/home/anquegi/quicklisp/dists/quicklisp/software/gsll-master-0f785ddd-git/solve-minimize-fit/solver-struct__grovel-tmp2OWI3Q7U -I/home/anquegi/quicklisp/dists/quicklisp/software/cffi_0.17.1/ /home/anquegi/.cache/common-lisp/sbcl-1.3.1-linux-x64/home/anquegi/quicklisp/dists/quicklisp/software/gsll-master-0f785ddd-git/solve-minimize-fit/solver-struct__grovel.c

on the debugger:

Subprocess (:PROCESS #<SB-IMPL::PROCESS :EXITED 1>)
 with command ("cc" "-m64" "-I/usr/lib/libffi-3.2.1/include"
               "-o"
               "/home/anquegi/.cache/common-lisp/sbcl-1.3.1-linux-x64/home/anquegi/quicklisp/dists/quicklisp/software/gsll-master-0f785ddd-git/solve-minimize-fit/solver-struct__grovel-tmp2OWI3Q7U"
               "-I/home/anquegi/quicklisp/dists/quicklisp/software/cffi_0.17.1/"
               "/home/anquegi/.cache/common-lisp/sbcl-1.3.1-linux-x64/home/anquegi/quicklisp/dists/quicklisp/software/gsll-master-0f785ddd-git/solve-minimize-fit/solver-struct__grovel.c")
 exited with error code 1
   [Condition of type CFFI-GROVEL:GROVEL-ERROR]

Restarts:
 0: [RETRY] Retry PROCESS-OP on #<GROVEL-FILE "gsll" "solve-minimize-fit" "solver-struct">.
 1: [ACCEPT] Continue, treating PROCESS-OP on #<GROVEL-FILE "gsll" "solve-minimize-fit" "solver-struct"> as having been successful.
 2: [RETRY] Retry ASDF operation.
 3: [CLEAR-CONFIGURATION-AND-RETRY] Retry ASDF operation after resetting the configuration.
 4: [ABORT] Give up on "cl-ana"
 5: [*ABORT] Return to SLIME's top level.
 6: [ABORT] abort thread (#<THREAD "repl-thread" RUNNING {1005168033}>)

Backtrace:
  0: (CFFI-GROVEL:GROVEL-ERROR "~a" #<UIOP/RUN-PROGRAM:SUBPROCESS-ERROR {1008CC3B93}>)
  1: ((LAMBDA NIL :IN CFFI-GROVEL:PROCESS-GROVEL-FILE))
  2: (SB-IMPL::%WITH-STANDARD-IO-SYNTAX #<CLOSURE (LAMBDA NIL :IN CFFI-GROVEL:PROCESS-GROVEL-FILE) {1008C9896B}>)
  3: ((:METHOD ASDF/ACTION:PERFORM (CFFI-GROVEL::PROCESS-OP CFFI-GROVEL:GROVEL-FILE)) #<CFFI-GROVEL::PROCESS-OP > #<CFFI-GROVEL:GROVEL-FILE "gsll" "solve-minimize-fit" "solver-struct">) [fast-method]
  4: ((SB-PCL::EMF ASDF/ACTION:PERFORM) #<unavailable argument> #<unavailable argument> #<CFFI-GROVEL::PROCESS-OP > #<CFFI-GROVEL:GROVEL-FILE "gsll" "solve-minimize-fit" "solver-struct">)
  5: ((:METHOD ASDF/ACTION:PERFORM-WITH-RESTARTS :AROUND (T T)) #<CFFI-GROVEL::PROCESS-OP > #<CFFI-GROVEL:GROVEL-FILE "gsll" "solve-minimize-fit" "solver-struct">) [fast-method]
  6: ((:METHOD ASDF/PLAN:PERFORM-PLAN (LIST)) ((#1=#<ASDF/LISP-ACTION:PREPARE-OP > . #<ASDF/SYSTEM:SYSTEM #2="cl-ana.pathname-utils">) (#1# . #3=#<ASDF/LISP-ACTION:CL-SOURCE-FILE #2# "package">) (#4=#<ASDF..
  7: ((FLET SB-C::WITH-IT :IN SB-C::%WITH-COMPILATION-UNIT))
  8: ((:METHOD ASDF/PLAN:PERFORM-PLAN :AROUND (T)) ((#1=#<ASDF/LISP-ACTION:PREPARE-OP > . #<ASDF/SYSTEM:SYSTEM #2="cl-ana.pathname-utils">) (#1# . #3=#<ASDF/LISP-ACTION:CL-SOURCE-FILE #2# "package">) (#4=#..
  9: ((FLET SB-C::WITH-IT :IN SB-C::%WITH-COMPILATION-UNIT))
 10: ((:METHOD ASDF/PLAN:PERFORM-PLAN :AROUND (T)) #<ASDF/PLAN:SEQUENTIAL-PLAN {100899E013}> :VERBOSE NIL) [fast-method]
 11: ((:METHOD ASDF/OPERATE:OPERATE (ASDF/OPERATION:OPERATION ASDF/COMPONENT:COMPONENT)) #<ASDF/LISP-ACTION:LOAD-OP :VERBOSE NIL> #<ASDF/SYSTEM:SYSTEM "cl-ana"> :VERBOSE NIL) [fast-method]
 12: ((SB-PCL::EMF ASDF/OPERATE:OPERATE) #<unused argument> #<unused argument> #<ASDF/LISP-ACTION:LOAD-OP :VERBOSE NIL> #<ASDF/SYSTEM:SYSTEM "cl-ana"> :VERBOSE NIL)
 13: ((LAMBDA NIL :IN ASDF/OPERATE:OPERATE))
 14: ((:METHOD ASDF/OPERATE:OPERATE :AROUND (T T)) #<ASDF/LISP-ACTION:LOAD-OP :VERBOSE NIL> #<ASDF/SYSTEM:SYSTEM "cl-ana"> :VERBOSE NIL) [fast-method]
 15: ((SB-PCL::EMF ASDF/OPERATE:OPERATE) #<unused argument> #<unused argument> ASDF/LISP-ACTION:LOAD-OP "cl-ana" :VERBOSE NIL)
 16: ((LAMBDA NIL :IN ASDF/OPERATE:OPERATE))
 17: (ASDF/CACHE:CALL-WITH-ASDF-CACHE #<CLOSURE (LAMBDA NIL :IN ASDF/OPERATE:OPERATE) {100898D65B}> :OVERRIDE NIL :KEY NIL)
 18: ((:METHOD ASDF/OPERATE:OPERATE :AROUND (T T)) ASDF/LISP-ACTION:LOAD-OP "cl-ana" :VERBOSE NIL) [fast-method]
 19: ((:METHOD ASDF/OPERATE:OPERATE :AROUND (T T)) ASDF/LISP-ACTION:LOAD-OP "cl-ana" :VERBOSE NIL) [fast-method]
 20: (ASDF/OPERATE:LOAD-SYSTEM "cl-ana" :VERBOSE NIL)
 21: (QUICKLISP-CLIENT::CALL-WITH-MACROEXPAND-PROGRESS #<CLOSURE (LAMBDA NIL :IN QUICKLISP-CLIENT::APPLY-LOAD-STRATEGY) {100897D58B}>)
 22: (QUICKLISP-CLIENT::AUTOLOAD-SYSTEM-AND-DEPENDENCIES "cl-ana" :PROMPT NIL)
 23: ((:METHOD QL-IMPL-UTIL::%CALL-WITH-QUIET-COMPILATION (T T)) #<unavailable argument> #<CLOSURE (FLET QUICKLISP-CLIENT::QL :IN QUICKLISP-CLIENT:QUICKLOAD) {10090A0B5B}>) [fast-method]
 24: ((:METHOD QL-IMPL-UTIL::%CALL-WITH-QUIET-COMPILATION :AROUND (QL-IMPL:SBCL T)) #<QL-IMPL:SBCL {100676FED3}> #<CLOSURE (FLET QUICKLISP-CLIENT::QL :IN QUICKLISP-CLIENT:QUICKLOAD) {10090A0B5B}>) [fast-me..
 25: ((:METHOD QUICKLISP-CLIENT:QUICKLOAD (T)) #<unavailable argument> :PROMPT NIL :SILENT NIL :VERBOSE NIL) [fast-method]
 26: (QL-DIST::CALL-WITH-CONSISTENT-DISTS #<CLOSURE (LAMBDA NIL :IN QUICKLISP-CLIENT:QUICKLOAD) {1009075B2B}>)
 27: (SB-INT:SIMPLE-EVAL-IN-LEXENV (QUICKLISP-CLIENT:QUICKLOAD (QUOTE CL-ANA)) #<NULL-LEXENV>)
 28: (EVAL (QUICKLISP-CLIENT:QUICKLOAD (QUOTE CL-ANA)))
 29: (SWANK::%EVAL-REGION "(ql:quickload 'cl-ana) ..)
 30: ((LAMBDA NIL :IN SWANK::%LISTENER-EVAL))
 31: (SWANK-REPL::TRACK-PACKAGE #<CLOSURE (LAMBDA NIL :IN SWANK::%LISTENER-EVAL) {100907595B}>)
 32: (SWANK::CALL-WITH-BUFFER-SYNTAX NIL #<CLOSURE (LAMBDA NIL :IN SWANK::%LISTENER-EVAL) {100907593B}>)
 33: (SWANK::%LISTENER-EVAL "(ql:quickload 'cl-ana) ..)
 34: (SB-INT:SIMPLE-EVAL-IN-LEXENV (SWANK-REPL:LISTENER-EVAL "(ql:quickload 'cl-ana) ..)
 35: (EVAL (SWANK-REPL:LISTENER-EVAL "(ql:quickload 'cl-ana) ..)
 36: (SWANK:EVAL-FOR-EMACS (SWANK-REPL:LISTENER-EVAL "(ql:quickload 'cl-ana) ..)
 37: (SWANK::PROCESS-REQUESTS NIL)
 38: ((LAMBDA NIL :IN SWANK::HANDLE-REQUESTS))
 39: ((LAMBDA NIL :IN SWANK::HANDLE-REQUESTS))
 40: (SWANK/SBCL::CALL-WITH-BREAK-HOOK #<FUNCTION SWANK:SWANK-DEBUGGER-HOOK> #<CLOSURE (LAMBDA NIL :IN SWANK::HANDLE-REQUESTS) {100517000B}>)
 41: ((FLET SWANK/BACKEND:CALL-WITH-DEBUGGER-HOOK :IN "/home/anquegi/.roswell/impls/ALL/ALL/quicklisp/dists/quicklisp/software/slime-v2.17/swank/sbcl.lisp") #<FUNCTION SWANK:SWANK-DEBUGGER-HOOK> #<CLOSURE ..
 42: (SWANK::CALL-WITH-BINDINGS ((*STANDARD-INPUT* . #1=#<SWANK/GRAY::SLIME-INPUT-STREAM {100507E493}>) (*STANDARD-OUTPUT* . #2=#<SWANK/GRAY::SLIME-OUTPUT-STREAM {100514E423}>) (*TRACE-OUTPUT* . #2#) (*ERR..
 43: (SWANK::HANDLE-REQUESTS #<SWANK::MULTITHREADED-CONNECTION {1004148F83}> NIL)
 44: ((FLET #:WITHOUT-INTERRUPTS-BODY-1156 :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE))
 45: ((FLET SB-THREAD::WITH-MUTEX-THUNK :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE))
 46: ((FLET #:WITHOUT-INTERRUPTS-BODY-359 :IN SB-THREAD::CALL-WITH-MUTEX))
 47: (SB-THREAD::CALL-WITH-MUTEX #<CLOSURE (FLET SB-THREAD::WITH-MUTEX-THUNK :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE) {7FFFF0A8ED5B}> #<SB-THREAD:MUTEX "thread result lock" owner: #<SB-THREAD:THR..
 48: (SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE #<SB-THREAD:THREAD "repl-thread" RUNNING {1005168033}> NIL #<CLOSURE (LAMBDA NIL :IN SWANK-REPL::SPAWN-REPL-THREAD) {1005167F9B}> (#<SB-THREAD:THREAD "re..
 49: ("foreign function: call_into_lisp")
 50: ("foreign function: new_thread_trampoline")

I also get this condition with gsll
[Condition of type CFFI-GROVEL:GROVEL-ERROR]

and when I try to execute the c compiler on

 2016-03-25 13:52:53 ☆  manjaro-pc in ~
○ → cc -m64 -I/usr/lib/libffi-3.2.1/include -o /home/anquegi/.cache/common-lisp/sbcl-1.3.1-linux-x64/home/anquegi/quicklisp/dists/quicklisp/software/gsll-master-0f785ddd-git/solve-minimize-fit/solver-struct__grovel-tmp2OWI3Q7U -I/home/anquegi/quicklisp/dists/quicklisp/software/cffi_0.17.1/ /home/anquegi/.cache/common-lisp/sbcl-1.3.1-linux-x64/home/anquegi/quicklisp/dists/quicklisp/software/gsll-master-0f785ddd-git/solve-minimize-fit/solver-struct__grovel.c

In file included from /home/anquegi/.cache/common-lisp/sbcl-1.3.1-linux-x64/home/anquegi/quicklisp/dists/quicklisp/software/gsll-master-0f785ddd-git/solve-minimize-fit/solver-struct__grovel.c:10:0:
/home/anquegi/.cache/common-lisp/sbcl-1.3.1-linux-x64/home/anquegi/quicklisp/dists/quicklisp/software/gsll-master-0f785ddd-git/solve-minimize-fit/solver-struct__grovel.c: In function ‘main’:
/home/anquegi/quicklisp/dists/quicklisp/software/cffi_0.17.1/grovel/common.h:8:62: error: ‘gsl_multifit_fdfsolver {aka struct <anonymous>}’ has no member named ‘J’
 #define offsetof(type, slot) ((long) ((char *) &(((type *) 0)->slot)))
                                                              ^
/home/anquegi/.cache/common-lisp/sbcl-1.3.1-linux-x64/home/anquegi/quicklisp/dists/quicklisp/software/gsll-master-0f785ddd-git/solve-minimize-fit/solver-struct__grovel.c:32:56: note: in expansion of macro ‘offsetof’
   fprintf(output, " :offset %lli)", (long long signed) offsetof(gsl_multifit_fdfsolver, J));

I have no idea of what happens ^

Inaccuracies in the "plotting" documentation on the Wiki

The Wiki doesn't seem up to date with the code, and since GitHub doesn't support pull requests on Wiki pages, all I can do is point out what's not working.

  1. The default terminal type for gnuplot is qt, not wxt as claimed. If you use a gnuplot without qt support compiled in, you get neither a plot nor an error message. That took me a lot of time to figure out.

  2. The second example, (draw (plot2d (lines #'sin #'cos))), fails because lines is cl-ana.string-utils:lines, which expects string arguments. I haven't found anything shorter than (draw (plot2d (list (line #'sin) (line #'cos)))) to get the desired result. The lines function is also mentioned under "Plotting structure", but doesn't seem to exist any more.

How to control cl-ana.table-viewing gnuplot behavior?

I am unfamiliar with gnuplot. I was able to get plots by specifying the :file keyword in the cl-ana.plotting package, but the cl-ana.table-viewing package doesn't allow users to pass this through. So:

  1. Does gnuplot somehow display graphs from within the terminal? I am running these commands from within SLIME from withing graphical emacs. Without an output file, I simply get back a T, so I think something is happening, but I'm unsure what.
  2. Do we want the table-viewing package to pass through plotting keywords? This seems like a high amount of coupling. Instead, perhaps we could pass in a closure in which is responsible for the plotting, and takes a limited set of arguments. This way, the caller may create the closure in the stack-frame above, utilizing all the plotting arguments they need, and then pass it into table-viewing functions. This keeps table-viewing and plotting decoupled, but still working together.

We would like to name an official chatroom for cl-ana

Hey all! I've spoken with @ghollisjr, and we would like to name a place users can go for live help/discussion of cl-ana. I thought I would open an issue to solicit feedback so we can get something going.

Here are the options I'm aware of:

  • #lisp on Freenode: As far as I can tell, this is where most Common Lisp users chat. A simple "official" pointer here would direct anyone who wants to get help to the room.
  • #cl-ana on Freenode: IRC is tried and true, and as I mentioned, it's where #lisp is. On the other hand, it's a little anachronistic, and it looks as though it might be getting edged out by newer technologies (e.g. Mozilla is obviating their own IRC network).
  • Matrix: This is an open protocol which supports federation. There is a free, reference instance in https://matrix.org. The protocol and the client support a lot of the new features expected of newer chat solutions. I personally use this and like it a lot.
  • Discord: Rust has decided to go with this. It looks like it has all the same benefits as Matrix, but it is proprietary and closed source. There is also an existing Lisp Discord Server. I don't know how popular that is.
  • https://gitter.im/: A lot of projects use this because it's tied to your GitHub path. I don't recommend it because it's a closed source, proprietary system. I also don't know how well things can be administered.
  • Slack: Tons of people use this, and it has a lot of mind-share. I also don't recommend this because it's proprietary, closed source, and you have to pay to keep chat history.

My personal recommendation and preference would be Matrix. I used to be a die-hard IRC user (and I also still idle there using Matrix's IRC bridge), but Matrix is really nice for a couple reasons:

  • Users don't have to set up a bouncer just to idle
  • Markdown formatting, media, and voice chat are supported out of the box
  • Actual clients for every platform: Android, iOS, Gnome, KDE, Windows, Mac, emacs
  • Better administration? I don't have a history of administering chatrooms, but the person making this same decision for Mozilla listed difficulties in moderating IRC as one of their pain points. Matrix does have admin levels and what look to be all the usual bells and whistles:
    image

EDIT: I forgot about Discord.

System failures - CL-ANA.PATHNAME-UTILS does not exist

http://report.quicklisp.org/2015-09-23/failure-report/cl-ana.html has a log. I'm getting this:

Unhandled SB-KERNEL:SIMPLE-PACKAGE-ERROR in thread #<SB-THREAD:THREAD "main thread" RUNNING {10040956C3}>: The name "CL-ANA.PATHNAME-UTILS" does not designate any package.

That's affecting:

  • cl-ana.table-viewing
  • cl-ana.plotting
  • cl-ana.makeres
  • cl-ana.makeres-table
  • cl-ana.makeres-progress
  • cl-ana.makeres-macro
  • cl-ana.makeres-graphviz
  • cl-ana.makeres-branch
  • cl-ana.makeres-block

Readibility Suggestion: predicates should end in `p` or `-p`.

It is a common idiom in Common Lisp for predicates to end in p as in listp or -p as in hash-table-p. I was reviewing some of the makeres code, and noticed a bunch of predicates ending in ?, e.g. ltab?. This is more of a scheme idiom than a Common Lisp idiom.

Someone once gave this advice to me, and now I pass it onto you. Please consider changing this code to fit Common Lisp idioms.

`cl-ana.table:open-plist-table` does not respect lower-case symbols

The root cause is because this does not round-trip properly: (cl-ana.symbol-utils:keywordify (cl-ana.string-utils:lispify :|foo-bar|)). When open-plist-table generates the field-names, it calls lispify which upper-cases the field name. Later, when calling table-field-symbols, the upper-case symbol is produced rather than the case-sensitive symbol.

This causes issues when working with plists with case-sensitive field names since after you create the plist table, if you are looping over the field symbols and calling (field-values table field-symbol), it won't be able to find the field.

User closer-mop when appropiate

In slot-names you implement the same function twice using implementation specific packages with features flags. You could instead use the implementation-compatibility layer closer-mop, for example:

(defun slot-names (obj)
  "Returns the list of slot symbols for a structure/CLOS class
instance."
  (loop
     for slot in (c2mop:class-slots (class-of obj))
     collect (c2mop:slot-definition-name slot))

cl-ana.tensor:tensor-map raises condition when used with vectors with the fill-pointer different than capacity

I am using cl-ana.generic-math:/ with two vectors which have a capacity which is larger than their fill-pointer, i.e.: (make-array 20700 :fill-pointer 0). When type-of is called here to be passed along as type, it returns something like (vector t 20700). Then, further in the call chain, make-sequence is called with a type restriction and a length that disagree with one another: (make-sequence (vector t 20700) 20640 :initial-element 0.0d0)). This raises the condition:

The length requested (20640) does not match the type restriction in (VECTOR
                                                                     T
                                                                     20700).

In my case, this arises because I'm increasing a vector's capacity in increments of 100 as I populate it. I can correct this by only monotonically incrementing the capacity, but that will be less performant.

What is the proper way to work in a columnar fashion?

Say I have the following table:

A B C D
1 2 3 4
2 4 6 8
4 8 12 16

There are several operations we may want to perform on this table in a columnar fashion:

  • Dropping a column from the table
  • Calculating the standard deviation of a column, e.g. possibly (cl-ana.statistics:standard-deviation (cl-ana.table:table-column "A"))
  • Transforming all values in a column.

I'm still learning cl-ana, so I'm unsure if there's already an established way to perform these types of operations. Is there?

If not, what might we want this type of access pattern to look like? Currently, I'm thinking a (defun pivot (table create-table-fn) ...) function. This would transform the table into a columnar table (in-memory or not, depending on what type of table the closure is creating) which would make it easy to get all the values in a column by getting a single row (which, while we're here, is there already a way to get a specific row of data?).

So this would make the above table look something like this:

col_name row_1 row_2 row_3
A 1 2 4
B 2 4 8
C 3 6 12
D 4 8 16

What does this mean for cl-ana's underlying table representation?

Cannot load due an error in antik by lisp-unit test library

I Know that this is not problem about this library but I cannot load this library, and I do not get any response from here and here Since last time I get good help from you I will also ask from some solution to this problem:

I have updated quicklisp:

CL-USER> (ql:update-all-dists)
1 dist to check.
Downloading http://beta.quicklisp.org/dist/quicklisp.txt
##########################################################################
You already have the latest version of "quicklisp": 2017-01-24.
NIL

Then when I try to load cl-ana

CL-USER> (ql:quickload :cl-ana)
To load "cl-ana":
  Load 1 ASDF system:
    cl-ana
; Loading "cl-ana"
..................................................
[package cl-ana.pathname-utils]...................
[package cl-ana.package-utils]....................
[package cl-ana.functional-utils].................
[package cl-ana.string-utils].....................
[package cl-ana.list-utils].......................
[package cl-ana.generic-math].....................
[package metabang.bind]...........................
[package metabang.bind.developer].................
[package editor-hints.named-readtables]...........
[package editor-hints.named-readtables]...........
[package antik]...................................
[package grid]....................................
[package affi].....................
; 
; caught ERROR:
;   READ error during COMPILE-FILE:
;   
;     Symbol "NUMBER-EQUAL" not found in the LISP-UNIT package.
;   
;       Line: 25, Column: 40, File-Position: 993
;   
;       Stream: #<SB-INT:FORM-TRACKING-STREAM for "file /home/anquegi/.roswell/lisp/quicklisp/dists/quicklisp/software/antik-master-ad6432e3-git/grid/tests/augment.lisp" {1001D24143}>
; 
; compilation unit aborted
;   caught 3 fatal ERROR conditions
;   caught 1 ERROR condition
; Evaluat(ion aborted on #<UIOP/LISP-BUILD:COMPILE-FILE-ERROR {1001D44793}>.

I can load lisp-unit, but the symbol number-equal does not exist:

CL-USER> (ql:quickload :lisp-unit)
To load "lisp-unit":
Load 1 ASDF system:
lisp-unit
; Loading "lisp-unit"

(:LISP-UNIT)
CL-USER> (describe 'lisp-unit:number-equal)
; Evaluation aborted on #<SB-INT:SIMPLE-READER-PACKAGE-ERROR "Symbol ~S not found in the ~A package." {100F1FBCE3}>.

I do not know how to proceed.

Thanks in advance for your help

questions about Installation of HDF5 and GSL

Thank you for developing cl-ana package.

I am on Debian GNU/Linux. For the required package: HDF5 and GSL, could I just install from Debian package manager the following packages:

hdf5-tools
gsl-bin

Are they enough for using cl-ana?

blank terminals after plots

Hello
I have been using cl-ana to create histograms and barcharts. With the last update from quicklisp the resulting terminal(wxt, qt) or file(pdf, png, or svg) is blank. The behaviour is similar to what happens with a gnuplot session that is expecting more input.

I am using gnuplot 4.6 patchlevel 4 on ubuntu 14.04

For example:

(cl-ana.plotting:draw #'sin)

creates a blank qt terminal

Thanks very much.
Matthew

Functions in cl-ana.generic-math can enter infinite loops with sequences with nils

Here is the smallest reproducible test case: (cl-ana.generic-math:add 1 nil). This will never return. My guess is that cl-ana is swallowing a condition somewhere in a loop.

A higher order issue may be how this edge-cases is even entered. I was calling cl-ana.statistics:mean passing a vector with nils interspersed throughout.

xticslabels option for boxes

More of a question than an issue. I would like to create categorical barcharts using the cl-ana gnuplot interface. I have tried a few approaches to get xticlabels to work in the :line-args slots (see below). I wanted to check if I there is a way to do this prior to trying to find my way towards to adding this option.

I would like to replicate this gnuplot code

plot 'barplot.dat' using 2:xticlabels(1) with boxes

with cl-ana.plotting:draw

(cl-ana.plotting:draw  '(("blue" . 3)
                        ("red" . 2)
                        ("green" . 1))
                       :line-args '(:style "boxes"
                                    :line-options "2:xticlabels(1)")
                      :plot-args '(:x-range (0 . 4)
                                   :y-range (0 . 4))
                      :page-args (list :terminal (cl-ana.plotting:wxt-term :size (cons 800 600))))

Thanks for a great package

No applicable method for the generic function cl-ana.table:table-load-next-row

Hi,

I'm trying to run create simple project that utilizes ltab functionality. I'm following sample code from cl-ana/makeres-table/tests/tabletrans-test.lisp which I simplified like so:

(defpackage satur
  (:use :cl :cl-ana :cl-ana.makeres :cl-ana.plotting :cl-ana.makeres-progress :cl-ana.table-utils))

(in-package :cl-ana)

(defproject example "/home/example/Sources/exampleproj/"
  ;; progresstrans prints progress so you can see how a computation unfolds
  (list #'progresstrans #'macrotrans #'tabletrans)
  (fixed-cache 5))


(defres source
  (srctab (plist-opener '((:x 1)
                          (:x 2)
                          (:x 3)))))

(defres filtered
  (ltab (res source)
      ()
    (when (< (field x) 4)
      ;; you only have to add new fields, all source
      ;; fields not shadowed are still available:
      (push-fields
       ;; new field y, x is still accessible, unshadowed
       (y (* 2 (field x)))))))

(defres canon
  (tab (res filtered)
      ()
      (hdf-opener "/tmp/canon.h5"
                  '(("X" . :int)
                    ("Y" . :float)
                    ("Z" . :float)))
    (push-fields (x (field x))
                 (y (sqrt (field y)))
                 (z (float
                     (expt (field y)
                           2))))))

When I compile/load the project and invoke (makeres) in REPL, I get following error which I'm unable (or don't know how) to debug any further.

There is no applicable method for the generic function
  #<STANDARD-GENERIC-FUNCTION CL-ANA.TABLE:TABLE-LOAD-NEXT-ROW (7)>
when called with arguments
  (NIL).
   [Condition of type SB-PCL::NO-APPLICABLE-METHOD-ERROR]

Restarts:
 0: [RETRY] Retry calling the generic function.
 1: [RETRY] Retry SLY mREPL evaluation request.
 2: [*ABORT] Return to SLY's top level.
 3: [ABORT] abort thread (#<THREAD "sly-channel-1-mrepl-remote-1" RUNNING {1007D16073}>)

Backtrace:
 0: ((:METHOD NO-APPLICABLE-METHOD (T)) #<STANDARD-GENERIC-FUNCTION CL-ANA.TABLE:TABLE-LOAD-NEXT-ROW (7)> NIL) [fast-method]
 1: (SB-PCL::CALL-NO-APPLICABLE-METHOD #<STANDARD-GENERIC-FUNCTION CL-ANA.TABLE:TABLE-LOAD-NEXT-ROW (7)> (NIL))
 2: (CL-ANA.MAKERES::COMPSEQFN
      [No Locals])
 3: ((LAMBDA (&REST CL-ANA.MAKERES::ARGS) :IN CL-ANA.MAKERES::COMPRES
      Locals:
        CL-ANA.MAKERES::FN = #<FUNCTION CL-ANA.MAKERES::COMPSEQFN>
        #:LOOP-LIST-2 = NIL
        CL-ANA.MAKERES::TARGET-FNS = (#<FUNCTION CL-ANA.MAKERES::COMPSEQFN {531EE45B}> #<FUNCTION CL-ANA.MAKERES::COMPSEQFN {531EE64B}> #<FUNCTION CL-ANA.MAKERES::COMPSEQFN>)))
 4: (MAKERES
      Locals:
        ARGS = NIL
        SB-C::THING = #<CLOSURE (LAMBDA (&REST CL-ANA.MAKERES::ARGS) :IN CL-ANA.MAKERES::COMPRES) {10058CAC9B}>)
 5: (SB-INT:SIMPLE-EVAL-IN-LEXENV (MAKERES) #<NULL-LEXENV>)
 6: (EVAL (MAKERES))

The same error happens when I'm trying to use the result of ltab in dotab operator like so:

(defres (filtered sum)
  (dotab (res filtered)
      ((sum 0))
      sum
    (incf sum (field y))))

The issue manifests both on the current master - f616c5c and on fa7cee4 . I use Guix-packaged distribution of cl-ana sources and SBCL 2.0.3.

Any help would be greatly appreciated.

New areas in cl-ana: in-memory analysis/summarizing, and data pre-processing

I am preparing to propose some functionality I have copied from numpy, and pandas, but I'm unsure which packages they belong in. I believe this may be new areas of functionality for cl-ana, and I need the advice of someone much more familiar with cl-ana, data science, and machine learning, than I currently am.

New Area: Analysis & summarization

From afar (I have yet to sit down and fully process it), DOP appears to be great at minimizing the passes over data to get the results the user has declared. However, in theory, a user could be exploring data in such a way that a columnar view of the data might minimize the number of passes. In such a case, a user may not know enough about the data to write all the declarative cases up front, and so DOP might do a minimal number of passes, locally, but not globally, as the user adds declarations. A columnar table would transpose the table so that any column-oriented operations would be globally minimal as the user thinks of new ways to poke at the data.

Some such operations I've brought over from pandas are: summarize (populated counts, and types, for all fields), value-counts (counts distinct values for a field), correlation-matrix (creates a matrix of the correlation coefficient between all columns). There are other, useful, summarizing functions we can take from pandas.

The thing these functions seem to have in common is that they summarize all fields at a high level to allow users to get a "feel" for the data before doing proper analysis. From what I gather, most users expect these operations to be very fast.

Should these live in cl-ana.summarization?

A more performant in-memory representation?

Several of the summarization operations would best be done on tables that are in-memory when possible. I think there are probably quite a few data sets out there that could be held in memory. We currently have plist-table, and that might be good enough. However, we might be able to come up with a more performant version based on multi-dimensional arrays, and have the customary current-row accessor simply be a tuple of (row col). This might be much faster, and still easy to understand. I'm not sure if this is warranted yet, but it's an idea.

New Area: Preprocessing & Data Munging

I haven't yet written any functions to mirror pandas functions. From what I understand, the step prior to training ML models is coercing the data into a shape, and corpus, that is conducive to the ML model you'd like to use. This involves dropping columns, transforming values of columns from strings to numeric values, etc. I don't think this is actual machine learning, so I think it might be a good fit for living in cl-ana.

Should these live in cl-ana.transform?

Mirroring popular python data science namespaces?

Tensorflow has a namespace for Keras which mirrors the Keras project's API, but all of the operations are in terms of Tensorflow primitives. This allows communities which were very familiar with Keras to work seamlessly with Tensorflow. It would be a bit strange to follow suit since we would be transcending the language barrier as well, but would we want a facade on cl-ana operations in terms of popular data science libraries? E.g. cl-ana.pandas?

What are the boundaries of cl-ana?

cl-ana has cl-ana.statistical-learning, but it is not my impression that it is trying to be a machine learning library unto itself. But what are its boundaries? And how could it best interoperate with other ecosystems?

My current understanding is that using machine learning involves ~5 stages:

  1. Data Retrieval
  2. Data Exploration
  3. Data Preprocessing/Munging
  4. Model Training
  5. Operationalizing the Model

I was planning on using Common Lisp, and cl-ana for steps 1-3, and then feeding the data to other ecosystems after that, e.g. Tensorflow.

When planning out the packages and functionality cl-ana, it might be helpful to have a clear idea how cl-ana might interoperate with other tooling.

EDIT:

I forgot this! I did some analysis of some popular machine learning packages to see what types of namespaces they expose. It may be helpful to consider the shape of these other packages when considering where to place things in cl-ana.

  • scikit-learning

    • base
    • calibration
    • cluster
      • biculture
    • compose
    • covariance
    • cross_decomposition
    • datasets
    • decomposition
    • discrimination_analysis
    • dummy
    • ensemble
    • exceptions
    • feature_extraction
    • feature_selection
    • gaussian_process
    • isotonic
    • impute
    • kernel_aproximation
    • kernel_ridge
    • linear_model
    • manifold
    • metrics
    • mixture
    • model_selection
    • multiclass
    • multioutput
    • naive_bayes
    • neighbors
    • neural_network
    • pipeline
    • preprocessing
    • random_projection
    • semi_supervised
    • svm
    • tree
    • utils
  • keras

    • activations
    • applications
    • backend
    • callbacks
    • constraints
    • datasets
    • estimator
    • experimental
    • initializers
    • layers
    • losses
    • metrics
    • models
    • optimizers
    • preprocessing
      • sequence
      • text
      • image
    • regularizers
    • utils
    • wrappers
  • tensorflow

    • app
    • autograph
    • bitwise
    • compat
    • contrib
    • data
    • debugging
    • distribute
    • distributions
    • dtypes
    • errors
    • estimator
    • experimental
    • feature_column
    • gfile
    • graph_util
    • image
    • initializers
    • io
    • keras
    • layers
    • linalg
    • lite
    • logging
    • losses
    • manip
    • math
    • metrics
    • nn
    • profiler
    • python_io
    • quantization
    • queue
    • ragged
    • random
    • resource_loader
    • saved_model
    • sets
    • signal
    • sparse
    • spectral
    • strings
    • summary
    • sysconfig
    • test
    • train
    • version
  • pytorch

    • Tensor
    • sparse
    • cuda
    • Storage
    • nn
      • functional
      • init
    • optim
    • autograd
    • distributed
    • distributions
    • hub
    • jit
    • multiprocessing
    • utils
      • bottleneck
      • checkpoint
      • cpp_extension
      • data
      • dlpack
      • model_zoo
      • tensorboard
    • onnx

usage: How to read existing hdf5 files

Hi Gary,

I have existing netcdf4 files that I am trying to read into CL-ana using hdf5 table. These files are climatology data.

its not clear to me how what I am doing wrong so hopefully you can can comment on this.

As an example here is my code

(defparameter hdf5-file (open-hdf-file "/Users/dbh/lisp/cl-netcdf/test-data/gis4.nc" :direction :input))

;; returns handle 16777216

(defparameter temp-table (open-hdf-table hdf5 "/tempanomaly"))

; Evaluation aborted on #<SIMPLE-ERROR "Non compound type given to typespec->field-names" {1011EABB73}>.

I can use h5dump to inspect the contents of the file and its valid. So I am missing some step, just not sure what.

Any advice appreciated

HDF5 String support

The HDF5 functions don't seem to currently support string datatypes.

  • The +H5T-STRING class is not being checked in hdf-type->typespec in hdf-typespec, preventing loading anything with a string field at all.
  • I'm having trouble locating useful documentation about how to extract the contents of a string field with the C API. It seems to work differently depending on if it's static or not and it doesn't seem to truly map to a C string even if not using the other encoding options, but that's the best I managed to find out. I can try to look at this if this is useful and if someone has an idea of where to look, though.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.