ghollisjr / cl-ana Goto Github PK
View Code? Open in Web Editor NEWFree (GPL) Common Lisp data analysis library with emphasis on modularity and conceptual clarity.
License: GNU General Public License v3.0
Free (GPL) Common Lisp data analysis library with emphasis on modularity and conceptual clarity.
License: GNU General Public License v3.0
cl-ana is a free (GPL) library of Common Lisp code for doing data analysis via either straightforward programming or dependency oriented programming. It aims to be a general purpose framework for analyzing small and large scale datasets, including binned data analysis and visualization. Much effort has been made to ensure modularity so that individual components may be used/re-used for a new purpose. cl-ana is available via quicklisp (http://www.quicklisp.org/beta/); for other dependencies see below. Example code for using some of the functionality is contained in various test.lisp files throughout the project; the full documentation is located on the wiki page: http://github.com/ghollisjr/cl-ana/wiki There is a Matrix live chat for cl-ana located here: https://matrix.to/#/!cANztuGawRmRSdyLhu:matrix.org?via=matrix.org Public address: #cl-ana:matrix.org Whenever possible, features are implemented via generic functions so that users can extend cl-ana to whatever they want to do. The functionality of this framework is divided into two layers. The lower layer provides basic libraries for the following: * Tabulated data: Supports data tables read-from and written-to HDF5 files (buffered read-write), ntuples (like CERN's PAW uses), comma separated value (CSV) files, and plists for all-in-memory operation. Adding a new table type is as easy as extending the table class and defining 4 functions for the table type. (The libraries cl-csv and GSLL provide the backbone for the CSV and ntuple tables; the HDF5 table access is completely new.) * Histograms: Supports categorical, contiguous, and sparse histograms of arbitrary dimensions. Provides functional access to histograms via mapping (which allows reducing) and filtering. * Nonlinear least squares fitting: Allows plain-old lisp functions to be fitted to data using the GNU Scientific Library (GSL); infers the number of fit parameters the function takes from the initial parameter guess. Can fit against alists of data & histograms and is easily extended to allow fitting against other types by defining a single function for the new type. * Plotting: Uses gnuplot to plot histograms, data samples, plain-old lisp functions, and strings interpreted as formulae. * Generic math: Common Lisp doesn't provide user-extendable math functions; cl-ana provides its own versions of the basic math functions CL gives you but with the ability to extend them for whatever types you want. Also provides use-gmath which easily adds generic-math's symbols to a package even if you already use the common-lisp package. Already provided are extensions to the generic math functions for error propogation, quantities (values with units), and treating CL sequences as tensors with all the usual math functions being applied element-by-element in a MATLAB/GNU Octave fashion. The higher layer provides dependency oriented programming. Dependency oriented programming is my own term for defining your program in terms of targets needing execution as opposed to an explicit computation. It is a hybrid of imperative and declarative programming. The target table can be transformed to allow for optimizations. Provided optimizations include table pass merge and collapse which minimize the number of passes over source datasets. Also included are various utilities which have use in a variety of places. The main principles of the project are: 1. Conceptual clarity and documentation. These are often neglected in software development, to the point where reading code can cause one to drink. Conceptual clarity refers to the way in which code is written and the way in which algorithms are implemented: A slightly slower but easier to understand implementation is favored above a labyrinth of bit shifts. Documentation should always be provided for any feature along with example usages--ESPECIALLY with example usages, as these are sometimes more helpful than the actual documentation. 2. Modularity/Bottom-up design. Whenever two components have a common feature/function/dependency, this commonality should be placed in a separate sublibrary. To limit sublibrary number explosion, this should be done in conjunction with point 1 preserving conceptual clarity. For example list utilities should be a sublibrary for general purpose list functions. Further: If a feature can be provided by either a set of utility functions or a type heirarchy, strong preference should be given to the utility functions approach; i.e. one should have to argue long and hard before stratifying things into classes. 3. Lispyness. Whenever possible, already established motifs from Lisp programming practices should be used. This goes for naming conventions, access macros, and the general desire to provide at least functional access to things. Each sublibrary should go in its own directory and come with its own .asdf file so that one can choose any subset of functionality to use from the library. As you will see in reading the code, I've tried to keep everything well documented. I place a high emphasis on documentation since I know how easy it is to fall out of practice. The last thing I want is for the usual cargo-cult around old code to emerge. Disclaimer: much of the code I've written has been part of my own personal development as a Lisp programmer; this is my first non-trivial project with Lisp, and coming from a C++ background I've had to learn quite a few things along the way. This means that there may be some dark corners of the code which need help from more experienced coders/myself at a later time. In addition, I haven't used any general testing framework. (To be honest I haven't needed one either as I've done the development in a highly bottom-up way, testing everything as I write it.) In short this is a work in progress. The code tries to be self documented, but I'm working on a tutorial/user's guide on the github wiki page to explain how to use the software to best effect. The dependencies for this project are: * HDF5 (http://www.hdfgroup.org/HDF5/) * GSL (http://www.gnu.org/software/gsl/) * CFFI (http://common-lisp.net/project/cffi/) * GSLL (http://common-lisp.net/project/gsll/) * Alexandria (http://common-lisp.net/project/alexandria/) * iterate (http://common-lisp.net/project/iterate/) * antik (http://www.common-lisp.net/project/antik/) * closer-mop (http://common-lisp.net/project/closer/closer-mop.html) * cl-csv (https://github.com/AccelerationNet/cl-csv) * gnuplot (http://www.gnuplot.info/) * cl-fad (http://weitz.de/cl-fad/) * external-program (http://github.com/sellout/external-program) All of the Lisp dependencies can be installed via quicklisp (http://www.quicklisp.org/). I copied the API for using gnuplot from gnuplot_i (http://ndevilla.free.fr/gnuplot/). gnuplot_i was written by N. Devillard <[email protected]>, released to the public domain, and is a no-nonsense gnuplot session manager written in C. I use SBCL (http://www.sbcl.org/) almost exclusively; however, I also intentionally try to ensure that all the code only assumes what the CL standard provides. Anytime implementation-specific functionality is needed I try to use third party libraries for this.
Hi,
it seems as if variance method is defined twice for the histogram argument type: once in defgeneric and once as defmethod.
Regards, Frank
Building with SBCL 2.0.5 / ASDF 3.3.1 for quicklisp dist creation.
Trying to build commit id 2d02056
cl-ana.hdf-typespec fails to build with the following error:
Unhandled PACKAGE-DOES-NOT-EXIST in thread #<SB-THREAD:THREAD "main thread" RUNNING {1000A10083}>: The name "HDF5" does not designate any package.
Hello,
I am trying to install cl-ana, but it keeps giving me the following errors that render me unable to get it working:
This is SBCL 2.3.5, an implementation of ANSI Common Lisp.
More information about SBCL is available at http://www.sbcl.org/.
SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses. See the CREDITS and COPYING files in the
distribution for more information.
main': /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.c:23: undefined reference to
H5check_version'H5open' /usr/bin/ld: /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.c:24: undefined reference to
H5check_version'H5open' /usr/bin/ld: /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.c:37: undefined reference to
H5check_version'H5open' /usr/bin/ld: /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.c:38: undefined reference to
H5check_version'H5open' /usr/bin/ld: /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.c:51: undefined reference to
H5check_version'H5open' /usr/bin/ld: /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.c:52: undefined reference to
H5check_version'H5open' /usr/bin/ld: /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.c:65: undefined reference to
H5check_version'H5open' /usr/bin/ld: /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.c:66: undefined reference to
H5check_version'H5open' /usr/bin/ld: /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.c:79: undefined reference to
H5check_version'H5open' /usr/bin/ld: /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.c:80: undefined reference to
H5check_version'H5open' /usr/bin/ld: /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.c:93: undefined reference to
H5check_version'H5open' /usr/bin/ld: /home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.c:94: undefined reference to
H5check_version'debugger invoked on a CFFI-GROVEL:GROVEL-ERROR in thread #<THREAD tid=10782 "main thread" RUNNING {10013C8073}>: Subprocess #<UIOP/LAUNCH-PROGRAM::PROCESS-INFO {100562CA93}>
with command ("cc" "-o" "/home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel-tmpAAURSO1" "-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now" "-flto=auto" "-g" "-Wl,--export-dynamic" "/home/user/.cache/common-lisp/sbcl-2.3.5-linux-x64/home/user/quicklisp/local-projects/cl-ana/hdf-cffi/src/h5f-grovel__grovel.o")
exited with error code 1
I have just stumbled upon numcl, a clone of numpy, which -- as I understand it -- is the de facto standard for scientific computing, possibly in any language. It looks like this package was released publicly last week.
I haven't yet looked into this, but numpy is the library whose behaviour I have been emulating and attempting to get into cl-ana. I am wondering if there should be a relationship between cl-ana an numcl?
I realize I'm hitting all kinds of edge cases because I'm currently only working with in-memory data sets.
I have introduced bugs into my code a couple times now because I'm forgetting how reusable-table
works. To restate how it works here for benefit of readers: make-reusable-table
takes in a creation-fn
and optionally an opener-form
which tell the table how to recreate the underlying table when it needs to. It then uses a needs-reloading
slot variable to determine when it needs to call these two functions.
The bug I've introduced in my code is when I partially loop through a table, but logically I'm finished with that pass of the table. I think what I should be doing is calling (setf (needs-reloading my-table) t)
, but I am obviously expecting forms like do-table
to do that for me once they exit.
I find myself wondering if a "cursor" concept wouldn't be clearer. E.g. you only ever have 1 representation of the table, but you might create multiple cursors on that table. And do-table
could take in &key (cursor (make-cursor table))
.
Thoughts?
Hi,
Thanks for making this tool.
I find that
(cl-ana::draw #'log)
output a straight line, which is not correct.
Please check the attached pdf.
draw-log.pdf
The #q reader macro, while convenient for the base units, doesn't have support for the derived units.
Hi,
I am getting a number of clashes when trying to load cl-ana - packages MAP, LIST-UTILS clash with both themselves (eg LIST-transpose) and with alexandria (MEAN, Standard-deviation ) . Is this expected behaviour?
Also, I think some thought needs to be made about package names. I already have a package called gnuplot-interface. Trying to load cl-ana, picks up my gnuplot-interface not yours. Using a prefix (cl-ana?) for internal packages less exposed to environmental issues.
Cheers
I can't use cl-ana's version of hdf5-cffi - what's the current issue with upstream? Are you considering forking and maintaining it?
Hi when I tried to load cl-ana on arch linux, using sbcl 1.3.1 and slime I get this error:
on the REPL
(ql:quickload 'cl-ana)
To load "cl-ana":
Load 6 ASDF systems:
alexandria antik cffi gsll iterate split-sequence
Install 8 Quicklisp releases:
bordeaux-threads cl-ana cl-csv cl-fad cl-interpol
cl-unicode closer-mop external-program
; Fetching #<URL "http://beta.quicklisp.org/archive/bordeaux-threads/2016-03-18/bordeaux-threads-v0.8.5.tgz">
; 19.63KB
==================================================
20,105 bytes in 0.05 seconds (417.74KB/sec)
; Fetching #<URL "http://beta.quicklisp.org/archive/cl-fad/2014-12-17/cl-fad-0.7.3.tgz">
; 24.08KB
==================================================
24,658 bytes in 0.04 seconds (573.34KB/sec)
; Fetching #<URL "http://beta.quicklisp.org/archive/external-program/2016-03-18/external-program-20160318-git.tgz">
; 10.02KB
==================================================
10,260 bytes in 0.00 seconds (10019.53KB/sec)
; Fetching #<URL "http://beta.quicklisp.org/archive/cl-unicode/2014-12-17/cl-unicode-0.1.5.tgz">
; 474.62KB
==================================================
486,011 bytes in 0.94 seconds (506.53KB/sec)
; Fetching #<URL "http://beta.quicklisp.org/archive/cl-interpol/2015-12-18/cl-interpol-0.2.5.tgz">
; 44.28KB
==================================================
45,343 bytes in 0.19 seconds (233.05KB/sec)
; Fetching #<URL "http://beta.quicklisp.org/archive/cl-csv/2015-06-08/cl-csv-20150608-git.tgz">
; 19.80KB
==================================================
20,271 bytes in 0.10 seconds (190.35KB/sec)
; Fetching #<URL "http://beta.quicklisp.org/archive/closer-mop/2016-03-18/closer-mop-20160318-git.tgz">
; 21.90KB
==================================================
22,427 bytes in 0.05 seconds (486.70KB/sec)
; Fetching #<URL "http://beta.quicklisp.org/archive/cl-ana/2016-03-18/cl-ana-20160318-git.tgz">
; 479.79KB
==================================================
491,305 bytes in 3.55 seconds (135.04KB/sec)
; Loading "cl-ana"
To load "fare-utils":
Load 1 ASDF system:
asdf
Install 1 Quicklisp release:
fare-utils
; Fetching #<URL "http://beta.quicklisp.org/archive/fare-utils/2015-12-18/fare-utils-20151218-git.tgz">
; 31.51KB
==================================================
32,270 bytes in 0.05 seconds (685.08KB/sec)
; Loading "fare-utils"
..................................................
[package fare-stateful].....
; Loading "cl-ana"
To load "trivial-utf-8":
Install 1 Quicklisp release:
trivial-utf-8
; Fetching #<URL "http://beta.quicklisp.org/archive/trivial-utf-8/2011-10-01/trivial-utf-8-20111001-darcs.tgz">
; 5.91KB
==================================================
6,055 bytes in 0.00 seconds (0.00KB/sec)
; Loading "trivial-utf-8"
[package trivial-utf-8].
; Loading "cl-ana"
[package cl-ana.pathname-utils]...................
[package cl-ana.package-utils]....................
[package cl-ana.functional-utils].................
[package cl-ana.string-utils].....................
[package cl-ana.list-utils].......................
[package cl-ana.generic-math]...; cc -m64 -I/usr/lib/libffi-3.2.1/include -o /home/anquegi/.cache/common-lisp/sbcl-1.3.1-linux-x64/home/anquegi/quicklisp/dists/quicklisp/software/gsll-master-0f785ddd-git/solve-minimize-fit/solver-struct__grovel-tmp2OWI3Q7U -I/home/anquegi/quicklisp/dists/quicklisp/software/cffi_0.17.1/ /home/anquegi/.cache/common-lisp/sbcl-1.3.1-linux-x64/home/anquegi/quicklisp/dists/quicklisp/software/gsll-master-0f785ddd-git/solve-minimize-fit/solver-struct__grovel.c
on the debugger:
Subprocess (:PROCESS #<SB-IMPL::PROCESS :EXITED 1>)
with command ("cc" "-m64" "-I/usr/lib/libffi-3.2.1/include"
"-o"
"/home/anquegi/.cache/common-lisp/sbcl-1.3.1-linux-x64/home/anquegi/quicklisp/dists/quicklisp/software/gsll-master-0f785ddd-git/solve-minimize-fit/solver-struct__grovel-tmp2OWI3Q7U"
"-I/home/anquegi/quicklisp/dists/quicklisp/software/cffi_0.17.1/"
"/home/anquegi/.cache/common-lisp/sbcl-1.3.1-linux-x64/home/anquegi/quicklisp/dists/quicklisp/software/gsll-master-0f785ddd-git/solve-minimize-fit/solver-struct__grovel.c")
exited with error code 1
[Condition of type CFFI-GROVEL:GROVEL-ERROR]
Restarts:
0: [RETRY] Retry PROCESS-OP on #<GROVEL-FILE "gsll" "solve-minimize-fit" "solver-struct">.
1: [ACCEPT] Continue, treating PROCESS-OP on #<GROVEL-FILE "gsll" "solve-minimize-fit" "solver-struct"> as having been successful.
2: [RETRY] Retry ASDF operation.
3: [CLEAR-CONFIGURATION-AND-RETRY] Retry ASDF operation after resetting the configuration.
4: [ABORT] Give up on "cl-ana"
5: [*ABORT] Return to SLIME's top level.
6: [ABORT] abort thread (#<THREAD "repl-thread" RUNNING {1005168033}>)
Backtrace:
0: (CFFI-GROVEL:GROVEL-ERROR "~a" #<UIOP/RUN-PROGRAM:SUBPROCESS-ERROR {1008CC3B93}>)
1: ((LAMBDA NIL :IN CFFI-GROVEL:PROCESS-GROVEL-FILE))
2: (SB-IMPL::%WITH-STANDARD-IO-SYNTAX #<CLOSURE (LAMBDA NIL :IN CFFI-GROVEL:PROCESS-GROVEL-FILE) {1008C9896B}>)
3: ((:METHOD ASDF/ACTION:PERFORM (CFFI-GROVEL::PROCESS-OP CFFI-GROVEL:GROVEL-FILE)) #<CFFI-GROVEL::PROCESS-OP > #<CFFI-GROVEL:GROVEL-FILE "gsll" "solve-minimize-fit" "solver-struct">) [fast-method]
4: ((SB-PCL::EMF ASDF/ACTION:PERFORM) #<unavailable argument> #<unavailable argument> #<CFFI-GROVEL::PROCESS-OP > #<CFFI-GROVEL:GROVEL-FILE "gsll" "solve-minimize-fit" "solver-struct">)
5: ((:METHOD ASDF/ACTION:PERFORM-WITH-RESTARTS :AROUND (T T)) #<CFFI-GROVEL::PROCESS-OP > #<CFFI-GROVEL:GROVEL-FILE "gsll" "solve-minimize-fit" "solver-struct">) [fast-method]
6: ((:METHOD ASDF/PLAN:PERFORM-PLAN (LIST)) ((#1=#<ASDF/LISP-ACTION:PREPARE-OP > . #<ASDF/SYSTEM:SYSTEM #2="cl-ana.pathname-utils">) (#1# . #3=#<ASDF/LISP-ACTION:CL-SOURCE-FILE #2# "package">) (#4=#<ASDF..
7: ((FLET SB-C::WITH-IT :IN SB-C::%WITH-COMPILATION-UNIT))
8: ((:METHOD ASDF/PLAN:PERFORM-PLAN :AROUND (T)) ((#1=#<ASDF/LISP-ACTION:PREPARE-OP > . #<ASDF/SYSTEM:SYSTEM #2="cl-ana.pathname-utils">) (#1# . #3=#<ASDF/LISP-ACTION:CL-SOURCE-FILE #2# "package">) (#4=#..
9: ((FLET SB-C::WITH-IT :IN SB-C::%WITH-COMPILATION-UNIT))
10: ((:METHOD ASDF/PLAN:PERFORM-PLAN :AROUND (T)) #<ASDF/PLAN:SEQUENTIAL-PLAN {100899E013}> :VERBOSE NIL) [fast-method]
11: ((:METHOD ASDF/OPERATE:OPERATE (ASDF/OPERATION:OPERATION ASDF/COMPONENT:COMPONENT)) #<ASDF/LISP-ACTION:LOAD-OP :VERBOSE NIL> #<ASDF/SYSTEM:SYSTEM "cl-ana"> :VERBOSE NIL) [fast-method]
12: ((SB-PCL::EMF ASDF/OPERATE:OPERATE) #<unused argument> #<unused argument> #<ASDF/LISP-ACTION:LOAD-OP :VERBOSE NIL> #<ASDF/SYSTEM:SYSTEM "cl-ana"> :VERBOSE NIL)
13: ((LAMBDA NIL :IN ASDF/OPERATE:OPERATE))
14: ((:METHOD ASDF/OPERATE:OPERATE :AROUND (T T)) #<ASDF/LISP-ACTION:LOAD-OP :VERBOSE NIL> #<ASDF/SYSTEM:SYSTEM "cl-ana"> :VERBOSE NIL) [fast-method]
15: ((SB-PCL::EMF ASDF/OPERATE:OPERATE) #<unused argument> #<unused argument> ASDF/LISP-ACTION:LOAD-OP "cl-ana" :VERBOSE NIL)
16: ((LAMBDA NIL :IN ASDF/OPERATE:OPERATE))
17: (ASDF/CACHE:CALL-WITH-ASDF-CACHE #<CLOSURE (LAMBDA NIL :IN ASDF/OPERATE:OPERATE) {100898D65B}> :OVERRIDE NIL :KEY NIL)
18: ((:METHOD ASDF/OPERATE:OPERATE :AROUND (T T)) ASDF/LISP-ACTION:LOAD-OP "cl-ana" :VERBOSE NIL) [fast-method]
19: ((:METHOD ASDF/OPERATE:OPERATE :AROUND (T T)) ASDF/LISP-ACTION:LOAD-OP "cl-ana" :VERBOSE NIL) [fast-method]
20: (ASDF/OPERATE:LOAD-SYSTEM "cl-ana" :VERBOSE NIL)
21: (QUICKLISP-CLIENT::CALL-WITH-MACROEXPAND-PROGRESS #<CLOSURE (LAMBDA NIL :IN QUICKLISP-CLIENT::APPLY-LOAD-STRATEGY) {100897D58B}>)
22: (QUICKLISP-CLIENT::AUTOLOAD-SYSTEM-AND-DEPENDENCIES "cl-ana" :PROMPT NIL)
23: ((:METHOD QL-IMPL-UTIL::%CALL-WITH-QUIET-COMPILATION (T T)) #<unavailable argument> #<CLOSURE (FLET QUICKLISP-CLIENT::QL :IN QUICKLISP-CLIENT:QUICKLOAD) {10090A0B5B}>) [fast-method]
24: ((:METHOD QL-IMPL-UTIL::%CALL-WITH-QUIET-COMPILATION :AROUND (QL-IMPL:SBCL T)) #<QL-IMPL:SBCL {100676FED3}> #<CLOSURE (FLET QUICKLISP-CLIENT::QL :IN QUICKLISP-CLIENT:QUICKLOAD) {10090A0B5B}>) [fast-me..
25: ((:METHOD QUICKLISP-CLIENT:QUICKLOAD (T)) #<unavailable argument> :PROMPT NIL :SILENT NIL :VERBOSE NIL) [fast-method]
26: (QL-DIST::CALL-WITH-CONSISTENT-DISTS #<CLOSURE (LAMBDA NIL :IN QUICKLISP-CLIENT:QUICKLOAD) {1009075B2B}>)
27: (SB-INT:SIMPLE-EVAL-IN-LEXENV (QUICKLISP-CLIENT:QUICKLOAD (QUOTE CL-ANA)) #<NULL-LEXENV>)
28: (EVAL (QUICKLISP-CLIENT:QUICKLOAD (QUOTE CL-ANA)))
29: (SWANK::%EVAL-REGION "(ql:quickload 'cl-ana) ..)
30: ((LAMBDA NIL :IN SWANK::%LISTENER-EVAL))
31: (SWANK-REPL::TRACK-PACKAGE #<CLOSURE (LAMBDA NIL :IN SWANK::%LISTENER-EVAL) {100907595B}>)
32: (SWANK::CALL-WITH-BUFFER-SYNTAX NIL #<CLOSURE (LAMBDA NIL :IN SWANK::%LISTENER-EVAL) {100907593B}>)
33: (SWANK::%LISTENER-EVAL "(ql:quickload 'cl-ana) ..)
34: (SB-INT:SIMPLE-EVAL-IN-LEXENV (SWANK-REPL:LISTENER-EVAL "(ql:quickload 'cl-ana) ..)
35: (EVAL (SWANK-REPL:LISTENER-EVAL "(ql:quickload 'cl-ana) ..)
36: (SWANK:EVAL-FOR-EMACS (SWANK-REPL:LISTENER-EVAL "(ql:quickload 'cl-ana) ..)
37: (SWANK::PROCESS-REQUESTS NIL)
38: ((LAMBDA NIL :IN SWANK::HANDLE-REQUESTS))
39: ((LAMBDA NIL :IN SWANK::HANDLE-REQUESTS))
40: (SWANK/SBCL::CALL-WITH-BREAK-HOOK #<FUNCTION SWANK:SWANK-DEBUGGER-HOOK> #<CLOSURE (LAMBDA NIL :IN SWANK::HANDLE-REQUESTS) {100517000B}>)
41: ((FLET SWANK/BACKEND:CALL-WITH-DEBUGGER-HOOK :IN "/home/anquegi/.roswell/impls/ALL/ALL/quicklisp/dists/quicklisp/software/slime-v2.17/swank/sbcl.lisp") #<FUNCTION SWANK:SWANK-DEBUGGER-HOOK> #<CLOSURE ..
42: (SWANK::CALL-WITH-BINDINGS ((*STANDARD-INPUT* . #1=#<SWANK/GRAY::SLIME-INPUT-STREAM {100507E493}>) (*STANDARD-OUTPUT* . #2=#<SWANK/GRAY::SLIME-OUTPUT-STREAM {100514E423}>) (*TRACE-OUTPUT* . #2#) (*ERR..
43: (SWANK::HANDLE-REQUESTS #<SWANK::MULTITHREADED-CONNECTION {1004148F83}> NIL)
44: ((FLET #:WITHOUT-INTERRUPTS-BODY-1156 :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE))
45: ((FLET SB-THREAD::WITH-MUTEX-THUNK :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE))
46: ((FLET #:WITHOUT-INTERRUPTS-BODY-359 :IN SB-THREAD::CALL-WITH-MUTEX))
47: (SB-THREAD::CALL-WITH-MUTEX #<CLOSURE (FLET SB-THREAD::WITH-MUTEX-THUNK :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE) {7FFFF0A8ED5B}> #<SB-THREAD:MUTEX "thread result lock" owner: #<SB-THREAD:THR..
48: (SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE #<SB-THREAD:THREAD "repl-thread" RUNNING {1005168033}> NIL #<CLOSURE (LAMBDA NIL :IN SWANK-REPL::SPAWN-REPL-THREAD) {1005167F9B}> (#<SB-THREAD:THREAD "re..
49: ("foreign function: call_into_lisp")
50: ("foreign function: new_thread_trampoline")
I also get this condition with gsll
[Condition of type CFFI-GROVEL:GROVEL-ERROR]
and when I try to execute the c compiler on
2016-03-25 13:52:53 ☆ manjaro-pc in ~
○ → cc -m64 -I/usr/lib/libffi-3.2.1/include -o /home/anquegi/.cache/common-lisp/sbcl-1.3.1-linux-x64/home/anquegi/quicklisp/dists/quicklisp/software/gsll-master-0f785ddd-git/solve-minimize-fit/solver-struct__grovel-tmp2OWI3Q7U -I/home/anquegi/quicklisp/dists/quicklisp/software/cffi_0.17.1/ /home/anquegi/.cache/common-lisp/sbcl-1.3.1-linux-x64/home/anquegi/quicklisp/dists/quicklisp/software/gsll-master-0f785ddd-git/solve-minimize-fit/solver-struct__grovel.c
In file included from /home/anquegi/.cache/common-lisp/sbcl-1.3.1-linux-x64/home/anquegi/quicklisp/dists/quicklisp/software/gsll-master-0f785ddd-git/solve-minimize-fit/solver-struct__grovel.c:10:0:
/home/anquegi/.cache/common-lisp/sbcl-1.3.1-linux-x64/home/anquegi/quicklisp/dists/quicklisp/software/gsll-master-0f785ddd-git/solve-minimize-fit/solver-struct__grovel.c: In function ‘main’:
/home/anquegi/quicklisp/dists/quicklisp/software/cffi_0.17.1/grovel/common.h:8:62: error: ‘gsl_multifit_fdfsolver {aka struct <anonymous>}’ has no member named ‘J’
#define offsetof(type, slot) ((long) ((char *) &(((type *) 0)->slot)))
^
/home/anquegi/.cache/common-lisp/sbcl-1.3.1-linux-x64/home/anquegi/quicklisp/dists/quicklisp/software/gsll-master-0f785ddd-git/solve-minimize-fit/solver-struct__grovel.c:32:56: note: in expansion of macro ‘offsetof’
fprintf(output, " :offset %lli)", (long long signed) offsetof(gsl_multifit_fdfsolver, J));
I have no idea of what happens ^
The Wiki doesn't seem up to date with the code, and since GitHub doesn't support pull requests on Wiki pages, all I can do is point out what's not working.
The default terminal type for gnuplot is qt
, not wxt
as claimed. If you use a gnuplot
without qt
support compiled in, you get neither a plot nor an error message. That took me a lot of time to figure out.
The second example, (draw (plot2d (lines #'sin #'cos)))
, fails because lines
is cl-ana.string-utils:lines
, which expects string arguments. I haven't found anything shorter than (draw (plot2d (list (line #'sin) (line #'cos))))
to get the desired result. The lines
function is also mentioned under "Plotting structure", but doesn't seem to exist any more.
Basic file closing operations no longer work with version 1.10, while 1.8 and below still seem to work.
Possibly replace cl-ana's HDF5 code with https://github.com/HDFGroup/hdf5-cffi in the future
I am unfamiliar with gnuplot. I was able to get plots by specifying the :file
keyword in the cl-ana.plotting
package, but the cl-ana.table-viewing
package doesn't allow users to pass this through. So:
T
, so I think something is happening, but I'm unsure what.table-viewing
package to pass through plotting
keywords? This seems like a high amount of coupling. Instead, perhaps we could pass in a closure in which is responsible for the plotting, and takes a limited set of arguments. This way, the caller may create the closure in the stack-frame above, utilizing all the plotting
arguments they need, and then pass it into table-viewing
functions. This keeps table-viewing
and plotting
decoupled, but still working together.I am hoping to use CL-ANA on CCL on Windows. I replaced several calls of SB-EXT:DELETE-DIRECTORY with UIOP:DELETE-DIRECTORY-TREE
Hey all! I've spoken with @ghollisjr, and we would like to name a place users can go for live help/discussion of cl-ana. I thought I would open an issue to solicit feedback so we can get something going.
Here are the options I'm aware of:
My personal recommendation and preference would be Matrix. I used to be a die-hard IRC user (and I also still idle there using Matrix's IRC bridge), but Matrix is really nice for a couple reasons:
EDIT: I forgot about Discord.
Currently, if an empty string is passed in, an END-OF-FILE
condition will be raised. Instead, I think this function should return NIL
for empty strings, to facilitate working with data with nullable fields.
http://report.quicklisp.org/2015-09-23/failure-report/cl-ana.html has a log. I'm getting this:
Unhandled SB-KERNEL:SIMPLE-PACKAGE-ERROR in thread #<SB-THREAD:THREAD "main thread" RUNNING {10040956C3}>: The name "CL-ANA.PATHNAME-UTILS" does not designate any package.
That's affecting:
It is a common idiom in Common Lisp for predicates to end in p
as in listp
or -p
as in hash-table-p
. I was reviewing some of the makeres
code, and noticed a bunch of predicates ending in ?
, e.g. ltab?
. This is more of a scheme idiom than a Common Lisp idiom.
Someone once gave this advice to me, and now I pass it onto you. Please consider changing this code to fit Common Lisp idioms.
http://report.quicklisp.org/cl-ana/2014-10-16/failtail.txt has a log.
I'm getting several failures of the form:
Error finding package for symbol "USE-GMATH":
The name "GENERIC-MATH" does not designate any package.
The root cause is because this does not round-trip properly: (cl-ana.symbol-utils:keywordify (cl-ana.string-utils:lispify :|foo-bar|))
. When open-plist-table
generates the field-names
, it calls lispify
which upper-cases the field name. Later, when calling table-field-symbols
, the upper-case symbol is produced rather than the case-sensitive symbol.
This causes issues when working with plists with case-sensitive field names since after you create the plist table, if you are loop
ing over the field symbols and calling (field-values table field-symbol)
, it won't be able to find the field.
In slot-names you implement the same function twice using implementation specific packages with features flags. You could instead use the implementation-compatibility layer closer-mop, for example:
(defun slot-names (obj)
"Returns the list of slot symbols for a structure/CLOS class
instance."
(loop
for slot in (c2mop:class-slots (class-of obj))
collect (c2mop:slot-definition-name slot))
I am using cl-ana.generic-math:/
with two vectors which have a capacity
which is larger than their fill-pointer
, i.e.: (make-array 20700 :fill-pointer 0)
. When type-of
is called here to be passed along as type
, it returns something like (vector t 20700)
. Then, further in the call chain, make-sequence
is called with a type restriction and a length that disagree with one another: (make-sequence (vector t 20700) 20640 :initial-element 0.0d0))
. This raises the condition:
The length requested (20640) does not match the type restriction in (VECTOR
T
20700).
In my case, this arises because I'm increasing a vector's capacity in increments of 100 as I populate it. I can correct this by only monotonically incrementing the capacity, but that will be less performant.
In order to ensure that units are always expanded properly, all quantity methods have to call quantity on their arguments explicitly.
http://report.quicklisp.org/2015-12-07/failure-report/cl-ana.html#cl-ana has a full log.
It looks like cl-ana references SB-POSIX symbols without explicitly depending on sb-posix.
Say I have the following table:
A | B | C | D |
---|---|---|---|
1 | 2 | 3 | 4 |
2 | 4 | 6 | 8 |
4 | 8 | 12 | 16 |
There are several operations we may want to perform on this table in a columnar fashion:
(cl-ana.statistics:standard-deviation (cl-ana.table:table-column "A"))
I'm still learning cl-ana, so I'm unsure if there's already an established way to perform these types of operations. Is there?
If not, what might we want this type of access pattern to look like? Currently, I'm thinking a (defun pivot (table create-table-fn) ...)
function. This would transform the table into a columnar table (in-memory or not, depending on what type of table the closure is creating) which would make it easy to get all the values in a column by getting a single row (which, while we're here, is there already a way to get a specific row of data?).
So this would make the above table look something like this:
col_name | row_1 | row_2 | row_3 |
---|---|---|---|
A | 1 | 2 | 4 |
B | 2 | 4 | 8 |
C | 3 | 6 | 12 |
D | 4 | 8 | 16 |
What does this mean for cl-ana's underlying table representation?
I Know that this is not problem about this library but I cannot load this library, and I do not get any response from here and here Since last time I get good help from you I will also ask from some solution to this problem:
I have updated quicklisp:
CL-USER> (ql:update-all-dists)
1 dist to check.
Downloading http://beta.quicklisp.org/dist/quicklisp.txt
##########################################################################
You already have the latest version of "quicklisp": 2017-01-24.
NIL
Then when I try to load cl-ana
CL-USER> (ql:quickload :cl-ana)
To load "cl-ana":
Load 1 ASDF system:
cl-ana
; Loading "cl-ana"
..................................................
[package cl-ana.pathname-utils]...................
[package cl-ana.package-utils]....................
[package cl-ana.functional-utils].................
[package cl-ana.string-utils].....................
[package cl-ana.list-utils].......................
[package cl-ana.generic-math].....................
[package metabang.bind]...........................
[package metabang.bind.developer].................
[package editor-hints.named-readtables]...........
[package editor-hints.named-readtables]...........
[package antik]...................................
[package grid]....................................
[package affi].....................
;
; caught ERROR:
; READ error during COMPILE-FILE:
;
; Symbol "NUMBER-EQUAL" not found in the LISP-UNIT package.
;
; Line: 25, Column: 40, File-Position: 993
;
; Stream: #<SB-INT:FORM-TRACKING-STREAM for "file /home/anquegi/.roswell/lisp/quicklisp/dists/quicklisp/software/antik-master-ad6432e3-git/grid/tests/augment.lisp" {1001D24143}>
;
; compilation unit aborted
; caught 3 fatal ERROR conditions
; caught 1 ERROR condition
; Evaluat(ion aborted on #<UIOP/LISP-BUILD:COMPILE-FILE-ERROR {1001D44793}>.
I can load lisp-unit, but the symbol number-equal does not exist:
CL-USER> (ql:quickload :lisp-unit)
To load "lisp-unit":
Load 1 ASDF system:
lisp-unit
; Loading "lisp-unit"
(:LISP-UNIT)
CL-USER> (describe 'lisp-unit:number-equal)
; Evaluation aborted on #<SB-INT:SIMPLE-READER-PACKAGE-ERROR "Symbol ~S not found in the ~A package." {100F1FBCE3}>.
I do not know how to proceed.
Thanks in advance for your help
Thank you for developing cl-ana package.
I am on Debian GNU/Linux. For the required package: HDF5 and GSL, could I just install from Debian package manager the following packages:
hdf5-tools
gsl-bin
Are they enough for using cl-ana?
Hello
I have been using cl-ana to create histograms and barcharts. With the last update from quicklisp the resulting terminal(wxt, qt) or file(pdf, png, or svg) is blank. The behaviour is similar to what happens with a gnuplot session that is expecting more input.
I am using gnuplot 4.6 patchlevel 4 on ubuntu 14.04
For example:
(cl-ana.plotting:draw #'sin)
creates a blank qt terminal
Thanks very much.
Matthew
Here is the smallest reproducible test case: (cl-ana.generic-math:add 1 nil)
. This will never return. My guess is that cl-ana is swallowing a condition somewhere in a loop.
A higher order issue may be how this edge-cases is even entered. I was calling cl-ana.statistics:mean
passing a vector with nil
s interspersed throughout.
More of a question than an issue. I would like to create categorical barcharts using the cl-ana gnuplot interface. I have tried a few approaches to get xticlabels to work in the :line-args slots (see below). I wanted to check if I there is a way to do this prior to trying to find my way towards to adding this option.
I would like to replicate this gnuplot code
plot 'barplot.dat' using 2:xticlabels(1) with boxes
with cl-ana.plotting:draw
(cl-ana.plotting:draw '(("blue" . 3)
("red" . 2)
("green" . 1))
:line-args '(:style "boxes"
:line-options "2:xticlabels(1)")
:plot-args '(:x-range (0 . 4)
:y-range (0 . 4))
:page-args (list :terminal (cl-ana.plotting:wxt-term :size (cons 800 600))))
Thanks for a great package
Hi,
I'm trying to run create simple project that utilizes ltab
functionality. I'm following sample code from cl-ana/makeres-table/tests/tabletrans-test.lisp which I simplified like so:
(defpackage satur
(:use :cl :cl-ana :cl-ana.makeres :cl-ana.plotting :cl-ana.makeres-progress :cl-ana.table-utils))
(in-package :cl-ana)
(defproject example "/home/example/Sources/exampleproj/"
;; progresstrans prints progress so you can see how a computation unfolds
(list #'progresstrans #'macrotrans #'tabletrans)
(fixed-cache 5))
(defres source
(srctab (plist-opener '((:x 1)
(:x 2)
(:x 3)))))
(defres filtered
(ltab (res source)
()
(when (< (field x) 4)
;; you only have to add new fields, all source
;; fields not shadowed are still available:
(push-fields
;; new field y, x is still accessible, unshadowed
(y (* 2 (field x)))))))
(defres canon
(tab (res filtered)
()
(hdf-opener "/tmp/canon.h5"
'(("X" . :int)
("Y" . :float)
("Z" . :float)))
(push-fields (x (field x))
(y (sqrt (field y)))
(z (float
(expt (field y)
2))))))
When I compile/load the project and invoke (makeres)
in REPL, I get following error which I'm unable (or don't know how) to debug any further.
There is no applicable method for the generic function
#<STANDARD-GENERIC-FUNCTION CL-ANA.TABLE:TABLE-LOAD-NEXT-ROW (7)>
when called with arguments
(NIL).
[Condition of type SB-PCL::NO-APPLICABLE-METHOD-ERROR]
Restarts:
0: [RETRY] Retry calling the generic function.
1: [RETRY] Retry SLY mREPL evaluation request.
2: [*ABORT] Return to SLY's top level.
3: [ABORT] abort thread (#<THREAD "sly-channel-1-mrepl-remote-1" RUNNING {1007D16073}>)
Backtrace:
0: ((:METHOD NO-APPLICABLE-METHOD (T)) #<STANDARD-GENERIC-FUNCTION CL-ANA.TABLE:TABLE-LOAD-NEXT-ROW (7)> NIL) [fast-method]
1: (SB-PCL::CALL-NO-APPLICABLE-METHOD #<STANDARD-GENERIC-FUNCTION CL-ANA.TABLE:TABLE-LOAD-NEXT-ROW (7)> (NIL))
2: (CL-ANA.MAKERES::COMPSEQFN
[No Locals])
3: ((LAMBDA (&REST CL-ANA.MAKERES::ARGS) :IN CL-ANA.MAKERES::COMPRES
Locals:
CL-ANA.MAKERES::FN = #<FUNCTION CL-ANA.MAKERES::COMPSEQFN>
#:LOOP-LIST-2 = NIL
CL-ANA.MAKERES::TARGET-FNS = (#<FUNCTION CL-ANA.MAKERES::COMPSEQFN {531EE45B}> #<FUNCTION CL-ANA.MAKERES::COMPSEQFN {531EE64B}> #<FUNCTION CL-ANA.MAKERES::COMPSEQFN>)))
4: (MAKERES
Locals:
ARGS = NIL
SB-C::THING = #<CLOSURE (LAMBDA (&REST CL-ANA.MAKERES::ARGS) :IN CL-ANA.MAKERES::COMPRES) {10058CAC9B}>)
5: (SB-INT:SIMPLE-EVAL-IN-LEXENV (MAKERES) #<NULL-LEXENV>)
6: (EVAL (MAKERES))
The same error happens when I'm trying to use the result of ltab
in dotab
operator like so:
(defres (filtered sum)
(dotab (res filtered)
((sum 0))
sum
(incf sum (field y))))
The issue manifests both on the current master - f616c5c and on fa7cee4 . I use Guix-packaged distribution of cl-ana sources and SBCL 2.0.3.
Any help would be greatly appreciated.
I am preparing to propose some functionality I have copied from numpy, and pandas, but I'm unsure which packages they belong in. I believe this may be new areas of functionality for cl-ana, and I need the advice of someone much more familiar with cl-ana, data science, and machine learning, than I currently am.
From afar (I have yet to sit down and fully process it), DOP appears to be great at minimizing the passes over data to get the results the user has declared. However, in theory, a user could be exploring data in such a way that a columnar view of the data might minimize the number of passes. In such a case, a user may not know enough about the data to write all the declarative cases up front, and so DOP might do a minimal number of passes, locally, but not globally, as the user adds declarations. A columnar table would transpose the table so that any column-oriented operations would be globally minimal as the user thinks of new ways to poke at the data.
Some such operations I've brought over from pandas are: summarize
(populated counts, and types, for all fields), value-counts
(counts distinct values for a field), correlation-matrix
(creates a matrix of the correlation coefficient between all columns). There are other, useful, summarizing functions we can take from pandas.
The thing these functions seem to have in common is that they summarize all fields at a high level to allow users to get a "feel" for the data before doing proper analysis. From what I gather, most users expect these operations to be very fast.
Should these live in cl-ana.summarization
?
Several of the summarization operations would best be done on tables that are in-memory when possible. I think there are probably quite a few data sets out there that could be held in memory. We currently have plist-table
, and that might be good enough. However, we might be able to come up with a more performant version based on multi-dimensional arrays, and have the customary current-row
accessor simply be a tuple of (row col)
. This might be much faster, and still easy to understand. I'm not sure if this is warranted yet, but it's an idea.
I haven't yet written any functions to mirror pandas functions. From what I understand, the step prior to training ML models is coercing the data into a shape, and corpus, that is conducive to the ML model you'd like to use. This involves dropping columns, transforming values of columns from strings to numeric values, etc. I don't think this is actual machine learning, so I think it might be a good fit for living in cl-ana.
Should these live in cl-ana.transform
?
Tensorflow has a namespace for Keras which mirrors the Keras project's API, but all of the operations are in terms of Tensorflow primitives. This allows communities which were very familiar with Keras to work seamlessly with Tensorflow. It would be a bit strange to follow suit since we would be transcending the language barrier as well, but would we want a facade on cl-ana operations in terms of popular data science libraries? E.g. cl-ana.pandas
?
cl-ana has cl-ana.statistical-learning
, but it is not my impression that it is trying to be a machine learning library unto itself. But what are its boundaries? And how could it best interoperate with other ecosystems?
My current understanding is that using machine learning involves ~5 stages:
I was planning on using Common Lisp, and cl-ana for steps 1-3, and then feeding the data to other ecosystems after that, e.g. Tensorflow.
When planning out the packages and functionality cl-ana, it might be helpful to have a clear idea how cl-ana might interoperate with other tooling.
EDIT:
I forgot this! I did some analysis of some popular machine learning packages to see what types of namespaces they expose. It may be helpful to consider the shape of these other packages when considering where to place things in cl-ana.
scikit-learning
keras
tensorflow
pytorch
Hi Gary,
I have existing netcdf4 files that I am trying to read into CL-ana using hdf5 table. These files are climatology data.
its not clear to me how what I am doing wrong so hopefully you can can comment on this.
As an example here is my code
(defparameter hdf5-file (open-hdf-file "/Users/dbh/lisp/cl-netcdf/test-data/gis4.nc" :direction :input))
;; returns handle 16777216
(defparameter temp-table (open-hdf-table hdf5 "/tempanomaly"))
; Evaluation aborted on #<SIMPLE-ERROR "Non compound type given to typespec->field-names" {1011EABB73}>.
I can use h5dump to inspect the contents of the file and its valid. So I am missing some step, just not sure what.
Any advice appreciated
The HDF5 functions don't seem to currently support string datatypes.
+H5T-STRING
class is not being checked in hdf-type->typespec
in hdf-typespec
, preventing loading anything with a string
field at all.A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.