philipce / nifty Goto Github PK

View Code? Open in Web Editor NEW

106.0 106.0 17.0 9.89 MB

Numerical computing in Swift – for Linux and macOS

License: Apache License 2.0

Swift 100.00%

algorithms computational-mathematics linear-algebra numerical-calculations swift

nifty's People

Contributors

Stargazers

Watchers

Forkers

adamduracz nbertagnolli felix91gr jasonm128 wme7 macbellingrath zachaysan crcrpar matrixy sbooker audonegianluca91 computational-mathematics-research porterchild luckyclan mingchungx

nifty's Issues

randn is now failing

It appears that commit 5574e0b
makes it so that travis thinks that randn is broken even though that test is unimplemented.

Use Float32, 64 and 80’s “ulp” Property for Comparison

In the “isEqual” function, tolerance is a parameter of the function. We can actually do better, using “ulp”.

They talked a bit about this in the 34th episode of Swift Unwrapped. In the shownotes there are two articles written by Jesse Squires where he talks about this in more detail.

What “ulp” is, in a nutshell, is a floating point value that shows you the next representable number at that level of precision. This will allows us to fine-tune any check that looks for floating point equality — not only looking for exact word equality, but rather by looking +- ulp (or wider, as needed) around the numbers.

Unable to build for Xcode

The installation instructions tell me to use the included project file if I'm using Xcode, but it doesn't say which project file, so I'm assuming I should clone the project, build and use the Nifty.framework.

So I tried that, but I get the following errors:

<unknown>:0: error: no such file or directory: '/Users/rodrigoruiz/Downloads/Nifty/Sources/_fft.swift'
<unknown>:0: error: no such file or directory: '/Users/rodrigoruiz/Downloads/Nifty/Sources/_cumsum.swift'
<unknown>:0: error: no such file or directory: '/Users/rodrigoruiz/Downloads/Nifty/Sources/_sort.swift'
<unknown>:0: error: no such file or directory: '/Users/rodrigoruiz/Downloads/Nifty/Sources/find.swift'
<unknown>:0: error: no such file or directory: '/Users/rodrigoruiz/Downloads/Nifty/Sources/interp1.swift'
<unknown>:0: error: no such file or directory: '/Users/rodrigoruiz/Downloads/Nifty/Sources/_poly.swift'
<unknown>:0: error: no such file or directory: '/Users/rodrigoruiz/Downloads/Nifty/Sources/_cumprod.swift'
<unknown>:0: error: no such file or directory: '/Users/rodrigoruiz/Downloads/Nifty/Sources/_repmat.swift'
<unknown>:0: error: no such file or directory: '/Users/rodrigoruiz/Downloads/Nifty/Sources/TimeFrame.swift'
<unknown>:0: error: no such file or directory: '/Users/rodrigoruiz/Downloads/Nifty/Sources/_linspace.swift'
<unknown>:0: error: no such file or directory: '/Users/rodrigoruiz/Downloads/Nifty/Sources/_shuffle.swift'
<unknown>:0: error: no such file or directory: '/Users/rodrigoruiz/Downloads/Nifty/Sources/_ifft.swift'
<unknown>:0: error: no such file or directory: '/Users/rodrigoruiz/Downloads/Nifty/Sources/TimeSeries.swift'
<unknown>:0: error: no such file or directory: '/Users/rodrigoruiz/Downloads/Nifty/Sources/DataFrame.swift'
<unknown>:0: error: no such file or directory: '/Users/rodrigoruiz/Downloads/Nifty/Sources/DataSeries.swift'
<unknown>:0: error: no such file or directory: '/Users/rodrigoruiz/Downloads/Nifty/Sources/_meshgrid.swift'
Command /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/swiftc failed with exit code 1

Homebrew dupes/science deprecated

In this repository and in the documentation
brew install homebrew/dupes/lapack homebrew/science/openblas
should now be
brew install homebrew/homebrew-core/lapack homebrew/homebrew-core/openblas

Demo and README do not include DataSeries/Frame functions

Error logging

Thinking back to our discussion about precondition vs throwing... maybe a there's a better option than both. What if nifty provides a global logging object (similar to python logging) that allows for logging levels debug, info, warning, error, critical. These print the message to the console, to a file, etc depending on what the user has set. They could also have a handler, e.g. the user can pass a closure to execute on occurrence of a particular event. By default, the critical handler is print the message, log to file, then fatalError(). Then, instead of precondition, our errors just use the logger, e.g. log.critical("Matrix is not invertible"). In our tests then we can override the default handler to set a flag so we can effectively test for precondition... Thoughts?

I found a nice testing framework for upgrading our unit tests :D

I recently listened to episode 54 of the Swift Coders Podcast, with Robert Widmann, a Core Team intern. Robert (aka CodaFi) recommended Property Testing, which is like an upgraded form of Unit Testing.
The idea is to automatically create random instances for testing, instead of making them by hand. You can also restrict how they are made, of course. He recommended also SwiftCheck, which is basically Property Testing for Swift... and maintained by him 😄

What do you think of it? After playing with the refactoring of the Vector, Matrix and Tensor structs, I'll make some tests with this and see how it goes. I think it could be marvelous to find small bugs in the implementation, that would usually go under the radar with the unit tests :)

Logo Design

Hi. I am a graphic designer. I volunteer to design a logo for open source projects. I can design it for you to use it in the readme file. What dou you say?

Mean Vector Should Use Apple's Accelerate Framework when applicable

Apple has mean calculation for vectors in Accelerate. We should use this for a potential speedup. https://developer.apple.com/reference/accelerate/1449980-vdsp_meanv?language=objc

Cap or Finish the Tensor Refactor for 1.0

To finish the idea we started on #24.

I tried to make that Refactor work, and I think it may be close to finished. Tha said, it has some weird behavior and introduces strange complexities with the Associated Types that come in the Tensor Protocol.

Maybe the Generics on Swift aren’t ready for such a refactor to be worth its weight yet. Maybe this code duplication is better than pure DRY since it’s much easier to read and to predict its behavior.

I don’t know. We can always refactor it later, specially if we have good tests (which I’m formalizing in another issue).

What do you think?

pandas clone features

This is the start of a list of the basic, most essential features from pandas that we want to include in the nifty series/frame:

Near-term:

full set of getters on series: get n before/after (series position), n greater/less (index magnitude), by loc (spot in underlying array)
make present() that returns locs public
allow setting (by both index and loc--make sure to verify)
missing() function (returns indices of all that are missing)
dropMissing() (remove rows with nil elements)
dropInvalid() (specify conditions to drop rows, e.g. row values out of bounds. Maybe this is a dictionary of column names and closures (Any) -> Bool that say whether to drop.
return series as list of tuples: [(index: Double, value: T)], e.g. s.asTuples
return series as lists: (index: [Double], values: [T]), e.g. s.asLists
return series as dict: [Double: T], e.g. s.asDict
return frame as list of series: [Series], e.g. f.asSeries
return frame as tuples: [(index: Double, values: [Any])], e.g. f.asTuples
return frame as list: (index: [Double], values: [[Any]]), e.g. f.asLists
return frame as dict: [String: [Any]], e.g. f.asDict
copy for frame and series (is this useful since they're structs?)
have frame keep a list of types, one for each series; provide public access to types
read from csv
write to csv

Long-term:

time as well as double index types
decimation
standard, hardcoded rule sets for how data is summarized (e.g. if there are units attached, what gets summed, what gets averaged)
init/add from array(s) of values and indexes
init/add to frame from dict
head and tail
get frame index/data as matrix (for homogeneous frame or selected homogeneous rows)
init frame/series from nifty matrix, vector
describe()
sort by axis/colmns
select a single/multiple (e.g range of, enumerated) columns
slice rows (basic done for series, need to extend and add to frame)
boolean indexing
mean, median, mode, std, var, min, max, sum, prod, cumsum, cumprod for each column
histogram?
combining, removing from, adding to series/frames
merge/join
write to vcd

General Discussion

@felix91gr

Glad to see you made it through exams and had some time to travel!

Sorry I've been absent. I've been making some big life changes--moving house, jobs, and other great things. I'll be getting back to Nifty in the next week or two for real. Here are the things I'm planning to work on (in no particular order):

Evaluate and possibly adopt some of the things added in Swift 4. Things that come to mind as potentially sweet: one sided ranges, Codable (as you noted earlier), generic subscripts, package manager update, numeric protocol, etc...
Finish the functions slated for v1.0 release. I think these are mostly minor things left, like some LAPACK wrappers and linspace and such. This will also involve ensuring that each function has a complete implementation (e.g. works on matrix, tensor, and vector--which in many cases will leverage your work on the tensor protocol).
Finish the basic Series implementation
Figure out the best interface to the optimization module and clean it up (there's an optimizer called SOMA that I'm using for some stuff, so I need him. I pushed a bunch of genetic algorithm stuff that is mostly junk so I won't bother merging that in v1)
Work on reading/writing matrices and variables to file. This ties into our previous discussion of codable and stuff. I'd also like to make a convenient interface for writing an entire environment to a file to make it easy to export/share variables and stuff (something like a .mat file in matlab). This probably won't be in v1.
I'd like to put together and publish a docker image that makes Nifty dead simple to get going on
Documentation -- I'd like to revisit the jazzy docs. See if they've fixed some of the bugs. If not, I may write some scripts to reformat our comments and stuff. Overall, this is just a fleshing out of the docs and clean up.
Sample projects. There's a demo project that uses some of the stuff, but not in a really compelling way. What I think I'd like to do is leave the demo code as snippets in a README, but then get rid of the lame demo project and replace it with something small but meaningful--some kind of simple machine learning thing springs to mind
I'll continue to add dumb tests like I have been for the above. But once we get a better framework figured out I'd like to start making better tests.

When all that's done, I'd like to make a real v1.0 release (I know, I know, there's already a 1.0--I have to delete that and do like a 0.9 or something while we finish up) and then I'd like to solicit some support for the project--post to the swift-users list, on reddit, etc. But before that, I'd like to get it to a more consumable state, which I think is covered between what you've talked about and the stuff I just mentioned.

Anyway, happy coding! Anything I can do to help you, just ask! I'm by no means an expert on tensors and such but I'm happy to chat/skype/whatever and discuss whatever--this is a learning experience for me too!

Tensor CSV representation

Not a pressing issue but I was just thinking that my original thoughts on how to save these data structures to file are a little off. Instead of saving values to file such that shape can be inferred (e.g. columns are separated by commas, rows by newlines, pages by semicolons, etc.), we should just be explicit about the shape, then store everything in row major order. So some format like:

tensorName:3,4,6,2:0.456,234.4,234,567,143, ...

We can have the name (and any other meta data), the shape, and then the comma separated values can easily be shoved into the tensor's array for easy creation.

Then, perhaps each object is separated by semi colons. So the whole file may encode a bunch of variables. Then importing a file would put all the variables in a dict, for easy access.

We loose the readability of a nice csv... but I don't think a 3+ dimensional tensor is readable anyway... plus, the description computed property is readable anyway.

Allow specifying matrix layout

It would be cool to allow specifying whether your matrix should be stored row major or column major order (right now we just do row major).

This could be useful for a number of reasons. Right now though, I'm thinking it would be cool to default to column-major order on xcode builds. The lapack interface in the Accelerate framework doesn't give an option to have row major, so we end up transposing before and after certain operations, which is wasteful. If it were column major by default, we'd only have to do the extra transposes if the user specifically requested row major.

Matrix slices as views

I think right now, slicing a Matrix returns a new Matrix containing the sliced elements. I think we want to return a view so that I can slice a Matrix, do the operation on the slice, and have that reflected in the original matrix (in the same way that numpy does).

Indirect Indexing in Matrices

Reproducible random programs

As far as I can see, there is no way to initialize the random number generators that underlie the rand(), randn() and randi() functions with a seed. It is possible to pass a seed as a parameter, but this means that repeated calls to these functions will always yield the same result.
What I would like to be able to so is to initialize the RNG once with a seed, which would make subsequent calls to any of the rand() functions deterministic.

Support sparse matrices

Shouldn't Tensor, Matrix and Vector share more code?

When you see T, M and V, they appear to be almost the same thing, if you judge them by their properties:

@Tensor
    /// Number of elements in the tensor.
    public let count: Int

    /// Number of elements in each dimension of the tensor.
    public var size: [Int]

    /// Data contained in tensor in row-major order.
    public var data: [T]

    /// Optional name of tensor (e.g., for use in display).
    public var name: String?

    /// Determine whether to show name when displaying tensor.
    public var showName: Bool

    /// Formatter to be used in displaying tensor elements.
    public var format: NumberFormatter
@Matrix
    /// Number of elements in the matrix.
    public let count: Int

    /// Number of [rows, columns] in the matrix.
    public var size: [Int]
    public var rows: Int { return self.size[0] }
    public var columns: Int { return self.size[1] }

    /// Data contained in matrix in row-major order.
    public var data: [T]

    /// Optional name of matrix (e.g., for use in display).
    public var name: String?

    /// Determine whether to show name when displaying matrx.
    public var showName: Bool

    /// Formatter to be used in displaying matrix elements.
    public var format: NumberFormatter    

@Vector
    /// Number of elements in vector.
    public let count: Int

    /// Data contained in vector.
    public var data: [T]

    /// Optional name of vector for use in display
    public var name: String?

    /// Determine whether to show name when displaying matrx.
    public var showName: Bool

    /// Formatter to be used in displaying matrix elements.
    public var format: NumberFormatter

For better maintainability, shouldn't they share more code? By means of a protocol, for example. This would help with not only maintenance, it would also help with testing and functions by joining together test cases and functions that are basically the same across types (see the mean function for example).

Btw, I like this project because it actually works in Linux 😄 I wanna help with this in my spare time

Enhance the Test Suite

Since Nifty deals with simple structs with many functions and properties, it’s close to being perfect for Property Testing.

This issue consists on enhancing some of the already-in-place coverage with Properties by using SwiftCheck. This library allows us to express properties using First Order Logic, which is quite powerful.

Roadmap to 1.0

I was wondering about this. Do we have one? We should have one to start making tasks and crossing them out :)