lisitsyn / tapkee Goto Github PK

A flexible and efficient С++ template library for dimension reduction

License: BSD 3-Clause "New" or "Revised" License

CMake 3.84% Makefile 0.89% Python 0.28% C++ 93.90% Shell 0.74% XSLT 0.15% Dockerfile 0.20%

tapkee's Introduction

Tapkee is a C++ template library for dimensionality reduction with some bias on spectral methods. The Tapkee origins from the code developed during GSoC 2011 as the part of the Shogun machine learning toolbox. The project aim is to provide efficient and flexible standalone library for dimensionality reduction which can be easily integrated to existing codebases. Tapkee leverages capabilities of effective Eigen3 linear algebra library and optionally makes use of the ARPACK eigensolver. The library uses CoverTree and VP-tree data structures to compute nearest neighbors. To achieve greater flexibility we provide a callback interface which decouples dimension reduction algorithms from the data representation and storage schemes.

The library is distributed under permissive BSD 3-clause license (except a few rather optional parts that are distributed under other open sources licenses, see Licensing section of this document). If you use this software in any publication we would be happy if you cite the following paper:

Sergey Lisitsyn and Christian Widmer and Fernando J. Iglesias Garcia. Tapkee: An Efficient Dimension Reduction Library. Journal of Machine Learning Research, 14: 2355-2359, 2013.

To get started with dimension reduction you may try the go.py script that embeds common datasets (swissroll, helix, scurve) using the Tapkee library and outputs it with the help of Matplotlib library. To use the script build the sample application (see the Application section for more details) and call go.py with the following command:

./examples/go.py [swissroll|helix|scurve|...] [lle|isomap|...]

You may also try out an minimal example using make minimal (examples/minimal) and the RNA example using make rna (examples/rna). There are also a few graphical examples. To run MNIST digits embedding example use make mnist (examples/mnist), to run promoters embedding example use make promoters (examples/promoters) and to run embedding for faces dataset use make faces (examples/faces). All graphical examples require Matplotlib which can be usually installed with a package manager. The promoters example also has non-trivial dependency on Shogun machine learning toolbox (minimal version is 2.1.0). We also provide some examples of usage Tapkee in Shogun as make langs (examples/langs) example.

API

We provide an interface based on the method chaining technique. The chain starts with the call of the with(const ParametersSet&) method, which is used to provide parameters like the method to use and its settings. The provided argument is formed with the following syntax:

(keyword1=value1, keyword2=value2)

Such syntax is possible due to comma operator overloading which groups all assigned keywords in the comma separated list.

Keywords are defined in the tapkee namespace. Currently, the following keywords are defined: method, eigen_method, neighbors_method, num_neighbors, target_dimension, diffusion_map_timesteps, gaussian_kernel_width, max_iteration, spe_global_strategy, spe_num_updates, spe_tolerance, landmark_ratio, nullspace_shift, klle_shift, check_connectivity, fa_epsilon, progress_function, cancel_function, sne_perplexity, sne_theta. See the documentation for their detailed meaning.

As an example of parameters setting, if you want to use the Isomap algorithm with the number of neighbors set to 15:

tapkee::with((method=Isomap,num_neighbors=15))

Please note that the inner parentheses are necessary as it uses the comma operator which appears to be ambiguous in this case.

Next, you may either embed the provided matrix with:

tapkee::with((method=Isomap,num_neighbors=15)).embedUsing(matrix);

Or provide callbacks (kernel, distance and features) using any combination of the withKernel(KernelCallback), withDistance(DistanceCallback) and withFeatures(FeaturesCallback) member functions:

tapkee::with((method=Isomap,num_neighbors=15))
       .withKernel(kernel_callback)
       .withDistance(distance_callback)
       .withFeatures(features_callback)

Once callbacks are initialized you may either embed data using an STL-compatible sequence of indices or objects (that supports the begin() and end() methods to obtain the corresponding iterators) with the embedUsing(Sequence) member function or embed the data using a sequence range with the embedRange(RandomAccessIterator, RandomAccessIterator) member function.

As a summary - a few examples:

TapkeeOutput output = with((method=Isomap,num_neighbors=15))
    .embedUsing(matrix);

TapkeeOutput output = with((method=Isomap,num_neighbors=15))
    .withDistance(distance_callback)
    .embedUsing(indices);

TapkeeOutput output = with((method=Isomap,num_neighbors=15))
    .withDistance(distance_callback)
    .embedRange(indices.begin(),indices.end());

Minimal example

A minimal working example of a program that uses the library is:

#include <tapkee/tapkee.hpp>
#include <tapkee/callbacks/dummy_callbacks.hpp>

using namespace std;
using namespace tapkee;

struct MyDistanceCallback
{
	ScalarType distance(IndexType l, IndexType r) { return abs(l-r); }
};

int main(int argc, const char** argv)
{
	const int N = 100;
	vector<IndexType> indices(N);
	for (int i=0; i<N; i++) indices[i] = i;

	MyDistanceCallback d;

	TapkeeOutput output = tapkee::with((method=MultidimensionalScaling,target_dimension=1))
	   .withDistance(d)
	   .embedUsing(indices);

	cout << output.embedding.transpose() << endl;
	return 0;
}

This example require Tapkee to be in the include path. With Linux compilers you may do that with the -I/path/to/tapkee/headers/folder key.

Integration

There are a few issues related to including the Tapkee library to your code. First, if your library already includes Eigen3 (and only if) - you might need to let Tapkee know about that with the following define:

#define TAPKEE_EIGEN_INCLUDE_FILE <path/to/your/eigen/include/file.h>

Please note that if you don't include Eigen3 in your project there is no need to define that variable - in this case Eigen3 will be included by Tapkee. This issue comes from the need of including the Eigen3 library only once when using some specific parameters (like debug and extensions).

If you are able to use less restrictive licenses (such as LGPLv3) you may define the following variable:

TAPKEE_USE_LGPL_COVERTREE to use Covertree code by John Langford.

When compiling your software that includes Tapkee be sure Eigen3 headers are in include path and your code is linked against ARPACK library (-larpack key for g++ and clang++).

For an example of integration you may check Tapkee adapter in Shogun.

When working with installed headers you may check which version of the library do you have with checking the values of TAPKEE_WORLD_VERSION, TAPKEE_MAJOR_VERSION and TAPKEE_MINOR_VERSION defines.

We welcome any integration so please contact authors if you have got any questions. If you have successfully used the library please also let authors know about that - mentions of any applications are very appreciated.

Customization

Tapkee is designed to be highly customizable with preprocessor definitions.

If you want to use float as internal numeric type (default is double) you may do that with definition of #define TAPKEE_CUSTOM_NUMTYPE float before including defines header.

If you use some non-standard STL-compatible realization of vector, map and pair you may redefine them with TAPKEE_INTERNAL_VECTOR, TAPKEE_INTERNAL_PAIR, TAPKEE_INTERNAL_MAP (they are set to std::vector, std::pair and std::map by default otherwise).

You may define TAPKEE_USE_FIBONACCI_HEAP or TAPKEE_USE_PRIORITY_QUEUE to select which data structure should be used in the shortest paths computing algorithm. By default a priority queue is used.

Other properties can be loaded from some provided header file using #define TAPKEE_CUSTOM_PROPERTIES. Currently such file should define only one variable - COVERTREE_BASE which defines the base of the CoverTree (default is 1.3).

Command line application

Tapkee comes with a sample application which can be used to construct low-dimensional representations of dense feature matrices. For more information on its usage please run:

./bin/tapkee -h

The application takes plain ASCII file containing dense matrix (each vector is a column and each line contains values of some feature). The output of the application is stored into the provided file in the same format (each line is feature).

To compile the application please use CMake. The workflow of compilation Tapkee with CMake is usual. When using Unix-based systems you may use the following command to compile the Tapkee application:

mkdir build && cd build && cmake [definitions] .. && make

There are a few cases when you'd want to put some definitions:

To enable unit-tests compilation add to -DBUILD_TESTS=1 to [definitions] when building. Please note that building unit-tests require googletest. If you are running Ubuntu you may install libgtest-dev package for that. Otherwise, if you have gtest sources around you may provide them as -DGTEST_SOURCE_DIR and -DGTEST_INCLUDES_DIR. You may also download gtest with the following command:

wget https://github.com/google/googletest/archive/release-1.8.0.tar.gz && tar xfv release-1.8.0.tar.gz

Downloaded sources will be used by Tapkee. To run tests use make test command (or better 'ctest -VV').

To let make script store test coverage information using GCOV and add a target for output test coverage in HTML with LCOV add the -DUSE_GCOV=1 flag to [definitions].
To enable precomputation of kernel/distance matrices which can speed-up algorithms (but requires much more memory) add -DPRECOMPUTED=1 to [definitions] when building.
To build application without parts licensed by LGPLv3 use -DGPL_FREE=1 definition.

The library requires Eigen3 to be available in your path. The ARPACK library is also highly recommended to achieve best performance. On Debian/Ubuntu these packages can be installed with

sudo apt-get install libeigen3-dev libarpack2-dev

If you are using Mac OS X and Macports you can install these packages with

sudo port install eigen3 && sudo port install arpack`

In case you want to use some non-default compiler use CC=your-C-compiler CXX=your-C++-compiler cmake [definitions] .. when running cmake.

Directory contents

The repository of Tapkee contains the following directories:

src/ that contains simple command-line application (src/cli) and CMake module finders (src/cmake).
includes/ that contains the library itself in the includes/tapkee subdirectory.
test/ that contains unit-tests in the test/unit subdirectory and a few helper scripts.
examples/ that contains a few examples including already mentioned (these examples are supposed to be called through make as described above, e.g. make minimal).
data/ a git submodule that contains data required for examples. To initialize this submodule use git submodule update --init.
doc/ that contains Doxygen interface file which is used to generate HTML documentation of the library. Calling doxygen doc/Doxyfile will generate it in this folder.

Once built, the root will also contain the following directories:

bin that contains binaries (tapkee that is command line application and various tests with common naming test_*)
lib that contains gtest shared libraries.

Need help?

If you need any help or advice don't hesitate to send an email or fire an issue at github.

Supported platforms

Tapkee is tested to be fully functional on Linux (ICC, GCC, Clang compilers) and Mac OS X (GCC and Clang compilers). It also compiles under Windows natively (MSVS 2012 compiler) with a few known issues. In general, Tapkee uses no platform specific code and should work on other systems as well. Please let us know if you have successfully compiled or have got any issues on any other system not listed above.

Supported dimension reduction methods

Tapkee provides implementations of the following dimension reduction methods (urls to descriptions provided):

Locally Linear Embedding and Kernel Locally Linear Embedding (LLE/KLLE)
Neighborhood Preserving Embedding (NPE)
Local Tangent Space Alignment (LTSA)
Linear Local Tangent Space Alignment (LLTSA)
Hessian Locally Linear Embedding (HLLE)
Laplacian eigenmaps
Locality Preserving Projections
Diffusion map
Isomap and landmark Isomap
Multidimensional scaling and landmark Multidimensional scaling (MDS/lMDS)
Stochastic Proximity Embedding (SPE)
Principal Component Analysis (PCA)
Kernel Principal Component Analysis (PCA)
Random projection
Factor analysis
t-SNE
Barnes-Hut-SNE

Licensing

The library is distributed under the BSD 3-clause license.

Exceptions are:

Barnes-Hut-SNE code by Laurens van der Maaten which is distributed under the BSD 4-clause license.
Covertree code by John Langford and Dinoj Surendran which is distributed under the LGPLv3 license.

tapkee's People

Contributors

Stargazers

Watchers

tapkee's Issues

bh-SNE with custom distance callback

Using method=tDistributedStochasticNeighborEmbedding in combination with withDistance() is not supported.

Laurens van der Maaten says for using a custom metric, the Vantage-Point Tree needs to be changed (see here. Note that this only refers to the Barnes-Hut algorithm; exact algorithm uses no VPTree and has it's own custom distance computation in tsne.hpp.

Interestingly, tapkee already comes with an alternative VPTree implementation that supports the use of a distance callback. It also looks quite compatible.

Could the method be altered to use the functionality of neighbors/vptree.hpp and enable withDistance()?

Fix crashes on windows

Covertree crashes

Projecting out of bag data

Hi,

I am new to Tapkee and I can't find how to project new data not used while building a model.

The issue is that I have large data sets and I would like to estimate a dimensionality reduction model using a subset of my data, and apply the model afterwards to the complete data set.

I understand that this may not make sense for some of the methods, but it does for some others. What I would need to do is something similar to retrieving the projection matrix of a PCA in order to apply it to new data. The question is, how to do it in a generic way with Tapkee?

I think I should be able to do that using a ProjectingFunction, but I can't find an example doing that.

Thanks in advance for your help.

Aliasing error during eigen decomposition

Hey there,

I found your library really useful so far and I am really surprised by its efficiency. However, I am experiencing some aliasing issues. The following code fails during embedding. Note that I have defined TAPKEE_CUSTOM_INTERNAL_NUMTYPE as float. ¹

// Create a matrix with 512x512 32-dim features.
tapkee::DenseMatrix descriptors(32, 262144);

// Randomly fill the matrix.
for (int i(0); i < 32; ++i)
for (int j(0); j < 262144; ++j)
	descriptors(i, j) = static_cast<float>(std::rand()) / static_cast<float>(RAND_MAX);

// Reduce to dim 8 using PCA.
tapkee::ParametersSet parameters = tapkee::kwargs[
	tapkee::method = tapkee::PCA,
	tapkee::target_dimension = 8
];

// Perform PCA
tapkee::TapkeeOutput result = tapkee::initialize()
	.withParameters(parameters)
	.embedUsing(descriptors);

Basically I want to reduce a set of 512x512 32 dimensional features to 8D, which I randomly initialize for demonstration purposes. When ran in debug-mode (under Win64), Eigen detects aliasing:

aliasing detected during transposition, use transposeInPlace() or evaluate the rhs into a temporary using .eval()

The error is raised during eigen decomposition (eigendecomposition.hpp):

//! Eigen library dense implementation of eigendecomposition-based embedding
template <class MatrixType, class MatrixOperationType>
EigendecompositionResult eigendecomposition_impl_dense(const MatrixType& wm, IndexType target_dimension, unsigned int skip)
{
	timed_context context("Eigen library dense eigendecomposition");

	DenseSymmetricMatrix dense_wm = wm;
	dense_wm += dense_wm.transpose();	// Invalid.
	dense_wm /= 2.0;
	DenseSelfAdjointEigenSolver solver(dense_wm);
	
	// ...
}

It relates to the issues described here. Changing this line to dense_wm += dense_wm.transpose().eval(); resolves the issue. I think transposeInPlace() should also work, however I was not able to get it to compile that way.

Did I miss something or is this an actual bug?!

¹ Also note that the documentation says TAPKEE_CUSTOM_NUMTYPE, which is actually incorrect.

"The neighbourhood graph is not connected."

Overview

Hello!

So I was trying to use your dimension reduction package in relation to hyperspectral imagery as you cited as an application within your benchmarks and paper. So I cloned your benchmarks repository and ran your script to perform the benchmark and I didn't run into any problems. However, I downloaded the library and wanted to mess around with dimension reduction for hyperspectral imagery and I get this glaring error no matter which similar hyperspectral data set I used - [warning] The neighborhood graph is not connected. This leads my results to having nan values. This is consistent between the tapkee CLI inputs as well as the shogun-toolbox inputs.

Specific Issue

So let's say I run this command in the CLI:

./tapkee -i aviris.dat -o aviris_dimred.dat -m isomap -td 20 -k 25 --benchmark

The method will run fine except for the [warning] The neighborhood graph is not connected. error that pops up no matter what simple parameters I use, i.e. k-nearest neighbors, target dimension, eigensolver method. Upon closer inspection, I see that the output file is simply a list of nan for each column.

So I tried to vary the dimension reduction techniques, i.e. pca, laplacian eigenmaps, neighborhood preserving projection, local linear embedding, etc, and I found that the eigendecomposition would always fail as an error would pop up saying Some error occured: eigendecomposition failed. Now I varied the knn, the dimension and the methods and I found that the only methods that would produce a solution with actual values in the output file were the stochastic methods; e.g. t-stochastic neighbourhood embedding and stochastic proximity embedding; and MDS method. Sometimes my solution would be nonsense - like if the input data resides between 0 and 1, the output data should as well and sometimes these algorithms didn't produce that - but that is to be expected with stochastic methods. And the MDS will put out some nan values as well but not all of them.

I also tried to use my own hyperspectral data set albeit it was much bigger; 145x145 with 200 dimensions. I created a flattened image of it so that the dimensions were 21025x200. However, I still got the same errors except it just took longer to process.

I even get the same issue if I try to use the Shogun-Toolbox to try and enter the same data sets via Python. Same error except it will not even produce a result and stops the algorithm altogether.

Question

So, what did you do for your graph that produced actual results? I looked into your script and I didn't find any extra commands that would allow you overcome this error. Maybe I am missing something in your script for using Tapkee on the aviris dataset? What commands should I enter or vary so that I can get a sensible solution with your dataset and package? Is there some sort of preprocessing step that you (or I could) do to avoid this issue?

Thank you for your time!

Refactor resources after API stabilizes

Update website (maybe making a "v1" and "v2" separation), possibly the jmlr benchmarks repo.

Starting context: 63c72bf

Documentation/examples are incorrect regarding keyword namespace

It looks as though keywords are now defined in the tapkee namespace rather than tapkee::keywords. The documentation needs to be updated to reflect this, along with the examples. Including the line:

using namespace tapkee::keywords;

results in the "expected namespace name" error.

Implement random projection

The neighborhood graph is not connected

Hi there, I am currently using the tapkee executable as a pipeline in my project in python, I am creating a text file with all my values to reduce wich looks like this :

0.301961,0.376471,0.443137,0,0.992157,0 0.188235,0.235294,0.298039,0.847059,0.752941,0 0.192157,0.239216,0.294118,0.854902,0.72549,0 0.266667,0.282353,0.380392,0.901961,0.505882,0.8 0.215686,0.211765,0.215686,0.913725,0.498039,0.8 0.25098,0.262745,0.286275,0.917647,0.494118,0.8 0.309804,0.305882,0.333333,0.905882,0.419608,0.8 0.258824,0.321569,0.396078,0.831373,0.811765,0 0.239216,0.298039,0.376471,0.831373,0.807843,0 0.231373,0.290196,0.368627,0.819608,0.741176,0 0.290196,0.360784,0.443137,0.784314,0.780392,0 0.301961,0.376471,0.443137,0,0.992157,0 0.188235,0.235294,0.298039,0.847059,0.752941,0 0.192157,0.239216,0.294118,0.854902,0.72549,0 0.266667,0.282353,0.380392,0.901961,0.505882,0.8 0.215686,0.211765,0.215686,0.913725,0.498039,0.8 0.25098,0.262745,0.286275,0.917647,0.494118,0.8 0.309804,0.305882,0.333333,0.905882,0.419608,0.8 0.258824,0.321569,0.396078,0.831373,0.811765,0

This example works fine but if I add one more line like

0.1,0.2,0.3,0.4,0.5,0

I get the message "[warning] The neighborhood graph is not connected"

So I check for solution and the only help I could find is this message that you wrote in your documentation.

Please note that "[warning] The neighborhood graph is not connected" message in most cases means
that ’tapkee’ run was unsuccessful. As a result, Tapkee() might return the matrix of NaN’s. One of
possible workarounds is to specify the higher number of neigbors (’-k’ option, default is 10). See
below for the example.

Then i tried to change the k values and it works when it's equal or superior to the number of rows divised by 2. So if I have 30 rows I have to make the neighbour values to 15 or higher.

Is it right or something else is wrong ? I am supposed to use this method with a lot of rows (4000 to
million), it gonna be long no ? I may not understand perfectly how it works.

Thanks for your help.

Various PCA Results

I applied tapkee PCA and opencv PCA to the same data and I got different results.
So I wonder if the methods are different in detail.

Running tapkee

Hi there,

Thanks for a wonderful program! For some reason, I could not run 'tapkee' on my comp (OS: Windows 7). When I open the downloaded file, it, first, asks me whether I want to run the program. As I hit 'run', for a millisecond, a window appears and disappears instantly. I tried to run in the Windows console, but it says that ''tapkee' is not recognized as an internal or external command, operable program or batch file'. I wonder what I am doing wrong.

Fire warning/error if any of parameters that are required to be set is not set

Reproduced by not setting nullspace shift in KLLE (in Shogun)

Implement streaming mode for landmark based algorithms

Idea is: with randomization we can assume that landmarks are the first N vectors so we collect them to buffer until its ready. Once bufferized we output all the landmark vectors' projections and process following vectors in streaming way.

Examples

Is there an example in tapkee to do the following?
a.) Load a data file from a text file
b.) Call a dimension reduction method
c.) Output the mapped data back to a text file..

Compile failed with OpenMP on Windows

Hello

I've try to use LE to do dim reduction with OpenMP on. But compile failed at Meta.h in Eigen3.3 library.
Error:
c:\users\cwang\documents\third library\eigen\eigen\src/Core/util/Meta.h(146): error C3052: 'ms_from' : variable doesn't appear in a data-sharing clause under a default(none) clause

Part of Meta.h
public: static From ms_from;
enum { value = sizeof(test(ms_from, 0))==sizeof(yes) }; <-- line 146

Part of My Code:
TapkeeOutput output = initialize()
.withParameters((method = LaplacianEigenmaps, target_dimension = 3))
.withDistance(distance) // distance function defined in other place.
.embedRange(a);

Environment: Windows 10. Visual studio 2013. 64bit (both debug & release)

Thank you very much.

Implement LOE (Local Ordinal Embedding)

Paper http://jmlr.org/proceedings/papers/v32/terada14.pdf

dimensionality reduction with MDS

Hello, does there are more example about MDS?

I don't know how to use MDS just with Minimal example.....

search condition in the vp tree in tapkee

in the vptree.hpp file, the search() function contains this code block:

            if (distance < node->threshold)
            {
                    if ((distance - tau) <= node->threshold)
                            search(node->left, target, k, heap);

                    if ((distance + tau) >= node->threshold)
                            search(node->right, target, k, heap);
            }
            else
            {
                    if ((distance + tau) >= node->threshold)
                            search(node->right, target, k, heap);

                    if ((distance - tau) <= node->threshold)
                            search(node->left, target, k, heap);
            }

Are second and fourth if necessary here ?

The first if says "distance < node->threshold", then the second if "(distance - tau) <= node->threshold" should be satisfied automatically, right ? Same is for the fourth if.

So I think we may simplify this code block to:

            if (distance < node->threshold)
            {
                    search(node->left, target, k, heap);

                    if ((distance + tau) >= node->threshold)
                            search(node->right, target, k, heap);
            }
            else
            {
                    search(node->right, target, k, heap);

                    if ((distance - tau) <= node->threshold)
                            search(node->left, target, k, heap);
            }

Correct me if I am wrong. Thanks

arpack_wrapper.hpp handles non self-adjoint input ?

Hi,

When I read the file include/tapkee/utils/arpack_wrapper.hpp, I see that it only handles the self-adjoint matrix A to solve the eigen problem A v = \lambda v. If A is not self-adjoint, then it looks that we can not use arpack in tapkee or we have to modify the arpack_wrapper.hpp to handle non self-adjoint input ?

When I run 'examples/borsch swissroll isomap', it shows that arpack_wrapper.hpp is handling the non-symmetric maxtrix A since I printed out
A[533][207] = -148.229072 and A[207][533] = -143.89085

Could you clarify this ?

Thanks.

Best,
Levi

Look into attached unit test logs

While #83, this popped up (summary, complete logs attached)

5: Indirect leak of 40 byte(s) in 1 object(s) allocated from:                                                                                     
5:     #0 0x7a2c008e1359 in __interceptor_malloc /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:69                                
5:     #1 0x5c5933bf1fdc in Eigen::internal::handmade_aligned_malloc(unsigned long, unsigned long) /usr/include/eigen3/Eigen/src/Core/util/Memory.
h:105                                                                                                                                             
5:     #2 0x5c5933bf244b in Eigen::internal::aligned_malloc(unsigned long) /usr/include/eigen3/Eigen/src/Core/util/Memory.h:188                   
5:     #3 0x5c5933d85d8e in void* Eigen::internal::conditional_aligned_malloc<true>(unsigned long) /usr/include/eigen3/Eigen/src/Core/util/Memory.
h:241                                                                                                                                             
5:     #4 0x5c5933d1e568 in double* Eigen::internal::conditional_aligned_new_auto<double, true>(unsigned long) /usr/include/eigen3/Eigen/src/Core/
util/Memory.h:404                                                                                                                                 5:     #5 0x5c5933cb5949 in Eigen::DenseStorage<double, -1, -1, 1, 0>::DenseStorage(Eigen::DenseStorage<double, -1, -1, 1, 0> const&) /usr/include
/eigen3/Eigen/src/Core/DenseStorage.h:589                                                                                                         5:     #6 0x5c5933c7ce30 in Eigen::PlainObjectBase<Eigen::Matrix<double, -1, 1, 0, -1, 1> >::PlainObjectBase(Eigen::PlainObjectBase<Eigen::Matrix<
double, -1, 1, 0, -1, 1> > const&) /usr/include/eigen3/Eigen/src/Core/PlainObjectBase.h:512                                                       5:     #7 0x5c5933c46159 in Eigen::Matrix<double, -1, 1, 0, -1, 1>::Matrix(Eigen::Matrix<double, -1, 1, 0, -1, 1> const&) /usr/include/eigen3/Eige
n/src/Core/Matrix.h:414                                                                                                                           5:     #8 0x5c5933c002da in tapkee::MatrixProjectionImplementation::MatrixProjectionImplementation(Eigen::Matrix<double, -1, -1, 0, -1, -1>, Eigen
::Matrix<double, -1, 1, 0, -1, 1>) tapkee/include/tapkee/projection.hpp:46                                                    5:     #9 0x5c5933d0cf69 in tapkee::tapkee_internal::ImplementationBase<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, tapkee::eigen_kernel_callback, tapkee::eigen_distance_callback, tapkee::eigen_features_callback>::embedPCA() tapkee/include
/tapkee/methods.hpp:341                                                                                                                           5:     #10 0x5c5933caa55c in tapkee::tapkee_internal::ImplementationBase<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, tapkee::eigen_kernel_callback, tapkee::eigen_distance_callback, tapkee::eigen_features_callback>::embedUsing(tapkee::DimensionReductionMethod)
 tapkee/include/tapkee/methods.hpp:113                                                                                        5:     #11 0x5c5933c735ea in tapkee::TapkeeOutput tapkee::embed<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, tapkee::eigen_kernel_callback, tapkee::eigen_distance_callback, tapkee::eigen_features_callback>(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, tapkee::eigen_kernel_callback, tapkee::eigen_d
istance_callback, tapkee::eigen_features_callback, stichwort::ParametersSet) tapkee/include/tapkee/embed.hpp:117              5:     #12 0x5c5933c3f289 in tapkee::tapkee_internal::ParametersInitializedState::embedUsing(Eigen::Matrix<double, -1, -1, 0, -1, -1> const&) cons
t tapkee/tapkee/include/tapkee/chain_interface.hpp:479                                                                               
5:     #13 0x5c5933bdf964 in Projecting_PCA_Test::TestBody() tapkee/tapkee/test/unit/projecting.cpp:17                               
5:     #14 0x7a2c0100465b  (/usr/lib/libgtest.so.1.14.0+0x5365b)                          
5:                                                                                                                                                
5: SUMMARY: AddressSanitizer: 304 byte(s) leaked in 6 allocation(s).                                                                              
5/5 Test #5: projecting .......................***Failed    0.17 sec

ctest.log

Hotfix below didn't improve if I didn't overlook results:

--- a/test/unit/projecting.cpp
+++ b/test/unit/projecting.cpp
@@@ -13,7 -13,7 +13,8 @@@ TEST(Projecting, PCA
  
      TapkeeOutput output;
  
--    ASSERT_NO_THROW(output = tapkee::initialize().withParameters((method = PCA, target_dimension = 2)).embedUsing(X));
++    const auto tapkee_obj = tapkee::initialize().withParameters((method = PCA, target_dimension = 2));
++    ASSERT_NO_THROW(output = tapkee_obj.embedUsing(X));
  
      auto projected = output.projection(X.col(0));
      ASSERT_EQ(2, projected.size())

I didn't get around it yet as I would have liked, so I added it here. From tomorrow I have more time again and I am aiming at getting it going this week. 🤓

Add Python wrapper module

The current borsch script simply writes the data to a file and invokes tapkee_cli to interface with the library. It should be possible (and much cleaner) to provide a simple wrapper module to call tapkee functions directly from Python.

Issue in PCA's projection function

Hi all,

Thanks for this library, I think it can be useful for my own research.

I am trying to use the projection function of the PCA. However, the result of the project is always a vector with the length of the high-dimensional space. Also, a dimension analysis of the matrix/vectors of the projection function, shows that there is something weird. Is this a bug or am I missing something?

I suspect that there is a .transpose() missing in the definition of the projection function Here

I tested this on my side, and I managed to retrieve the embedding after projecting the training dataset with my modified projection function.

Thanks

bh-sne and MNIST example does not seem to work

After retrieving the mnist4000.dat file from the benchmarks repo, and scrutinizing the mnist.py code, I was able to get the example to start running. However, at the moment, only t-sne seems to be supported, and I get the following error:

python examples/mnist/mnist.py examples/mnist/mnist4000.dat
Traceback (most recent call last):
File "examples/mnist/mnist.py", line 56, in
embedding, data = embed(sys.argv[1])
File "examples/mnist/mnist.py", line 6, in embed
json_data = json.load(file)
File "/usr/local/lib/python2.7/json/init.py", line 278, in load
**kw)
File "/usr/local/lib/python2.7/json/init.py", line 326, in loads
return _default_decoder.decode(s)
File "/usr/local/lib/python2.7/json/decoder.py", line 369, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 1 column 5 - line 4001 column 1 (char 5 - 15680000)

arpack eigendecomposition and dense eigendecomposition gave different eigenvalues

I am using the isomap.

I tried arpack eigendecomposition and it gave the two largest eigenvlues of a shortest distance matrix as
-7.12559e+06 4.72011e+06

But when I tried the dense eigendecomposition, it gave the two largest eigenvlues of the same shortest distance matrix as
3.62429e+06 4.72011e+06

By the way, I have symmetrized the neighbors (see issue #30) and have checked the input matrix of eigendecomposition is symmetric.

arpack eigendecomposition and dense eigendecomposition agree on the second eigenvalue but not on the first one in this case.

In the embedIsomap() function, there is such code block

for (IndexType i=0; i < static_cast(p_target_dimension); i++)
embedding.first.col(i).array() *= sqrt(embedding.second(i));

arpack eigendecomposition gave a negative eigenvalue and it did not work to call sqrt() here in this case

I tried to increase the ncv from current 20 to 100 in arpack_wrapper.hpp but it did not work and still gave the two eigenvalues -7.12559e+06 4.72011e+06

By the way, the input symmetric matrix is 5000 by 5000

Appreciate very much for any idea on the arpack eigendecomposition issue in my this case.

Addition of a catch-all handler?

Would you like to add the construct "catch(...)" in the function "main"?

Delete out of sample estimators if not used

Implement plain C interface

Crash on empty input

Currently when you pass an empty file to the command line app it crashes

Remove unnecessary null pointer checks

An extra null pointer check is not needed in functions like the following.

Add a Homebrew cask

More in https://docs.brew.sh/Adding-Software-to-Homebrew#casks

Problem with make and eigen3

I've tried to "make" tapkee, but I got error as below.
I've installed eigen3 by sudo apt-get install libeigen3-dev.
Probably it is related to this: https://bugs.launchpad.net/ubuntu/+source/eigen3/+bug/1610265 , I'm on ubuntu 16.04 LTS.

============================================================================
[ 50%] Building CXX object CMakeFiles/tapkee.dir/src/cli/main.cpp.o
In file included from /home/mglowacki/tapkee-master/include/tapkee/methods.hpp:31:0,
                 from /home/mglowacki/tapkee-master/include/tapkee/embed.hpp:11,
                 from /home/mglowacki/tapkee-master/include/tapkee/tapkee.hpp:10,
                 from /home/mglowacki/tapkee-master/src/cli/main.cpp:6:
/home/mglowacki/tapkee-master/include/tapkee/external/barnes_hut_sne/tsne.hpp: In member function ‘void tsne::TSNE::run(tapkee::DenseMatrix&, int, int, tapkee::ScalarType*, int, tapkee::ScalarType, tapkee::ScalarType)’:
/home/mglowacki/tapkee-master/include/tapkee/external/barnes_hut_sne/tsne.hpp:70:9: warning: unused variable ‘total_time’ [-Wunused-variable]
   float total_time = .0;
         ^
In file included from /usr/include/eigen3/Eigen/Eigenvalues:38:0,
                 from /usr/include/eigen3/Eigen/Dense:7,
                 from /usr/include/eigen3/Eigen/Eigen:1,
                 from /home/mglowacki/tapkee-master/include/tapkee/defines/eigen3.hpp:12,
                 from /home/mglowacki/tapkee-master/include/tapkee/defines.hpp:23,
                 from /home/mglowacki/tapkee-master/include/tapkee/embed.hpp:10,
                 from /home/mglowacki/tapkee-master/include/tapkee/tapkee.hpp:10,
                 from /home/mglowacki/tapkee-master/src/cli/main.cpp:6:
/usr/include/eigen3/Eigen/src/Eigenvalues/SelfAdjointEigenSolver.h: In instantiation of ‘Eigen::SelfAdjointEigenSolver<MatrixType>& Eigen::SelfAdjointEigenSolver<_MatrixType>::compute(const Eigen::EigenBase<OtherDerived>&, int) [with InputType = Eigen::SelfAdjointView<Eigen::Matrix<double, -1, -1>, 2u>; _MatrixType = Eigen::Matrix<double, -1, -1>]’:
/usr/include/eigen3/Eigen/src/Eigenvalues/SelfAdjointEigenSolver.h:168:14:   required from ‘Eigen::SelfAdjointEigenSolver<_MatrixType>::SelfAdjointEigenSolver(const Eigen::EigenBase<OtherDerived>&, int) [with InputType = Eigen::SelfAdjointView<Eigen::Matrix<double, -1, -1>, 2u>; _MatrixType = Eigen::Matrix<double, -1, -1>]’
/home/mglowacki/tapkee-master/include/tapkee/routines/eigendecomposition.hpp:60:77:   required from ‘tapkee::tapkee_internal::EigendecompositionResult tapkee::tapkee_internal::eigendecomposition_impl_dense(const MatrixType&, tapkee::IndexType, unsigned int) [with MatrixType = Eigen::Matrix<double, -1, -1>; MatrixOperationType = tapkee::tapkee_internal::DenseMatrixOperation; tapkee::tapkee_internal::EigendecompositionResult = std::pair<Eigen::Matrix<double, -1, -1>, Eigen::Matrix<double, -1, 1> >; tapkee::IndexType = int]’
/home/mglowacki/tapkee-master/include/tapkee/routines/eigendecomposition.hpp:202:47:   required from here
/usr/include/eigen3/Eigen/src/Eigenvalues/SelfAdjointEigenSolver.h:428:7: error: ‘const class Eigen::SelfAdjointView<Eigen::Matrix<double, -1, -1>, 2u>’ has no member named ‘triangularView’
   mat = matrix.template triangularView<Lower>();
       ^
CMakeFiles/tapkee.dir/build.make:62: recipe for target 'CMakeFiles/tapkee.dir/src/cli/main.cpp.o' failed
make[2]: *** [CMakeFiles/tapkee.dir/src/cli/main.cpp.o] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/tapkee.dir/all' failed
make[1]: *** [CMakeFiles/tapkee.dir/all] Error 2
Makefile:127: recipe for target 'all' failed
make: *** [all] Error 2

Out-of-sample projections

Implement factor analysis

ezoptionparser error

        case S1: CHECKRANGE(S1,char); break;
        case U1: CHECKRANGE(U1,unsigned char); break;
        case S2: CHECKRANGE(S2,short); break;
        case U2: CHECKRANGE(U2,unsigned short); break;
        case S4: CHECKRANGE(S4,int); break;
        case U4: CHECKRANGE(U4,unsigned int); break;

Error 34 error C2589: '(' : illegal token on right side of '::' ...

Failed to integrate and compile

In file included from /root/work/include/Eigen/Core:297:0,
                 from /root/work/include/Eigen/Dense:1,
                 from /root/work/include/Eigen/Eigen:1,
                 from /root/work/include/omni/frontalizer_cpu.h:4,
                 from /root/work/src/main.cpp:28:
/root/work/include/Eigen/src/Core/DenseBase.h: In instantiation of ‘void Eigen::DenseBase<Derived>::swap(const Eigen::DenseBase<OtherDerived>&) [with OtherDerived = Eigen::Matrix<double, -1, -1>; Derived = Eigen::Matrix<double, -1, -1>]’:
/root/work/include/Eigen/src/Core/PlainObjectBase.h:852:33:   required from ‘void Eigen::PlainObjectBase<Derived>::swap(const Eigen::DenseBase<OtherDerived>&) [with OtherDerived = Eigen::Matrix<double, -1, -1>; Derived = Eigen::Matrix<double, -1, -1>]’
/root/work/include/tapkee/defines.hpp:52:20:   required from here
/root/work/include/Eigen/src/Core/util/StaticAssert.h:32:40: error: static assertion failed: THIS_EXPRESSION_IS_NOT_A_LVALUE__IT_IS_READ_ONLY
     #define EIGEN_STATIC_ASSERT(X,MSG) static_assert(X,#MSG);
                                        ^
/root/work/include/Eigen/src/Core/DenseBase.h:427:7: note: in expansion of macro ‘EIGEN_STATIC_ASSERT’
       EIGEN_STATIC_ASSERT(!OtherDerived::IsPlainObjectBase,THIS_EXPRESSION_IS_NOT_A_LVALUE__IT_IS_READ_ONLY);
       ^
make[2]: *** [src/CMakeFiles/omni.dir/main.cpp.o] Error 1
make[1]: *** [src/CMakeFiles/omni.dir/all] Error 2
make: *** [all] Error 2

Complete quoting for parameters of some CMake commands

Some parameters (like "${CMAKE_CURRENT_SOURCE_DIR}/lib" and "${CMAKE_BINARY_DIR}") are passed to CMake commands in your build script without enclosing them by quotation marks. I see that these places will result in build difficulties if the contents of the used variables will contain special characters like spaces.

I would recommend to apply advices from a Wiki article.

Crashes when building with MinGW

When I enable the VexCL examples, the program fails to build and says that "too much memory", cannot compile and it crashes.
Here is the build output

10:58:30 *** Build of project Tapkee-Release@tapkee ****
"C:\TDM-GCC-64\bin\mingw32-make.exe" -C E:/Binaries_MinGW/tapkee tapkee_cli
mingw32-make: Entering directory 'E:/Binaries_MinGW/tapkee'
"C:\Program Files (x86)\CMake 2.8\bin\cmake.exe" -HE:\Sources\tapkee-master -BE:\Binaries_MinGW\tapkee --check-build-system CMakeFiles\Makefile.cmake 0
C:/TDM-GCC-64/bin/mingw32-make -f CMakeFiles\Makefile2 tapkee_cli
mingw32-make[1]: Entering directory 'E:/Binaries_MinGW/tapkee'
"C:\Program Files (x86)\CMake 2.8\bin\cmake.exe" -HE:\Sources\tapkee-master -BE:\Binaries_MinGW\tapkee --check-build-system CMakeFiles\Makefile.cmake 0
"C:\Program Files (x86)\CMake 2.8\bin\cmake.exe" -E cmake_progress_start E:\Binaries_MinGW\tapkee\CMakeFiles 1
C:/TDM-GCC-64/bin/mingw32-make -f CMakeFiles\Makefile2 CMakeFiles/tapkee_cli.dir/all
mingw32-make[2]: Entering directory 'E:/Binaries_MinGW/tapkee'
C:/TDM-GCC-64/bin/mingw32-make -f CMakeFiles\tapkee_cli.dir\build.make CMakeFiles/tapkee_cli.dir/depend
mingw32-make[3]: Entering directory 'E:/Binaries_MinGW/tapkee'
"C:\Program Files (x86)\CMake 2.8\bin\cmake.exe" -E cmake_depends "MinGW Makefiles" E:\Sources\tapkee-master E:\Sources\tapkee-master E:\Binaries_MinGW\tapkee E:\Binaries_MinGW\tapkee E:\Binaries_MinGW\tapkee\CMakeFiles\tapkee_cli.dir\DependInfo.cmake --color=
mingw32-make[3]: Leaving directory 'E:/Binaries_MinGW/tapkee'
C:/TDM-GCC-64/bin/mingw32-make -f CMakeFiles\tapkee_cli.dir\build.make CMakeFiles/tapkee_cli.dir/build
mingw32-make[3]: Entering directory 'E:/Binaries_MinGW/tapkee'
"C:\Program Files (x86)\CMake 2.8\bin\cmake.exe" -E cmake_progress_report E:\Binaries_MinGW\tapkee\CMakeFiles 4
[100%] Building CXX object CMakeFiles/tapkee_cli.dir/src/cli/main.cpp.obj
C:\TDM-GCC-64\bin\g++.exe -DTAPKEE_WITH_VIENNACL -fopenmp -Wall -Wextra -pedantic -Wno-long-long -Wshadow -O3 -DNDEBUG @CMakeFiles/tapkee_cli.dir/includes_CXX.rsp -o CMakeFiles\tapkee_cli.dir\src\cli\main.cpp.obj -c E:\Sources\tapkee-master\src\cli\main.cpp
In file included from E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/backend/mem_handle.hpp:28:0,
from E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/backend/memory.hpp:28,
from E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/scalar.hpp:28,
from E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/matrix.hpp:26,
from E:/Sources/tapkee-master/include/tapkee/routines/matrix_operations.hpp:16,
from E:/Sources/tapkee-master/include/tapkee/routines/eigendecomposition.hpp:19,
from E:/Sources/tapkee-master/include/tapkee/routines/locally_linear.hpp:10,
from E:/Sources/tapkee-master/include/tapkee/methods.hpp:18,
from E:/Sources/tapkee-master/include/tapkee/embed.hpp:11,
from E:/Sources/tapkee-master/include/tapkee/tapkee.hpp:10,
from E:\Sources\tapkee-master\src\cli\main.cpp:6:
E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/tools/shared_ptr.hpp: In constructor 'viennacl::tools::detail::count::count(unsigned int)':
E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/tools/shared_ptr.hpp:42:35: warning: declaration of 'val' shadows a member of 'this' [-Wshadow]
count(unsigned int val) : val_(val){ }
^
In file included from E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/matrix.hpp:26:0,
from E:/Sources/tapkee-master/include/tapkee/routines/matrix_operations.hpp:16,
from E:/Sources/tapkee-master/include/tapkee/routines/eigendecomposition.hpp:19,
from E:/Sources/tapkee-master/include/tapkee/routines/locally_linear.hpp:10,
from E:/Sources/tapkee-master/include/tapkee/methods.hpp:18,
from E:/Sources/tapkee-master/include/tapkee/embed.hpp:11,
from E:/Sources/tapkee-master/include/tapkee/tapkee.hpp:10,
from E:\Sources\tapkee-master\src\cli\main.cpp:6:
E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/scalar.hpp: In constructor 'viennacl::scalar_expression<LHS, RHS, OP>::scalar_expression(LHS&, RHS&)':
E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/scalar.hpp:53:49: warning: declaration of 'rhs' shadows a member of 'this' [-Wshadow]
scalar_expression(LHS & lhs, RHS & rhs) : lhs_(lhs), rhs_(rhs) {}
^
E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/scalar.hpp:53:49: warning: declaration of 'lhs' shadows a member of 'this' [-Wshadow]
E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/scalar.hpp: In constructor 'viennacl::scalar_expression<LHS, RHS, viennacl::op_inner_prod>::scalar_expression(LHS&, RHS&)':
E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/scalar.hpp:88:49: warning: declaration of 'rhs' shadows a member of 'this' [-Wshadow]
scalar_expression(LHS & lhs, RHS & rhs) : lhs_(lhs), rhs_(rhs) {}
^
E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/scalar.hpp:88:49: warning: declaration of 'lhs' shadows a member of 'this' [-Wshadow]
E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/scalar.hpp: In constructor 'viennacl::scalar_expression<LHS, RHS, viennacl::op_norm_1>::scalar_expression(LHS&, RHS&)':
E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/scalar.hpp:121:49: warning: declaration of 'rhs' shadows a member of 'this' [-Wshadow]
scalar_expression(LHS & lhs, RHS & rhs) : lhs_(lhs), rhs_(rhs) {}
^
E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/scalar.hpp:121:49: warning: declaration of 'lhs' shadows a member of 'this' [-Wshadow]
E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/scalar.hpp: In constructor 'viennacl::scalar_expression<LHS, RHS, viennacl::op_norm_2>::scalar_expression(LHS&, RHS&)':
E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/scalar.hpp:153:49: warning: declaration of 'rhs' shadows a member of 'this' [-Wshadow]
scalar_expression(LHS & lhs, RHS & rhs) : lhs_(lhs), rhs_(rhs) {}
^
E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/scalar.hpp:153:49: warning: declaration of 'lhs' shadows a member of 'this' [-Wshadow]
E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/scalar.hpp: In constructor 'viennacl::scalar_expression<LHS, RHS, viennacl::op_norm_inf>::scalar_expression(LHS&, RHS&)':
E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/scalar.hpp:186:49: warning: declaration of 'rhs' shadows a member of 'this' [-Wshadow]
scalar_expression(LHS & lhs, RHS & rhs) : lhs_(lhs), rhs_(rhs) {}
^
E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/scalar.hpp:186:49: warning: declaration of 'lhs' shadows a member of 'this' [-Wshadow]
E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/scalar.hpp: In constructor 'viennacl::scalar_expression<LHS, RHS, viennacl::op_norm_frobenius>::scalar_expression(LHS&, RHS&)':
E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/scalar.hpp:218:49: warning: declaration of 'rhs' shadows a member of 'this' [-Wshadow]
scalar_expression(LHS & lhs, RHS & rhs) : lhs_(lhs), rhs_(rhs) {}
^
E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/scalar.hpp:218:49: warning: declaration of 'lhs' shadows a member of 'this' [-Wshadow]
In file included from E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/matrix.hpp:27:0,
from E:/Sources/tapkee-master/include/tapkee/routines/matrix_operations.hpp:16,
from E:/Sources/tapkee-master/include/tapkee/routines/eigendecomposition.hpp:19,
from E:/Sources/tapkee-master/include/tapkee/routines/locally_linear.hpp:10,
from E:/Sources/tapkee-master/include/tapkee/methods.hpp:18,
from E:/Sources/tapkee-master/include/tapkee/embed.hpp:11,
from E:/Sources/tapkee-master/include/tapkee/tapkee.hpp:10,
from E:\Sources\tapkee-master\src\cli\main.cpp:6:
E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/vector.hpp: In constructor 'viennacl::const_vector_iterator<SCALARTYPE, ALIGNMENT>::const_vector_iterator(const viennacl::vector_base&, viennacl::vcl_size_t, viennacl::vcl_size_t, viennacl::vcl_ptrdiff_t)':
E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/vector.hpp:234:55: warning: declaration of 'stride' shadows a member of 'this' [-Wshadow]
vcl_ptrdiff_t stride = 1) : elements_(vec.handle()), index_(index), start_(start), stride_(stride) {}
^
E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/vector.hpp: In constructor 'viennacl::const_vector_iterator<SCALARTYPE, ALIGNMENT>::const_vector_iterator(const handle_type&, viennacl::vcl_size_t, viennacl::vcl_size_t, viennacl::vcl_ptrdiff_t)':
E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/vector.hpp:245:55: warning: declaration of 'stride' shadows a member of 'this' [-Wshadow]
vcl_ptrdiff_t stride = 1) : elements_(elements), index_(index), start_(start), stride_(stride) {}
^
E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/vector.hpp: In constructor 'viennacl::vector_base<SCALARTYPE, SizeType, DistanceType>::vector_base(SCALARTYPE, viennacl::memory_types, viennacl::vector_base<SCALARTYPE, SizeType, DistanceType>::size_type, viennacl::vcl_size_t, viennacl::vector_base<SCALARTYPE, SizeType, DistanceType>::difference_type)':
E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/vector.hpp:408:9: warning: declaration of 'stride' shadows a member of 'this' [-Wshadow]
: size_(vec_size), start_(start), stride_(stride), internal_size_(vec_size)
^
E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/vector.hpp:408:9: warning: declaration of 'start' shadows a member of 'this' [-Wshadow]
E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/vector.hpp: In member function 'void viennacl::vector_base<SCALARTYPE, SizeType, DistanceType>::pad()':
E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/vector.hpp:916:39: warning: declaration of 'pad' shadows a member of 'this' [-Wshadow]
std::vector pad(internal_size() - size());
^
In file included from E:/Sources/tapkee-master/include/tapkee/routines/matrix_operations.hpp:16:0,
from E:/Sources/tapkee-master/include/tapkee/routines/eigendecomposition.hpp:19,
from E:/Sources/tapkee-master/include/tapkee/routines/locally_linear.hpp:10,
from E:/Sources/tapkee-master/include/tapkee/methods.hpp:18,
from E:/Sources/tapkee-master/include/tapkee/embed.hpp:11,
from E:/Sources/tapkee-master/include/tapkee/tapkee.hpp:10,
from E:\Sources\tapkee-master\src\cli\main.cpp:6:
E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/matrix.hpp: In constructor 'viennacl::implicit_matrix_base::implicit_matrix_base(viennacl::implicit_matrix_base::size_type, viennacl::implicit_matrix_base::size_type, std::pair<SCALARTYPE, bool>, bool)':
E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/matrix.hpp:48:108: warning: declaration of 'diag' shadows a member of 'this' [-Wshadow]
implicit_matrix_base(size_type size1, size_type size2, std::pair<SCALARTYPE, bool> value, bool diag) : size1_(size1), size2_(size2), value_(value), diag_(diag){ }
^
E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/matrix.hpp:48:108: warning: declaration of 'value' shadows a member of 'this' [-Wshadow]
E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/matrix.hpp:48:108: warning: declaration of 'size2' shadows a member of 'this' [-Wshadow]
E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/matrix.hpp:48:108: warning: declaration of 'size1' shadows a member of 'this' [-Wshadow]
In file included from E:/Sources/tapkee-master/include/tapkee/routines/matrix_operations.hpp:16:0,
from E:/Sources/tapkee-master/include/tapkee/routines/eigendecomposition.hpp:19,
from E:/Sources/tapkee-master/include/tapkee/routines/locally_linear.hpp:10,
from E:/Sources/tapkee-master/include/tapkee/methods.hpp:18,
from E:/Sources/tapkee-master/include/tapkee/embed.hpp:11,
from E:/Sources/tapkee-master/include/tapkee/tapkee.hpp:10,
from E:\Sources\tapkee-master\src\cli\main.cpp:6:
E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/matrix.hpp: In constructor 'viennacl::matrix_expression<LHS, RHS, OP>::matrix_expression(LHS&, RHS&)':
E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/matrix.hpp:170:47: warning: declaration of 'rhs' shadows a member of 'this' [-Wshadow]
matrix_expression(LHS & lhs, RHS & rhs) : lhs_(lhs), rhs_(rhs) {}
^
E:/ThirdPartyLibraries_MinGW/64Bit/ViennaCL/include/viennacl/matrix.hpp:170:47: warning: declaration of 'lhs' shadows a member of 'this' [-Wshadow]

cc1plus.exe: out of memory allocating 359557 bytes
CMakeFiles\tapkee_cli.dir\build.make:57: recipe for target 'CMakeFiles/tapkee_cli.dir/src/cli/main.cpp.obj' failed
mingw32-make[3]: *** [CMakeFiles/tapkee_cli.dir/src/cli/main.cpp.obj] Error 1
mingw32-make[3]: Leaving directory 'E:/Binaries_MinGW/tapkee'
mingw32-make[2]: *** [CMakeFiles/tapkee_cli.dir/all] Error 2
CMakeFiles\Makefile2:62: recipe for target 'CMakeFiles/tapkee_cli.dir/all' failed
mingw32-make[1]: *** [CMakeFiles/tapkee_cli.dir/rule] Error 2
mingw32-make: *** [tapkee_cli] Error 2
mingw32-make[2]: Leaving directory 'E:/Binaries_MinGW/tapkee'
CMakeFiles\Makefile2:74: recipe for target 'CMakeFiles/tapkee_cli.dir/rule' failed
mingw32-make[1]: Leaving directory 'E:/Binaries_MinGW/tapkee'
Makefile:149: recipe for target 'tapkee_cli' failed
mingw32-make: Leaving directory 'E:/Binaries_MinGW/tapkee'

10:58:34 Build Finished (took 4s.220ms)

MultidimensionalScaling with input type of std::vector<DenseVector>

Instead using vector < IndexType > indices(N) as the input, I use std::vector < DenseVector > for the MultidimensionalScaling method. The code crash all the time. Can you give me a example to
run these type of data? Thanks.

Let tapkee_cli handle standard input and standard output

Thanks a lot of this great library! Wouldn't it be great if tapkee_cli could be invoked like this:

$ < iris-original.csv tapkee_cli --method t-sne > iris-mapped.csv

Right now, /dev/stdin has to be specified explicitly, tapkee_cli does write to /dev/stdout/, and I believe error messages are not on /dev/stderr. Moreover, I think it would be more conform the expectations of users if the input data is such that each line is a data point, and not a feature. Curious to hear your thoughts about this.

In the meantime, I use this Bash script: https://github.com/jeroenjanssens/command-line-tools-for-data-science/blob/master/tools/tapkee, as I discuss Tapkee in Chapter 9 of my upcoming book.