Giter Club home page Giter Club logo

dkm's People

Contributors

eozd avatar genbattle avatar ianfhunter avatar kam3k avatar muecs avatar randuck-dev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

dkm's Issues

converting std::array<T,N> to std::vector<T>

@genbattle, don't you think it would be a better idea if std::vector<T> is used instead of std::array<T,N>. This way one can have a more dynamic design.

I understand std::array is probably a bit faster than std::vector but perhaps not too much.

I have changed std::array<T,N> to std::vector<T> in my code because I needed it post-compilation based on dataset.

Are you aware of any significant speed drops?

invalid probability vector for discrete_distribution

Hello,

Thanks for this useful project. I think I have found a suspiscious behavior, you may be interested:

image

I have this assert raised in Microsoft Visual Studio 2017 when running the following code in debug:

void ReproduceAssert()
{
    std::vector<std::array<double, 1>> data;
    data.push_back(std::array<double,1>({1181.43797999999310377461}));
    data.push_back(std::array<double,1>({35.40983650000003990499}));
    data.push_back(std::array<double,1>({15.00000000000000000000}));
    data.push_back(std::array<double,1>({15.00000000000000000000}));
    data.push_back(std::array<double,1>({264.58983650000004672620}));
    data.push_back(std::array<double,1>({18.56102000000697671567}));
    data.push_back(std::array<double,1>({96.22983649999991939694}));
    data.push_back(std::array<double,1>({15.00000000000000000000}));
    data.push_back(std::array<double,1>({10.34999999999968167685}));
    data.push_back(std::array<double,1>({185.00000000000000000000}));
    data.push_back(std::array<double,1>({10.34999999999968167685}));
    data.push_back(std::array<double,1>({89.65000000000031832315}));
    data.push_back(std::array<double,1>({89.65000000000031832315}));
    data.push_back(std::array<double,1>({185.00000000000000000000}));
    data.push_back(std::array<double,1>({10.34999999999968167685}));
    data.push_back(std::array<double,1>({185.00000000000000000000}));
    data.push_back(std::array<double,1>({89.65000000000031832315}));
    dkm::clustering_parameters<double> params(10);
    params.set_random_seed(0);
    params.set_max_iteration(1000);
    auto result = dkm::kmeans_lloyd<double, 1>(data, params);
}

Utility functions for commonly occurring problems

First of all, I would like to thank you for your work implementing dkm. I have used it in one of my projects and it was a breeze to set up and use.

During the time I was in the process of incorporating dkm into the project, I have written several utility functions that helped me use the library more efficiently. Currently, I have functions to

  • calculate distance of each point to cluster center
  • calculate sum of distances of each point to cluster center
  • return all points that belong to the given cluster
  • calculate the inertia of a given clustering using sum of distances method
  • return the best clustering based on inertia
  • calculate kmeans by running dkm::kmeans_lloyd function a given number of times and return the best clustering

Would you like me to send a pull request with these functions added to dkm.hpp file under a util namespace?

P.S.: All the functions are template functions; so the library would still be header-only.

[Feature Request] Please add dkm::kmeans_lloyd that take std::vector<std::vector<..>>

Hi,
thanks for your great library!
Would it be possible for you to implement an overloaded version of the dkm::kmeans_lloyd method that
accepts a std::vector<std::vector> for the data instead of a std::vector<std::array<>>. I think that would make it much easier to use this implementation for certain code bases that require a dynamic amount of dimension (or at least you don't need to pass that number as a template parameter which makes it more error prone because you need to change it at multiple locations in the code)

Using unsigned fixed width integer types for data point arrays causes compile-time error

Description

Using fixed width unsigned integer types like uint8_t, uint16_t, or uint32_t causes a compile-time error.

Example

For instance, using uint8_t as the array data type:

array<uint8_t, 625> points;

And then invoking:

auto cluster_data = dkm::kmeans_lloyd(data, K);

Will throw the following compile-time error:

dkm/include/dkm.hpp: In instantiation of ‘std::tuple<std::vector<std::array<_Tp, _Nm>, std::allocator<std::array<_Tp, _Nm> > >, std::vector<unsigned int, std::allocator<unsigned int> > > dkm::kmeans_lloyd(const std::vector<std::array<_Tp, _Nm> >&, const dkm::clustering_parameters<T>&) [with T = unsigned char; long unsigned int N = 625]’:
dkm/include/dkm.hpp:315:21:   required from ‘std::tuple<std::vector<std::array<_Tp, _Nm>, std::allocator<std::array<_Tp, _Nm> > >, std::vector<unsigned int, std::allocator<unsigned int> > > dkm::kmeans_lloyd(const std::vector<std::array<_Tp, _Nm> >&, uint32_t, uint64_t, T) [with T = unsigned char; long unsigned int N = 625; uint32_t = unsigned int; uint64_t = long unsigned int]’
src/main.cpp:148:76:   required from here
dkm/include/dkm.hpp:273:2: error: static assertion failed: kmeans_lloyd requires the template parameter T to be a signed arithmetic type (e.g. float, double, int)
  static_assert(std::is_arithmetic<T>::value && std::is_signed<T>::value,
  ^~~~~~~~~~~~~

Implement N-dimensional array class to replace `std::vector<std::array<T, N>>`

Issues (#8 and #11) have been raised about the flexibility of the current std::vector<std::array<T, N>> pattern used for passing data matrices in and out of this library.

A more flexible alternative is to have a dedicated N-dimensional array class (dkm::ndarray<T>) which has convenient constructors that take both the existing std::vector<std::array<T, N>> and std::vector<std::vector<T>>, which has been requested for some use cases. This class would also have methods to explicitly convert back to these data types after running the kmeans algorithm.

Make code citable with a Digital Object Identifiers (DOI)

Description

The source code should be made citable for academic references: https://guides.github.com/activities/citable-code/

Proposed Solution

Apply a DOI:

A DOI, or Digital Object Identifier, is a string of numbers, letters and symbols used to permanently identify an article or document and link to it on the web. A DOI will help your reader easily locate a document from your citation. - source.

See GitHub instructions on "making your code citable": https://guides.github.com/activities/citable-code/

Is all the source code in include?

Hi. it seems that you wrote a templatized version.

I am assuming that all the library source code (for Kmeans) is in include folder and opencv is only required for an example. Further, we don't need openMP if we don't intend to use parallel version. And I am thinking about building it with VS2015 so won't need CMAKE either.

Are my assumptions correct?

static_assert(__which < 0, /* needs to be dependent */

Hi,

I was building your library on a 64bit x86 machine with GCC, no issues. Cross-compiling for a 32bit ARM platform however gives a very confusing error:

In file included from /usr/arm-linux-gnueabihf/include/c++/8/random:49,
                 from /mnt/project_ws/src/pasta/camera/applications/tracking/src/dkm.hpp:9,
                 from /mnt/project_ws/src/pasta/camera/applications/tracking/src/pattern_extractor.cc:8:
/usr/arm-linux-gnueabihf/include/c++/8/bits/random.h: In instantiation of ‘struct std::__detail::_Select_uint_least_t<99, 1>’:
/usr/arm-linux-gnueabihf/include/c++/8/bits/random.h:114:40:   required from ‘struct std::__detail::_Mod<long long unsigned int, 18446744073709551615, 25214903917, 11, false, false>’
/usr/arm-linux-gnueabihf/include/c++/8/bits/random.h:147:48:   required from ‘_Tp std::__detail::__mod(_Tp) [with _Tp = long long unsigned int; _Tp __m = 18446744073709551615; _Tp __a = 25214903917; _Tp __c = 11]’
/usr/arm-linux-gnueabihf/include/c++/8/bits/random.h:325:50:   required from ‘std::linear_congruential_engine<_UIntType, __a, __c, __m>::result_type std::linear_congruential_engine<_UIntType, __a, __c, __m>::operator()() [with _UIntType = long long unsigned int; _UIntType __a = 25214903917; _UIntType __c = 11; _UIntType __m = 18446744073709551615; std::linear_congruential_engine<_UIntType, __a, __c, __m>::result_type = long long unsigned int]’
/usr/arm-linux-gnueabihf/include/c++/8/bits/uniform_int_dist.h:243:31:   required from ‘std::uniform_int_distribution<_IntType>::result_type std::uniform_int_distribution<_IntType>::operator()(_UniformRandomNumberGenerator&, const std::uniform_int_distribution<_IntType>::param_type&) [with _UniformRandomNumberGenerator = std::linear_congruential_engine<long long unsigned int, 25214903917, 11, 18446744073709551615>; _IntType = unsigned int; std::uniform_int_distribution<_IntType>::result_type = unsigned int]’
/usr/arm-linux-gnueabihf/include/c++/8/bits/uniform_int_dist.h:166:51:   required from ‘std::uniform_int_distribution<_IntType>::result_type std::uniform_int_distribution<_IntType>::operator()(_UniformRandomNumberGenerator&) [with _UniformRandomNumberGenerator = std::linear_congruential_engine<long long unsigned int, 25214903917, 11, 18446744073709551615>; _IntType = unsigned int; std::uniform_int_distribution<_IntType>::result_type = unsigned int]’
/mnt/project_ws/src/pasta/camera/applications/tracking/src/dkm.hpp:81:43:   required from ‘std::vector<std::array<_Tp, _Nm> > dkm::details::random_plusplus(const std::vector<std::array<_Tp, _Nm> >&, uint32_t, uint64_t) [with T = float; unsigned int N = 3; uint32_t = unsigned int; uint64_t = long long unsigned int]’
/mnt/project_ws/src/pasta/camera/applications/tracking/src/dkm.hpp:278:65:   required from ‘std::tuple<std::vector<std::array<_Tp, _Nm>, std::allocator<std::array<_Tp, _Nm> > >, std::vector<unsigned int, std::allocator<unsigned int> > > dkm::kmeans_lloyd(const std::vector<std::array<_Tp, _Nm> >&, const dkm::clustering_parameters<T>&) [with T = float; unsigned int N = 3]’
/mnt/project_ws/src/pasta/camera/applications/tracking/src/dkm.hpp:316:22:   required from ‘std::tuple<std::vector<std::array<_Tp, _Nm>, std::allocator<std::array<_Tp, _Nm> > >, std::vector<unsigned int, std::allocator<unsigned int> > > dkm::kmeans_lloyd(const std::vector<std::array<_Tp, _Nm> >&, uint32_t, uint64_t, T) [with T = float; unsigned int N = 3; uint32_t = unsigned int; uint64_t = long long unsigned int]’
/mnt/project_ws/src/pasta/camera/applications/tracking/src/pattern_extractor.cc:47:48:   required from here
/usr/arm-linux-gnueabihf/include/c++/8/bits/random.h:84:24: error: static assertion failed: sorry, would be too much trouble for a slow result
  static_assert(__which < 0, /* needs to be dependent */
                ~~~~~~~~^~~
/usr/arm-linux-gnueabihf/include/c++/8/bits/random.h: In instantiation of ‘struct std::__detail::_Mod<long long unsigned int, 18446744073709551615, 25214903917, 11, false, false>’:
/usr/arm-linux-gnueabihf/include/c++/8/bits/random.h:147:48:   required from ‘_Tp std::__detail::__mod(_Tp) [with _Tp = long long unsigned int; _Tp __m = 18446744073709551615; _Tp __a = 25214903917; _Tp __c = 11]’
/usr/arm-linux-gnueabihf/include/c++/8/bits/random.h:325:50:   required from ‘std::linear_congruential_engine<_UIntType, __a, __c, __m>::result_type std::linear_congruential_engine<_UIntType, __a, __c, __m>::operator()() [with _UIntType = long long unsigned int; _UIntType __a = 25214903917; _UIntType __c = 11; _UIntType __m = 18446744073709551615; std::linear_congruential_engine<_UIntType, __a, __c, __m>::result_type = long long unsigned int]’
/usr/arm-linux-gnueabihf/include/c++/8/bits/uniform_int_dist.h:243:31:   required from ‘std::uniform_int_distribution<_IntType>::result_type std::uniform_int_distribution<_IntType>::operator()(_UniformRandomNumberGenerator&, const std::uniform_int_distribution<_IntType>::param_type&) [with _UniformRandomNumberGenerator = std::linear_congruential_engine<long long unsigned int, 25214903917, 11, 18446744073709551615>; _IntType = unsigned int; std::uniform_int_distribution<_IntType>::result_type = unsigned int]’
/usr/arm-linux-gnueabihf/include/c++/8/bits/uniform_int_dist.h:166:51:   required from ‘std::uniform_int_distribution<_IntType>::result_type std::uniform_int_distribution<_IntType>::operator()(_UniformRandomNumberGenerator&) [with _UniformRandomNumberGenerator = std::linear_congruential_engine<long long unsigned int, 25214903917, 11, 18446744073709551615>; _IntType = unsigned int; std::uniform_int_distribution<_IntType>::result_type = unsigned int]’
/mnt/project_ws/src/pasta/camera/applications/tracking/src/dkm.hpp:81:43:   required from ‘std::vector<std::array<_Tp, _Nm> > dkm::details::random_plusplus(const std::vector<std::array<_Tp, _Nm> >&, uint32_t, uint64_t) [with T = float; unsigned int N = 3; uint32_t = unsigned int; uint64_t = long long unsigned int]’
/mnt/project_ws/src/pasta/camera/applications/tracking/src/dkm.hpp:278:65:   required from ‘std::tuple<std::vector<std::array<_Tp, _Nm>, std::allocator<std::array<_Tp, _Nm> > >, std::vector<unsigned int, std::allocator<unsigned int> > > dkm::kmeans_lloyd(const std::vector<std::array<_Tp, _Nm> >&, const dkm::clustering_parameters<T>&) [with T = float; unsigned int N = 3]’
/mnt/project_ws/src/pasta/camera/applications/tracking/src/dkm.hpp:316:22:   required from ‘std::tuple<std::vector<std::array<_Tp, _Nm>, std::allocator<std::array<_Tp, _Nm> > >, std::vector<unsigned int, std::allocator<unsigned int> > > dkm::kmeans_lloyd(const std::vector<std::array<_Tp, _Nm> >&, uint32_t, uint64_t, T) [with T = float; unsigned int N = 3; uint32_t = unsigned int; uint64_t = long long unsigned int]’
/mnt/project_ws/src/pasta/camera/applications/tracking/src/pattern_extractor.cc:47:48:   required from here
/usr/arm-linux-gnueabihf/include/c++/8/bits/random.h:114:40: error: no type named ‘type’ in ‘struct std::__detail::_Select_uint_least_t<99, 1>’
            + std::__lg(__m) + 2>::type _Tp2;
                                        ^~~~
make[2]: *** [CMakeFiles/tracking.dir/build.make:89: CMakeFiles/tracking.dir/src/pattern_extractor.cc.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:137: CMakeFiles/tracking.dir/all] Error 2
make: *** [Makefile:141: all] Error 2

Changing line 76 in dkm.hpp from

  // std::linear_congruential_engine<uint64_t, 6364136223846793005, 1442695040888963407, UINT64_MAX> rand_engine(seed);

to

  std::linear_congruential_engine<std::uint_fast32_t, 48271, 0, 2147483647> rand_engine(seed);

I can put a pull request if you like, but if you have a particular preference for some other random engine, I can also put that one in.

Building for visual studio 2017 community edition

JFYI:

  1. It won't be able to find opencv out-of-the-box. So you have to put is manually:
    mkdir build
    cd build
    cmake -G"Visual Studio 15 2017 Win64" -DOpenCV_DIR=D:\opencv\build ../
  2. The generated project won't build, you have to load dkm.sln and remove extra compilation options for each project.

readme wrong information for return type of kmeans_lloyd()

in section Usage of readme, the return type of kmeans_lloyd() should be
std::tuple<std::vector<std::array<T, N>>, std::vector<uint32_t>>
not
std::tuple<std::array<T, N>, std::vector<uint32_t>>
which make me confuse at the first look

Can you expand in the readme to give more instructions?

I just found your repo because I am trying to make clusters from x, y coordinates that I extract from frames from ffmpeg and a remote camera. I ran some live data through the example given in the readme and I am not sure what the returned data is exactly. I have virtually no C++ experience, so please excuse me. When accessing std::cout<<std::get<0>(means)<<std::endl; I get an array of 2 arrays that are structured like [[482, 212], [191, 393]]. Are these the centroids of a cluster? Can your code be used to create a bounding box around clusters of coordinates and if yes, can you please show me? Thanks for the code!

Fix CI build

Fix Travis integration or migrate to Github Actions.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.