Giter Club home page Giter Club logo

wmd's People

Contributors

mkusner avatar renaud avatar taineleau avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

wmd's Issues

no moudle named multiarray

I have installed numpy and in the shell,when I type "import numpy.core.multiarray",it's ok.
I don't know why this problem appear?

luoyj@luoyj-Lenovo-M490:~/wmd-master$ python wmd.py twitter_vec.pk twitter_wmd_d.pk
Traceback (most recent call last):
File "wmd.py", line 11, in
[X, BOW_X, y, C, words] = pickle.load(f)
File "/usr/lib/python2.7/pickle.py", line 1378, in load
return Unpickler(file).load()
File "/usr/lib/python2.7/pickle.py", line 858, in load
dispatchkey
File "/usr/lib/python2.7/pickle.py", line 1090, in load_global
klass = self.find_class(module, name)
File "/usr/lib/python2.7/pickle.py", line 1124, in find_class
import(module)
ImportError: No module named multiarray

Makefile:39: recipe for target '_emd.so' failed

Interesting topic and paper. I tried to compile the makefile on Ubuntu (15.04) using Python 2.7, including all of the required libraries, but there is an error that I could not solve it, here is the output of running make:
wmd- error page

I would be thankful if you can help me to solve this. Thanks.

Makefile:51: recipe for target 'emd_wrap.c' failed

# git clone https://github.com/mkusner/wmd.git
Cloning into 'wmd'...
remote: Counting objects: 41, done.
remote: Total 41 (delta 0), reused 1 (delta 0), pack-reused 40
Unpacking objects: 100% (41/41), done.
Checking connectivity... done.
# cd wmd/
# pip install gensim numpy scipy
# cd python-emd-master/
# make
>>> Building object file 'emd.o'.
    cc -o emd.o -c emd.c -fPIC -I/usr/include/python2.7 -I/usr/include/x86_64-linux-gnu/python2.7 
In file included from emd.c:20:0:
emd.h:22:0: warning: "INFINITY" redefined
 #define INFINITY       1e20
 ^
In file included from /usr/include/math.h:41:0,
                 from emd.c:18:
/usr/include/x86_64-linux-gnu/bits/inf.h:26:0: note: this is the location of the previous definition
 # define INFINITY (__builtin_inff())
 ^
In file included from emd.c:20:0:
emd.h:32:20: warning: extra tokens at end of #include directive
 #include "Python.h";
                    ^

>>> Generating C interface
swig -python emd.i
make: swig: Command not found
Makefile:51: recipe for target 'emd_wrap.c' failed
make: *** [emd_wrap.c] Error 127
rm emd.o

Current wmd implementation does not match GenSim

It is not really an issue, but compatibility with GenSim library.

Using the first twitter corpus texts, i.e.

now all apple has to do is get swype on the iphone and it will be crack iphone that is

and

apple will be adding more carrier support to the iphone 4s just announced,

I get 0.99 distance using GenSim wmd implementation and 2.6625 using this implementation (original and from the paper's author).

At first sight, I thought that it was related to your stop words list. That said, debugging your code I see that the first and second texts become:

apple swype iphone iphone crack
apple adding carrier support iphone 4s announced

However, running with the words above, I still get a completely different result. Using GenSim and filtering your stop words (as above) I get 0.96 wmd.

Is there any place where this compatibility is discussed?
Could anybody please confirm if the same numbers are returned for different implementations?

This highly impacts the effectiveness of using GenSim implementation to find semantically close texts.

memory leak

in emd.i


%typemap(freearg) signature_t * {
    if ($1 != NULL) {
        PyObject **features_array = (PyObject **) $1->Features;
        int weights_count = (int)$1->n;
        int i = 0;
        for (i = 0; i < weights_count; ++i) {
            Py_XDECREF(features_array[i]);
        }
        free((PyObject **) $1->Features);
        free((float *) $1->Weights);
        free((signature_t *) $1);
    }
}

the meaning of row and column in distance matrix(WMD_D)

Dear sir,
I feel sorry to trouble you :
After I run wmd.py, it may get a distance matrix between all documents.
But I am puzzled about the row and column of this distance matrix:
Is every row of distance matrix representing each document? that is the document vector?
Is every column of distance matrix representing the same word of each document?

Thank you so much!

Paralleling processing

Wow, great paper! Thank you for making the code OSS.

The documentation says that the Python wrapper is not suitable for parallel execution:

The wrapper is not suited for concurrent execution. It uses a global variable for the distance callback function, so calling emd from concurrent threads will result in undefined behavior.

However, the function get_wmd calls emd concurrently. Can you please explain?

installation issues

swig is required, but not mentioned.

in emd.h, the include of Python.h has a ; that should be removed

problem with WCD

@mkusner I read your paper and want to use your WCD+RWMD method to calculate docs similarity in my doc recommendation project. I found the code for RWMD in matlab, but didn't find the code for WCD. Is it the file named distance.m?

install error=> ld: unknown option: -shared

mac os high sierra 10.13
install error

Building object file 'emd.o'.
-n
cc -o emd.o -c emd.c -fPIC -I/usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framework/Versions/2.7/include/python2.7 -I/usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framework/Versions/2.7/include/python2.7
In file included from emd.c:20:
./emd.h:22:9: warning: 'INFINITY' macro redefined [-Wmacro-redefined]
#define INFINITY 1e20
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.13.sdk/usr/include/math.h:68:9: note: previous definition is
here
#define INFINITY HUGE_VALF
^
1 warning generated.

Generating C interface
swig -python emd.i

Building object file 'emd_wrap.o'.
-n
cc -o emd_wrap.o -c emd_wrap.c -fPIC -I/usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framework/Versions/2.7/include/python2.7 -I/usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framework/Versions/2.7/include/python2.7
In file included from emd_wrap.c:3020:
./emd.h:22:9: warning: 'INFINITY' macro redefined [-Wmacro-redefined]
#define INFINITY 1e20
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.13.sdk/usr/include/math.h:68:9: note: previous definition is
here
#define INFINITY HUGE_VALF
^
1 warning generated.

Linking wrapper library '_emd.so'.
-n
ld -shared -o _emd.so emd.o emd_wrap.o
ld: unknown option: -shared
make: *** [_emd.so] Error 1
rm emd_wrap.o emd.o emd_wrap.c

Deadlock in Multiprocessing

Thank you for your implementation of your paper.

First, I tried with your code and data. It worked well. (all_twitter_by_line.txt)

Second, I tried 20newsgroup data which was in your paperwork.

Then, I got

"emd: Maximum number of iterations has been reached 1013"

error because of limitation, MAX_SIG_SIZE 100.

So, I change it to over maximum size of unique keywords in 20newsgroup dataset( =5284).

Now, I have trouble with blocking after some steps.

I think it's because of multiprocessing.

I check CPU availability, it was 99% in multiCPU, multicore environment.

Is there any solution for this?

i want to use WMD to train chinese data,there's some errors ,plz help me!

root@user-virtual-machine:/home/user/WMD# python wmd.py asd.pk asdwmd.pk
[pool :] <multiprocessing.pool.Pool object at 0x7f327f1cc150>
0 out of 3
1 out of 3
emd: Signature size is limited to 100
2 out of 3
emd: Signature size is limited to 100


stop.txt and training data all use in chinese. how can i solve this problem???

Obtaining flow information through python interfance

Hello,

Thank you for the great work and nice implementation. It really helps me!
I know that i can obtain distances through emd( (X[i], BOW_X[i]), (X[j], BOW_X[j]), distance). But how can I get the flow information (transportation matrix)? I have no idea of getting it through python interface.

Zhe Zhao

How to work with result (.pk) file

Hi,

first of all thank you for the great work and nice implementation!

The tool works fine for me and I will use it for document comparison in the socal media context. Can you please give me some advise how to work with the resulting "...wmd_d.pk" file? First I thought the result would be a textfile with a readable matrix in it but now I think I need any additional software?

Thank you very much!

installation issues solved

i had installation issues similar to before-mentioned ones.

running
sudo apt-get install python-dev # for python2.x installs
or
sudo apt-get install python3-dev # for python3.x installs
and
removing ";" from include Python.h; in emd.h

solved the problems

technology independent output file

It would be very nice if the output distance matrix file were independent of python formats. So we use it in another languages as well.

ImportError: No module named _emd

Why is that I keep getting "ImportError: No module named _emd" error from emd.py? I use python 2.7.

May I ask what is '_emd' ? I assume it's not the same as pyemd?

Thanks in advance for your time!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.