Giter Club home page Giter Club logo

spec2vec_gnps_data_analysis's People

Contributors

florian-huber avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

spec2vec_gnps_data_analysis's Issues

Undefined MS_library.list_similars_ctr_idx, MS_library.list_similars_ctr and M_sim_mol in iomega-10-spectra-networking.ipynb

Hi,

I am trying to run iomega-10-spectra-networking.ipynb but I am getting error that these MS_library is not defined and also corresponding variables seems to be undefined (MS_library.list_similars_ctr_idx, MS_library.list_similars_ctr) at cell 14. Also M_sim_mol is not defined before use cell 16. Please see the attached image (cell number can be different in the image, they are referenced from the following link: https://github.com/iomega/spec2vec_gnps_data_analysis/blob/master/notebooks/iomega-10-spectra-networking.ipynb

image

Thanks.

similarities_visualisation

Dear Florian,

When using a pre-trained model (spec2vec) and for the visualization part (figure 2. In depth comparison ...) I am basically following the steps below as agreed (I am using two mgf files (one from GNPS as a reference and one query doc from my data):

from matchms.Scores import Scores
import spec2vec
import os
import sys
import time
from matchms.filtering import add_losses
from matchms.filtering import add_parent_mass
from matchms.filtering import default_filters
from matchms.filtering import normalize_intensities
from matchms.filtering import reduce_to_number_of_peaks
from matchms.filtering import require_minimum_number_of_peaks
from matchms.filtering import select_by_mz
from matchms.importing import load_from_mgf #mgf to mzML
from matchms.importing import load_adducts
from spec2vec import SpectrumDocument
from spec2vec.model_building import train_new_word2vec_model

def apply_my_filters(s):
    s = normalize_intensities(s)
    s = default_filters(s)
    s = add_parent_mass(s)
    s = reduce_to_number_of_peaks(s, n_required=10, ratio_desired= None)
    s = select_by_mz(s, mz_from=0, mz_to=1000)
    s = add_losses(s, loss_mz_from=10.0, loss_mz_to=200.0)
    s = require_minimum_number_of_peaks(s, n_required=10)
    return s

spectrums = [apply_my_filters(s) for s in load_from_mgf("referenceXXX.mgf")] 
spectrums = [s for s in spectrums if s is not None]
reference_documents = [SpectrumDocument(s, n_decimals=2) for s in spectrums]
model_file = "spec2vec.model"

import gensim   
from matchms import calculate_scores
from spec2vec import Spec2Vec
import numpy as np  

query_spectrums = [apply_my_filters(s) for s in load_from_mgf("queryXXX.mgf")]
query_spectrums = [s for s in query_spectrums if s is not None]
query_documents = [SpectrumDocument(s, n_decimals=2) for s in query_spectrums]
model_file = "spec2vec.model"
model = gensim.models.Word2Vec.load(model_file)

spec2vec = Spec2Vec(model=model, intensity_weighting_power=0.5,
                    allowed_missing_percentage=5.0)
scores = list(calculate_scores(reference_documents, query_documents, spec2vec))
filtered = [(reference, query, score) for (reference, query, score) in scores if reference != query]

sorted_by_score = sorted(filtered, key=lambda elem: elem[2], reverse=True)
similarity_matrix= spec2vec.matrix(reference_documents, query_documents, is_symmetric=True)
filename = 'similarities_spec2vec_germicidins.npy'
np.save(filename, similarity_matrix)

# But I am getting a matrix dimension (1, 2995) which cannot work with the spectra comparison unfortunately (12787, 12797) (1, 2995) when using your directions from the iomega-in-depths-spectrum-comparions.ipynb:

from plotting_functions import plot_spectra_comparison

filename = 'similarities_daylight2048_jaccard.npy'
matrix_similarities_fingerprint_daylight = np.load(filename)
filename = 'similarities_cosine_tol0005_200708.npy'
matrix_similarities_cosine = np.load(filename)

filename = 'similarities_cosine_tol0005_200708_matches.npy'
matrix_matches_cosine = np.load(filename)

print("Matrix dimension", matrix_matches_cosine.shape)

matrix_similarities_cosine[matrix_matches_cosine < 6] = 0
filename = 'similarities_mod_cosine_tol0005_200727.npy'
matrix_similarities_mod_cosine = np.load(filename)

filename = 'similarities_mod_cosine_tol0005_200727_matches.npy'
matrix_matches_mod_cosine = np.load(filename)
matrix_similarities_mod_cosine[matrix_matches_mod_cosine < 10] = 0

print("Load spec2vec similarities")

filename = 'similarities_spec2vec_germicidins.npy'
matrix_similarities_spec2vec = np.load(filename)
print("Matrix dimension", matrix_similarities_spec2vec.shape)

pair_selection = np.where((matrix_similarities_cosine < 0.4)
                          & (matrix_similarities_mod_cosine < 0.4)
                          & (matrix_similarities_mod_cosine > 0)
                & (matrix_similarities_spec2vec > 0.8) 
                & (matrix_similarities_spec2vec < 0.98) 
                & (matrix_similarities_fingerprint_daylight > 0.8))

print("Found ", pair_selection[0].shape, " matching spectral pairs.")

possible_grid_points = np.arange(0, 2000, 50)
grid_points = possible_grid_points[(possible_grid_points > 370) & (possible_grid_points < 980)]
grid_points

ID1 = 1276 #pair_selection[0][pick]
ID2 = 1277 #pair_selection[1][pick]
print(ID1, ID2)
print(spectrums_postprocessed[ID1].get("spectrumid"), spectrums_postprocessed[ID2].get("spectrumid"))
print("Spec2Vec score: {:.4}".format(matrix_similarities_spec2vec[ID1, ID2]))
print("Cosine score: {:.4}".format(matrix_similarities_cosine[ID1, ID2]))
print("Modified cosine score: {:.4}".format(matrix_similarities_mod_cosine[ID1, ID2]))
print("Molecular similarity: {:.4}".format(matrix_similarities_fingerprint_daylight[ID1, ID2]))

csim = plot_spectra_comparison(spectrums_postprocessed[ID1], spectrums_postprocessed[ID2],
                                model,
                                intensity_weighting_power=0.5,
                                num_decimals=2,
                                min_mz=300,
                                max_mz=1000,
                                intensity_threshold=0.05,
                                method="cosine",#"modcos", #
                                tolerance=0.005,
                                wordsim_cutoff=0.05,
                                circle_size=5,
                                circle_scaling='wordsim',
                                padding=30,
                                display_molecules=True,
                                figsize=(12, 12),
                                filename="example_1276_1277_new.pdf")#None)#

The error turns out to be: pair_selection = np.where((matrix_similarities_cosine < 0.4)

ValueError: operands could not be broadcast together with shapes (12797,12797) (1,2995)

Any unnecessary steps that I might be doing here? Or any thoughts? I will try using different files, but should I try perhaps a different pretrained model?

AllPositive model cannot be loaded

Hei,

while trying to load the 'AllPositive' model using:

import gensim

model_fn = "data/spec2vec_models/spec2vec_AllPositive_ratio05_filtered_iter_15.model"  
model = gensim.models.Word2Vec.load(model_fn)

I get the following import error:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-20-8fd5ace11800> in <module>
      3 # model_fn = "data/spec2vec_models/spec2vec_UniqueInchikeys_ratio05_filtered_iter_50.model"
      4 model_fn = "data/spec2vec_models/spec2vec_AllPositive_ratio05_filtered_iter_15.model"  # Cannot be loaded
----> 5 model = gensim.models.Word2Vec.load(model_fn)

/path/to/venv/lib/python3.8/site-packages/gensim/models/word2vec.py in load(cls, *args, **kwargs)
   1139         """
   1140         try:
-> 1141             model = super(Word2Vec, cls).load(*args, **kwargs)
   1142 
   1143             # for backward compatibility for `max_final_vocab` feature

/path/to/venv/lib/python3.8/site-packages/gensim/models/base_any2vec.py in load(cls, *args, **kwargs)
   1228 
   1229         """
-> 1230         model = super(BaseWordEmbeddingsModel, cls).load(*args, **kwargs)
   1231         if not hasattr(model, 'ns_exponent'):
   1232             model.ns_exponent = 0.75

/path/to/venv/lib/python3.8/site-packages/gensim/models/base_any2vec.py in load(cls, fname_or_handle, **kwargs)
    600 
    601         """
--> 602         return super(BaseAny2VecModel, cls).load(fname_or_handle, **kwargs)
    603 
    604     def save(self, fname_or_handle, **kwargs):

/path/to/venv/lib/python3.8/site-packages/gensim/utils.py in load(cls, fname, mmap)
    433         compress, subname = SaveLoad._adapt_by_suffix(fname)
    434 
--> 435         obj = unpickle(fname)
    436         obj._load_specials(fname, mmap, compress, subname)
    437         logger.info("loaded %s", fname)

/path/to/venv/lib/python3.8/site-packages/gensim/utils.py in unpickle(fname)
   1396         # Because of loading from S3 load can't be used (missing readline in smart_open)
   1397         if sys.version_info > (3, 0):
-> 1398             return _pickle.load(f, encoding='latin1')
   1399         else:
   1400             return _pickle.loads(f.read())

ModuleNotFoundError: No module named 'custom_functions'

When I add the "custom_functions" directory to the Python path:

import gensim

import sys
sys.path.append("/path/to/spec2vec_gnps_data_analysis")

model_fn = "data/spec2vec_models/spec2vec_AllPositive_ratio05_filtered_iter_15.model" 
model = gensim.models.Word2Vec.load(model_fn)

The import error gets "more specific":

...
/path/to/venv/lib/python3.8/site-packages/gensim/utils.py in unpickle(fname)
   1396         # Because of loading from S3 load can't be used (missing readline in smart_open)
   1397         if sys.version_info > (3, 0):
-> 1398             return _pickle.load(f, encoding='latin1')
   1399         else:
   1400             return _pickle.loads(f.read())

ModuleNotFoundError: No module named 'custom_functions.utils_spec2vec'

I could not find anyware a file / module called "utils_spec2vec". Nevertheless, loading the "UniqueInchikey" model works just fine. Can it be, that there happend a mistake meanwhile pickling the larger model?

Best regards,

Eric

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.