Giter Club home page Giter Club logo

Comments (4)

lauri-codes avatar lauri-codes commented on May 28, 2024

Hi!

The basic tutorial should cover this to some extent. If you look at the first step in that tutorial, it shows you that you need to transform your structures into ASE.Atoms. In your case, you just need to call the read function from ASE on e.g. your sdf-file. If your sdf file contains multiple molecules, the ase.io.read function can return a list of all of them (by using the argument index=":") You can find more details about the I/O capabilities of ASE at their homepage. Perhaps I could add a separate tutorial for working with larger datasets.

Hope this helps!

from dscribe.

dangthatsright avatar dangthatsright commented on May 28, 2024

Hi, thank I think part of the reason was I never heard of ASE before. Passing in index=":" still gives only the first molecule. Looking at ase.io.sdf.read_sdf https://wiki.fysik.dtu.dk/ase/_modules/ase/io/sdf.html#read_sdf it seems like they only read the first molecule as well. I could get around this but I was wondering what do you do for larger datasets. Are you really storing hundreds of thousands of files somewhere?

from dscribe.

lauri-codes avatar lauri-codes commented on May 28, 2024

Hi,

You're right: ASE seems to actually have a very limited support for sdf files. If you want to stick to sdf files, you will have to read them with some other software and then construct ase.Atoms from them. This should be easy, if you have problems let me know.

As for using multiple configurations in general: It definitely makes sense to store large datasets in a single file. In the ASE.io page, you can see a table of the file formats that support reading and writing multiple configurations. I have typically used the extended xyz-format (.extxyz).

Just out of curiosity, what kind of data structure do you use to work with atomistic structures? Are you using some other library or something custom? So far ASE has been the easiest to work with, so we went with it. But if there are better and more widely accepted alternatives then we can reconsider.

from dscribe.

dangthatsright avatar dangthatsright commented on May 28, 2024

I primarily work with protein / ligand interactions and proteins are usually in pdb format, whereas ligands are usually sdfs. I have not seen ASE or xyz be used in many places. The main library that we use is RDKit, so a direct way to go from RDKit to ASE would be good, but RDKit to your featurizations would be best.

Right now I am doing RDKit to sdf to ASE which doesn't seem great, but at least it works :P

from dscribe.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.