Giter Club home page Giter Club logo

embeddingdb's Introduction

Embedding Database zenodo

This package provides a database schema and Python wrapper for storing the embeddings generated through various representation learning packages.

Currently, this package focuses on using a SQL database with SQLAlchemy, but might be extended to use a NoSQL database as an alternative.

Installation

Install embeddingdb from PyPI with:

$ pip install embeddingdb

Alternatively, install the latest development version of embeddingdb directly from GitHub with:

$ pip install git+https://github.com/cthoyt/embeddingdb

For developers, install embeddingdb in development mode from GitHub with:

$ git clone https://github.com/cthoyt/embeddingdb.git
$ cd embeddingdb
$ pip install -e .

Set the environment variable EMBEDDINGDB_CONNECTION to a valid SQLAlchemy connection string for a PostgreSQL instance, as this package uses the PostgreSQL-specific ARRAY type.

Command Line Interface

This package installs an entrypoint embeddingdb that can be used directly from the shell.

Uploading Entity Embeddings

Entities can be embedded and stored from various types of representation learning, including network representation learning, knowledge graph embedding, and textual learning.

Upload embeddings generated by word2vec by specifying the file path with:

$ embeddingdb upload --fmt word2vec --path ~/path/to/file.txt

Upload embeddings generated by pykeen by specifying the output directory with:

$ embeddingdb upload --fmt keen --path ~/path/to/directory/

Listing Entity Embeddings

After uploading, the collections can be listed with:

$ embeddingdb ls

Analyzing Entity Embeddings' Correlations

One of the motivations for building this repository was to make a convenient way to compare the embeddings for entities generated through orthogonal embedding tecnhiques. For example, we wanted to know to what extent the embeddings for proteins generated from their sequences with ratvec contained the same information as the embeddings generated from protein-protein interaction networks with pykeen or nrl.

The two positional arguments correspond to the collection identifiers in the database.

$ embeddingdb analyze 1 2

Running with Docker

After installing Docker, the entire web application can be instantiated with:

$ docker-compose up

Get the endpoint /test to instantiate the database and add a test collection.

embeddingdb's People

Contributors

cthoyt avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

aarek-eng

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.