Giter Club home page Giter Club logo

asifkhan2017's Projects

bert icon bert

TensorFlow code and pre-trained models for BERT

biolink-model icon biolink-model

Schema and generated objects for biolink data model and upper ontology

capstone-project icon capstone-project

Build user communities from their posted content on Twitter using clustering and topic detection methods.

charm icon charm

Parsing and testing Juju charms

dgp icon dgp

Rethinking Knowledge Graph Propagation for Zero-Shot Learning, in CVPR 2019

elecbert icon elecbert

Improving Sentiment Analysis in Election-Based Conversations on Twitter with ElecBERT Language Model

generativessl icon generativessl

Deep generative model for labels for semi-supervised learning

jwescoder-corpusoptima_operationalcode icon jwescoder-corpusoptima_operationalcode

Contains fully operational working code for creating large, biomedically pertinent semantic spaces based on "clean corpora" scraped from public databases, in particular abstracts from the National Library of Medicine's Pubmed biomedical literature database. The resulting semantic space has multiple uses esp. training Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) models in text-mining modules and packages such as Gensim and scikit-learn. The CorpusOptima system allows for comprehensive, systematic, and efficient scraping of exclusively abstract text -- without XML header, author/location data, or other ancillary text -- from all abstracts in a given chronological stretch (month-by-month in the primary version), conveniently organized to store all the scraped text from a given month or year of Pubmed abstracts in a local file or database entry. For incorporation into Gensim, scikit-learn or other modules implementing LSA and other document comparison tools, the "corpora" (semantic spaces) are available in two forms, each corresponding to a saved simplejson text file in the primary version of the code: 1. a large list of comma-separated strings with each string -- representing a separate document -- corresponding to text from a single scraped abstract (akin to the "documents" variable in Radim Rehurek's first Gensim tutorial -- https://radimrehurek.com/gensim/tut1.html ) 2. a nested list with the external list containing numerous lists of single-word tokens, each internal list representing the stemmed, lowercased, depunctuated, stopworded tokenization of each abstract, and thus with each internal list again corresponding to a distinct document (akin to the "texts" variable in Radim Rehurek's first Gensim tutorial -- https://radimrehurek.com/gensim/tut1.html ). The basic code module first uploaded maxes out at 100,000 abstracts per scrape (the maximum allowed under the NLM eutils API); a looped variant allows for scraping > 100,000 abstracts up to the total catalogued for a given month or year. I have used this code to scrape NLM Pubmed abstracts on a month-by-month and year-by-year basis dating back to 1911, one of the first years when abstracts in general were systematically catalogued for biomedical publications, with the full semantic space (complete corpus) of biomedical abstracts -- housing tens of millions of documents in total -- being stored in public Dropbox and Google Drive directories (each file corresponding to one month's or one year’s worth of saved corpora) as well as a database under construction.

ncbi_bert icon ncbi_bert

NCBI BERT, pre-trained on PubMed abstracts and clinical notes (MIMIC-III).

online-social-network-analysis icon online-social-network-analysis

Empirical Analysis of Predictive Algorithms for Collaborative Filtering, constructing a Social Network using Twitter Data, Community Detection and Link Prediction using Facebook ‘Like’ Data, Categorizing Movie Reviews based on Sentiment Analysis, Content-based Recommendation Algorithm using Python, Pandas, Numpy and scikit-learn.

pytorch-biggraph icon pytorch-biggraph

Software used for generating embeddings from large-scale graph-structured data.

scrapper icon scrapper

Facebook, Blog, Twitter, and Instagram Scrapper

socialnetworkanalysis icon socialnetworkanalysis

Empirical Analysis of Predictive Algorithms for Collaborative Filtering, constructing a Social Network using Twitter Data, Community Detection and Link Prediction using Facebook ‘Like’ Data, Categorizing Movie Reviews based on Sentiment Analysis, Content-based Recommendation Algorithm using Python, Pandas, Numpy and scikit-learn.

tridnr icon tridnr

Tri-Party Deep Network Representation

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.