Giter Club home page Giter Club logo

ohdsi-phenotyping's Introduction

Hello there 👋, I am Juan M. Banda

Header

                 

Hello, my name is Juan M. Banda and I am currently an assistant professor of computer science at Georgia State University. In my research lab, Panacea Lab, we aim to build machine learning, computer vision, and NLP methods that help to generate insights from multi-modal large-scale data sources. With applications to precision medicine, medical informatics, astroinformatics and other domains, our work addresses domain-specific problems with data science methods and practices. As an engineer at heart and practice for the last 20 years, I have used Python, Bash, ontologies, and NLP tools to build pipelines to annotate over 68 million clinical notes. I have built custom ETLs to map over 8 million patient electronic health records, from 4 institutions, to common data models (OMOP) for large scale analytics and machine learning purposes. I have designed pipelines, databases, and processes to build research infrastructure for my current and previous labs. I have used R, SQL, Matlab, Perl, Java, Javascript, and other languages to acquire, clean and operationalize data from multiple sources. I have mined over 9 billion Tweets for NLP tasks to gain insights from them. In my earlier days, I built content-based image retrieval systems for NASA’s SDO mission, with capacity to process and index over 40,000 images daily, and provide computer vision-aided similarity search for images. I started my engineering days designing and developing point-of-sale systems written in Visual Basic. Apart from my technical skills, I have strong communication and writing skills (over 50 refereed publications) and management skills (I have managed over 40 employees and 20 students). With the desire of improving patient outcomes, medical care and building things that change people’s lives, I am committed to releasing all my work via open-source licenses following the FAIR data sharing principles.

✈️ Yes, that is me in the middle of the picture at the ruins of Abu Simbel. I am an avid traveler and have visited over 100 countries during my travels 🌎.

🛠️ Tools and Technologies

Operating Systems: Windows Centos Ubuntu Macos

Programming Languages: R Python PHP JavaScript Java Matlab C++ HTML5 CSS3 Shell

Databases: PostgreSQL MySQL SQLServer Oracle MongoDB

Cloud Environments: Amazon AWS Microsoft Azure Google Cloud

Other Tools: Tensorflow Pandas WEKA NLTK Spacy numpy ElasticSearch VIM Git GitHub jupyter colab mapreduce spark solr

Currently Learning:

Heroku

📊 Github Statistics

Juan's github stats Top Langs

Lab-related projects

Project 🚧 Stars ⭐ Forks 🍴 Issues ❌ Pull Requests 🌿
Covid-19 Twitter dataset GitHub stars GitHub Forks GitHub Issues GitHub PRs
Social Media Mining Toolkit GitHub stars GitHub Forks GitHub Issues GitHub PRs
APHRODITE GitHub stars GitHub Forks GitHub Issues GitHub PRs

ohdsi-phenotyping's People

Contributors

jmbanda avatar

Watchers

 avatar

Forkers

alisoncossette

ohdsi-phenotyping's Issues

Cases/Controls query code redundant in getPatientData.R?

If I understand things correctly, lines 94-267 and 271-419 differ only in the patient subset to which they are being applied (cases vs. controls). Perhaps merge (put all patients into one list and iterate over that, and create a separate list with case/control labels) or turn into function. Since you're appending the features anyway, may as well just combine, I think...?

I'm pretty sure you're already doing this, but I thought I'd add it just in case. :)

Maybe add a file or routine for cleanup of cached data?

I'm a big fan of saving-out-to-file all along the way. In the real world, users may want to change or redo some step of the workflow without rerunning the entire thing (especially the DB queries). For example, retrain the classifier (or try a new classifier on the same features) without having to re-extract the cases, features, etc.

That said, perhaps you might add a simple routine or script that is guaranteed to clean up / delete all of the cached data once the user is ready to do so.

Minor feedback on config.R

Line 3: modiy -> modify
Line 3: What does "do not modify when building an end-to-end phenotype" mean? When should the user NOT modify this file? Seems like you'd always have to modify it, right?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.