Giter Club home page Giter Club logo

Hi there 👋

I'm Shamik and I enjoy building solutions to problems, mostly through programming (and occasionally with WD-40). I work as a Lead Data Scientist building machine learning applications for detecting and anonymizing PII and PHI in data breaches. I am also a part-time contributor to the BigScience Workshop, the BigBIO effort and the BigCode Project from 🤗. In addition, I am working with PIISA, a collection of data scientists, software developers and lawyers to establish an open standard for PII protection that can be used across the globe. You can follow our efforts here. I also like to cook 👨‍🍳

├── Interests
│   ├── Natural Language Processing
│   ├── Explainable Machine Learning
│   ├── AI Ethics
│   ├── System Design
│   └── PII Anonymization
├── Occupations
│   ├── Software Engineer
│   ├── Graduate Research Assistant
│   ├── Lead Data Scientist
│   └── Senior Researcher
├── Locations
│   ├── Kolkata, India
│   ├── Boston, MA, USA
│   ├── Tallahassee, FL, USA
│   └── Leeds, England
└── Book Suggestions
    ├── Fiction
    │   ├── The Three Body Problem - Cixin Liu
    │   ├── All the Light we cannot see - Anthony Doerr
    │   └── Purple Hibiscus - Chimamanda Ngozi Adichie
    ├── Non-Fiction
    │   ├── Algorithms of Oppression - Safiya Umoji Noble
    │   ├── Braiding Sweetgrass - Robin Wall Kimmerer
    |   ├── Chaos Machine - Max Fisher
    |   ├── Viral Justice - Ruha Benjamin
    │   └── Weapons of Math Destruction - Cathy O. Neill
    └── Cookbooks
        ├── The Food Lab - J. Kenji Lopez-Alt
        ├── Mi Cocina - Rick Martinez
        └── Dessert Person - Claire Saffitz
Projects
  1. Scientific Title Generator
  2. BigBIO dataloaders
  3. MIT 6.006 Solution Notebooks
Publications
  1. Explaining AI for Malware Detection: Analysis of Mechanisms of MalConv
  2. PhD Thesis: Towards Explainability in Machine Learning for Malware Detection
  3. Static Malware Modeling and Detection using Topic Models
  4. BigBIO: A Framework for Data-Centric Biomedical Natural Language Processing
  5. The bigscience roots corpus: A 1.6 tb composite multilingual dataset

P.S. The tree was built using Rich

Shamik Bose's Projects

biglam icon biglam

Dataset loaders for the BigLAM hackathon

biomedical icon biomedical

Tools for curating biomedical training data for large-scale language modeling

data-science-cheatsheet icon data-science-cheatsheet

A helpful 4-page data science cheatsheet to assist with exam reviews, interview prep, and anything in-between.

failed-ml icon failed-ml

Compilation of high-profile real-world examples of failed machine learning projects

healthdata_eda icon healthdata_eda

Using CD data to build visualizations of healthcare situations in USA

leetcode_top icon leetcode_top

A repo showcasing solutions to the top interview Questions on Leetcode

mit6.006 icon mit6.006

This contains solutions to problems discussed in the lectures for the "Intro to Algorithms" course. Video playlist for the course is available here: https://www.youtube.com/playlist?list=PLUl4u3cNGP61Oq3tWYp6V_F-5jb5L2iHb

ml_prep icon ml_prep

Machine Learning interview prep guide

pii-lib icon pii-lib

Code for PII detection and redaction in code datasets

pytorch icon pytorch

A collection of Python notebooks with an intro to pytorch

stanford_dl_ex icon stanford_dl_ex

Programming exercises for the Stanford Unsupervised Feature Learning and Deep Learning Tutorial

trankit icon trankit

Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing

wandb_course icon wandb_course

A repository for the materials from the Weights and Biases "Effective MLOps" course

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.