I'm Shamik and I enjoy building solutions to problems, mostly through programming (and occasionally with WD-40). I work as a Lead Data Scientist building machine learning applications for detecting and anonymizing PII and PHI in data breaches. I am also a part-time contributor to the BigScience Workshop, the BigBIO effort and the BigCode Project from 🤗. In addition, I am working with PIISA, a collection of data scientists, software developers and lawyers to establish an open standard for PII protection that can be used across the globe. You can follow our efforts here. I also like to cook 👨🍳
├── Interests
│ ├── Natural Language Processing
│ ├── Explainable Machine Learning
│ ├── AI Ethics
│ ├── System Design
│ └── PII Anonymization
├── Occupations
│ ├── Software Engineer
│ ├── Graduate Research Assistant
│ ├── Lead Data Scientist
│ └── Senior Researcher
├── Locations
│ ├── Kolkata, India
│ ├── Boston, MA, USA
│ ├── Tallahassee, FL, USA
│ └── Leeds, England
└── Book Suggestions
├── Fiction
│ ├── The Three Body Problem - Cixin Liu
│ ├── All the Light we cannot see - Anthony Doerr
│ └── Purple Hibiscus - Chimamanda Ngozi Adichie
├── Non-Fiction
│ ├── Algorithms of Oppression - Safiya Umoji Noble
│ ├── Braiding Sweetgrass - Robin Wall Kimmerer
| ├── Chaos Machine - Max Fisher
| ├── Viral Justice - Ruha Benjamin
│ └── Weapons of Math Destruction - Cathy O. Neill
└── Cookbooks
├── The Food Lab - J. Kenji Lopez-Alt
├── Mi Cocina - Rick Martinez
└── Dessert Person - Claire Saffitz
Publications
- Explaining AI for Malware Detection: Analysis of Mechanisms of MalConv
- PhD Thesis: Towards Explainability in Machine Learning for Malware Detection
- Static Malware Modeling and Detection using Topic Models
- BigBIO: A Framework for Data-Centric Biomedical Natural Language Processing
- The bigscience roots corpus: A 1.6 tb composite multilingual dataset
P.S. The tree was built using Rich