Giter Club home page Giter Club logo

knessettopicclassification's Introduction

💫 About Me

🔭 NLP geek, Computer Science MSc candidate researching NLP (multi-document summarization) at HebrewU & Data Scientist at PayPal
💬 Feel free to ask me about any of my repos, I love getting messages about my work! LinkedIn Twitter
☕ If my code or my notes helped you, you can buy me a coffee if you'd like Ko-Fi

💻 Tech Stack

Python NumPy Pandas Plotly scikit-learn MySQL Postgres C++ Java

👩‍💻 NLP / Data Science Public Projects

Corpify (2023)

A Language model that rephreases spoken English into workplace-appropriate language!

In this project, we introduce the novel NLP task of corpy textual style-transfer, which involves the transformation of casual English text into a style suited for a professional workplace setting. We constructed an original parallel corpus comprising 634 sentences in casual English and their corporate-style paraphrases.

This project includes the dataset itself, the code for fine-tuning the style transfer models, 2 of the best performing fine-tuned models, and code for fine-tuning a style detection model for detecting corpy style in text.

Methods used in this NLP projects: Textual style transfer, text classification.

Python

image

An independant multi-phase NLP project for classifying parlemintary quotes in Hebrew into 8 topics. Also includes the annotated dataset.

In this project, I started with a raw dataset of quotes (in Hebrew) gathered from protocoles of the Knesset (the Israeli parliment). In the first stage of the project, I used unsupervised topic modeling methods in order to cluster quotes by topics. The topic assignment that was created during the first stage were used to prioritize qoutes for manual tagging process - quotes with the highest confidence score were sent to mannual tagging. This process created ~2,700 quotes that were manually tagged into 8 topics (in addition to a "no topic" tag). Then, in the second phase of this project, I trained a supervised classifier to predict quotes topics.

Methods used in this NLP projects: Topic modeling (unsupervised), Topic classification (supervised).

Python

image

AI assistant that helps groups of friends or co-workers find a restaurant to order from together, that best matches the group members' dining preferences.

In this project, we used restaurants menus gathered via Wolt's API and created a smart system that helps groups of friends or co-workers find a single restaurant that matches everyone's needs and preferences (such as vegeterianism, price limits, prefered cuisines etc). We examined several different algorithms (neither are ML-based), all of them provided solutions who were incredibly close to the optimal solution (that could be found by iterating over the entire 30M combinations dataset) in a fraction of the time (up to 11K times faster)!

Methods used in this AI projects: local search, genetic algorithms.

Python

image

📄 Detailed NLP Notes and Resources (Hebrew)

I have collected all of the detailed notes I wrote during my studies at HebrewU as well as courses I studied independently, and published them as a part of my goal to make Data Science & NLP topics more accessible to Hebrew speakers.

This colelction contains detailed notes in Hebrew on subjects such as Math (Calculus, Linear Algebra, Probability, Discrete Math), foundations of Computer Science (Data Structures, Algorithms, Complexity), as well as advanced Data Science (Machine Learning, NLP).

This includes my recent detailed notes (90 pages) for Stanford's CS224N (NLP with DL) course, that gained more than 1K likes across Israeli DS & ML communities, and featured in MDLI newsletter as "If you need to read only one post this week, make it this one".

Recently I decided to share my private Notion hub where I organize all of my NLP knowledge (mostly in Hebrew). This hub is meant for my personal use, but since many people found it useful I decided to share it. It conatains some notes by topic (such as NLP tasks, architectures or uses) that has significant ovarlaps with my CS224N notes, as well as noted I wrote for a few dozens NLP papers I have read in the past year for my studies and my reasearch.

image

If you want to use this resource, it is highly recommended to download Notion Enhancer and enable it's right-to-left feature, since currently Notion doed not support RTL.

I shared my simple-but-useful system for queueing and reviewing papers I read (or plan to read). This Notion template is free to use, and also contains tips on how to personalize it to work for your needs. image

knessettopicclassification's People

Contributors

nitzanbarzilay avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

knessettopicclassification's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.