balaka-18 / rake_new2 Goto Github PK
View Code? Open in Web Editor NEWA Python library that enables smooth keyword extraction from any text using the RAKE(Rapid Automatic Keyword Extraction) algorithm.
License: MIT License
A Python library that enables smooth keyword extraction from any text using the RAKE(Rapid Automatic Keyword Extraction) algorithm.
License: MIT License
I can add a welcome bot config file having a proper message that will show up when any user will open up an issue or pull-request for the first time.
For reference, check out: https://github.com/apps/welcome
Please assign it to me.
Hai,
I would love to add an issue template for your repository. This template would have four issues namely bug, documentation, feature, proposal and question
We know code of conduct is a very important thing to be followed when many are contributing in a single project ,
Let me give a description about the code of conduct ,
This Code of Conduct applies both within project spaces and in public spaces
I have made a logo for the website. The preview of the same is provided below. Kindly assign me this issue so that I can make a PR and work on it under DWoC. @BALaka-18
Create the frontend of a web application that'll be used to make the library accessible to the users via the web.
The color scheme is upto your choice.
The web app must contain :
Create the files according to convention.
--> All HTML files must be under : web_app/templates/
--> All CSS and JS(if any) files under : web_app/static/
PR INSTRUCTION :
ALL PRs MUST BE MADE TO THE web-app
BRANCH ONLY, ELSE THEY WILL BE REJECTED.
1 week.
Shields for pypi, issues, forks and stars should link to the corresponding pages on pypi and github. For now it just links to the images.
Add the Hacktoberfest logo to the very start of the README file and add the link to the Contributing_Guidelines.md file and the Issues tab in the repository. The links should read :
'CLICK HERE TO START CONTRIBUTING' --> Link to the Issues tab.
'READ THE CONTRIBUTING GUIDELINES' --> Link to the Contributing_Guidelines.md file
10 minutes.
I will add related images and improve the content of the contribution guidelines.
Edge Case : Since keywords are mainly made by avoiding stopwords, for some cases the keywords extracted do not interpret the meaning of the text exactly.
For example : If text is - "I like sweet apples but I don't like sour apples ", the extracted keywords will say : 'I', 'like', 'sweet', 'sour'. 'apples', with 'apples' being shown as the highest priority keyword. But the meaning gets changed completely if we summarize the keywords.
This issue asks you to work a way around this problem, or brainstorm with me and other interested contributors.
Create a folder algorithm_addons
in the root directory and write a Python function to work around this problem. If I approve, I will create a function directly in the rake_new2.py
main file, with the contributors' name on top of the function.
Example naming convention : algorithm_addons/stopwords_debug.py
.py
file must be properly formatted.Recurring
Add frontend folder that contains react app
Create a PULL_REQUEST_TEMPLATE.md file that must contain the skeleton of a PR description and generate the template each time a PR is created.
If you don't know how or what this template is, read this link on creating Pull request templates
20 minutes
TF-IDF is one of the most famous algorithms when it comes to keyword extraction from text. Your task is to create a function that will extract keywords from text using the TF-IDF algorithm and compare the results against this library. How similar / different are the results ?
NOTE : You have to build the Tf-idf algorithm for keyword extraction from scratch. You will then compare its performance against sklearn's TfidfVectorizer and rake_new2.
For your reference, you may read this link
Create a folder tfidf_vectorizer
in the root directory. The folder must contain a .py
file that will contain the function for extracting the keywords from text using the Tfidf algorithm written from scratch.
Structure : tfidf_vectorizer/extract_keywords_tfidf_scratch.py
requirements.txt file
is updated if you are including any new library.2.5-3 hours (or more if needed)
Create an extensive and in-depth literature study on various keyword extraction algorithms other than RAKE and Tf-Idf. Every algorithm must be accompanied by brief logical / mathematical explanation + examples (in text or in the form of pictures / diagrams)
Create a Literature_Survey.md
file in the root directory.
5 days - 1 week.
Integrate the most appropriate GitHub Action for automatic generation of code coverage report on every code related PR made.
20 minutes
Hi @BALaka-18
Whenever any person visits a project repository , they look to README file first as this file is the most integral part and they like it
I found that this project's README file first lines like the heading of project , a desc of it , and badges this is what it makes look great , and I found there is no heading of the project name in readme and badges are not aligned to center and less badges are used , Let me know if you would like to improve it and give a great look to it
I would be glad to work on it @BALaka-18
Kindly do assign me
I can add a heading and a desc and add few more badges and then align all to center
Expecting to work on it and get a good level for it
For now run and build artifacts are inside the repository:
- build
- dist
- __pycache__
- egg-info
They should be remove from the repository.
In order to avoid future unwanted build and run artifacts to be committed in the repository we should add a .gitignore file.
I propose to take the python project gitignore from https://www.toptal.com/developers/gitignore. It's the classic .gitignore file for python project.
If you are ok with those changes I have a branch ready for a pull request!
No algorithm can escape edge cases. Your task is to check and test for probable edge cases where you think the algorithm might fail, by trial and error. Test the library on as many texts as you can.
For example : The previous version of this algorithm couldn't handle HTML tags in text. It was resolved in the current version that you see.
Create a folder test_cases
in the root directory. The folder must contain a .txt
file that will contain all the edge cases that you found, with each edge case in a separate line.
Structure : test_cases/edge_cases_file.txt
.txt
file must be properly formatted.Recurring
It would be useful to add descriptions to the labels so that people understand what they’re about and know when to use them.
Create a CONTRIBUTORS.md file that must contain the name of the contributors whose PRs get merged.
Format :
Contributor's GitHub profile picture as a thumbnail || Contributor Name(It must be a link to the contributor's GitHub profile) || Merged PR number.
Recurring process
TF-IDF is one of the most famous algorithms when it comes to keyword extraction from text. Your task is to create a function that will extract keywords from text using the TF-IDF algorithm and compare the results against this library. How similar / different are the results ?
For your reference, you may read these :
Create a folder tfidf_vectorizer
in the root directory. The folder must contain a .py
file that will contain the function for extracting the keywords from text using sklearn's TfidfVectorizer.
Structure : tfidf_vectorizer/extract_keywords_tfidf_sklearn.py
requirements.txt file
is updated if you are including any new library.1.5 hours
Hai,
I would love to add the Contributors.md file to your project in the form of a table and display a link to it in the README.md file
Make UI for the home page of the website using any of the UI designing tools like Figma
some links for references:-
https://templatemo.com/tm-540-lava-landing-page
Add the mentioned link in the README file just above the 'Installation' heading. The link should be displayed as 'READ MORE ABOUT RAKE' and should point to the URL : https://monkeylearn.com/keyword-extraction/
10 minutes.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.