balaka-18 / rake_new2 Goto Github PK

A Python library that enables smooth keyword extraction from any text using the RAKE(Rapid Automatic Keyword Extraction) algorithm.

License: MIT License

Python 100.00%

text text-data keyword-extraction keywords keyword-search nlp python-library

rake_new2's Issues

Integrate welcome bot

I can add a welcome bot config file having a proper message that will show up when any user will open up an issue or pull-request for the first time.
For reference, check out: https://github.com/apps/welcome

Please assign it to me.

Create Issue Template

Hai,
I would love to add an issue template for your repository. This template would have four issues namely bug, documentation, feature, proposal and question

CODE OF CONDUCT

We know code of conduct is a very important thing to be followed when many are contributing in a single project ,
Let me give a description about the code of conduct ,

There will be a pledge which all need to follow so that this community get a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity etc while contributing and we take this pledge to follow the decorum in conversation in issues also.
There will be Standards like :
- Examples of unacceptable behavior by participants
- Examples of behavior that contributes to creating a positive environment
Responsibilities
And if any one faces any issues from other contributors , how to contact ?
Scope of code of conduct like This Code of Conduct applies both within project spaces and in public spaces
These are the things i will be adding , in little more descriptive way
let me know if you can assign me @BALaka-18

Website logo

Description

I have made a logo for the website. The preview of the same is provided below. Kindly assign me this issue so that I can make a PR and work on it under DWoC. @BALaka-18

For reference

Please review.

Basic frontend for public web app

Description

Create the frontend of a web application that'll be used to make the library accessible to the users via the web.

The color scheme is upto your choice.

The web app must contain :

A text area for the users to type in the text.
A button that allows users to upload text files if they don't want to type.
Two radio buttons : 1. Keep HTML, 2. Don't keep HTML
A dropdown list with options : 1. Get keywords only, 2. Get keywords with scores, 3. Get top 5 keywords.
Two radio buttons : 1. Show top 5 most frequent words, 2. Don't show frequent words.
A button that links people to the official PyPi page of rake_new2.
A button that has the text : Click to extract keywords.
ALL SEVEN ITEMS MUST BE WRAPPED INSIDE ONE BOX THAT WILL BE CENTERED.
HEADING : WELCOME TO rake_new2 : A PYTHON LIBRARY THAT HELPS IN SMOOTH KEYWORD EXTRACTION.
ON THE TOP RIGHT CORNER, THERE SHOULD BE A DESIGN LIKE THIS(any color) :

INSIDE THIS SHOULD BE A LOGO OF GITHUB, AND THIS TRIANGLE SHOULD LINK TO : https://github.com/BALaka-18/rake_new2/issues
Below the box(the box stated in point 8.), there should be a bold, legible text that says : Want to contribute ? Have a better idea to enhance our library ? Click on the top right corner of this page.

File structure

Create the files according to convention.

--> All HTML files must be under : web_app/templates/
--> All CSS and JS(if any) files under : web_app/static/

PR INSTRUCTION :

ALL PRs MUST BE MADE TO THE web-app BRANCH ONLY, ELSE THEY WILL BE REJECTED.

Acceptance Criteria

All instructions provided in the Description must be strictly followed.
Must be neat and formal.
All criteria must be satisfied.
Must be functioning.
PR must follow PR instruction and PR template.

Definition of Done

All of the required items are completed.
Approval by 1 mentor.

Time Estimation

1 week.

Shields links are not correct

Shields for pypi, issues, forks and stars should link to the corresponding pages on pypi and github. For now it just links to the images.

Improve Contributor.md

Add Hacktoberfest logo to README.md

Description

Add the Hacktoberfest logo to the very start of the README file and add the link to the Contributing_Guidelines.md file and the Issues tab in the repository. The links should read :

'CLICK HERE TO START CONTRIBUTING' --> Link to the Issues tab.
'READ THE CONTRIBUTING GUIDELINES' --> Link to the Contributing_Guidelines.md file

Acceptance Criteria

README must be properly formatted.
All instructions provided in the Description must be strictly followed.

Definition of Done

All of the required items are completed.
Approval by 1 mentor.

Time Estimation

10 minutes.

Enhance Contribution Guidelines

I will add related images and improve the content of the contribution guidelines.

Stopwords filter

Description

Edge Case : Since keywords are mainly made by avoiding stopwords, for some cases the keywords extracted do not interpret the meaning of the text exactly.

For example : If text is - "I like sweet apples but I don't like sour apples ", the extracted keywords will say : 'I', 'like', 'sweet', 'sour'. 'apples', with 'apples' being shown as the highest priority keyword. But the meaning gets changed completely if we summarize the keywords.

This issue asks you to work a way around this problem, or brainstorm with me and other interested contributors.

Read : How to use rake_new2

NOTE : This may be a multi-assignee issue

Folder Structure, Function details

Create a folder algorithm_addons in the root directory and write a Python function to work around this problem. If I approve, I will create a function directly in the rake_new2.py main file, with the contributors' name on top of the function.

Example naming convention : algorithm_addons/stopwords_debug.py

Acceptance Criteria

The .py file must be properly formatted.
All instructions provided in the Description must be strictly followed.

Definition of Done

All of the required items are completed.
Approval by at least 1 mentor.

Time Estimation

Recurring

Adding frontend react app

Add frontend folder that contains react app

Create a Pull request Template file.

Description

Create a PULL_REQUEST_TEMPLATE.md file that must contain the skeleton of a PR description and generate the template each time a PR is created.

If you don't know how or what this template is, read this link on creating Pull request templates

Acceptance Criteria

PULL_REQUEST_TEMPLATE.md file must be properly created.
All instructions provided in the Description must be strictly followed.

Definition of Done

All of the required items are completed.
Approval by 1 mentor.

Time Estimation

20 minutes

Comparison against TF-IDF Vectorizer (from scratch)

Description

TF-IDF is one of the most famous algorithms when it comes to keyword extraction from text. Your task is to create a function that will extract keywords from text using the TF-IDF algorithm and compare the results against this library. How similar / different are the results ?

NOTE : You have to build the Tf-idf algorithm for keyword extraction from scratch. You will then compare its performance against sklearn's TfidfVectorizer and rake_new2.

For reference :

For your reference, you may read this link

Folder Structure, Function details

Create a folder tfidf_vectorizer in the root directory. The folder must contain a .py file that will contain the function for extracting the keywords from text using the Tfidf algorithm written from scratch.

Structure : tfidf_vectorizer/extract_keywords_tfidf_scratch.py

Acceptance Criteria

Code must be properly formatted.
Code must be accompanied by appropriate comments.
File structure must be strictly maintained.
Test cases must be present at the end of the code.
Variables and functions must be properly named
IMPORTANT : Make sure requirements.txt file is updated if you are including any new library.
All instructions provided in the Description must be strictly followed.

Definition of Done

All of the required items are completed.
Approval by 1 mentor.

Time Estimation

2.5-3 hours (or more if needed)

Literature study on different keyword extraction algorithms

Description

Create an extensive and in-depth literature study on various keyword extraction algorithms other than RAKE and Tf-Idf. Every algorithm must be accompanied by brief logical / mathematical explanation + examples (in text or in the form of pictures / diagrams)

File structure

Create a Literature_Survey.md file in the root directory.

Acceptance Criteria

All instructions provided in the Description must be strictly followed.

Definition of Done

All of the required items are completed.
Approval by 1 mentor.

Time Estimation

5 days - 1 week.

Enable GitHub Actions for code coverage.

Description

Integrate the most appropriate GitHub Action for automatic generation of code coverage report on every code related PR made.

NOTE : Once assigned, please comment here as to which GitHub Action you're going to integrate before creating a PR. I will approve the Action, only then you may integrate it.

Acceptance Criteria

All instructions provided in the Description must be strictly followed.

Definition of Done

All of the required items are completed.
Approval by 1 mentor.

Time Estimation

20 minutes

UPDATE AND IMPROVE README LOOK

Hi @BALaka-18

Whenever any person visits a project repository , they look to README file first as this file is the most integral part and they like it
I found that this project's README file first lines like the heading of project , a desc of it , and badges this is what it makes look great , and I found there is no heading of the project name in readme and badges are not aligned to center and less badges are used , Let me know if you would like to improve it and give a great look to it

I would be glad to work on it @BALaka-18

Kindly do assign me

I can add a heading and a desc and add few more badges and then align all to center

Expecting to work on it and get a good level for it

cleaning the repository and adding a .gitignore

For now run and build artifacts are inside the repository:
- build
- dist
- __pycache__
- egg-info
They should be remove from the repository.
In order to avoid future unwanted build and run artifacts to be committed in the repository we should add a .gitignore file.
I propose to take the python project gitignore from https://www.toptal.com/developers/gitignore. It's the classic .gitignore file for python project.

If you are ok with those changes I have a branch ready for a pull request!

Test the current algorithm of rake_new2 to look for edge cases

Description

No algorithm can escape edge cases. Your task is to check and test for probable edge cases where you think the algorithm might fail, by trial and error. Test the library on as many texts as you can.

Read : How to use rake_new2

For example : The previous version of this algorithm couldn't handle HTML tags in text. It was resolved in the current version that you see.

NOTE : This may be a multi-assignee issue

Folder Structure, Function details

Create a folder test_cases in the root directory. The folder must contain a .txt file that will contain all the edge cases that you found, with each edge case in a separate line.

Structure : test_cases/edge_cases_file.txt

Acceptance Criteria

The .txt file must be properly formatted.
All instructions provided in the Description must be strictly followed.

Definition of Done

All of the required items are completed.
Approval by 1 mentor.

Time Estimation

Recurring

Labels to have a description

Description

It would be useful to add descriptions to the labels so that people understand what they’re about and know when to use them.

Create CODE_OF_CONDUCT

Create a CONTRIBUTORS.md file

Description

Create a CONTRIBUTORS.md file that must contain the name of the contributors whose PRs get merged.
Format :

Contributor's GitHub profile picture as a thumbnail || Contributor Name(It must be a link to the contributor's GitHub profile) || Merged PR number.

Acceptance Criteria

CONTRIBUTORS.md file must be properly formatted.
All instructions provided in the Description must be strictly followed.

Definition of Done

All of the required items are completed.
Approval by 1 mentor.

Time Estimation

Recurring process

Comparison against TF-IDF Vectorizer (using sklearn)

Description

For reference :

For your reference, you may read these :

Folder Structure, Function details

Create a folder tfidf_vectorizer in the root directory. The folder must contain a .py file that will contain the function for extracting the keywords from text using sklearn's TfidfVectorizer.

Structure : tfidf_vectorizer/extract_keywords_tfidf_sklearn.py

Acceptance Criteria

Code must be properly formatted.
Code must be accompanied by appropriate comments.
File structure must be strictly maintained.
Test cases must be present at the end of the code.
Variables and functions must be properly named
IMPORTANT : Make sure requirements.txt file is updated if you are including any new library.
All instructions provided in the Description must be strictly followed.

Definition of Done

All of the required items are completed.
Approval by 1 mentor.

Time Estimation

1.5 hours

Description

Add the mentioned link in the README file just above the 'Installation' heading. The link should be displayed as 'READ MORE ABOUT RAKE' and should point to the URL : https://monkeylearn.com/keyword-extraction/

Acceptance Criteria

README must be properly formatted.
All instructions provided in the Description must be strictly followed.

Definition of Done

All of the required items are completed.
Approval by 1 mentor.

Time Estimation

10 minutes.

balaka-18 / rake_new2 Goto Github PK

rake_new2's Issues

Description

For reference

Description

File structure

Acceptance Criteria

Definition of Done

Time Estimation

Description

Acceptance Criteria

Definition of Done

Time Estimation

Description

NOTE : This may be a multi-assignee issue

Folder Structure, Function details

Acceptance Criteria

Definition of Done

Time Estimation

Description

Acceptance Criteria

Definition of Done

Time Estimation

Description

For reference :

Folder Structure, Function details

Acceptance Criteria

Definition of Done

Time Estimation

Description

File structure

Acceptance Criteria

Definition of Done

Time Estimation

Description

NOTE : Once assigned, please comment here as to which GitHub Action you're going to integrate before creating a PR. I will approve the Action, only then you may integrate it.

Acceptance Criteria

Definition of Done

Time Estimation

Description

NOTE : This may be a multi-assignee issue

Folder Structure, Function details

Acceptance Criteria

Definition of Done

Time Estimation

Description

Description

Acceptance Criteria

Definition of Done

Time Estimation

Description

For reference :

Folder Structure, Function details

Acceptance Criteria

Definition of Done

Time Estimation

Description

Acceptance Criteria

Definition of Done

Time Estimation

Recommend Projects

Recommend Topics

Recommend Org