Giter Club home page Giter Club logo

speedcoder5 / keywordexplorer Goto Github PK

View Code? Open in Web Editor NEW

This project forked from pgfeldman/keywordexplorer

1.0 0.0 0.0 5.43 MB

Collection of Python desktop (tkinter) apps that lets you: (1) Find (english) Wikipedia pages and look at page views over time (2) Look at the number of tweets containing substrings over time (3) Use the GPT-3 to search for keywords and Twitter to see if those keywords occur in the wild, and (4) Select terms to download tweets to a DB

License: MIT License

Python 99.55% Makefile 0.44% Shell 0.01%

keywordexplorer's Introduction

Explorer Apps

There are six(!) applications in this project, KeywordExplorer, TweetsCountExplorer, TweetDownloader, WikiPageviewExplorer, TweetEmbedExplorer, and ModelExplorer. The latest stable version can be installed with pip:

pip install keyword-explorer

A brief overview of each can be reached using the links below.

KeywordExplorer is a Python desktop app that lets you use the GPT-3 to search for keywords and Twitter to see if those keywords are any good.

TweetCountsExplorer is a Python desktop app that lets you explore the quantity of tweets containing keywords over days, weeks or months.

TweetDownloader is a Python desktop app that lets you select and download tweets containing keywords into a database. The number of Tweets can be adjusted so that they are the same for each day or proportional. Users can apply daily and overall limits for each keyword corpora.

WikiPageviewExplorer is a Python desktop app that lets you explore keywords that appear as articles in the Wikipedia, and chart their relative page views.

TweetEmbedExplorer is a Python desktop app for analyzing, filtering, and augmenting tweet information. Augmented information can them be used to create a train/test corpus for finetuning language models such as the GPT-2,

ModelExplorer is a Python desktop app that lets a user interact with a finetuned GPT-2 model trained using EmbeddingExplorer

Before Using!

Most of these apps require that you have an OpenAI account and/or a Twitter developer account:

  • KeywordExplorer requires a Twitter and OpenAI account
  • TweetCountExplorer requires a Twitter developer account
  • WikiPageviewExplorer uses the wikipedia API (pip install wikipedia), and requires a user agent
  • TweetDownloader requres additional elements such as a database, which will be descussed in its section but not here.
  • TweetEmbedExplorer requires a Twitter account, OpenAI account, and a MariaBD/MySQl database
  • ModelExplorer uses the HuggingFace transformers API (pip install transformers), and a MariaDB/Mysql database
  • ModelExplorer requires GPT-2 models trained on corpora generated by TweetEmbedExplorer. To train a model, follow these steps: How to train a model

The following links are very helpful:

In each case you'll have to get an ID and set it as an environment variable. The names must be OPENAI_KEY for your GPT-3 account and BEARER_TOKEN_2 for your Twitter account, as shown below for a Windows environment:

Environment variables

If you don't have permissions to set up environment variables or just don't want to, you can set up a json file and load that instead:

{
  "BEARER_TOKEN_2": "AAAAAAAAAAAAAAAAAAAAAC-----------------------",
  "OPENAI_KEY": "sk-s------------------------------------",
  "USER_AGENT": "[email protected]",
}

In this case, BEARER_TOKEN_2 id for the Twitter V2 account, OPENAI_KEY is for the GPT-3, and USER_AGENT is for accessing the Wikipedia.

To load the file click on the "File" menu and select "Load IDs". Then navigate to the json file and select it. After the ids are loaded, any application that depends on them will run. If you try using an app that doesn't have an active ID, it will complain.

LoadID

Alternately you can create a .env file in the folder from which you are running the apps or a parent folder thereof. An example of this file is provided as .env_example.

To use this method copy .env_example to .env, enter your keys and save the file.

This file uses dotenv to automatically search for and environment variables and load them. .env is ignored by git as to make sure it is not committed.

DATABASE_USER=root
DATABASE_PASSWORD=password
DATABASE_HOST=localhost
DATABASE_SSL_CA=/home/username/.ssl/DigiCertGlobalRootG2.crt.pem
OPENAI_KEY=AAAAAAAAAAAAAAAAAAAAAC-----------------------
BEARER_TOKEN_2=AAAAAAAAAAAAAAAAAAAAAC-----------------------
[email protected]

Default values are used if an environment variable is omitted.

DATABASE_SSL_CA is only required if you are connecting to a non-local database via ssl. It is the path to the file that contains a PEM-formatted CA certificate. By example, if you are using an Azure MySQL database the .pem file can be obtained here. Sometimes the Root CA changes, as happened recently with Azure MySQL. The new certificates can be found here.

MySQL Database Setup

  1. Create a .my.cnf file in your home directory:
touch ~/.my.cnf
  1. Open the .my.cnf file and paste the contents of .my.cnf_example into it.

  2. Modify the values in .my.cnf to match your MySQL configuration.

Set the correct file permissions:

chmod 600 ~/.my.cnf
  1. Verify the setup by running mysql without specifying user and password.
make show-databases
  1. Create the databases.
make create-databases

make commands explained

make description
help This help
clean Removes build artifacts
clean-all Remove the virtual environment and build artifacts
venv Create/update project's virtual environment. To activate, run: source activate
test Run unit tests
dist Create python package and run unit tests
publish Create/publish python package to test pypi repo
show-databases show existing databases (also tests database connection)
create-databases create databases
drop-databases drop databases
get-corpora download some books from gutenberg to corpora

You should be good to use the apps!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.