Giter Club home page Giter Club logo

bible_words's Introduction

Bible Word and Phrase Counter

This project contains a Python script that parses an HTML file of the Bible and creates a treemap visualization of the most common words and phrases.

Words

image

Phrases

image

The phrase can be improve with ML:

Here’s a high-level idea of how this could be done:

  • Preprocess the text: This could involve cleaning the text, removing stop words, and possibly lemmatizing words.

  • Convert sentences into vectors: Use an NLP model to convert each sentence into a vector. This could be a simple Bag-of-Words model, TF-IDF, or more complex models like Word2Vec, GloVe, BERT, etc.

  • Calculate similarity: For each sentence, calculate its similarity to all other sentences. This could be done using cosine similarity, which is a common measure for the similarity between vectors.

  • Group sentences: Based on their similarities, group sentences together. This could be done using a clustering algorithm like K-means.

  • Count groups: Instead of counting identical sentences, count the number of sentences in each group.

Getting Started

These instructions will get you a copy of the project up and running on your local machine.

Prerequisites

You need to have Python installed on your machine. You also need the following Python libraries:

  • BeautifulSoup
  • collections
  • re
  • matplotlib
  • squarify

You can install these libraries using pip:

pip install beautifulsoup4 matplotlib squarify

Running the Script

To run the script, navigate to the directory containing the script and run the following command:

python words.py

or

python phrases.py

Authors

Charlie

License

This project is licensed under the MIT License.

Acknowledgments

Thanks to OpenAI for providing the initial guidance for this project.

bible_words's People

Contributors

charliecidral avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.