Giter Club home page Giter Club logo

smart-city-slicker's Introduction

cs5293sp23-project3

Description

The term “smart city” is widely used, but there is no consensus on the definition. Many citizens and stakeholders are unsure about what a smart city means in their community and how it affects them. Imagine you are a stakeholder in a rising Smart City and want to know more about themes and concepts about existing smart cities. You also want to know where does your smart city place among others. In this project, you will use text analysis techniques to investigate themes and similarities for smart cities with the use of cluster analysis, topic modeling, and summarization. This project can help you as a stakeholder understand smart cities using data from the 2015 Smart City Challenge.

How to install

Clone the repo

How to run

pipenv run python project3.py --document "city.pdf" the command for document has to be in quotations. pipenv run python -m pytest for test

tree

. ├── COLLABORATORS ├── LICENSE ├── Pipfile ├── Pipfile.lock ├── README.md ├── pycache │ └── contractions.cpython-310.pyc ├── contractions.py ├── docs ├── model.pkl ├── project3.ipynb ├── project3.py ├── setup.cfg ├── setup.py ├── smartcity │ ├── AK Anchorage.pdf │ ├── AL Birmingham.pdf │ ├── AL Montgomery.pdf │ ├── AZ Scottsdale AZ.pdf │ ├── AZ Tucson.pdf │ ├── CA Chula Vista.pdf │ ├── CA Fremont.pdf │ ├── CA Fresno.pdf │ ├── CA Long Beach.pdf │ ├── CA Moreno Valley.pdf │ ├── CA Oakland.pdf │ ├── CA Oceanside.pdf │ ├── CA Riverside.pdf │ ├── CA Sacramento.pdf │ ├── CA San Jose_0.pdf │ ├── CT NewHaven.pdf │ ├── DC_0.pdf │ ├── FL Jacksonville.pdf │ ├── FL Miami.pdf │ ├── FL Orlando.pdf │ ├── FL St. Petersburg.pdf │ ├── FL Tallahassee.pdf │ ├── FL Tampa.pdf │ ├── GA Atlanta.pdf │ ├── GA Brookhaven.pdf │ ├── GA Columbus.docx │ ├── IA Des Moines.pdf │ ├── IN Indianapolis.pdf │ ├── KY Louisville.pdf │ ├── LA Baton Rouge.pdf │ ├── LA New Orleans.pdf │ ├── LA Shreveport.pdf │ ├── MA Boston.pdf │ ├── MD Baltimore.pdf │ ├── MI Detroit.pdf │ ├── MI Port Huron and Marysville.pdf │ ├── MN Minneapolis St Paul.pdf │ ├── MO St. Louis.pdf │ ├── NC Charlotte.pdf │ ├── NC Greensboro.pdf │ ├── NC Raleigh.pdf │ ├── NE Lincoln.pdf │ ├── NE Omaha.pdf │ ├── NJ Jersey City.pdf │ ├── NJ Newark.pdf │ ├── NM Albuquerque.docx │ ├── NV Las Vegas.pdf │ ├── NV Reno.pdf │ ├── NY Albany Troy Schenectady Saratoga Springs.pdf │ ├── NY Buffalo.pdf │ ├── NY Mt Vernon Yonkers New Rochelle.pdf │ ├── NY Rochester.pdf │ ├── OH Akron.pdf │ ├── OH Canton.pdf │ ├── OH Cleveland.pdf │ ├── OH Toledo.pdf │ ├── OK Oklahoma City.pdf │ ├── OK Tulsa.pdf │ ├── RI Providence.pdf │ ├── SC Greenville.pdf │ ├── TN Chattanooga.pdf │ ├── TN Memphis.pdf │ ├── TN Nashville.pdf │ ├── TX Lubbock.pdf │ ├── VA Newport News.pdf │ ├── VA Norfolk.pdf │ ├── VA Richmond.pdf │ ├── VA Virginia Beach.pdf │ ├── WA Seattle.pdf │ ├── WA Spokane.pdf │ └── WI Madison.pdf ├── smartcity_eda.tsv ├── smartcity_predict.tsv └── tests └── test_project3.py

External libraries

import PyPDF2 import os import pandas as pd import numpy as np import nltk import spacy import unicodedata from contractions import CONTRACTION_MAP import re from nltk.corpus import wordnet import collections #from textblob import Word from nltk.tokenize.toktok import ToktokTokenizer from bs4 import BeautifulSoup from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.cluster import KMeans, DBSCAN, AgglomerativeClustering from joblib import dump, load import argparse

Code organization

load data

Loads the pdf given and converts it to text

normalizer

Using the textbook code it normalizes and cleans the code

load model

loads the model used in Part 1

main

Gets arguments and runs the code

Assumptions

The model loaded but I could not fit the model correctly due to the features not being the same size. The trained model was of size 20353 while the test tfidf vector only had 1680 features. There was not much testing done as it is hard testing a model. The testing of file creation and writing was done manually. It is assumed you have all the packages installed.

smart-city-slicker's People

Contributors

fazilraja avatar

Watchers

 avatar

smart-city-slicker's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.