This is a project to implement data science pipelines including data collection, ETL, and visualization on a webpage. This project is solely made with a Python backend both for data science implementation and rendering webpages. I have used Django framework to power the website, Numpy and Pandas for processing collected data, and Matplotlib and Seaborn for creating visualizations.
Refer to the requirements.txt
file to get all the necessary modules used for this project. The steps to build the project will be mentioned below as the project develops.
Here I have documented each steps that I followed to build this project along with date & time when I have completed a step in the project timeline.
-
Start the project with a virtual environment. To create the virtual environment, run the given command in any command line terminal.
virtualenv djds
Activate virtual environment by runningdjds\scripts\activate
in Windows, orsource bin/activate
in Mac. -
Install all necessary Python modules given in the
requirements.txt
by runningpip install -r requirements.txt
-
Run
django-admin startproject analyzr .
on your command prompt or terminal to create the basic layout of your django project. The.
at the end creates the project files on the same directory where you are running the command. You should have a SECRET_KEY in yoursettings.py
file; keep this secret from everyone. In this repository, the key has been hidden using random****
characters.
-
Creating Pandas dataframe from queryset is fairly easy. Django ORM produces queryset object and sends to view function when database call is performed. This object is passed into
pandas.dataFrame
method. This method turns any list of dictionary into a Pandas dataframe object. -
A dataframe object has a
to_html()
method, which generates HTML for the dataframe information so that it can be displayed on the webpage. This HTML snippet is then passed into the context, which can be used by Django templating engine to include it in the response. -
Django templating engine escapes all characters in a string. That is why the string of HTML snippet generated by pandas cannot be shown on the webpage unless it is made safe. To do this, we simply put
{{ html_snippet | safe }}
in the template. This marks the variablehtml_snippet
to be safe and does not need to be escaped. Not that, this has security threats, which you need to handle manually.
- Run the command
python manage.py collectstatic
, this will fetch settings data and create a folder calledstatic_cdn
in the root directory. Inside this folder, there will be
static_cdn/
static_root/
admin/
css/
vendor/
select2/
some-files
some .css files
fonts/
some .txt and .woff files
img/
gis/
some .svg files
some .svg files
js/
admin/
some .js files
vendor/
jquery/
some .js files
select2/
i18n/
some .js files
some .js files
xregexp/
some .js files
some .js files
-
We will put all our extra codes for
products
app into a separate python fileutils.py
. At this point, create this file insideproducts
directory and write the functions to generate the graphs using matplotlib and seaborn, then produce a byte buffer representing that graph as apng
image. Pass this buffer into the view and render it into the template to display it on the webpage. -
Create some logics in the
products/models.py
file to handle cases when the form provided is not filled up completely, or cases for incomplete required information. This is done by putting some error messages on the screen to guide the user through providing all necessary information. -
Put a
navbar.html
in the root templates directory to create a consistent navigation panel for the website. Currently, the panel just contains a heading which will be modified later.
-
Create a modal displaying some statistical information about the data, style it and add modal behaviors. The basic style and behavior are handled with
Semantic UI
, I have used a few custom styles to modify the looks. -
Modify titles over each types of charts in the
utils.py
file. Create new HTML template to add new records in the database from the user. Authentication system will be built later.
-
Add the
add.html
file and create a view functionadd_purchase_view
to render it. This page will allow a user to add a new sales record to the database, which will affect the graph on themain.html
page. -
Fill up the
navbar.html
to create a working navigation bar for the entire website. This is included in thebase.html
to render it to any template which extendsbase.html
. This creates a nice navigation panel for us to navigate through different pages. -
With this, we have a working website which allows us to store new sales data and views stats of sales based on number of sales for different items and selling price on daily basis.
-
Create a new app
csvs
in the same way we createdproducts
app before. Follow the same steps to create acsvs/urls.py
andcsvs/forms.py
, add a model in thecsvs/models.py
and make migrations to create the database table. -
Remove the database, and all the migration files, but leave the
__init__.py
files in migrations directories. Create a super user again, and run migrations like before to setup your new database. Create a second user this time for working with data uploads. -
Create view for the
csvs
app, add templates for upload csv files. Put the template calledupload.html
into the templates directory local to thecsvs
app. Add a form in theforms.py
and render the form into the template using the view function. -
Build logic to store all the data row from uploaded csv file into the database tables at once so that we can use these data to view the plots and charts in our website. Make sure that your csv file has this format:
product_name, quantity, unit_price, salesman_id, date_time
Please make sure that your data does not have any header row.
- Upload a csv file to populate the database. A
demo csv file
has been provided for you to start working, DO NOT MAKE ANY CHANGES FOR NOW WITH THE FILE.
-
Create a new app
customer
in the same way we createdproducts
app before. Follow the same steps to create acustomer/urls.py
, add a model in thecustomer/models.py
and make migrations to create the database table. -
Create a view to show customer correlation stats on a webpage by rendering the
main.html
template local to thecustomer
app. Don't forget to put a link to navigate to this template in thenavbar.html
template in the roottemplates
directory. -
Create a
urls.py
to include new urls pointing to the templates in thecustomer
app. Include this file in theurls.py
script of the project directory. -
Add some customer data in the database using the admin panel. Then browser the client site to view the results and verify whether they are in good shape.
-
Add
views.py
andforms.py
into the project directory, i.e. theanalyzr
directory. Inside theviews.py
script, add 3 methods, one for home, one for login, and one for logging out. Random users cannot create user accounts in this site, as any user can input data into the database in any format, which can lead the various problems. Only admin can add a new user account through the admin panel, and then anyone can use those credentials to login. This is suitable for a sales management site. -
Add a login form in the
forms.py
. Render this form into the login template calledlogin.html
which is put into the root templates directory. Thehome.html
template should also be put in the same directory. -
Add links to these 2 pages and logout functionality in the
analyzr/views.py
using suitable namespaces. Add all the necessary links with suitable logics in thenavbar.html
which renders links based on the authentication status of a visitor. -
Make sure to import the
login_required
decorator fromdjango.contrib.auth.decorators
and put it on top of every function-based views in the localviews.py
scripts in each of the 3 apps. This is done to make these views accessible only by the authenticated users. -
Add
LOGIN_URL
setting in thesettings.py
scripts in the project directory. Adding this setting will redirect any visitor to the login page if they are trying to access any protected page which havelogin_required
decorator in the corresponding view.
If you find any mistakes, please make a pull request and let us know about the mistake so that we can make amendment to the repository. If you don't understand something, reach out to me using:
Thank you!!!