The sjr-journal-ranking from abir0

SJR Journal Ranking Analysis

A web scraping and visualization project on SJR and WoS journal indexes.
View Dashboard

Table of Contents

Problem Statement
Built With
Installation
Results

Problem Statement

This project is a data scraping, analysis, and visualization project on Research Journals. The project is divided into two parts: the first part is the web scraping part, which is done using Selenium and Python; the second part is the data analysis and visualization part, which is done using Tableau. The project is done as a part of the 1st capstone project of MasterCourse Data Science Cohort 2 program.

The data is scraped from the following websites:

An external dataset is also used in this project:

Scopus

From these 3 sources, the following information is scraped:

Journal Name or Title
Subject Area
Open Access Status
Publisher
Country
Coverage Year
Journal Rank
SJR Index
Quartile
H-Index
CiteScore
References Count
Citations Count
Documents Count ...

The scraped data is then cleaned and analyzed using Python libraries such as Pandas, Numpy, Matplotlib, and Seaborn. The cleaned data is then visualized using Tableau. The final dataset can be found in kaggle.

Built With

Python libraries and softwares used in this project:

Installation

This project is done using Python 3.11.0. Please install the latest version of Python before running the project.

Below are the steps to run the project:

Clone the repo

git clone https://github.com/abir0/SJR-Journal-Ranking.git

Intialize and activate virtual environment

virtualenv --no-site-packages  venv
source venv/bin/activate

Install dependencies

pip install -r requirements.txt

Download Chrome WebDrive from https://chromedriver.chromium.org/downloads and add the path to the chromedriver.exe file in PATH environment variable.
Run the scraper scripts

python src/sjr_scraper.py
python src/wos_scraper.py

Run all the cells in the data transformation notebook in google colab or download the notebook and run it in Jupyter.
You will get a file named combined_journal_ranking_data.csv. This is the final data.
Open the SJR Journal Ranking Analysis.twb file in Tableau (or open the public tableau link) and connect the combined_journal_ranking_data.csv file to the workbook.

Results

The final dashboard can be found here.

Here are the two dashboards:

Dashboard 1

Dashboard 2

Key findings from the analysis:

From the correlation analysis, it is found that there is a positive correlation between SJR Index and CiteScore, H-index, and Cites per Docs. So, these metrics are better indicators than the simple counts of citations, references, and documents.
But for lower-ranking journals, these metrics do not represent much significance due to higher randomness (note that correlation plots get more scattered to the right).
Open Access journals have a higher average of Citations per Document than non-Open Access journals.
One interesting observation: based on the number of documents, citations, and references MDPI is among the top 5 publishers. This is because MDPI publishes a lot of journals, but the quality of the journals is not as high as the top 5 publishers which is reflected by the poor CiteScore.
Based on CiteScore, the top 5 publishers are: Wiley, Elsevier, Springer, Nature Portfolio, and Routledge.
The top 5 countries with the highest number of journals are: United States, United Kingdom, Netherlands, Germany, and Switzerland.
Medicine and Social Sciences are the top 2 subject areas that have the most number of documents, references, and combined H-index.

abir0 / sjr-journal-ranking Goto Github PK

sjr-journal-ranking's Introduction

SJR Journal Ranking Analysis

Problem Statement

Built With

Installation

Results

Dashboard 1

Dashboard 2

sjr-journal-ranking's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent