Giter Club home page Giter Club logo

sjr-journal-ranking's Introduction

Colab Contributors MIT License Linkedin

SJR Journal Ranking Analysis

A web scraping and visualization project on SJR and WoS journal indexes.
View Dashboard

Table of Contents
  1. Problem Statement
  2. Built With
  3. Installation
  4. Results

Problem Statement

This project is a data scraping, analysis, and visualization project on Research Journals. The project is divided into two parts: the first part is the web scraping part, which is done using Selenium and Python; the second part is the data analysis and visualization part, which is done using Tableau. The project is done as a part of the 1st capstone project of MasterCourse Data Science Cohort 2 program.

The data is scraped from the following websites:

An external dataset is also used in this project:

From these 3 sources, the following information is scraped:

  • Journal Name or Title
  • Subject Area
  • Open Access Status
  • Publisher
  • Country
  • Coverage Year
  • Journal Rank
  • SJR Index
  • Quartile
  • H-Index
  • CiteScore
  • References Count
  • Citations Count
  • Documents Count ...

The scraped data is then cleaned and analyzed using Python libraries such as Pandas, Numpy, Matplotlib, and Seaborn. The cleaned data is then visualized using Tableau. The final dataset can be found in kaggle.

Built With

Python libraries and softwares used in this project:

  • Selenium
  • Pandas
  • Tableau

Installation

This project is done using Python 3.11.0. Please install the latest version of Python before running the project.

Below are the steps to run the project:

  1. Clone the repo
git clone https://github.com/abir0/SJR-Journal-Ranking.git
  1. Intialize and activate virtual environment
virtualenv --no-site-packages  venv
source venv/bin/activate
  1. Install dependencies
pip install -r requirements.txt
  1. Download Chrome WebDrive from https://chromedriver.chromium.org/downloads and add the path to the chromedriver.exe file in PATH environment variable.

  2. Run the scraper scripts

python src/sjr_scraper.py
python src/wos_scraper.py
  1. Run all the cells in the data transformation notebook in google colab or download the notebook and run it in Jupyter.

  2. You will get a file named combined_journal_ranking_data.csv. This is the final data.

  3. Open the SJR Journal Ranking Analysis.twb file in Tableau (or open the public tableau link) and connect the combined_journal_ranking_data.csv file to the workbook.

Results

The final dashboard can be found here.

Here are the two dashboards:

Key findings from the analysis:

  • From the correlation analysis, it is found that there is a positive correlation between SJR Index and CiteScore, H-index, and Cites per Docs. So, these metrics are better indicators than the simple counts of citations, references, and documents.
  • But for lower-ranking journals, these metrics do not represent much significance due to higher randomness (note that correlation plots get more scattered to the right).
  • Open Access journals have a higher average of Citations per Document than non-Open Access journals.
  • One interesting observation: based on the number of documents, citations, and references MDPI is among the top 5 publishers. This is because MDPI publishes a lot of journals, but the quality of the journals is not as high as the top 5 publishers which is reflected by the poor CiteScore.
  • Based on CiteScore, the top 5 publishers are: Wiley, Elsevier, Springer, Nature Portfolio, and Routledge.
  • The top 5 countries with the highest number of journals are: United States, United Kingdom, Netherlands, Germany, and Switzerland.
  • Medicine and Social Sciences are the top 2 subject areas that have the most number of documents, references, and combined H-index.

sjr-journal-ranking's People

Contributors

abir0 avatar

Stargazers

Moussa GUENANI avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.