Giter Club home page Giter Club logo

russscholar-seeker's Introduction

RussScholar-Seeker:A Python package for predicting whether a name is Russian

I am aware that this topic may be viewed from a political perspective. That is absolutely AGAINST my motivation.

微信图片_20240409101854

This project contains a series of programs designed to automatically identify and analyze Russian authors in academic papers. Utilizing the latest natural language processing technologies, it predicts the geographical attribute of names using a pre-trained BERT model to determine whether a given name is Russian.

This script enables users to search the latest 1000 papers from selected conferences (or journals) and utilizes a large model to identify authors possibly of Russian background. It outputs the paper title, author names, and DOI. The script has already been deployed online, web version: https://russscholar.online

Principle

The core of the project is based on the BertForSequenceClassification model from the transformers library, trained with a specific dataset to distinguish Russian from non-Russian names. We first scrape metadata of academic papers, including titles and author names, from databases like DBLP. Then, we use this trained model to predict the names fetched, automatically identifying Russian authors. https://huggingface.co/Gao-Tianci/RussScholar-Seeker

Production Process

  1. Data Preparation: First, we collected a set of names labeled as Russian and non-Russian to serve as the dataset for training the model.
  2. Model Training: We trained the model using BertForSequenceClassification and the collected dataset. During the training process, we adjusted the model parameters to achieve the best predictive performance.
  3. Data Scraping: We wrote web scraping programs to fetch metadata of academic papers from databases like DBLP.
  4. Prediction and Analysis: The fetched names were predicted using the trained model to identify Russian authors, and the related information was output.

Usage Guide

Before using this tool, you need to install some necessary Python libraries, including transformers, torch, requests, and beautifulsoup4. The installation command is as follows:

pip install transformers torch requests beautifulsoup4

After that, you can run prediction.py to execute the Russian expert identification. The command might look like this:

python prediction.py

Case Study: Identifying Russian Authors in AAAI 2021

One of the notable applications of this project was the analysis of academic papers from the AAAI 2021 conference, listed on DBLP(HTML,XML). The goal was to identify papers with Russian authors, showcasing the model's ability to provide insights into geographical distributions of academic contributions.

Results

321279174-2cd01309-38cc-4fab-a9bc-8316c023e69f

The model successfully identified several papers with Russian authors, underlining the global collaboration in the field of Artificial Intelligence. Here are a few highlights from the analysis:

1713720067602

These results not only demonstrate the practical utility of the Russian Expert Identifier in analyzing academic contributions but also highlight the diverse international collaboration within the AI research community.

Implications

This case study underscores the potential of AI and NLP technologies in enhancing our understanding of academic landscapes. By automating the identification of geographical attributes of authors, we can gain valuable insights into global research trends, collaboration networks, and the geographical distribution of expertise.

russscholar-seeker's People

Contributors

tiancigao avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.