Giter Club home page Giter Club logo

science-novelty's Introduction

Measuring Novelty and its Impact in Science using Natural Language Processing

Citation

If you use the code from this repository, please cite the following paper:

Arts S., Melluso N., Veugelers R. (2023). Beyond Citations: Text-Based Metrics for Assessing Novelty and its Impact in Scientific Publications. https://doi.org/10.48550/arXiv.2309.16437

Overview

This repository is dedicated to the assessement of novelty and its impact of scientific publications, employing Python scripts and Jupyter notebooks. It is designed with a dual purpose:

  • Reproduce Results:
    • Replicate the findings of the original paper, which analyzes data from the Microsoft Academic Graph (MAG), now OpenAlex, encompassing a comprehensive collection of papers from 1800 to 2020. The data can be accessed here: https://zenodo.org/record/8283353.
  • Custom Analysis:
    • Enable users to apply the analysis, including preprocessing and metrics calculation, to a tailored set of papers for individual research needs.

Science Novelty Schema

The methodology is systematically organized into the following segments:

  1. Data Collection
  2. Preprocessing
  3. Text Embeddings
  4. Cosine Distance
  5. New Word
  6. New Bigrams
  7. New Trigrams
  8. New Word Combinations

Each segment is integral for extracting text-based metrics to measure the novelty and its impact of scientific publications.

Usage Guide

Notebooks

The repository contains scripts and detailed Jupyter notebooks that guide users through each step of the process. The notebooks are particularly beneficial for those aiming to execute specific tasks or a subset of the entire process.

  • 0.tutorial: A comprehensive guide that offers a step-by-step walkthrough of all phases, serving as an introductory overview.
  • 1.data-collection: Instructions for downloading a custom set of papers from OpenAlex or searching within the Zenodo repository.
  • 2.preprocessing: A guide for preprocessing titles and abstracts (and full texts, if available) of a selected set of papers.
  • 3.text-embeddings and 4.cosine-distance: Notebooks for generating text embeddings and calculating cosine similarity.
  • 5.new-word, 6.new-bigram, 7.new-trigram, 8.new-word-comb: Detailed guides for identifying new lexical elements and combinations in processed papers.

Custom Analysis

Users are encouraged to adapt the code for their specific research needs, ensuring a flexible and customizable approach to analyzing scientific novelty and impact. To this end the notebooks are organized as follow:

Contribution & Feedback

Contributions to enhance the code and extend its functionalities are warmly welcomed. For any inquiries, issues, or feedback, feel free to open an issue or contact us directly at [email protected]. Part of this code is inspired from https://github.com/sam-arts/respol_patents_code

Respect Copyrights

Users are reminded to adhere to copyright regulations and ethical guidelines when utilizing and adapting the provided resources and data.

science-novelty's People

Contributors

nicolamelluso avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.