Giter Club home page Giter Club logo

papercast-dev / papercast Goto Github PK

View Code? Open in Web Editor NEW
32.0 1.0 1.0 187 KB

A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GROBID, LangChain, listen as podcast. Customize your own pipelines.

Home Page: https://docs.papercast.dev

License: MIT License

Python 100.00%
arxiv grobid python semantic-scholar dag nlp pdf-converter pdf-document-processor pipeline document-parser

papercast's People

Contributors

g-simmons avatar papercast-dev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

joe-nano

papercast's Issues

Enable code analysis for plugins

Issue: Pylance in VSCode is unable to recognize the dynamically imported plugins in the papercast package, leading to issues like "Go to definition" not working as expected for the installed plugins.

Example:

from papercast.processors import ArxivProcessor

In the above import statement, Pylance does not recognize the ArxivProcessor imported from the papercast.processors package after papercast-arxiv plugin is installed.

Possible solution:

  • Change the plugin structure to one folder with four files: subscribers.py, processors.py, publishers.py, types.py #11 . Classes of each type should belong to each file. Maybe not a bad idea anyway.
  • At install time for each plugin, generate stubs for each file, and place them in the corresponding submodule folders e.g. papercast/processors/stubs/. Alternatively, figure out a way to generate stubs for only a specific class from within a module using stubgen.
  • Modify the __init__.pyi file to be able to import from papercast.processors import ArxivProcessor instead of from papercast.processors.arxiv_processor import ArxivProcessor.
    • Note: if we're ok with the messier import suggested in #10 we might be able to skip this step?
  • Make an init.pyi file with contents from .stubs import *
  • Add a line to the papercast/processors/__init__.py:
    from papercast.processors import *

Notes:

  • stubs/init.pyi is shared across plugins, maybe not ideal

Enforce file naming conventions for plugins

Enfore a file structure for Python packages from each plugin that looks like:

plugin_dir
├── processors.py
├── publishers.py
├── subscribers.py
└── types.py

Benefits:

  • Easier stub generation for #9
  • Consistent init.py across plugins
  • Consistency/structure might be a good thing for plugin contributors

Cons:

  • Slightly harder to open files for developers working with several plugins at once

Change plugin imports to from papercast.types.plugin import Class

We currently import stuff from plugins with

from papercast.processors import ArxivProcessor, SemanticScholarProcessor

This would change to

from papercast.processors.arxiv import ArxivProcessor
from papercast.processors.semanticscholar import SemanticScholarProcessor

Pros:

  • Possibility to simplify static code analysis?
    • Take a look at processors/init.py and see whether this would actually simplify anything. See #9
  • Maybe make it clearer where each object is coming from

Cons:

  • Import statement is not as clean, importing types from several locations take up multiple lines

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.