Giter Club home page Giter Club logo

imslp's Introduction

imslp

pytest codecov Documentation Status Downloads Run on Repl.it Stargazers

🎼 The clean and modern way of accessing IMSLP data and scores programmatically. 🎢

Installation

The package is available on PyPi and can be installed using your favorite package manager:

pip install imslp

Data Sources

This project attempts to use robust sources of data, that do not require web scraping of some sort:

  • MediaWiki API. IMSLP is one of tens of thousands of websites built on top of MediaWiki, the framework created for Wikipedia.org. As such, it can be accessed through the MediaWiki API for which, fortunately, there exists a fantastic Python wrapper library called mwclient.

  • IMSLP API. For convenience, the IMSLP built some ad-hoc scripts that can be used to get a list of people and a list of works, in a variety of different formats, including JSON.

It also uses scraping to collect additional information (such as the number of pages in a score, the number of times a score was downloaded, or the user-provided ratings).

Some quirks of IMSLP

While fortunately, as mentioned, IMSLP uses a widely used open-source Wiki platform, MediaWiki, it has a handful of quirks. Such as:

  • Composers are stored as Category, for instance Category:Scarlatti, Domenico. For each composer, there is usually three tabs: "Compositions", "Collaborations" and "Collections"; these are stored as separate categories resulting from the concatenation of the composer and subtype, such as Category:Scarlatti, Domenico/Collections.

  • PDF files for sheet music are stored as "images"; unfortunately, for the time being, the scheme does not appear in the URLs computed for the files. These need to be manually patched.

  • The imslpdisclaimeraccepted cookie must be set to "yes" for files to download properly (otherwise, downloading any file will result in the disclaimer page). With mwclient, this can be specified on login.

    cookies = {
        "imslp_wikiLanguageSelectorLanguage": "en",
        "imslpdisclaimeraccepted": "yes",
    }
  • Much of the metadata associated with images, such as the internal ID or the download counter, is stored separately than the MediaWiki metadata. This makes scraping the rendered HTML page a necessary endeavour.

Fortunately all these quirks are handled by this package!

Related Projects

Here are a handful of other related projects available on GitHub to access the IMSLP data programmatically:

  • jjjake/imslp-scrape: Last commit in May 2012 (32 commits), mix of Python and shell, scraping the website for data (people, score links) with HTML parsing.

  • FrankTheCodeMonkey/IMSLP-Scraper: Last commit in June 2020 (6 commits), Python, scraping the website for data and scores, with HTML parsing and Selenium.

  • josefleventon/imslp-api: Last commit in May 2020 (17 commits), JavaScript, uses IMSLP's custom API to get the list of people and list of works programmatically through a web API query.

More recently, and in other languages:

Acknowledgements

Let's be clear that all the heavy lifting is done by mwclientβ€”and the volunteers who uploaded and/or scanned and/or typeset the scores on IMSLP.

License

This project is licensed under the LGPLv3 license, with the understanding that importing a Python modular is similar in spirit to dynamically linking against a library.

  • You can use the library imslp in any project, for any purpose, as long as you provide some acknowledgement to this original project for use of the library.

  • If you make improvements to imslp, you are required to make those changes publicly available.

imslp's People

Contributors

github-actions[bot] avatar jlumbroso avatar ramseyharrison avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

imslp's Issues

Using this library

Hi there, interesting project there. However, I don't see how to get it working.

Looking at your code I tried:

import imslp
cl = imslp.client.ImslpClient('User', 'PW')

but get the warning module 'imslp' has no attribute 'client'. Is it that __init__.py is missing an import client or that I'm just missing the documentation to see how this library works?

How to search for works efficiently?

Going off the example in #1 (comment), I tried the following code:

from imslp import client
r = client.ImslpClient()
results = r.search_works(composer="Schoenberg")
ids = [r["intvals"]["pageid"] for r in results]
print(ids)

The search_works() call "appears" to produce unresponsive behaviour, as there is no indication that anything is being done. However, sending a keyboard interrupt after some time (a few minutes) does return data. Is there a more efficient method to get information on a large number of works?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.