Giter Club home page Giter Club logo

sawtpedia's Introduction

Sawtpedia

This is a joint development project of Wikimedia Tunisia and Data Engineering and Semantics Research Unit within the framework of Hack4OpenGLAM. Based on inspiration from the logic of QRpedia, Yamen Bousrih has first presented the idea at the Hack4OpenGLAM Showcase at the 2021 Creative Commons Global Summit. Then, he has disseminated it in Wikimedia Conferences such as WikidataCon 2021. Deployed at https://sawtpedia.toolforge.org, Sawtpedia generates a QRCode related to a monument that once scanned will fetch the Wikidata item for that monument and then open the audio file for the Wikipedia article about the monument in the mobile device's language if available in Wikimedia Commons. If the audio recording does not exist, Sawtpedia will try to generate an audio from the lead of the Wikipedia article in the user language using gTTS Text-to-Speech System.

Principles

The tool uses the same principle as QRpedia. However, we have updated the project approach by considering the latest advances in Web Development and in Wikimedia Projects. In fact, the tool is implemented in Python with Flask instead of PHP and benefits from the large-scale multilingual structured data available in Wikidata to work. The tool has two components:

  • A HTML Page with advanced JavaScript and CSS codes to generate a QRCode for a given monument. The input is the Wikipedia Page of the monument in any language. The Wikidata item of the monument is retrieved from the Wikipedia Page using JavaScript and mw.config. Then, a QRCode will be generated using the QRpedia web interface leading to a web service leading to the audio recording of the Wikipedia article about the monument in the language of the web browser of the mobile device.
  • A Web Service implemented in Python with Flask to redirect the user to the audio recording of the Wikipedia article about the monument in the language of the web browser of the mobile device. The input here is the Wikidata ID of the monument. The Web Service will retrieve the language of the web browser of the user. Then, it will find the URL of the audio recording in the considered language using a SPARQL query on spoken text audio statements of Wikidata. Here, Wikidata hub is used to return the Wikidata ID of the user language based on its IETF Language Tag. If the file exists, the user will be redirected to the audio. If it does not exist, the tool can:
    • convert the lead of the Wikipedia article about the Wikidata item in the user language to an audio using gTTS. The languages currently supported by gTTS are: Afrikaans (af), Arabic (ar), Bulgarian (bg), Bengali (bn), Bosnian (bs), Catalan (ca), Czech (cs), Welsh (cy), Danish (da), German (de), Greek (el), English (en), Esperanto (eo), Spanish (es), Estonian (et), Finnish (fi), French (fr), Gujarati (gu), Hindi (hi), Croatian (hr), Hungarian (hu), Armenian (hy), Indonesian (id), Icelandic (is), Italian (it), Japanese (ja), Javanese (jw), Khmer (km), Kannada (kn), Korean (ko), Latin (la), Latvian (lv), Macedonian (mk), Malayalam (ml), Marathi (mr), Myanmar Burmese (my), Nepali (ne), Dutch (nl), Norwegian (no), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Sinhala (si), Slovak (sk), Albanian (sq), Serbian (sr), Sundanese (su), Swedish (sv), Swahili (sw), Tamil (ta), Telugu (te), Thai (th), Filipino (tl), Turkish (tr), Ukrainian (uk), Urdu (ur), Vietnamese (vi), and Chinese (zh).
    • generate an error message if gTTS does not support the user language.

Requirements

Team

Acknowledgements

  • Capacity Building about Web Development with Flask has been provided by Data Engineering and Semantics Research Unit, University of Sfax, Tunisia as a part of the Federated Research Project PRF-COV19-D1-P1.
  • We thank Terence Eden and Roger Bamkin for providing the source codes of QRpedia. We were inspired by the QRpedia Principles and we have even reused several excerpts as well as the QRpedia web service for the generation of the QRCode from URL in our source codes. As we built Sawtpedia based on QRpedia, we use the MIT License for our source code and we adopt the Website Privacy Policy of Wikimedia UK for our tool.
  • We thank Legoktm, Mutante, AntiComposite, Reedy, RhinosF1, and Bryan Davis for supporting the deployment of the tool on Toolforge using SSH Server.
  • We thank Habib M'henni from Wikimedia Tunisia for his contribution to our testing of the tool.
  • We thank Abel Lifaeli Mbula for his contributions to the source code.

sawtpedia's People

Contributors

csisc avatar bam92 avatar

Stargazers

Elias Vasnic avatar Terence Eden avatar Thomas Vroylandt avatar Paul-Antoine avatar Daniel Mietchen avatar

Watchers

 avatar

Forkers

legoktm bam92

sawtpedia's Issues

TTS reads out image captions at the start

(Thanks for this project - I love it!)

I used https://en.wikipedia.org/wiki/Marble_Arch which took me to https://sawtpedia.toolforge.org/link?id=Q845529

That starts by reading out

The arch with The Cumberland Hotel, Great Cumberland Place and the trees of Bryanston Square beyond, parts of the British Regency-architecture Portman Estate; the hotel has an access to its Tube station"

Before getting to the useful bit of the article:

Marble Arch is a 19th-century white marble-faced triumphal arch in London, England.

Is it possible to get the TTS to skip image captions for articles?

Ogg files not supported by iOS

Whilst android mobiles are reading correctly the ogg files from Commons, iOS mobiles are now not reading correctly the ogg files.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.