Giter Club home page Giter Club logo

ceterach's Introduction

ceterach

Documentation Status

Rather than attempting the impossible task of being a full service, e.g. EarwigBot, ceterach aims to be a simple modular toolkit. ceterach tries to strike the balance between being a fully featured package and being a light interface to MediaWiki, leading it to be as capable as it is of standing alone as it is of fitting seamlessly alongside other Python code.

ceterach emphatically aims to be a general MediaWiki interface, not a Wikipedia-centric one. Wikipedia-specific functionality can be added by creating extensions, a process that is as of yet undocumented and left as an exercise to the reader.

Examples

The following short program demonstrates manipulating the text of a Wikipedia article:

from ceterach.api import MediaWiki

api = MediaWiki("http://en.wikipedia.org/w/api.php")
api.login(username, password)
p = api.page("Wikipedia")
if p.exists:
    text = p.content.replace("Jimmy Wales", "[[User:Jimbo Wales]]")
    summary = "Replaced Jimmy with his username"
    p.edit(text, summary, minor=True)
api.logout()

The following short program deletes the talk pages of pages (not recursing into subcategories) of a category, except for the page on Napoleon:

from ceterach.api import MediaWiki

api = MediaWiki("http://en.wikipedia.org/w/api.php")
api.login(username, password)

catname = input("What's the category? Do not enter the 'Category:' prefix: ")
c = api.category("Category:" + catname)
for p in c.members:
    if p.title == "Napoleon":
        print("Found Napoleon! Skipping...")
        continue
    if not p.is_talkpage:
        p = p.toggle_talk() 
    p.delete("Hasta la vista")
api.logout()

The following short program emails everyone who edited the article on Napoleon:

from ceterach.api import MediaWiki

api = MediaWiki("http://en.wikipedia.org/w/api.php")
api.login(username, password)

p = api.page("Napoleon Bonaparte", follow_redirects=True)
# any action performed on p will be equivalent to working on the "Napoleon"
# page but resolution of redirects is lazy! Since we're told that the title
# is "Napoleon Bonaparte", we won't try to resolve redirects until we try to
# interact with the API using that title:
assert p.title == "Napoleon Bonaparte"
p.load_attributes()  # We can force page normalisation by calling this method
assert p.title == "Napoleon"

# You can set the follow_redirects parameter to False to ensure that you don't
# follow redirects:
p2 = api.page("Napoleon Bonaparte", follow_redirects=False)
p2.load_attributes()
assert p2.is_redirect
print(p2.content)  # prints '#REDIRECT [[Napoleon]] {{R from other name}}'
del p2

for r in p.revisions:  # p.revisions[n] is newer than p.revisions[n+1]
    u = r.user
    if u.is_emailable:
        subject = "Regarding your edit on Napoleon"
        if r.is_minor:
            subject = subject.replace("your edit", "your minor edit")
        body = "I saw revision number {}. Nice edit! Unless it was vandalism."
        body = body.format(r.revid)
        u.email(subject, body, cc=False)  # Don't spam myself lol
api.logout()

ceterach's People

Contributors

geofbot avatar legoktm avatar riamse avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

ceterach's Issues

Plugin system

The creation of a plugin system would solve the existential crisis that ceterach currently faces: do we tailor ourselves toward Wikimedia Foundation websites or do we make ourselves generic for most/all MediaWikis?

How? It would permit us to be generic at the core, but create plugins and overlay it over this core that make it work with Wikimedia things.

I'd better get on this.

api.logout() doesn't provide "token" parameter

Since the changes from #11 (which appear to have been necessary to fix another problem), I now get the following error when I call api.logout():

Traceback (most recent call last):
  File "/....py", line 46, in <module>
    api.logout()
  File "/.../src/ceterach/ceterach/api.py", line 299, in logout
    return len(self.call(action="logout", use_defaults=False)) == 0
  File "/.../src/ceterach/ceterach/api.py", line 272, in call
    return self._call(params, more_params, use_defaults=use_defaults)
  File "/.../src/ceterach/ceterach/api.py", line 217, in _call
    raise raiseme
ceterach.exceptions.CeterachError: 'missingparam': 'The "token" parameter must be set.'

Write tests

In order to ensure that python setup.py test doesn't fire up a local mediawiki at localhost:8000 and then runs things there, we need an alternative. Perhaps it'd be okay to just check if requests are what they should be, by replacing MediaWiki.call with an identity function to return the query parameters.

Lack of documentation

Could you please document this code in a better way? It would be great if you could write a test case to show exactly how we can use ceterach to extract information off a Wikipedia page.

Thanks!

readapidenied errors

We've upgraded our Wiki to a new version of mediaWiki and now ceterach doesn't work anymore.

Here's the dump:

  File "/home/cad/anaconda/Anaconda3-2020.11/lib/python3.8/site-packages/ceterach-0.0.1-py3.8.egg/ceterach/utils.py", line 42, in wrapped
    if not hasattr(self, attr): self.load_attributes()

  File "/home/cad/anaconda/Anaconda3-2020.11/lib/python3.8/site-packages/ceterach-0.0.1-py3.8.egg/ceterach/page.py", line 107, in load_attributes
    self.__load(res)

  File "/home/cad/anaconda/Anaconda3-2020.11/lib/python3.8/site-packages/ceterach-0.0.1-py3.8.egg/ceterach/page.py", line 128, in __load
    res = res or next(i(kwargs, use_defaults=False))

  File "/home/cad/anaconda/Anaconda3-2020.11/lib/python3.8/site-packages/ceterach-0.0.1-py3.8.egg/ceterach/api.py", line 384, in olditerator
    res = self.call(params, rawcontinue='', **more_params)

  File "/home/cad/anaconda/Anaconda3-2020.11/lib/python3.8/site-packages/ceterach-0.0.1-py3.8.egg/ceterach/api.py", line 271, in call
    return self._call(params, more_params, use_defaults=use_defaults)

  File "/home/cad/phozone/latest/python/mwiki.py", line 140, in dbg_call
    raise raiseme

CeterachError: You need read permission to use this module.

I managed to hack a copy of api._call and print out the input parameters:

ceterach.api._call({'prop': ('info', 'revisions', 'categories'), 'rvprop': ('ids', 'flags', 'timestamp', 'user', 'comment', 'content'), 'inprop': ('protection',), 'rvlimit': 1, 'rvdir': 'older', 'titles': 'Test'},{'rawcontinue': ''},False)

the return json is:

{'error': {'code': 'readapidenied', 'info': 'You need read permission to use this module.', '*': 'See http://wiki.phoelex.com/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at &lt;https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce&gt; for notice of API deprecations and breaking changes.'}}

There's a similar problem in the java api which is fixed by calling their "setVersion" function - it's described in stackoverflow here: https://stackoverflow.com/questions/57146687/how-to-fix-readapidenied-error-on-mediawiki-api-in-java

So I'm guessing that something changed in the versions of Wiki that means running a query through the api must be done differently.

Warnings in Python 3.8

Hi, just updated to Anaconda 3.2020.07

When I import ceterach I now get the following warnings:

/path/to/python3.8/site-packages/ceterach-0.0.1-py3.8.egg/ceterach/page.py:46: SyntaxWarning: "is" with a literal. Did you mean "=="?
/path/to/python3.8/site-packages/ceterach-0.0.1-py3.8.egg/ceterach/page.py:46: SyntaxWarning: "is" with a literal. Did you mean "=="?
/path/to/python3.8/site-packages/ceterach-0.0.1-py3.8.egg/ceterach/page.py:49: SyntaxWarning: "is not" with a literal. Did you mean "!="?
/path/to/python3.8/site-packages/ceterach-0.0.1-py3.8.egg/ceterach/page.py:49: SyntaxWarning: "is not" with a literal. Did you mean "!="?
/path/to/python3.8/site-packages/ceterach-0.0.1-py3.8.egg/ceterach/page.py:46: SyntaxWarning: "is" with a literal. Did you mean "=="?
/path/to/python3.8/site-packages/ceterach-0.0.1-py3.8.egg/ceterach/page.py:46: SyntaxWarning: "is" with a literal. Did you mean "=="?
/path/to/python3.8/site-packages/ceterach-0.0.1-py3.8.egg/ceterach/page.py:49: SyntaxWarning: "is not" with a literal. Did you mean "!="?
/path/to/python3.8/site-packages/ceterach-0.0.1-py3.8.egg/ceterach/page.py:49: SyntaxWarning: "is not" with a literal. Did you mean "!="?

Make edit conflict handling cooler

The issue on the table: An edit is counted as starting when you call Page.edit(). This only takes care of API-related edit conflicts, but this doesn't cover all the cases: for example, when the actual edit is being calculated (e.g. parsing page content prior to calling Page.edit). We need a way to mark that a page is currently being edited before we actually send the requests to the API.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.