Giter Club home page Giter Club logo

rsemanticmediawikibot's Introduction

RSemanticMediaWikiBot

This is an bot developed in R for editing Semantic MediaWiki templates. This code is very much in development, and it is highly recommended to test it on a few pages before letting it loose on a wiki.

The primary motivation for Yet Another MediaWiki Bot Framework is that this bot is specifically design to help with batch editing of data contained within Semantic Templates that are commonly used with Semantic MediaWiki.

The main idea is that this bot converts templates into data structures in R. For example, it allows you to read from a wiki page a template such as:

{{City
| point=52.015, 4.356667
| country=Netherlands
}}

...and then convert this data into a list within R. The data contained in the list can be accessed via template$point, template$country, etc.

##Installation Once you check out the code, you can install the package via:

cd Directory/Of/RSemanticMediaWikiBot
bash ./checkBuildAndInstall.sh

This runs a shell script which performs the steps below:

  1. Check that everything is ok:
cd Directory/Of/RSemanticMediaWikiBot
R CMD check .
  1. Build:
cd .. 
R CMD build RSemanticMediaWikiBot
  1. Install it so that it is accessible within the R environment:
sudo R CMD INSTALL RSemanticMediaWikiBot_0.1.tar.gz

The functions can then be accessed from within R code by first declaring:

library(RSemanticMediaWikiBot)

##Basic usage - logging in, reading, editing

###Logging in

#TODO fill these in based on your own configuration
username=USERNAME
password=PASSWORD
apiURL = "http://my.wiki.com/wiki/api.php"

bot = initializeBot(apiURL) #initialize the bot
login(username, password, bot) #login to the wiki

###Reading page text

text = read(title="MyWikiPage", bot) 

###Editing and saving page text

edit(title="MyWikiPage", 
     text="this is the new page text", 
     bot, 
     summary="my edit summary")

###Deleting pages

delete(pageName, bot, reason="deleting old page")

##Working with template data

###Extracting templates Assuming that you are not working with multiple instance templates, you can retrieve and modify the data in a template as such:

template = getTemplateByName("MyTemplateName", "MyWikiPage", bot)[[1]]
#[[1]] is needed as a list is returned
#If using multiple-instance templates, then multiple templates will be returned

###Getting and modifying values of template parameters

valueOfTemplate = template$data$NameOfTemplateParameter

You can then modify this value by:

template$data$NameOfTemplateParameter = newValue

###Removing template parameters If you want to completely remove a parameter from a template (i.e. both the key and the value) such as changing this:

{{City
| point=52.015, 4.356667
| country=Netherlands
}}

to this:

{{City
| country=Netherlands
}}

then you can just do:

template$data$point = NULL

###Writing the template back to the wiki page The template with its new value can then be written back to the wiki as such:

writeTemplateToPage(template, bot, editSummary="testing bot")

The template contains information about the page which it came from, so the name of the page does not need to be specified.

###Writing Spreadsheet Data to Multiple Pages Spreadsheet data loaded into a dataframe can be used to make it easy to write data to templates contained on multiple pages. The first column of the data frame specifies the name of the page, while the second column is the name of the template to write to. The headers for the rest of the columns need to correspond to the names of the parameters in that template. The default behavior of this code is to not overwrite existing values unless you explicitly tell it to. A list of pages for which an existing value for a parameter were found are returned.

# default - will not overwrite existing parameter values that are already set
errorDFEntries = writeDataFrameToPageTemplates(dataFrame, bot, editSummary="what the bot is doing")

# overwrite existing values
errorDFEntries = writeDataFrameToPageTemplates(dataFrame, bot, overWriteConflicts=TRUE, editSummary="what the bot is doing")

###Writing a Data Frame to a Table on a Single page### The syntax for a sortable wikitable can be generated from a data frame. The code currently doesn't figure out how to intelligently put it on a page - it's up to you to figure out how to paste things together in some useful way.

# get the wiki table syntax
wikiTable = getWikiTableTextForDataFrame(df)

# put some text before and after the table
pageText = paste(someText, "\n\n", wikiTable, "\n\n", someMoreText, sep="")
  
# write this all to some wiki page
edit(title=pageTitle,
     text=pageText,
     bot,
     summary="adding a table")

##Future development/known issues

  • No support yet for multiple-instance templates. There needs to be a way to distinguish if one wants to edit an existing one, or add another.
  • No support yet for adding a new template to a page.
  • When editing a page, no check is done to see if it will create the page.
  • Nested template calls may not be parsed correctly
  • If the code is not able to connect to the wiki API, then it will terminate instead of trying to connect again. In practical experience, this means that you may have to run a script multiple times if you have several thousand edits.
  • There seems to be a memory leak if you read and/or edit around 10,000+ pages.

rsemanticmediawikibot's People

Contributors

cbdavis avatar

Watchers

James Chang avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.