| 1 : |
dantman |
4 |
This is a package to build robots for MediaWiki wikis like Wikipedia. Some example robots are
|
| 2 : |
|
|
included.
|
| 3 : |
|
|
|
| 4 : |
|
|
=======================================================================
|
| 5 : |
|
|
PLEASE DO NOT PLAY WITH THIS PACKAGE. These programs can actually
|
| 6 : |
|
|
modify the live wiki on the net, and proper wiki-etiquette should
|
| 7 : |
|
|
be followed before running it on any wiki.
|
| 8 : |
|
|
=======================================================================
|
| 9 : |
|
|
|
| 10 : |
|
|
To get started on proper usage of the bot framework, please refer to:
|
| 11 : |
|
|
|
| 12 : |
|
|
http://meta.wikimedia.org/wiki/Using_the_python_wikipediabot
|
| 13 : |
|
|
|
| 14 : |
|
|
The contents of the package are:
|
| 15 : |
|
|
|
| 16 : |
|
|
=== Library routines ===
|
| 17 : |
|
|
|
| 18 : |
|
|
LICENSE : a reference to the MIT license
|
| 19 : |
|
|
wikipedia.py : The wikipedia library
|
| 20 : |
|
|
wiktionary.py : The wiktionary library
|
| 21 : |
|
|
config.py : Configuration module containing all defaults. Do not
|
| 22 : |
|
|
change these! See below how to change values.
|
| 23 : |
|
|
titletranslate.py : rules and tricks to auto-translate wikipage titles
|
| 24 : |
|
|
date.py : Date formats in various languages
|
| 25 : |
|
|
family.py : Abstract superclass for wiki families. Subclassed by
|
| 26 : |
|
|
the classes in the 'families' subdirectory.
|
| 27 : |
|
|
catlib.py : Library routines written especially to handle
|
| 28 : |
|
|
category pages and recurse over category contents.
|
| 29 : |
|
|
gui.py : Some GUI elements for solve_disambiguation.py
|
| 30 : |
|
|
mediawiki_messages.py : Access to the various translations of the MediaWiki
|
| 31 : |
|
|
software interface.
|
| 32 : |
|
|
pagegenerators.py : Generator pages.
|
| 33 : |
|
|
userlib.py : Library to work with users, their pages and talk pages.
|
| 34 : |
|
|
BeautifulSoup.py : is a Python HTML/XML parser designed for quick turnaround
|
| 35 : |
|
|
projects like screen-scraping. See more:
|
| 36 : |
|
|
http://www.crummy.com/software/BeautifulSoup
|
| 37 : |
|
|
|
| 38 : |
|
|
=== Utilities ===
|
| 39 : |
|
|
|
| 40 : |
|
|
basic.py : Is a template from which simple bots can be made.
|
| 41 : |
|
|
checkusage.py : Provides a way for users of the Wikimedia toolserver to check the
|
| 42 : |
|
|
use of images from Commons on other Wikimedia wikis.
|
| 43 : |
|
|
extract_wikilinks.py : Two bots to get all linked-to Wikipedia pages from an
|
| 44 : |
|
|
HTML-file. They differ in their output: extract_names
|
| 45 : |
|
|
gives bare names (can be used for solve_disambiguation.py,
|
| 46 : |
|
|
table2wiki.py or windows-chars.py), extract_wikilinks
|
| 47 : |
|
|
gives them in interwiki-link format (can be used for
|
| 48 : |
|
|
interwiki.py)
|
| 49 : |
|
|
followlive.py : Periodically grab the list of new articles and analyze
|
| 50 : |
|
|
them. If the article is too short, a menu will let you
|
| 51 : |
|
|
easily add a template.
|
| 52 : |
|
|
get.py : Script to gets a page and writes its contents to standard output.
|
| 53 : |
|
|
login.py : Log in to an account on your "home" wikipedia.
|
| 54 : |
|
|
splitwarning.py : split an interwiki.log file into warning files for each
|
| 55 : |
|
|
separate language. suggestion: Zip the created files up,
|
| 56 : |
|
|
put them somewhere on the internet, and send an
|
| 57 : |
|
|
announcement of the location on the robot mailinglist.
|
| 58 : |
|
|
test.py : Check whether you are logged in.
|
| 59 : |
|
|
testfamily.py : Check whether you are logged in all known languages in a family.
|
| 60 : |
|
|
xmltest.py : Read an XML file (e.g. the sax_parse_bug.txt sometimes
|
| 61 : |
|
|
created by interwiki.py), and if it contains an error,
|
| 62 : |
|
|
show a stacktrace with the location of the error.
|
| 63 : |
|
|
editarticle.py : Edit an article with your favourite editor. Run
|
| 64 : |
|
|
the script with the "--help" option to get
|
| 65 : |
|
|
detailed infortion on possiblities.
|
| 66 : |
|
|
sqldump.py : Extract information from local cur SQL dump
|
| 67 : |
|
|
files, like the ones at http://download.wikimedia.org
|
| 68 : |
|
|
rcsort.py : A tool to see the recentchanges ordered by user instead of by date.
|
| 69 : |
|
|
threadpool.py :
|
| 70 : |
|
|
xmlreader.py :
|
| 71 : |
|
|
watchlist.py : Allows access to the bot account's watchlist.
|
| 72 : |
|
|
wikicomserver.py : This library allows the use of the pywikipediabot directly
|
| 73 : |
|
|
from COM-aware applications.
|
| 74 : |
|
|
|
| 75 : |
|
|
=== Robots ===
|
| 76 : |
|
|
|
| 77 : |
|
|
capitalize_redirects.py: Script to create a redirect of capitalize articles.
|
| 78 : |
|
|
casechecker.py : Script to enumerate all pages in the wikipedia and find all titles
|
| 79 : |
|
|
with mixed latin and cyrilic alphabets.
|
| 80 : |
|
|
category.py : add a category link to all pages mentioned on a page,
|
| 81 : |
|
|
change or remove category tags
|
| 82 : |
|
|
catall.py : Add or change categories on a number of pages.
|
| 83 : |
|
|
catmove.pl : Need Perl programming language for takes a list of category
|
| 84 : |
|
|
moves or removes to make and uses category.py.
|
| 85 : |
|
|
clean_sandbox.py : This bot makes the cleaned of the page of tests.
|
| 86 : |
|
|
commons_link.py : This robot include commons template to linking Commons and
|
| 87 : |
|
|
your wiki project.
|
| 88 : |
|
|
copyright.py : This robot check copyright text in Google, Yahoo! and Live Search.
|
| 89 : |
|
|
cosmetic_changes.py : Can do slight modifications to a wiki page source code
|
| 90 : |
|
|
such that the code looks cleaner.
|
| 91 : |
|
|
delete.py : This script can be used to delete pages en masse.
|
| 92 : |
|
|
disambredir.py : Changing redirect names in disambiguation pages.
|
| 93 : |
|
|
featured.py : A robot to check feature articles.
|
| 94 : |
|
|
fixes.py : This is not a bot, perform one of the predefined replacements
|
| 95 : |
|
|
tasks, used for "replace.py -fix:replacement".
|
| 96 : |
|
|
image.py : This script can be used to change one image to another or
|
| 97 : |
|
|
remove an image entirely.
|
| 98 : |
|
|
imagetransfer.py : Given a Wikipedia page, check the interwiki links
|
| 99 : |
|
|
for images, and let the user choose among them for
|
| 100 : |
|
|
images to upload
|
| 101 : |
|
|
inline_images.py : This bot looks for images that are linked inline (i.e., they
|
| 102 : |
|
|
are hosted from an external server and hotlinked).
|
| 103 : |
|
|
interwiki.py : A robot to check interwiki links on all pages (or
|
| 104 : |
|
|
a range of pages) of a wikipedia.
|
| 105 : |
|
|
interwiki_graph.py : Possible create graph with interwiki.py.
|
| 106 : |
|
|
imageharvest.py : Bot for getting multiple images from an external site.
|
| 107 : |
|
|
isbn.py : Bot for converts all ISBN-10 codes to the ISBN-13 format.
|
| 108 : |
|
|
makecat.py : Given an existing or new category, find pages for that
|
| 109 : |
|
|
category.
|
| 110 : |
|
|
movepages.py : Bot page moves to another title.
|
| 111 : |
|
|
nowcommons.py : This bot can be deleted images with NowCommons template.
|
| 112 : |
|
|
pagefromfile.py : This bot takes its input from a file that contains a number of
|
| 113 : |
|
|
pages to be put on the wiki.
|
| 114 : |
|
|
redirect.py : Fix double redirects and broken redirects. Note:
|
| 115 : |
|
|
solve_disambiguation also has functions which treat
|
| 116 : |
|
|
redirects.
|
| 117 : |
|
|
refcheck.py : This script checks references to see if they are properly
|
| 118 : |
|
|
formatted.
|
| 119 : |
|
|
replace.py : Search articles for a text and replace it by another
|
| 120 : |
|
|
text. Both text are set in two configurable
|
| 121 : |
|
|
text files. The bot can either work on a set of given
|
| 122 : |
|
|
pages or crawl an SQL dump.
|
| 123 : |
|
|
saveHTML.py : Downloads the HTML-pages of articles and images.
|
| 124 : |
|
|
selflink.py : This bot goes over multiple pages of the home wiki,
|
| 125 : |
|
|
searches for selflinks, and allows removing them.
|
| 126 : |
|
|
solve_disambiguation.py: Interactive robot doing disambiguation.
|
| 127 : |
|
|
speedy_delete.py : This bot load a list of pages from the category of candidates
|
| 128 : |
|
|
for speedy deletion and give the user an interactive prompt to decide
|
| 129 : |
|
|
whether each should be deleted or not.
|
| 130 : |
|
|
spellcheck.py : This bot spellchecks Wikipedia pages.
|
| 131 : |
|
|
standardize_interwiki.py:A robot that downloads a page, and reformats the
|
| 132 : |
|
|
interwiki links in a standard way (i.e. move all
|
| 133 : |
|
|
of them to the bottom or the top, with the same
|
| 134 : |
|
|
separator, in the right order).
|
| 135 : |
|
|
standardize_notes.py : Converts external links and notes/references to
|
| 136 : |
|
|
: Footnote3 ref/note format. Rewrites References.
|
| 137 : |
|
|
table2wiki.py : Semi-automatic converting HTML-tables to wiki-tables.
|
| 138 : |
|
|
templatecount.py : Display the list of pages transcluding a given list
|
| 139 : |
|
|
of templates.
|
| 140 : |
|
|
template.py : change one template (that is {{...}}) into another.
|
| 141 : |
|
|
touch.py : Bot goes over all pages of the home wiki, and edits
|
| 142 : |
|
|
them without changing.
|
| 143 : |
|
|
unlink.py : This bot unlinks a page on every page that links to it.
|
| 144 : |
|
|
unusedfiles.py : Bot appends some text to all unused images and other
|
| 145 : |
|
|
text to the respective uploaders.
|
| 146 : |
|
|
upload.py : upload an image to Wikipedia.
|
| 147 : |
|
|
us-states.py : A robot to add redirects to cities for US state
|
| 148 : |
|
|
abbreviations.
|
| 149 : |
|
|
warnfile.py : A robot that parses a warning file created by
|
| 150 : |
|
|
interwiki.py on another wikipedia language,
|
| 151 : |
|
|
and implements the suggested changes without
|
| 152 : |
|
|
verifying them.
|
| 153 : |
|
|
weblinkchecker.py : Check if external links are still working.
|
| 154 : |
|
|
welcome.py : Script to welcome new users.
|
| 155 : |
|
|
windows_chars.py : Change characters that are not part of Latin-1 into
|
| 156 : |
|
|
something harmless. It is advisable to do this on
|
| 157 : |
|
|
Latin-1 wikis before switching to UTF-8.
|
| 158 : |
|
|
|
| 159 : |
|
|
=== Directories ===
|
| 160 : |
|
|
|
| 161 : |
|
|
archive : Contains old bots.
|
| 162 : |
|
|
category :
|
| 163 : |
|
|
copyright : Contains information retrieved by copyright.py
|
| 164 : |
|
|
deadlinks : Contains information retrieved by weblinkchecker.py
|
| 165 : |
|
|
disambiguations : If you run solve_disambiguation.py with the -primary
|
| 166 : |
|
|
argument, the bot will save information here
|
| 167 : |
|
|
families : Contains wiki-specific information like URLs,
|
| 168 : |
|
|
languages, encodings etc.
|
| 169 : |
|
|
featured : Stored featured article in cache file.
|
| 170 : |
|
|
interwiki_dump : If the interwiki bot is interrupted, it will store
|
| 171 : |
|
|
a dump file here. This file will be read when using
|
| 172 : |
|
|
the interwiki bot with -restore or -continue.
|
| 173 : |
|
|
interwiki_graphs : Contains graphs for interwiki_graph.py
|
| 174 : |
|
|
logs : Contains logfiles.
|
| 175 : |
|
|
mediawiki-messages : Information retrieved by mediawiki_messages.py will
|
| 176 : |
|
|
be stored here.
|
| 177 : |
|
|
login-data : login.py stores your cookies here (Your password won't
|
| 178 : |
|
|
be stored as plaintext).
|
| 179 : |
|
|
simplejson : A simple, fast, extensible JSON encoder and decoder
|
| 180 : |
|
|
used by query.py.
|
| 181 : |
|
|
spelling : Contains dictionaries for spellcheck.py.
|
| 182 : |
|
|
userinterfaces : Contains Tkinter, WxPython, terminal and transliteration
|
| 183 : |
|
|
interfaces user choose in user-config.py
|
| 184 : |
|
|
watchlists : Information retrieved by watchlist.py will be stored
|
| 185 : |
|
|
here.
|
| 186 : |
|
|
wiktionary : Contains script to used for Wiktionary project.
|
| 187 : |
|
|
|
| 188 : |
|
|
|
| 189 : |
|
|
=== Unit tests ===
|
| 190 : |
|
|
|
| 191 : |
|
|
wiktionarytest.py : Unit tests for wiktionary.py
|
| 192 : |
|
|
|
| 193 : |
|
|
|
| 194 : |
|
|
|
| 195 : |
|
|
External software can be used with PyWikipediaBot:
|
| 196 : |
|
|
* Win32com library for use with wikicomserver.py
|
| 197 : |
|
|
* Pydot, Pyparsing and Graphviz for use with interwiki_graph.py
|
| 198 : |
|
|
* JSON for use with query.py
|
| 199 : |
|
|
* PyGoogle to access Google Web API and PySearch to access Yahoo! Search
|
| 200 : |
|
|
Web Services for use with copyright.py and pagegenerators.py
|
| 201 : |
|
|
* MySQLdb to access MySQL database for use with pagegenerators.py
|
| 202 : |
|
|
|
| 203 : |
|
|
PyWikipediaBot makes use of some modules that are part of python, but that
|
| 204 : |
|
|
are not installed by default on some Linux distributions:
|
| 205 : |
|
|
* python-xml (required to parse XML via SaX2)
|
| 206 : |
|
|
* python-celementtree (recommended if you use XML dumps)
|
| 207 : |
|
|
* python-tkinter (optional, used by some experimental GUI stuff)
|
| 208 : |
|
|
|
| 209 : |
|
|
|
| 210 : |
|
|
More precise information, and a list of the options that are available for
|
| 211 : |
|
|
the various programs, can be retrieved by running the bot with the -help
|
| 212 : |
|
|
parameter, e.g.
|
| 213 : |
|
|
|
| 214 : |
|
|
python interwiki.py -help
|
| 215 : |
|
|
|
| 216 : |
|
|
You need to have at least python version 2.4 (http://www.python.org/download/)
|
| 217 : |
|
|
installed on your computer to be able to run any of the code in this package.
|
| 218 : |
|
|
Although some of the code may work on python version 2.3, support for older
|
| 219 : |
|
|
versions of python is not planned.
|
| 220 : |
|
|
|
| 221 : |
|
|
You do not need to "install" this package to be able to make use of
|
| 222 : |
|
|
it. You can actually just run it from the directory where you unpacked
|
| 223 : |
|
|
it or where you have your copy of the SVN sources.
|
| 224 : |
|
|
|
| 225 : |
|
|
Before you run any of the programs, you need to create a file named
|
| 226 : |
|
|
user-config.py in your current directory. It needs at least two lines:
|
| 227 : |
|
|
The first line should set your real name; this will be used to identify you
|
| 228 : |
|
|
when the robot is making changes, in case you are not logged in. The
|
| 229 : |
|
|
second line sets the code of your home language. The file should look like:
|
| 230 : |
|
|
|
| 231 : |
|
|
===========
|
| 232 : |
|
|
username='My name'
|
| 233 : |
|
|
mylang='xx'
|
| 234 : |
|
|
===========
|
| 235 : |
|
|
|
| 236 : |
|
|
There are other variables that can be set in the configuration file, please
|
| 237 : |
|
|
check config.py for ideas.
|
| 238 : |
|
|
|
| 239 : |
|
|
After that, you are advised to create a username + password for the bot, and
|
| 240 : |
|
|
run login.py. Anonymous editing is not possible.
|