Revision
4 -
Fri Apr 25 03:31:52 2008 UTC
(2 years, 4 months ago)
by
dantman
File size: 14018 byte(s)
Importing Pywikipediabot code.
Importing some pywikipediabot family files (Just the Uncyclopedia one and the Wikimedia/Wikitravel wiki, afaik)
1 This is a package to build robots for MediaWiki wikis like Wikipedia. Some example robots are
2 included.
3
4 =======================================================================
5 PLEASE DO NOT PLAY WITH THIS PACKAGE. These programs can actually
6 modify the live wiki on the net, and proper wiki-etiquette should
7 be followed before running it on any wiki.
8 =======================================================================
9
10 To get started on proper usage of the bot framework, please refer to:
11
12 http://meta.wikimedia.org/wiki/Using_the_python_wikipediabot
13
14 The contents of the package are:
15
16 === Library routines ===
17
18 LICENSE : a reference to the MIT license
19 wikipedia.py : The wikipedia library
20 wiktionary.py : The wiktionary library
21 config.py : Configuration module containing all defaults. Do not
22 change these! See below how to change values.
23 titletranslate.py : rules and tricks to auto-translate wikipage titles
24 date.py : Date formats in various languages
25 family.py : Abstract superclass for wiki families. Subclassed by
26 the classes in the 'families' subdirectory.
27 catlib.py : Library routines written especially to handle
28 category pages and recurse over category contents.
29 gui.py : Some GUI elements for solve_disambiguation.py
30 mediawiki_messages.py : Access to the various translations of the MediaWiki
31 software interface.
32 pagegenerators.py : Generator pages.
33 userlib.py : Library to work with users, their pages and talk pages.
34 BeautifulSoup.py : is a Python HTML/XML parser designed for quick turnaround
35 projects like screen-scraping. See more:
36 http://www.crummy.com/software/BeautifulSoup
37
38 === Utilities ===
39
40 basic.py : Is a template from which simple bots can be made.
41 checkusage.py : Provides a way for users of the Wikimedia toolserver to check the
42 use of images from Commons on other Wikimedia wikis.
43 extract_wikilinks.py : Two bots to get all linked-to Wikipedia pages from an
44 HTML-file. They differ in their output: extract_names
45 gives bare names (can be used for solve_disambiguation.py,
46 table2wiki.py or windows-chars.py), extract_wikilinks
47 gives them in interwiki-link format (can be used for
48 interwiki.py)
49 followlive.py : Periodically grab the list of new articles and analyze
50 them. If the article is too short, a menu will let you
51 easily add a template.
52 get.py : Script to gets a page and writes its contents to standard output.
53 login.py : Log in to an account on your "home" wikipedia.
54 splitwarning.py : split an interwiki.log file into warning files for each
55 separate language. suggestion: Zip the created files up,
56 put them somewhere on the internet, and send an
57 announcement of the location on the robot mailinglist.
58 test.py : Check whether you are logged in.
59 testfamily.py : Check whether you are logged in all known languages in a family.
60 xmltest.py : Read an XML file (e.g. the sax_parse_bug.txt sometimes
61 created by interwiki.py), and if it contains an error,
62 show a stacktrace with the location of the error.
63 editarticle.py : Edit an article with your favourite editor. Run
64 the script with the "--help" option to get
65 detailed infortion on possiblities.
66 sqldump.py : Extract information from local cur SQL dump
67 files, like the ones at http://download.wikimedia.org
68 rcsort.py : A tool to see the recentchanges ordered by user instead of by date.
69 threadpool.py :
70 xmlreader.py :
71 watchlist.py : Allows access to the bot account's watchlist.
72 wikicomserver.py : This library allows the use of the pywikipediabot directly
73 from COM-aware applications.
74
75 === Robots ===
76
77 capitalize_redirects.py: Script to create a redirect of capitalize articles.
78 casechecker.py : Script to enumerate all pages in the wikipedia and find all titles
79 with mixed latin and cyrilic alphabets.
80 category.py : add a category link to all pages mentioned on a page,
81 change or remove category tags
82 catall.py : Add or change categories on a number of pages.
83 catmove.pl : Need Perl programming language for takes a list of category
84 moves or removes to make and uses category.py.
85 clean_sandbox.py : This bot makes the cleaned of the page of tests.
86 commons_link.py : This robot include commons template to linking Commons and
87 your wiki project.
88 copyright.py : This robot check copyright text in Google, Yahoo! and Live Search.
89 cosmetic_changes.py : Can do slight modifications to a wiki page source code
90 such that the code looks cleaner.
91 delete.py : This script can be used to delete pages en masse.
92 disambredir.py : Changing redirect names in disambiguation pages.
93 featured.py : A robot to check feature articles.
94 fixes.py : This is not a bot, perform one of the predefined replacements
95 tasks, used for "replace.py -fix:replacement".
96 image.py : This script can be used to change one image to another or
97 remove an image entirely.
98 imagetransfer.py : Given a Wikipedia page, check the interwiki links
99 for images, and let the user choose among them for
100 images to upload
101 inline_images.py : This bot looks for images that are linked inline (i.e., they
102 are hosted from an external server and hotlinked).
103 interwiki.py : A robot to check interwiki links on all pages (or
104 a range of pages) of a wikipedia.
105 interwiki_graph.py : Possible create graph with interwiki.py.
106 imageharvest.py : Bot for getting multiple images from an external site.
107 isbn.py : Bot for converts all ISBN-10 codes to the ISBN-13 format.
108 makecat.py : Given an existing or new category, find pages for that
109 category.
110 movepages.py : Bot page moves to another title.
111 nowcommons.py : This bot can be deleted images with NowCommons template.
112 pagefromfile.py : This bot takes its input from a file that contains a number of
113 pages to be put on the wiki.
114 redirect.py : Fix double redirects and broken redirects. Note:
115 solve_disambiguation also has functions which treat
116 redirects.
117 refcheck.py : This script checks references to see if they are properly
118 formatted.
119 replace.py : Search articles for a text and replace it by another
120 text. Both text are set in two configurable
121 text files. The bot can either work on a set of given
122 pages or crawl an SQL dump.
123 saveHTML.py : Downloads the HTML-pages of articles and images.
124 selflink.py : This bot goes over multiple pages of the home wiki,
125 searches for selflinks, and allows removing them.
126 solve_disambiguation.py: Interactive robot doing disambiguation.
127 speedy_delete.py : This bot load a list of pages from the category of candidates
128 for speedy deletion and give the user an interactive prompt to decide
129 whether each should be deleted or not.
130 spellcheck.py : This bot spellchecks Wikipedia pages.
131 standardize_interwiki.py:A robot that downloads a page, and reformats the
132 interwiki links in a standard way (i.e. move all
133 of them to the bottom or the top, with the same
134 separator, in the right order).
135 standardize_notes.py : Converts external links and notes/references to
136 : Footnote3 ref/note format. Rewrites References.
137 table2wiki.py : Semi-automatic converting HTML-tables to wiki-tables.
138 templatecount.py : Display the list of pages transcluding a given list
139 of templates.
140 template.py : change one template (that is {{...}}) into another.
141 touch.py : Bot goes over all pages of the home wiki, and edits
142 them without changing.
143 unlink.py : This bot unlinks a page on every page that links to it.
144 unusedfiles.py : Bot appends some text to all unused images and other
145 text to the respective uploaders.
146 upload.py : upload an image to Wikipedia.
147 us-states.py : A robot to add redirects to cities for US state
148 abbreviations.
149 warnfile.py : A robot that parses a warning file created by
150 interwiki.py on another wikipedia language,
151 and implements the suggested changes without
152 verifying them.
153 weblinkchecker.py : Check if external links are still working.
154 welcome.py : Script to welcome new users.
155 windows_chars.py : Change characters that are not part of Latin-1 into
156 something harmless. It is advisable to do this on
157 Latin-1 wikis before switching to UTF-8.
158
159 === Directories ===
160
161 archive : Contains old bots.
162 category :
163 copyright : Contains information retrieved by copyright.py
164 deadlinks : Contains information retrieved by weblinkchecker.py
165 disambiguations : If you run solve_disambiguation.py with the -primary
166 argument, the bot will save information here
167 families : Contains wiki-specific information like URLs,
168 languages, encodings etc.
169 featured : Stored featured article in cache file.
170 interwiki_dump : If the interwiki bot is interrupted, it will store
171 a dump file here. This file will be read when using
172 the interwiki bot with -restore or -continue.
173 interwiki_graphs : Contains graphs for interwiki_graph.py
174 logs : Contains logfiles.
175 mediawiki-messages : Information retrieved by mediawiki_messages.py will
176 be stored here.
177 login-data : login.py stores your cookies here (Your password won't
178 be stored as plaintext).
179 simplejson : A simple, fast, extensible JSON encoder and decoder
180 used by query.py.
181 spelling : Contains dictionaries for spellcheck.py.
182 userinterfaces : Contains Tkinter, WxPython, terminal and transliteration
183 interfaces user choose in user-config.py
184 watchlists : Information retrieved by watchlist.py will be stored
185 here.
186 wiktionary : Contains script to used for Wiktionary project.
187
188
189 === Unit tests ===
190
191 wiktionarytest.py : Unit tests for wiktionary.py
192
193
194
195 External software can be used with PyWikipediaBot:
196 * Win32com library for use with wikicomserver.py
197 * Pydot, Pyparsing and Graphviz for use with interwiki_graph.py
198 * JSON for use with query.py
199 * PyGoogle to access Google Web API and PySearch to access Yahoo! Search
200 Web Services for use with copyright.py and pagegenerators.py
201 * MySQLdb to access MySQL database for use with pagegenerators.py
202
203 PyWikipediaBot makes use of some modules that are part of python, but that
204 are not installed by default on some Linux distributions:
205 * python-xml (required to parse XML via SaX2)
206 * python-celementtree (recommended if you use XML dumps)
207 * python-tkinter (optional, used by some experimental GUI stuff)
208
209
210 More precise information, and a list of the options that are available for
211 the various programs, can be retrieved by running the bot with the -help
212 parameter, e.g.
213
214 python interwiki.py -help
215
216 You need to have at least python version 2.4 (http://www.python.org/download/)
217 installed on your computer to be able to run any of the code in this package.
218 Although some of the code may work on python version 2.3, support for older
219 versions of python is not planned.
220
221 You do not need to "install" this package to be able to make use of
222 it. You can actually just run it from the directory where you unpacked
223 it or where you have your copy of the SVN sources.
224
225 Before you run any of the programs, you need to create a file named
226 user-config.py in your current directory. It needs at least two lines:
227 The first line should set your real name; this will be used to identify you
228 when the robot is making changes, in case you are not logged in. The
229 second line sets the code of your home language. The file should look like:
230
231 ===========
232 username='My name'
233 mylang='xx'
234 ===========
235
236 There are other variables that can be set in the configuration file, please
237 check config.py for ideas.
238
239 After that, you are advised to create a username + password for the bot, and
240 run login.py. Anonymous editing is not possible.