Now seems like a good time to ask who is keeping backups of what? I've got backups of the articles I've done, and pics I've added, but not in any wiki native format, they're .txt and .jpg, so restoration would take ages. Wgetting the wiki a while back proved a bit impractical, and its far bigger now.
There's various bits of s/w out there that will spider their way through a website, grab all the files that make it up and write it to disc. I used one called WinHTTrack once which worked ok.
They're rarely much use when a site is heavily scripted, as you get the rendered output of the scripts rather than the actual website source. As NT (Tabby) hinted, the average wiki consists entirely of scripts, which interpret the wiki contents, images and markup and convert them to HTML. As with any software, to back it up you need to save the sources, not the output.
Whoever has the admin rights to the wiki server should be easily able to access the source files and images and make a backup of them.
Failing that, and if you don't need the wiki history or discussion pages or other metadata, perhaps one could adapt a spider to only follow the wiki "view source" links and scrape the unformatted text from them ...
The MediaWiki software package has facilties for taking full backups of the database(s) that it uses to store the information that the scripts present as web pages. The admins should have access to the server to do this, IIRC this requires shell access to the server preferably via SSH.
entered the entire index of pages, but instead of working it gave this: XML Parsing Error: no element found Location:
formatting link
Number 1, Column 1: ^
I dont see anything elsewhere that might be used to export pages in any form. Anyone a bit more familiar with wikimedia? Thanks
NT
ps To the folks that suggested using various inbuilt methods, I dont know what the terms mean, so dont know how to do what was sugested.
pps As someone suggested, I did use wget a while ago to harvest the edit pages, which is where the article text is, but its a pretty horrible way to do it, and would be a mare to reinstate.
MediaWiki sucks for this. DB level access is a royal nightmare to actually use (in practice it means that you often can't use a backup you took earlier).
One of the best and most robust ways, although painful, is to use Special:Export and Special:Import to produce XML dumps of the wiki content (including categories, templates or other namespaces). This has the great advantage that it can be done through normal wiki pages, without being a server admin (wiki admin permissions are usually needed, certainly for import). The downside is that Export needs a list of pages, not wildcard - the best way I have to handle this is by installing the DPL (Dynamic Page List) extension and using this to make a page that outputs a list of page names to export. You can also use this same approach to replicate content (or part content) from one wiki to another.
I've got all the xmls saved. It will only spit out a limited number of articles per xml page.
Next question is the image files. I've got the ones I've contributed only as jpgs, so need a much faster to restore format for all of them. Any suggestions?
Its hosted on one of Grunff's servers, so I expect (although have not checked!) it will be backed up as a matter of course. I will ask him and see what level of backup is in place.
I have access via FTP to the account that hosts all the media wiki stuff, and also to the MySQL server that holds the (non image) content[1]. Last time I looked however it was getting a bit on the large size (i.e. several gig) for sucking down ADSL connections on a routine basis.
[1] the images a dumped into a directory hierarchy, and links to them are held in the DB rather than the images themselves begin held as BLOBs
JOOI, how easy is it to get at the "raw data"? Is the database format nicely-documented and the data stored in an accessible form such that it can easily be pulled out independently of the wiki software? I'm always a bit wary of public information tucked away behind proprietary software in a proprietary format...
Dave Liquorice ( snipped-for-privacy@howhill.com) wibbled on Monday 03 January
2011 00:25:
formatting link
's all there.
As someone else said, the only way to do a proper backup is from the server with suitable permissions (MySQL dump and read acccess to the Wiki config and media files).
I'm sure whoever is running it has though of all that, but if people are worried, perhaps someone with the connections could ask?
In the absence of that, a wget mirror would be better than nothing but it would be painful to reconstruct (unless using a "smart" backup script that pulled each page in edit mode so that the raw wiki markup were grabbed plus spidering into full resolution versions of any embedded images etc) and even then all the history is byebye.
What's on the Mediawiki site is very far from "all of it"
In particular, be careful when trying to restore DB backups (this is a general problem for backups at that level), because you will run into problems if the target system is different from the source system. It's OK if you're restoring after a disk crash into an identical environment, but it's very likely to break if the installation was different (DB names for multiple wikis on shared DB servers is one), if the context is different (home vs. production server) or even if you try to put a backup from an older MW onto a newer MW.
Andy Dingley ( snipped-for-privacy@codesmiths.com) wibbled on Monday 03 January 2011
12:02:
That's in the details - which I take for granted when it comes to an actual implementation (this is what I do as well for a living).
The short of it is though that it comes down to backing up RDBMS and a bunch of files - I didn't want to write a treatise on the exact procedure which as you say is rather more complicated. Just affirming that any sort of meaningful backup does require access to the server.
DB names aren't a problem if you run individual backups per DB - at least not on Postgresql - I usually dispense with the "backup all DBs" program in favour of scripting my own that backs up each DB into a seperate file for very similar reasons.
I sort of expected MediaWiki to have a generic backup option at the application level to avoid problems with restoring to slightly different versions - Horde does, at least at a per-app level.
But if that link above is correct (by lack of omission) that doesn't seem to be an option?
Which would mean recover would have to be done to the same version - but I don't see that as too much of a problem.
If you install multiple wikis on the typical shared hosting with a single DB visible to that hosting account, they're disambiguated by a per-wiki name prefix on the DB object names, set when you first install. You're likely to see the same single-DB situation when you have multiple development wikis on a laptop (I have three hosted on this one). The need for this is also a reason to always use the optional prefix when installing, as it makes it easier to move them around later.
Andy Dingley ( snipped-for-privacy@codesmiths.com) wibbled on Monday 03 January 2011
13:42:
I do prefer the multiple DB way of installing stuff - much easier to manage. But I am aware of the table prefix method.
I don't think this is a problem - the stated issue is how to back up the running server. It is a prefectly reasonable assumption that the server would be reinstalled in the same environment, or if one had to reinstall to a new environment, then you'd have to make that environment the same, even installing an older matching version of MediaWiki then upgrading to the current if necessary, which is also reasonable. I think for this exercise it is a moot issue. I have seen the same scenario with MythTV where the database gets modified with upgrades so I'm aware of the issue.
One thing though - I always work with Postgresql avoiding MySQL unless I really have no choice.
Postgresql's pg_dump program has options to dump pure SQL as well as binary format dumps. Obviously, with a pure SQL dump, doing a search and replace on all prefixes is possible. How do MySQL dumps work?
HomeOwnersHub website is not affiliated with any of the manufacturers or service providers discussed here.
All logos and trade names are the property of their respective owners.