How to back up a blog | Technology | theguardian.com
This
is a great use of a blog, and there are lots of different ways you
could preserve it. For example, you could save each page as file and
print the resulting files in colour. This would make the blog
accessible offline and sharable in much the same way as snapshots, with
the files you have saved providing a separate backup. Every browser
should have a "File Save" routine somewhere, or a "Save page as…"
command like Google Chrome.
Both Microsoft's Internet Explorer and Opera provide a handy way to do this: they let you save pages in a standards-based MHTML
format. In IE, use the Save option called "Web archive, single file
(*.mht)". This saves a page as a single file rather than saving the
page's HTML as one file and all the other elements — images, scripts etc
— in a separate folder. (HTML, HyperText Markup Language, is the
language used to create most web pages. The M comes from MIME, or
Multipurpose Internet Mail Extensions. MHTML is shortened to .mht to
identify these files.)
There's also a Firefox extension called UnMHT, which adds MHT file support to Firefox. Another extension, Mozilla Archive Format, will save pages in either MHT or MAFF (Mozilla Archive Format File). You can get similar plug-ins for most popular browsers.
Another
quick way to make a tangible record is to select and copy a hefty
chunk of the blog and paste into a Microsoft Word document (docx). This
will copy panels and pictures as well as text. The results tend to be a
bit mixed, but sometimes it works very well. You can also try
capturing each page as an image, if you have an image capture program
that will snapshot whole web pages, not just the parts currently on
screen.
Of course, if it's a large blog, saving or copying one
page at a time soon becomes tedious. Also, saving individual pages
loses the structure of the blog. Programs known as "site rippers" solve
both problems by copying (ripping) whole sites and downloading them to
a folder on your PC's hard drive.
Site rippers were in vogue in
the early days of the web, when people were paying by the minute to
dial up and read web pages online. It was relatively quick and easy to
rip a small website so that you could read it offline. Thanks to
ubiquitous broadband, there's not much call for that nowadays. Also,
today's bigger and more complicated websites are harder to rip than
simple HTML sites.
One survivor is HTTrack Website Copier, an open source site ripper that works on Microsoft Windows and Linux. There's a YouTube video that shows how to use it. Spadix Software's BackStreet Browser 3.2 for Windows looks easier to use. There's also the WinWSD WebSite Downloader, which you can download from CNet and other sites, but the author's home page no longer works.
Things are different if you own a Blogger
blog, or at least have a password to access the dashboard. In this
case, you can go to Settings (the spanner icon), click on Other, and
then choose "Export blog". This will save a back-up copy of the whole
blog to your hard drive in XML (Extensible Markup Language) format.
It's
important to do this in case you inadvertently run into a dreaded
"policy violation" and Google deletes your blog. Note that to get a
proper backup, you have to download a backup copy of your blog's
Template as well.
A Blogger backup isn't very useful, but if you
double-click the .xml file, it will load into a browser, such as IE. If
you scroll down past the confusing headers, you should find the
readable text of each post, but it's no substitute for the original
blog. And while you could import your .xml file into Blogger or the WordPress blogging system, you could still only read it online.
At
this point, I'm stuck for a suggestion. One idea, which I haven't
tried, would be to install a copy of WordPress on your PC and import the
Blogger backup into that. It's quite a lot of work and might not be
worth the effort. Perhaps a reader can suggest a better idea for offline
reading.
Finally, there are online services that will back up a
blog, or convert it into an ebook or even a printed book. Again, there
may be limits on what you can do unless you have access to the Blogger
dashboard.
The BlogBackupr website will back up a blog on a daily basis, using its RSS feed.
ZinePal
will convert a blog into a PDF file and into an ebook in the Amazon
Kindle, Mobipocket and ePub formats. However, it will only convert five
blog posts unless you sign up for the Pro version, which will convert
50. One ebook costs $5.
BookSmith
will convert up to 100 posts from Blogger or WordPress into an ebook,
and offer you the chance to buy a printed copy. However, you have to
give it your logon and password.
BlogBooker
will convert a Blogger, WordPress or LiveJournal blog into a PDF
ebook. This looks a decent bet because you can upload your Blogger
backup (.xml) file. However, the site is "donationware" and, quite
reasonably, requests a donation if you want to include images.
I
tried BlogBooker with an old blog and the result is quite book-like
with an index at the front, different chapters for different years,
left/right page spacing, headers/footers and so on. It also included
comments. With an automated spacing and layout system, there are bound
to be errors that a human editor would correct, but still, if you like
the result, you can get the PDF printed by Lulu. Either way, it's better than nothing.
I also tried PDF my URL,
because it only involves pasting in the blog's web address. Almost
instantly, the site created a paginated PDF copy that looked like my
original Blogger site, with colour panels etc. However, it only picked
up the last 15 of 67 posts, and no comments. If you want to modify the
settings, you have to sign up for a paid subscription.
If there are better services out there, please tell us about them in the comments….