Demork.py

Ladies and gentlemen, demork.py. My first foray into Python. Somebody else’s python, specifically.

Demork.py takes a Mozilla history.db file (in the now-legendary “mork” format) and spits out valid XML. I hope that some of you find it useful now, and that it makes migrating away from Mozilla’s current data format easier.

Thank you, Shaver, for keeping me from starting from scratch, and more generally to the Mozilla Project for their fine, fine browser. I hope this helps a little, maybe. I’ve got one possibly-misguided question, if there is any XML expertise out there – I’ve done a straight-up regex replacement of all the ampersands in the URLs in the history file to “&”; this makes for valid XML, but I foresee broken URLs. Should this be taken care of automagically during the trip back from XML, or should I be replacing them with “%something” instead?

One last word: those of you who are thinking about rolling your own undocumented, one-off data format for whatever job you happen to be doing, don’t. Just don’t. Whether you realize it or not you are playing a nasty, frathouse-grade prank on the future. You are pre-emptively saran-wrapping millions of as-yet-unbuilt toilets, and people will remember your name and hate you for it.

If anybody would like to help me test this (I only have one history file, as you might expect) I’d appreciate it. I hope you can take a minute to run it against your history file like so:

python demork.py history.db > history.xml

and then validate the XML file, either here or the household appliance of your preference. Please let me know if it works, and any details of failure.Many thanks.

3 Comments

  1. Posted February 21, 2005 at 10:27 pm | Permalink

    Your last paragraph is the answer to the perennial “XML is lame because it’s so obvious and non-novel” objection, of course.

    Your ampersanding: Once the document parsed, it’s as if it were a regular ampersand. In fact, the conventional ampersand-handling is technically broken, and escaping is correct. If you ever get around to writing proper XHTML documents, you need to escape your ampersands.

  2. Mike Bruce
    Posted February 23, 2005 at 12:13 am | Permalink

    Doesn’t work at all for me. Spits out a lot of errors on my actual history.dat and creates an XML document with no information in it. Spits out no errors on a very small history.dat, but still doesn’t put any useful information in the XML.

    I’ll investigate and let you know what’s going on.

  3. Tomas Herrmann
    Posted February 25, 2005 at 9:07 am | Permalink

    may I ask you for instructions how to read mork?
    I am trying to read and write mork files with Delphi. Unfortunally I found only
    http://www.mozilla.org/mailnews/arch/mork/primer.txt
    http://www.mozilla.org/mailnews/arch/mork/grammar.txt

    but these texts are very hard to understand for me!
    thanks
    Tomas.