Archive for July, 2011

Quick Note

God I LOVE Python’s IDLE.

Anyway, after hacking away for two days I was finally able to mock up a pipeline for the process behind pickling an article. I’m so pumped. Thanks pretty much entirely to the raw data from python.org’s playground Mm2 list archives, I was able to reverse engineer enough of the data to be able to develop some beginning theories on how this script should work.

I had been expecting pickles of articles sorted by index (article, thread, subject, etc.) somewhere and instead discovered that python’s marshal module (http://docs.python.org/library/marshal.html) was able to open the contents of these files whereas the pickle module was not. Remembering that DumbBTree uses marshal, I looked back at the DumbBTree class in HyperDatabase.py and realized that IT was what is being saved by HyperDatabase to the /database folder within the list archives. After reading back through HyperDB, I noticed that dumbbtree instances contain pickle.dumps() string representations of the articles they contain. Since DumbBTress are dictionaries, the articles can be accessed via: pickle.load(tree[articlemsgid]), which is the core of the algorithm.

Now I need to develop an efficient algorithm for accessing each archive and generating StormArticle instances. Tomorrow’s project

Leave a comment

Upgrade Script

Yeah I know, it’s been a while. My bad

Now, I’ve changed my focus from removing the remnants of dumbbtree from pipermail to writing that upgrade script to migrate the existing Mm2 pickle structures to the Mm3 storm schema. Barry was able to give me the raw list archives from python.org’s “playground” Mm2 implementation and the structure in pipermail.py is making more sense now that I can see how the database files are laid out. I have a general idea of the algorithm but not enough worth posting – I’ll post back as soon as I determine how I’m going to do this. I’m generating the database independently of mailman for now and will later add some testing if needed after I have a working implementation.

Mostly posting to assure any followers that I am not dead and am still working on Mm3

Leave a comment

Update

Pretty sure I finished everything that needed to be done for HyperDatabase:

  1. Added StormArticle and Conversation models to be used in place of DumbBTrees
  2. Load sqlite database and create the tables if necessary in HyperDB __init__ method
  3. Rewrote methods in HyperDB to rely on the Storm API instead of indices/DumbBTrees
  4. Wrote __init__ and toArticle methods for StormArticle for easy conversion to and from pipermail Articles
  5. Added extra method(s) for conversation object support for dushyant’s dynamic page generation and ui code

As far as I can tell, the three methods next, first, and clearIndex in HyperDatabase are no longer needed and have been commented out and replaced with a pass. Since those three deal specifically with DumbBTree, I did not think it necessary to try to duplicate the functionality with the database. If this is not the case then I can simply go back and implement them.

It seems that I have two primary objectives now:

  1. Remove remnants of DumbBTrees and indices from pipermail.py
  2. Write database upgrade script for Mm2 -> Mm3 procedure

Barry had previously mentioned that marshal needs to be removed from the project. I agree, and since it is only used in DumbBTree and that class is no longer used (probably will just remove it), then that pretty much takes care of itself.

If there’s anything I’m missing, don’t hesitate to find me on IRC (dcrodman) or shoot me an email (dcrodman@gmail.com). Feedback and suggestions are welcome

Leave a comment