February 2012
9 posts
3 tags
2 tags
I haven’t been drunk in 3 years... and I’ve been... →
January 2012
17 posts
3 tags
Fast Ruby Parsers →
This thread mentions a bunch of Ruby parsers, with the focus on being fast for small languages. This is more for my own future reference than anything, it’s a great summary.
5 tags
Parsing Wiki Text (Part 2)
More on parsing Wikipedia markup. So you’ve abandoned the idea of using regular expressions to parse the markup, congratulations.
This should save yourself from insanity and oblivion.
mwlib is a Python library for parsing Wiki markup. It seems semi-official. Whatever, it’s what we’re going to use.
It can parse the zipped XML dumps as I’ve shown in previous...