Ben Humphreys

  • Archive
  • RSS

Parsing Edict XML with Perl and XML::LibXML

Edict is a Japanese-English dictionary that is free to use for research (as far as I know). It’s available in a few formats, the most useful of which is XML dump of English-only data.

It might help someone sometime, so I’ve posted a short Perl snippet of how to parse the format.

A single entry looks like:

    • #programming
    • #phd
    • #japanese
    • #perl
  • 4 months ago
  • 17
  • Comments
  • Permalink
  • Share
    Tweet

17 Notes/ Hide

  1. molecules0de liked this
  2. ticlesbase09 liked this
  3. victimms9 liked this
  4. valenzuelaki09 liked this
  5. yeseniauio09 liked this
  6. benhumphreys posted this

Recent comments

Blog comments powered by Disqus
← Previous • Next →

About

Avatar Computational linguistics researcher at Kyoto University, focussing on machine translation. Also learning Japanese, Korean, French and other badassery.
(日本語版)

Me, Elsewhere

  • @benhumphreys on Twitter
  • benhumphreys on github
  • RSS
  • Random
  • Archive
  • Mobile

Effector Theme by Carlo Franco.

Powered by Tumblr