Ben Humphreys

  • Archive
  • RSS

Parsing Edict XML with Perl and XML::LibXML

Edict is a Japanese-English dictionary that is free to use for research (as far as I know). It’s available in a few formats, the most useful of which is XML dump of English-only data.

It might help someone sometime, so I’ve posted a short Perl snippet of how to parse the format.

A single entry looks like:

    • #programming
    • #phd
    • #japanese
    • #perl
  • 2 weeks ago
  • Comments
  • Permalink
  • Share
    Tweet

Japanese Equivalent of The Onion

I love The Onion. For a long time I wished there was an equivalent satrirical site in Japanese. Someone just told me about the Kyoko Shimbun, a Japanese site full of made-up amusing stories.

For example they have a story on McDonalds Japan releasing the McDonal-don, a bowl of rice (don) topped with a burger.

Another story on naturally drying baumkuhen cakes in the sun.

Discovering that pi is only 10 digits long. Calculations until now having been a bug in the program running them. The quote from the researcher at the end is great.

The stories are short enough and have a good variety of vocabulary to be a pretty good way to practice Japanese I think.

    • #japanese
    • #language
  • 1 month ago
  • Comments
  • Permalink
  • Share
    Tweet

G30 at Bonn University

From the 6th to the 10th of December I made a flying visit to Bonn, Germany as part of Kyoto University’s G30 student recruitment drive. G30 is an initiative by the Japanese government to attract more foreign students to Japan, with the aim of having 300,000 foreign students by 2020. The first stages of the initiative involved recruiting more non-Japanese professors, converting certificates and documentation to English and changing course content itself into English. A number of departments within Kyoto University have completed these first three stages and are now trying to attract more foreign students to their new English-language courses.

While G30 tries to address the language barrier associated with studying in Japan, it is often criticised for not addressing the other issue with studying in Japan — the cost. G30 does not offer scholarships, but it does offer partial and full exemption from tuition fees. Living costs in Japan are high but without tuition fees on top, the cost is bearable.

Anyway, back to the trip.

Japan Fair

All day Wednesday I took part in Study Japan!, a fair put together but 10 or so Japanese universities aimed at pulling in more German students for the G30 initiative I described above. Most of the big universities were there: Waseda University, Kyoto University, Tokyo Metropolitan University etc. We each had a stand, and a bunch of documents to give out to prospective students. I was there to tell them about what student life was like at Kyoto University and in Kyoto in general. Some of the questions that I tried to answer in my notes for G30 post did come up, but most of the time I just answered general questions. Surprisingly only one person asked about the nuclear situation in Japan, that and the fact that most of the students who came were in their first year, makes me think that they were not yet serious about studying in Japan.

Most of the students that came were from the humanities department. They were taking Japanese or Asian Studies as their major, with a few people from the management or economics department. We were there to represent the Informatics Department of Kyoto University so it was unfortunate that we didn’t have the right booklets to support them. I think that science and technology majors probably did not come because they assume that Japanese would be required to study in Japan. Clearly Japan needs to work a little more on letting people know that Japanese is not a requirement.

After learning more about G30 I’m a lot less skeptical about it. In the foreign community in Japan, it’s generally seen as a “cash grab” by the Japanese government to attract more fees-paying students without thinking about the practicalities of studying in Japan. However the tuition exemption offer makes it a much more tempting prospect.

Food

I’ve had little experience of German food, beyond the snacks I had at Oktoberfest in Tokyo a few years ago. The food I had in Germany was generally delicious. On the first night we had the German equivalent of mulled wine, called glühwein and German sausages. The only thing I couldn’t quite get used to was the amount of salt in everything. It was a bit too much after a while.

Language

While there, everyone thought I was German. Being friendly, people would casually say things to me, but I would have to stop them half-way through or at the end of their long sentence and apologise that I couldn’t speak German.

I really felt that I was apologising every time I said this. Being in Germany it seemed so rude that I was not able to understand the simplest things. I’ve tried studying German before but all I could remember was danke and and auf wiedersehen. People were very nice about it and never seemed to get annoyed in the way I’ve heard Parisians do with non-French speakers. And of course everybody spoke English extremely well.

If it was French, I think I would be able to guess what the other person was asking me from the cognates that exist in English. But with German I found it impossible to guess what they were saying. I’ve heard that German is supposed to be close to English but it seems so much further than French.

The irony is that in Japan, people mostly assume that I cannot speak Japanese when I can. But in Germany they assume I can speak the language, but I can’t. Oh cruel irony.

People & Schadenfreude

It’s been many years since I’ve travelled travelled outside of Asia, but I was struck by how much I felt I was ‘on the same wavelength’ as German people I met. When something amusing or odd happened, I often met eyes with other people around, and we exchanged knowing looks that said “You’re seeing this too, and thinking the same, right?”

One perfect example of this happened on the last day as we tried to take the express ICE train to Frankfurt airport. There had been a suicide on the line at around 10am and all the trains on the line were still not moving by the time we tried to get our 11am train. Jumping in front of trains in Germany is a rare enough occurrence that they do not have a quick response to it. In Japan people jumping in front of trains is probably the most popular method of suicide and happens literally every day, and so Japanese train companies are extremely efficient at cleaning up the mess and getting trains running again. Anyway, we were sitting on the train and waiting for it to start moving. Every so often, a rather stressed-sounding German train official spoke through the train’s PA system giving us updates. Every time he gave a new update, he seemed to get more and more stressed, with his voice rising in volume and pitch. The PA system would start to crackle and cut out as he got louder and louder. Everyone on the train found this hilarious. Nobody could do anything so it seemed everyone was resigned to waiting and laughing at the ridiculousness of our situation. The train staff had tried to connect another set of cars to ours, but there had been a software malfunction, and other complications that were making this poor young train conductor more and more stressed. Later on he began asking people to get off the train as it was exceeding the legal limit for passengers it could carry. He was almost screaming “Please get off the train, there is another coming in 3 minutes, please get that one. We cannot leave until you do. Please.” Passengers were laughing at this poor guy. It was pretty funny. I guess it’s somewhat like schadenfreude.

Overall I really enjoyed my trip. But the guilt I felt at not being able to speak the language has really brought home how important it is to learn the language of the country you’re planning on visiting. And how it would be completely impossible for me to live somewhere without knowing the language.

    • #germany
    • #japan
    • #japanese
    • #g30
    • #university
    • #trips
  • 2 months ago
  • Comments
  • Permalink
  • Share
    Tweet

Mixing Kana and Kanji and MT

Writing in a mix of Kanji and Kana makes it a lot easier for machines as well as humans. Found this while messing with Google Translate and Japanese “no”.

“かんこくのでんきせいひんのかかく” → “Dress belongings or writing full-bodied electric kettle”

“韓国の電気製品の価格” → “South Korean electronics prices”

    • #japanese
    • #translation
  • 2 months ago
  • 8
  • Comments
  • Permalink
  • Share
    Tweet
Having fun with languages. Prerequisites: Kanji, English, a slightly twisted mind.
Pop-upView Separately

Having fun with languages. Prerequisites: Kanji, English, a slightly twisted mind.

    • #language
    • #japanese
  • 6 months ago
  • Comments
  • Permalink
  • Share
    Tweet

Remembering 雨 Related Kanji

I sometimes find it easier to make up stories about Kanji to remember how to write them, especially if they look really similar. Here’s how I remember weather-related Kanji that use 雨. Most of them are easy, but I often get the right-hand parts of dew and mist confused.

  • 雨 rain - Basic.
  • 雲 cloud - (Can’t think of one).
  • 雪 snow - Katakana ヨ at the bottom which is like ユ in ゆき.
  • 露 dew - Has 足 at the bottom, you get dew on your feet.
  • 霧 mist - mist is hard to predict, so contains 予. Also because mist is really fine, it has no power… so it has 力 at the bottom right.
  • 霰 hail - Not sure about this one, bottom looks like 昔, hail is an old word?

Does anyone else use systems like this for remembering Kanji?

    • #japanese
    • #language
  • 7 months ago
  • 2
  • Comments
  • Permalink
  • Share
    Tweet

All !English All The Time

(For the non-programmers !English means Not English)

Today I finally took the plunge and removed all English-language feeds from my Google Reader account. I read about All Japanese All The Time a while back and the idea kept coming back to me — removing English as my main source of information. Or more precisely, changing it to Japanese, Korean or French.

I have enough news and design blogs in Japanese to learn new vocabulary, but I still haven’t found enough on linguistics or NLP yet.

I read half the time via Reeder on iPad and half the time through the Pure Reader-themed/plugin for Firefox

With news and reading material covered, my next objective is finding news and TV I want to watch. I just finished watching the phenomenal Deadwood and there’s nothing I’ve seen on Japanese TV that even comes close to it. Cultural differences are part of it, but the acting in Japanese dramas is horrific.

TOEFL

On the topic of English, I booked myself in for a TOEFL test today. Despite English being my native tongue, Kyoto University require that I submit a TOEFL or TOEIC score to apply to their doctorate program. So TOEFL get $200 for me to go and have a chat with them. I’m actually a little worried I’ll use too much informal English in the speaking sections, and be marked down for “insufficient vocabulary usage”.

    • #japanese
    • #english
  • 8 months ago
  • 2
  • Comments
  • Permalink
  • Share
    Tweet

I took the plunge and bought AnkiSRS for iPhone. It’s the most expensive piece of iOS software I’ve bought to date (£15). I’ll write a full review later.

I just spent 20 minutes trying to work out how to sync my existing decks online. As with the rest of Anki, it’s annoying to use, ugly and as user friendly as non-Ubuntu Linux.

Watch the video to see the 10-step process.

    • #japanese
  • 9 months ago
  • Comments
  • Permalink
  • Share
    Tweet

Just finished watching 告白 (Confessions). The direction, story and soundtrack were a welcome change from the western films I’ve been watching recently. I’m going to try to get back into Japanese cinema.

The video above is one of the ambient tracks from the film.

    • #music
    • #japanese
  • 9 months ago
  • 1
  • Comments
  • Permalink
  • Share
    Tweet

All Japanese All The Time

Had a fascinating talk with a fellow university student on Saturday night. He has been learning Japanese for a while, but a year ago he started full Japanese immersion based on All Japanese All The Time.

The author of the site managed complete native-level fluency within 18 months, all while living in the US. And I think it’s possible.

The site has a lot of useful information, but is kind of spread around so here’s a quick distillation based on what the guy told me, and what I’ve read on the site.

  • Becoming fluent at a language requires raw hours, nothing else.
  • Immerse yourself 100% in the language.
  • Change everything you consume to the language you’re learning - news, books, music, films, TV, friends, social networking.
  • Listen/watch to things even if you don’t understand, you need raw exposure to the language.
  • Only do stuff that is fun. If it’s not fun, drop it and move onto something else. Volume of input material is the key.

That’s about it. It’s obvious when you think about it, but the way the guy explained it to me and how it’s presented on the site makes a very strong point. The passion and conviction of the explanation is extremely infectious.

I’ve got 6 months in which to get my Japanese up to a sufficient level to be able to take part in laboratory discussions in technical Japanese, so fluency is really my short-term priority. I still need to be reading about research but if I follow the rules I can just read about it in Japanese and kill 2 birds with one stone.

For more information check out All Japanese All The Time.

    • #japanese
    • #language
  • 9 months ago
  • Comments
  • Permalink
  • Share
    Tweet
← Newer • Older →
Page 1 of 3

About

Avatar Computational linguistics researcher at Kyoto University, focussing on machine translation. Also learning Japanese, Korean, French and other badassery.
(日本語版)

Me, Elsewhere

  • @benhumphreys on Twitter
  • benhumphreys on github
  • RSS
  • Random
  • Archive
  • Mobile

Effector Theme by Carlo Franco.

Powered by Tumblr