December 2011
19 posts
2 tags
Using MongoDB for Research - Don't
Parse data using your programming language of choice. Wonderful. Insert data into MongoDB in a easy-to-understand hierarchical structure. Write other scripts to compare, process and analyse the data. Joy. Add more data to the database. See BSONElement exception. Curse, search the internet for why. Give up, run mysterious db.repairDatabase(). Hope data is OK. Fail. Reload data. Run some...
Dec 26th
16 notes
3 tags
Segmentation and Evaluation
This is just a short post as it’s too long to put on Twitter. Today I tried segmenting NTCIR-7 English–Japanese MT data by various methods and seeing if it affected their BLEU and RIBES scores. Using BLEU on the character level was tried in BLEU in characters (Denoul 2005), in which they showed that for English, BLEU on the character level correlates with word-level BLEU for English....
Dec 25th
15 notes
Dec 24th
Dec 23rd
Dec 21st
1 note
Dec 20th
3 tags
TeX/PDF → HTML
I haven’t managed to get the idea of publishing papers in HTML out of my head. I’m convinced now that 99% of the work is in decent conversion to HTML. The presentation aspect is tricky but can be done with copious amounts of CSS and Javascript. Back to conversion. It seems there’s two possible ways to tackle it, each with their strengths and difficulties: TeX → HTML, or PDF...
Dec 13th
15 notes
6 tags
G30 at Bonn University
From the 6th to the 10th of December I made a flying visit to Bonn, Germany as part of Kyoto University’s G30 student recruitment drive. G30 is an initiative by the Japanese government to attract more foreign students to Japan, with the aim of having 300,000 foreign students by 2020. The first stages of the initiative involved recruiting more non-Japanese professors,...
Dec 11th
15 notes
2 tags
Mixing Kana and Kanji and MT
Writing in a mix of Kanji and Kana makes it a lot easier for machines as well as humans. Found this while messing with Google Translate and Japanese “no”. “かんこくのでんきせいひんのかかく” → “Dress belongings or writing full-bodied electric kettle” “韓国の電気製品の価格” → “South Korean electronics prices”
Dec 11th
8 notes
Dec 10th
1 note
Dec 10th
Dec 10th
Dec 10th
2 tags
Dec 9th
3 notes
5 tags
Dear Science — Let’s stop using PDF — Part 2
I’ve thought more about how to implement what I put forward in the first part of Dear Science — Let’s stop using PDF, and I believe the problem can be broken down into two parts: Generating HTML — converting LaTeX to HTML Presentation — presenting text and figures in a resolution-independent way Generating HTML This is probably the harder of the two tasks. Researchers are...
Dec 4th
33 notes
4 tags
Dear Science — Let's stop using PDF
It’s 2011, it’s the future. The Earth is doomed. I’m making a Space Ark. For Space. There’s no room for printed material on my Space Ark. “A4” is just an abstract concept for when we used dead trees to store our information. For when we collated facts like so many dead butterflies and bound them in books to sit on shelves and gather dust. It’s 2011 and...
Dec 2nd
27 notes
Dec 1st
November 2011
46 posts
Nov 30th