Ben Humphreys

  • Archive
  • RSS

Playing with NLTK

I was playing with the Python NLTK again for the first time in a while and made myself the task of finding out all the different ways “to go” is used in English. So I was aiming for a list including stuff like “to go up”, “to go down” etc.

Apologies if the Python code is horrible, I’m still getting used to how to do things in it. Consequently I’m thinking in a mix of Perl/Ruby and still reading the manual a lot.

Code:

import nltk
from nltk.corpus import brown
from nltk.stem.regexp import *

porter = nltk.PorterStemmer()

import re
p = re.compile('\w+')

words = []

def process(sentence):
    for (w1, t1), (w2, t2) in nltk.bigrams(sentence):
        stem1 = porter.stem(w1)
        if (stem1 == 'go' and t1.startswith('V') and p.match(t2)):
            words.append( 'to ' + stem1 + ' ' + w2 + ' (' + t1 + ', ' + t2 + ')' )

for tagged_sent in brown.tagged_sents():
    process(tagged_sent)

f = open('output.txt', 'w')

fdist = nltk.FreqDist(words)
for key, val in fdist.items():
    f.write( "%d : %s\n" % ( val, key ) )

The results are:

202 : to go to (VBG, TO)
94 : to go to (VB, IN)
34 : to go to (VBG, IN)
29 : to go on (VB, RP)
27 : to go on (VBG, RP)
25 : to go out (VB, RP)
24 : to go back (VB, RB)
18 : to go home (VB, NR)
18 : to go into (VB, IN)
18 : to go up (VB, RP)
18 : to go with (VB, IN)
10 : to go down (VB, RP)
9 : to go back (VBG, RB)
8 : to go into (VBG, IN)
8 : to go through (VB, IN)
7 : to go a (VB, AT)
7 : to go and (VB, CC)
7 : to go for (VB, IN)
7 : to go out (VBG, RP)
7 : to go over (VB, RP)
7 : to go to (VB, TO)
...

I could do more to trim down the results, like merging all the different types of verb part-of-speech tags (VB, VBG).

Any comments, I’d be happy to hear them.

    • #programming
    • #nltk
    • #python
    • #linguistics
  • 1 year ago
  • Comments
  • Permalink
  • Share
    Tweet

Recent comments

Blog comments powered by Disqus
← Previous • Next →

About

Avatar Computational linguistics researcher at Kyoto University, focussing on machine translation. Also learning Japanese, Korean, French and other badassery.
(日本語版)

Me, Elsewhere

  • @benhumphreys on Twitter
  • benhumphreys on github
  • RSS
  • Random
  • Archive
  • Mobile

Effector Theme by Carlo Franco.

Powered by Tumblr