#小小的捏合应用
a = FreqDist([len(i) for i in texti])
#先遍历所有字长,再得出各数出现频率,得出哪个长度词使用最多
a.items() #键值两两对应
a.max()
a[i]#times of i
a.freq(i) #占比,频率
a.inc(sample)#add sample
a.N()#total of sample
a.keys#in line 28
a.tabulate()#draw table
a.plot#draw image –optional:(cumulative = True)

#compare between words(->boolean)
s = ‘str’#can be a word or a sentence
s.startswith(”)
s.endswith(”)
s.islower()#all are lower
s.isupper()#all are upper
s.isalpha()#all are letters
s.isalnum()#all are letters or numbers
s.isdigit()#all are numbers
s.istitle()#every first letter is upper
#example
sorted([i for i in set(text1) if i.endswith(‘ableness’)])#同后缀
sorted([i for i in set(text4) if ‘gnt’ in i])
sorted([i for i in set(text6) if i.istitle()])
sorted([i for i in set(sent7) if i.isdigit()])
……if ‘-‘ in i and ‘index’ in i#text7 华尔街日报的各种指数
……if i.istitle() and len(i) > 10#在text3中,大多都是一些人名地名,很有趣,其他也有一些符合主语
……if not i.islower()
……if ‘cie’ in i or ‘cei’ in i

#oprate elements
len(i) for i in text1
i.upper() for i in text1
len(set([i.low() for i in text1 if i.isalpha]))#排除This, this这样的重复,并且排除数字和标点

babelize_shell()#12次与任意一种语言互译(需要那个最大的包)
nltk.chat.chatbots()#interesting

http://languagelog.ldc.upenn.edu/nll/  LanguageLog 一个有用的自然语言处理博客

之后任务:

1.每章习题

2.配合英文版阅读

3.Create something