{"id":160,"date":"2016-11-30T00:17:02","date_gmt":"2016-11-29T16:17:02","guid":{"rendered":"http:\/\/blog.defjia.top\/?p=160"},"modified":"2016-11-30T00:17:02","modified_gmt":"2016-11-29T16:17:02","slug":"python%e8%87%aa%e7%84%b6%e8%af%ad%e8%a8%80%e5%a4%84%e7%90%86natural-language-processing%e5%9f%ba%e4%ba%8enltk%e5%ba%93%e7%9a%84%e5%ad%a6%e4%b9%a01-2","status":"publish","type":"post","link":"https:\/\/blog.defjia.top\/?p=160","title":{"rendered":"Python\u81ea\u7136\u8bed\u8a00\u5904\u7406(Natural Language Processing)\u57fa\u4e8enltk\u5e93\u7684\u5b66\u4e601.2"},"content":{"rendered":"<blockquote><p>#\u5c0f\u5c0f\u7684\u634f\u5408\u5e94\u7528<br \/>\na = FreqDist([len(i) for i in texti])<br \/>\n#\u5148\u904d\u5386\u6240\u6709\u5b57\u957f\uff0c\u518d\u5f97\u51fa\u5404\u6570\u51fa\u73b0\u9891\u7387\uff0c\u5f97\u51fa\u54ea\u4e2a\u957f\u5ea6\u8bcd\u4f7f\u7528\u6700\u591a<br \/>\na.items() #\u952e\u503c\u4e24\u4e24\u5bf9\u5e94<br \/>\na.max()<br \/>\na[i]#times of i<br \/>\na.freq(i) #\u5360\u6bd4,\u9891\u7387<br \/>\na.inc(sample)#add sample<br \/>\na.N()#total of sample<br \/>\na.keys#in line 28<br \/>\na.tabulate()#draw table<br \/>\na.plot#draw image &#8211;optional:(cumulative = True)<\/p>\n<p>#compare between words(-&gt;boolean)<br \/>\ns = &#8216;str&#8217;#can be a word or a sentence<br \/>\ns.startswith(&#8221;)<br \/>\ns.endswith(&#8221;)<br \/>\ns.islower()#all are lower<br \/>\ns.isupper()#all are upper<br \/>\ns.isalpha()#all are letters<br \/>\ns.isalnum()#all are letters or numbers<br \/>\ns.isdigit()#all are numbers<br \/>\ns.istitle()#every first letter is upper<br \/>\n#example<br \/>\nsorted([i for i in set(text1) if i.endswith(&#8216;ableness&#8217;)])#\u540c\u540e\u7f00<br \/>\nsorted([i for i in set(text4) if &#8216;gnt&#8217; in i])<br \/>\nsorted([i for i in set(text6) if i.istitle()])<br \/>\nsorted([i for i in set(sent7) if i.isdigit()])<br \/>\n&#8230;&#8230;if &#8216;-&#8216; in i and &#8216;index&#8217; in i#text7 \u534e\u5c14\u8857\u65e5\u62a5\u7684\u5404\u79cd\u6307\u6570<br \/>\n&#8230;&#8230;if i.istitle() and len(i) &gt; 10#\u5728text3\u4e2d\uff0c\u5927\u591a\u90fd\u662f\u4e00\u4e9b\u4eba\u540d\u5730\u540d\uff0c\u5f88\u6709\u8da3\uff0c\u5176\u4ed6\u4e5f\u6709\u4e00\u4e9b\u7b26\u5408\u4e3b\u8bed<br \/>\n&#8230;&#8230;if not i.islower()<br \/>\n&#8230;&#8230;if &#8216;cie&#8217; in i or &#8216;cei&#8217; in i<\/p>\n<p>#oprate elements<br \/>\nlen(i) for i in text1<br \/>\ni.upper() for i in text1<br \/>\nlen(set([i.low() for i in text1 if i.isalpha]))#\u6392\u9664This, this\u8fd9\u6837\u7684\u91cd\u590d\uff0c\u5e76\u4e14\u6392\u9664\u6570\u5b57\u548c\u6807\u70b9<\/p>\n<p>babelize_shell()#12\u6b21\u4e0e\u4efb\u610f\u4e00\u79cd\u8bed\u8a00\u4e92\u8bd1(\u9700\u8981\u90a3\u4e2a\u6700\u5927\u7684\u5305)<br \/>\nnltk.chat.chatbots()#interesting<\/p><\/blockquote>\n<p>http:\/\/languagelog.ldc.upenn.edu\/nll\/ \u00a0LanguageLog \u4e00\u4e2a\u6709\u7528\u7684\u81ea\u7136\u8bed\u8a00\u5904\u7406\u535a\u5ba2<\/p>\n<p>\u4e4b\u540e\u4efb\u52a1\uff1a<\/p>\n<p>1.\u6bcf\u7ae0\u4e60\u9898<\/p>\n<p>2.\u914d\u5408\u82f1\u6587\u7248\u9605\u8bfb<\/p>\n<p>3.Create something<\/p>\n","protected":false},"excerpt":{"rendered":"<p>#\u5c0f\u5c0f\u7684\u634f\u5408\u5e94\u7528 a = FreqDist([len(i) for i in texti]) #\u5148\u904d\u5386\u6240\u6709\u5b57\u957f\uff0c\u518d\u5f97\u51fa\u5404\u6570\u51fa\u73b0\u9891\u7387\uff0c\u5f97\u51fa\u54ea\u4e2a\u957f\u5ea6\u8bcd\u4f7f\u7528\u6700\u591a a.items() #\u952e\u503c\u4e24\u4e24\u5bf9\u5e94 a.max() a[i]#times of i a.freq(i) #\u5360\u6bd4,\u9891\u7387 a.inc(sample)#add sample a.N()#to\u2026\u2026<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[],"class_list":["post-160","post","type-post","status-publish","format-standard","hentry","category-3"],"_links":{"self":[{"href":"https:\/\/blog.defjia.top\/index.php?rest_route=\/wp\/v2\/posts\/160","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.defjia.top\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.defjia.top\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.defjia.top\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.defjia.top\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=160"}],"version-history":[{"count":0,"href":"https:\/\/blog.defjia.top\/index.php?rest_route=\/wp\/v2\/posts\/160\/revisions"}],"wp:attachment":[{"href":"https:\/\/blog.defjia.top\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=160"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.defjia.top\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=160"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.defjia.top\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=160"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}