Practical examples of NLTK use [closed]

I'm playing around with the Natural Language Toolkit (NLTK).

Its documentation (Book and HOWTO) are quite bulky and the examples are sometimes slightly advanced.

Are there any good but basic examples of uses/applications of NLTK? I'm thinking of things like the NTLK articles on the Stream Hacker blog.


Asked by: Adrian577 | Posted: 06-12-2021






Answer 1

Here's my own practical example for the benefit of anyone else looking this question up (excuse the sample text, it was the first thing I found on Wikipedia):

import nltk
import pprint

tokenizer = None
tagger = None

def init_nltk():
    global tokenizer
    global tagger
    tokenizer = nltk.tokenize.RegexpTokenizer(r'\w+|[^\w\s]+')
    tagger = nltk.UnigramTagger(nltk.corpus.brown.tagged_sents())

def tag(text):
    global tokenizer
    global tagger
    if not tokenizer:
        init_nltk()
    tokenized = tokenizer.tokenize(text)
    tagged = tagger.tag(tokenized)
    tagged.sort(lambda x,y:cmp(x[1],y[1]))
    return tagged

def main():
    text = """Mr Blobby is a fictional character who featured on Noel
    Edmonds' Saturday night entertainment show Noel's House Party,
    which was often a ratings winner in the 1990s. Mr Blobby also
    appeared on the Jamie Rose show of 1997. He was designed as an
    outrageously over the top parody of a one-dimensional, mute novelty
    character, which ironically made him distinctive, absurd and popular.
    He was a large pink humanoid, covered with yellow spots, sporting a
    permanent toothy grin and jiggling eyes. He communicated by saying
    the word "blobby" in an electronically-altered voice, expressing
    his moods through tone of voice and repetition.

    There was a Mrs. Blobby, seen briefly in the video, and sold as a
    doll.

    However Mr Blobby actually started out as part of the 'Gotcha'
    feature during the show's second series (originally called 'Gotcha
    Oscars' until the threat of legal action from the Academy of Motion
    Picture Arts and Sciences[citation needed]), in which celebrities
    were caught out in a Candid Camera style prank. Celebrities such as
    dancer Wayne Sleep and rugby union player Will Carling would be
    enticed to take part in a fictitious children's programme based around
    their profession. Mr Blobby would clumsily take part in the activity,
    knocking over the set, causing mayhem and saying "blobby blobby
    blobby", until finally when the prank was revealed, the Blobby
    costume would be opened - revealing Noel inside. This was all the more
    surprising for the "victim" as during rehearsals Blobby would be
    played by an actor wearing only the arms and legs of the costume and
    speaking in a normal manner.[citation needed]"""
    tagged = tag(text)    
    l = list(set(tagged))
    l.sort(lambda x,y:cmp(x[1],y[1]))
    pprint.pprint(l)

if __name__ == '__main__':
    main()

Output:

[('rugby', None),
 ('Oscars', None),
 ('1990s', None),
 ('",', None),
 ('Candid', None),
 ('"', None),
 ('blobby', None),
 ('Edmonds', None),
 ('Mr', None),
 ('outrageously', None),
 ('.[', None),
 ('toothy', None),
 ('Celebrities', None),
 ('Gotcha', None),
 (']),', None),
 ('Jamie', None),
 ('humanoid', None),
 ('Blobby', None),
 ('Carling', None),
 ('enticed', None),
 ('programme', None),
 ('1997', None),
 ('s', None),
 ("'", "'"),
 ('[', '('),
 ('(', '('),
 (']', ')'),
 (',', ','),
 ('.', '.'),
 ('all', 'ABN'),
 ('the', 'AT'),
 ('an', 'AT'),
 ('a', 'AT'),
 ('be', 'BE'),
 ('were', 'BED'),
 ('was', 'BEDZ'),
 ('is', 'BEZ'),
 ('and', 'CC'),
 ('one', 'CD'),
 ('until', 'CS'),
 ('as', 'CS'),
 ('This', 'DT'),
 ('There', 'EX'),
 ('of', 'IN'),
 ('inside', 'IN'),
 ('from', 'IN'),
 ('around', 'IN'),
 ('with', 'IN'),
 ('through', 'IN'),
 ('-', 'IN'),
 ('on', 'IN'),
 ('in', 'IN'),
 ('by', 'IN'),
 ('during', 'IN'),
 ('over', 'IN'),
 ('for', 'IN'),
 ('distinctive', 'JJ'),
 ('permanent', 'JJ'),
 ('mute', 'JJ'),
 ('popular', 'JJ'),
 ('such', 'JJ'),
 ('fictional', 'JJ'),
 ('yellow', 'JJ'),
 ('pink', 'JJ'),
 ('fictitious', 'JJ'),
 ('normal', 'JJ'),
 ('dimensional', 'JJ'),
 ('legal', 'JJ'),
 ('large', 'JJ'),
 ('surprising', 'JJ'),
 ('absurd', 'JJ'),
 ('Will', 'MD'),
 ('would', 'MD'),
 ('style', 'NN'),
 ('threat', 'NN'),
 ('novelty', 'NN'),
 ('union', 'NN'),
 ('prank', 'NN'),
 ('winner', 'NN'),
 ('parody', 'NN'),
 ('player', 'NN'),
 ('actor', 'NN'),
 ('character', 'NN'),
 ('victim', 'NN'),
 ('costume', 'NN'),
 ('action', 'NN'),
 ('activity', 'NN'),
 ('dancer', 'NN'),
 ('grin', 'NN'),
 ('doll', 'NN'),
 ('top', 'NN'),
 ('mayhem', 'NN'),
 ('citation', 'NN'),
 ('part', 'NN'),
 ('repetition', 'NN'),
 ('manner', 'NN'),
 ('tone', 'NN'),
 ('Picture', 'NN'),
 ('entertainment', 'NN'),
 ('night', 'NN'),
 ('series', 'NN'),
 ('voice', 'NN'),
 ('Mrs', 'NN'),
 ('video', 'NN'),
 ('Motion', 'NN'),
 ('profession', 'NN'),
 ('feature', 'NN'),
 ('word', 'NN'),
 ('Academy', 'NN-TL'),
 ('Camera', 'NN-TL'),
 ('Party', 'NN-TL'),
 ('House', 'NN-TL'),
 ('eyes', 'NNS'),
 ('spots', 'NNS'),
 ('rehearsals', 'NNS'),
 ('ratings', 'NNS'),
 ('arms', 'NNS'),
 ('celebrities', 'NNS'),
 ('children', 'NNS'),
 ('moods', 'NNS'),
 ('legs', 'NNS'),
 ('Sciences', 'NNS-TL'),
 ('Arts', 'NNS-TL'),
 ('Wayne', 'NP'),
 ('Rose', 'NP'),
 ('Noel', 'NP'),
 ('Saturday', 'NR'),
 ('second', 'OD'),
 ('his', 'PP$'),
 ('their', 'PP$'),
 ('him', 'PPO'),
 ('He', 'PPS'),
 ('more', 'QL'),
 ('However', 'RB'),
 ('actually', 'RB'),
 ('also', 'RB'),
 ('clumsily', 'RB'),
 ('originally', 'RB'),
 ('only', 'RB'),
 ('often', 'RB'),
 ('ironically', 'RB'),
 ('briefly', 'RB'),
 ('finally', 'RB'),
 ('electronically', 'RB-HL'),
 ('out', 'RP'),
 ('to', 'TO'),
 ('show', 'VB'),
 ('Sleep', 'VB'),
 ('take', 'VB'),
 ('opened', 'VBD'),
 ('played', 'VBD'),
 ('caught', 'VBD'),
 ('appeared', 'VBD'),
 ('revealed', 'VBD'),
 ('started', 'VBD'),
 ('saying', 'VBG'),
 ('causing', 'VBG'),
 ('expressing', 'VBG'),
 ('knocking', 'VBG'),
 ('wearing', 'VBG'),
 ('speaking', 'VBG'),
 ('sporting', 'VBG'),
 ('revealing', 'VBG'),
 ('jiggling', 'VBG'),
 ('sold', 'VBN'),
 ('called', 'VBN'),
 ('made', 'VBN'),
 ('altered', 'VBN'),
 ('based', 'VBN'),
 ('designed', 'VBN'),
 ('covered', 'VBN'),
 ('communicated', 'VBN'),
 ('needed', 'VBN'),
 ('seen', 'VBN'),
 ('set', 'VBN'),
 ('featured', 'VBN'),
 ('which', 'WDT'),
 ('who', 'WPS'),
 ('when', 'WRB')]

Answered by: Paul159 | Posted: 07-01-2022



Answer 2

NLP in general is very useful so you might want to broaden your search to general application of text analytics. I used NLTK to aid MOSS 2010 by generating file taxonomy by extracting concept maps. It worked really well. It doesn't take long before files start to cluster in useful ways.

Often times to understand text analytics you have to think in tangents to the ways you are used to thinking. For example, text analytics is extremely useful to discovery. Most people, though, don't even know what the difference is between search and discovery. If you read up on those subjects you will likely "discover" ways in which you might want to put NLTK to work.

Also, consider your world view of text files without NLTK. You have a bunch of random length strings separated by whitespace and punctuation. Some of the punctuation changes how it is used such as the period (which is also a decimal point and a postfix marker for an abbreviation.) With NLTK you get words and more to the point you get parts of speech. Now you have a handle on the content. Use NLTK to discover the concepts and actions in the document. Use NLTK to get at the "meaning" of the document. Meaning in this case refers to the essencial relationships in the document.

It is a good thing to be curious about NLTK. Text Analytics is set to breakout in a big way in the next few years. Those who understand it will be better suited to take advantage of the new opportunities better.

Answered by: Richard193 | Posted: 07-01-2022



Answer 3

I'm the author of streamhacker.com (and thanks for the mention, I get a fair amount of click traffic from this particular question). What specifically are you trying to do? NLTK has a lot of tools for doing various things, but is somewhat lacking clear information on what to use the tools for, and how best to use them. It's also oriented towards academic problems, and so it can be heavy going to translate the pedagogical examples to practical solutions.

Answered by: Kellan654 | Posted: 07-01-2022



Similar questions

python - Practical examples on redis and celery

I am new to redis and celery. I have gone through the basic tutorial of both, but I am not getting how to implement then in task scheduling job I am unable to start with the scripting part. I am not getting how to write a script to make a queue, run the workers etc. I would need a practical example


python - Practical examples on redis and celery

I am new to redis and celery. I have gone through the basic tutorial of both, but I am not getting how to implement then in task scheduling job I am unable to start with the scripting part. I am not getting how to write a script to make a queue, run the workers etc. I would need a practical example


python - Where to learn Pyramid by following practical examples?

Closed. This question does not meet Stack Overflow guid...


java - Practical GUI toolkit?

I am thinking about cross-platform with nice programming language bindings (Java, Ruby and Python). What would be the "flattest" learning curve but yet enough powers to perform most of the standard GUI features? What would you guys/gals recommend; FOX, wx,


Practical point of view: Why would I want to use Python with C++?

I've been seeing some examples of Python being used with c++, and I'm trying to understand why would someone want to do it. What are the benefits of calling C++ code from an external language such as Python? I'd appreciate a simple example - Boost::Python will do


Why is it not possible to create a practical Perl to Python source code converter?

It would be nice if there existed a program that automatically transforms Perl code to Python code, making the resultant Python program as readable and maintainable as the original one, let alone working the same way. The most obvious solution would just invoke perl via Python utils: #!/usr/bin/python os.exec("tail -n -2 "+__file__+" | perl -") ...the rest of file is the original perl p...


Python __call__ special method practical example

I know that __call__ method in a class is triggered when the instance of a class is called. However, I have no idea when I can use this special method, because one can simply create a new method and perform the same operation done in __call__ method and instead of calling the instance, you can call the method. I would really appreciate it if someone gives me a practical usage of this speci...


python - Practical Django Projects - Pages 71 and 80

I am reading the book "Practical Django Projects". It is a nice book. I have a few questions though : On page 71, there is the following code : from django.conf.urls.defaults import * from django.contrib import admin admin.autodiscover() from coltrane.models import Entry entry_info_dict = { 'queryset': Entry.objects.all(), 'date_field': 'pub_date', } ... However no v...


python - Practical Django Projects - Page 168

On page 168, there are two pieces of code : def clean_password2(self): if self.cleaned_data['password1'] != self.cleaned_data['password2']: raise forms.ValidationError("You must type the same password each time") return self.cleaned_data['password2'] def clean(self): if 'password1' in self.cleaned_data and 'password2' in self.cleaned_data: if self.cleaned_data['password1'] != self.clea...


Is it practical to put Python under version control?

Where practical, I like to have tools required for a build under version control. The ideal is that a fresh checkout will run on any machine with a minimal set of tools required to be installed first. Is it practical to have Python under version control? How about python packages? My naive attempt to use Sphinx without installing it fails due to a dependency on Docutils. Is there a way to do use it without ...


debian - Will it be practical to implement deb preinst, postint, etc. scripts in Python, not in sh

I'm interested in what pitfalls can be (except Python is not installed in target system) when using Python for deb package flow control scripts (preinst, postinst, etc.). Will it be practical to implement those scripts in Python, not in sh? As I understand it's at least possible.


python - Practical examples on redis and celery

I am new to redis and celery. I have gone through the basic tutorial of both, but I am not getting how to implement then in task scheduling job I am unable to start with the scripting part. I am not getting how to write a script to make a queue, run the workers etc. I would need a practical example


python - What is the simple and practical example for using formatter module?

I found a formatter module in Python 3.4 (Lib/formatter.py). I can import it. import formatter Then what? What is the purpose of this module? There is no unit test for this module so I can not find any examples. The documentation is very cryptic. """Generic output formatting. Formatter objects transform an abstract flow of formatting events into specific output events on w...






Still can't find your answer? Check out these communities...



PySlackers | Full Stack Python | NHS Python | Pythonist Cafe | Hacker Earth | Discord Python



top