Stata/Natural Language Processing

From Wikibooks, open books for an open world
< Stata
Jump to: navigation, search

Contents

[edit] Reading a text file

If lines are short (less than the 244 string characters), one can use insheet. This command will read the text file into Stata's memory.

. insheet using toto.txt, clear

[edit] String functions

First have a look at the list of string functions already included in Stata.

. h string functions

[edit] Regular Expressions

Stata includes commands for regular expressions regexm(), regexr() and regexs().

[edit] Wordscores

Ken Benoit, Michael Laver and Will Lowe have developed wordscores, a set of Stata command which read textfiles, count the frequency of each word and compute some index of similarity between texts.

Personal tools
Namespaces
Variants
Actions
Navigation
Community
Toolbox
Sister projects
Print/export