Stata/Natural Language Processing
From Wikibooks, open books for an open world
< Stata
Contents |
[edit] Reading a text file
If lines are short (less than the 244 string characters), one can use insheet. This command will read the text file into Stata's memory.
. insheet using toto.txt, clear
[edit] String functions
First have a look at the list of string functions already included in Stata.
. h string functions
[edit] Regular Expressions
Stata includes commands for regular expressions regexm(), regexr() and regexs().
[edit] Wordscores
Ken Benoit, Michael Laver and Will Lowe have developed wordscores, a set of Stata command which read textfiles, count the frequency of each word and compute some index of similarity between texts.
This page may need to be