User:Pfctdayelise/Using Wikipedia as a resource for computational linguistics
The aim of this book is to outline some areas of research in computational linguistics or natural language processing where Wikipedia, and by extension the other Wikimedia projects, have the potential to be valuable resources. It is not intended to serve as an introduction to either of these fields and does not assume any knowledge of Wikipedia.
- 1 Description of Wikimedia projects
- 2 Description of the English Wikipedia
- 3 Possible tasks
Description of Wikimedia projects
Witkionary, Wikinews, Commons, Wikibooks, Wikisource, Wikiquote. Languages. Meta & Commons direct translations of help (etc) pages. Who contributes? growth.
Description of the English Wikipedia
- Accessible - dumps
- Coverage - biased to pop-culture and geek topics (best coverage), wikiprojects
- Format - MOS - but not reliable
- FAs, cleanup tags
Word sense disambiguation
Word and phrase translation
Web mining, data mining
Geospatial term disambiguation and named entity recognition
Image analysis (?)
Synonymy, abbreviations (RDRs)
- http://wm.sieheauch.de/?p=48 papers
- Wiki Research Bibliography
- Michael Strube
- "Web corpus mining by instance of Wikipedia"
- Video mining