R Programming/Introduction

From Wikibooks, open books for an open world
Jump to: navigation, search

What is R ?[edit]

R is statistical software which is used for data analysis. It includes a huge number of statistical procedures such as t-test, chi-square tests, standard linear models, instrumental variables estimation, local polynomial regressions, etc. It also provides high-level graphics capabilities. There are a few minor similarities between R and C programing languages, but they both run in different ways.

Why use R?[edit]

  • R is free software. R is an official GNU project and distributed under the Free Software Foundation General Public License (GPL).
  • R is a powerful data-analysis package with many standard and cutting-edge statistical functions. See the Comprehensive R Archive Network (CRAN)'s Task Views to get an idea of what you can do with R.
  • R is a programming language, so its abilities can easily be extended through the use of user-defined functions. A large collection of user-contributed functions and packages can be found in CRAN's Contributed Packages.
  • R is widely used in political science, statistics, econometrics, actuarial sciences, sociology, finance, etc.
  • R is available for all major operating systems (Windows, Mac OS, GNU-Linux).
  • R is object-oriented. Virtually anything (e.g., complex data structures) can be stored as an R object.
  • R is a matrix language.
  • R syntax is much more systematic than Stata or SAS syntax.
  • R can be installed on your USB stick[1].

Alternatives to R[edit]

  • S-PLUS is a commercial version of the same S programming language that R is a free version of.
  • Gretl is free software for econometrics. It has a graphical user interface and is nice for beginners.
  • SPSS is proprietary software which is often used in sociology, psychology and marketing. It is known to be easy to use.
  • GNU PSPP is a free-software alternative to SPSS.
  • SAS is proprietary software that can be used with very large datasets such as census data.
  • Stata is proprietary software that is often used in economics and epidemiology.
  • MATLAB is proprietary software used widely in the mathematical sciences and engineering.
  • Octave is free software similar to MATLAB. The syntax is the same and MATLAB code can be used in Octave.
  • Python is a general programming language. It includes some specific libraries for data analysis such as Pandas[2] ·[3].

Beginners can have a look at GNU PSPP or Gretl. Intermediate users can check out Stata. Advanced users who like matrix programming may prefer MATLAB or Octave. Very advanced users may use C or Fortran.

See also[edit]

R programming style[edit]

  • R is an object oriented programming language. This means that virtually everything can be stored as an R object. Each object has a class. This class describes what the object contains and what each function does with it. For instance, plot(x) produces different outputs depending on whether x is a regression object or a vector.
  • The assignment symbol is "<-". Alternatively, the classical "=" symbol can be used.

The two following statements are equivalent :

 > a <- 2
 > a = 2
  • Arguments are passed to functions inside round brackets (parentheses).
  • One can easily combine functions. For instance you can directly type
  • The symbol "#" comments to the end of the line:
 # This is a comment
 5 + 7 # This is also a comment
  • Commands are normally separated by a newline. If you want to put more than one statement on a line, you can use the ";" delimiter.
 a <- 1:10 ; mean(a)
  • You can also have one statement on multiple lines.
  • R is case sensitive: a and A are two different objects.
  • Traditionally underscores "_" are not used in names. It is often better to use dots ".". One should avoid using an underscore as the first character of an object name.

How you can help[edit]

Here are some things editors do to keep this book internally consistent. If you have something to contribute, go ahead and make your contribution. Other editors can touch up your edits afterwards so that they conform to the guidelines.

The local manual of style WB:LMOS for the R programming book, including a brief explanation of why we do it that way, is:

  • Examples use "source" tags : <source lang="rsplus"> a <- 1:10 ; mean(a) </source>. That makes them look pretty to our readers.
  • The name of packages are in bold  : '''Hmisc'''.
  • Name of functions are in "code" tags: <code>lm()</code>.
  • Page titles -- the part after "R Programming/" -- are in sentence case, like "R Programming/Working with data frames". We couldn't decide between sentence case and title case, so I flipped a coin.
  • Every page has <noinclude>{{R Programming/Navigation}}</noinclude> at the top and {{R Programming/Navbar|Mathematics|Probability Distributions}} at the bottom. That makes it easier to navigate from one page to another online.

See Also[edit]


Index Next: Sample Session