R Programming/Introduction

From Wikibooks, open books for an open world
Jump to navigation Jump to search

What is R ?[edit | edit source]

R is statistical software which is used for data analysis. It includes a huge number of statistical procedures such as t-test, chi-square tests, standard linear models, instrumental variables estimation, local polynomial regressions, etc. It also provides high-level graphics capabilities. There are a few minor similarities between R and C programming languages, but they both run in different ways.

Why use R?[edit | edit source]

  • R is free software. R is an official GNU project and distributed under the Free Software Foundation General Public License (GPL).
  • R is a powerful data-analysis package with many standard and cutting-edge statistical functions. See the Comprehensive R Archive Network (CRAN)'s Task Views to get an idea of what you can do with R.
  • R is a programming language, so its abilities can easily be extended through the use of user-defined functions. A large collection of user-contributed functions and packages can be found in CRAN's Contributed Packages.
  • R is widely used in political science, statistics, econometrics, actuarial sciences, sociology, finance, etc.
  • R is available for all major operating systems (Windows, Mac OS, GNU-Linux).
  • R is object-oriented. Virtually anything (e.g., complex data structures) can be stored as an R object.
  • R is a matrix language.
  • R syntax is much more systematic than Stata or SAS syntax.
  • R can be installed on your USB stick[1].

Alternatives to R[edit | edit source]

  • S-PLUS is a commercial version of the same S programming language that R is a free version of.
  • Gretl is free software for econometrics. It has a graphical user interface and is nice for beginners.
  • SPSS is proprietary software which is often used in sociology, psychology and marketing. It is known to be easy to use.
  • GNU PSPP is a free-software alternative to SPSS.
  • SAS is proprietary software that can be used with very large datasets such as census data.
  • Stata is proprietary software that is often used in economics and epidemiology.
  • Julia is a general programming language, with capabilities similar to MATLAB, R and Python (and speed of C), and can call libraries from all those.
  • MATLAB is proprietary software used widely in the mathematical sciences and engineering.
  • Octave is free software similar to MATLAB. The syntax is the same and MATLAB code can be used in Octave.
  • Python is a general programming language. It includes some specific libraries for data analysis such as Pandas[2] ·[3].

Beginners can have a look at GNU PSPP or Gretl. Intermediate users can check out Stata. Advanced users who like matrix programming may prefer MATLAB or Octave. Very advanced users may use C or Fortran.

See also[edit | edit source]

R programming style[edit | edit source]

  • R is an object oriented programming language. This means that virtually everything can be stored as an R object. Each object has a class. This class describes what the object contains and what each function does with it. For instance, plot(x) produces different outputs depending on whether x is a regression object or a vector.
  • The assignment symbol is "<-". Alternatively, the classical "=" symbol can be used.

The two following statements are equivalent :

 > a <- 2
 > a = 2
  • Arguments are passed to functions inside round brackets (parentheses).
  • One can easily combine functions. For instance you can directly type
mean(rnorm(1000)^2)
  • The symbol "#" comments to the end of the line:
 # This is a comment
 5 + 7 # This is also a comment
  • Commands are normally separated by a newline. If you want to put more than one statement on a line, you can use the ";" delimiter.
 a <- 1:10 ; mean(a)
  • You can also have one statement on multiple lines.
  • R is case sensitive: a and A are two different objects.
  • Traditionally underscores "_" are not used in names. It is often better to use dots ".". One should avoid using an underscore as the first character of an object name.
 1:10 |> mean(.)
  • You can also use the pipe operator |>.

How you can help[edit | edit source]

Here are some things editors do to keep this book internally consistent. If you have something to contribute, go ahead and make your contribution. Other editors can touch up your edits afterwards so that they conform to the guidelines.

The local manual of style WB:LMOS for the R programming book, including a brief explanation of why we do it that way, is:

  • Examples use "source" tags : <syntaxhighlight lang="rsplus"> a <- 1:10 ; mean(a) </syntaxhighlight>. That makes them look pretty to our readers.
  • The name of packages are in bold  : '''Hmisc'''.
  • Name of functions are in "code" tags: <code>lm()</code>.
  • Page titles -- the part after "R Programming/" -- are in sentence case, like "R Programming/Working with data frames". We couldn't decide between sentence case and title case, so I flipped a coin.
  • Every page has <noinclude>{{R Programming/Navigation}}</noinclude> at the top and {{R Programming/Navbar|Mathematics|Probability Distributions}} at the bottom. That makes it easier to navigate from one page to another online.

See Also[edit | edit source]

References[edit | edit source]

  1. Portable R by Andrew Redd http://sourceforge.net/projects/rportable/
  2. "Python Data Analysis Library". pandas.pydata.org/. Retrieved February 14, 2013.
  3. "Getting started with Pandas". blog.kaggle.com. January 17, 2013. Retrieved February 14, 2013.
Index Next: Sample Session