# R Programming/Tobit And Selection Models

## Tobit (type 1 Tobit)

In this section, we look at simple tobit model where the outcome variable is observed only if it is above or below a given threshold.

• tobit() in the AER package. This is a wrapper for survreg().
```N <- 1000
u <- rnorm(N)
x <- - 1 + rnorm(N)
ystar <- 1 + x + u
y <- ystar*(ystar > 0)
hist(y)

ols <- lm(y ~ x)
summary(ols)
#Plot a correlation matrix and scatter plot
library(GGally)
library(ggplot2)
library(ggfortify)
ggcorr(DATA)
ggpairs(DATA)
#
M<lm(y~.)
library(ggfortify)
autoplot(M, label.size = 3)
#

library(AER)
tobit <- tobit(y ~ x,left=0,right=Inf,dist = "gaussian")
```

## Selection models (type 2 tobit or heckit)

In this section we look at endogenous selection process. The outcome y is observe only if d is equal to one with d a binary variable which is correlated with the error term of y.

• heckit() and selection() in sampleSelection . The command is called `heckit()` in honor of James Heckman.
```N <- 1000
u <- rnorm(N)
v <- rnorm(N)
x <- - 1 + rnorm(N)
z <- 1 + rnorm(N)
d <- (1 + x + z + u + v> 0)
ystar <- 1 + x + u
y <- ystar*(d == 1)
hist(y)

ols <- lm(y ~ x)
summary(ols)

library(sampleSelection)
heckit.ml <- heckit(selection = d ~ x + z, outcome = y ~ x, method = "ml")
summary(heckit.ml)

heckit.2step <- heckit(selection = d ~ x + z, outcome = y ~ x, method = "2step")
summary(heckit.2step)
```

## Multi-index selection models

In this section we look at endogenous selection processes in matching markets. Matching is concerned with who transacts with whom, and how. For example, which students attend which college. The outcome y is observed only for equilibrium student-college pairs (or matches). These matches are indicated with d equal to one with d a binary variable which is correlated with the error term of y.

• stabit() and stabit2() in matchingMarkets. The command is called `stabit()` in reference to the application in stable matching markets.

Simulate two-sided matching data for 20 markets (m=20) with 100 students (nStudents=100) per market and 20 colleges with quotas of 5 students, each (nSlots=rep(5,20)). True parameters in selection and outcome equations are all equal to 1.

```library(matchingMarkets)
xdata <- stabsim2(m=20, nStudents=100, nSlots=rep(5,20),
colleges = "c1",
students = "s1",
outcome = ~ c1:s1 + eta + nu,
selection = ~ -1 + c1:s1 + eta
)
```

Observe the bias from sorting between students and colleges.

```lm1 <- lm(y ~ c1:s1, data=xdata\$OUT)
summary(lm1)
```

Correct for sorting bias by running the Gibbs sampler in Sorensen (2007).

```fit2 <- stabit2(OUT = xdata\$OUT,
colleges = "c1",
students = "s1",
outcome = y ~ c1:s1,
selection = ~ -1 + c1:s1,
niter=1000
)
summary(fit2)
```

## Truncation

• truncreg package
• DTDA "An R package for analyzing truncated data" pdf.