# Interrupted Time Series Analysis Using STATA* Professor ... ... 1 Interrupted Time Series Analysis...

date post

11-Feb-2021Category

## Documents

view

0download

0

Embed Size (px)

### Transcript of Interrupted Time Series Analysis Using STATA* Professor ... ... 1 Interrupted Time Series Analysis...

1

Interrupted Time Series Analysis Using STATA*

Professor Nicholas Corsaro School of Criminal Justice University of Cincinnati

*Lecture Presented at the Justice Research Statistics Association (JRSA) Conference, Denver, CO.

2

Introduction: As a starting framework, let’s consider our understanding of research designs. We

want to test whether (and to what extent) some social mechanism (i.e., intervention or policy)

influences an outcome of interest.

A highly important CJ example, “does police officer presence influence crime and disorder in

hotspots?” The Minneapolis and Kansas City patrol (replication) experimental designs showed

us that indeed police officer presence corresponds with a reduction in crime and disorder. The

researchers used an experimental design to assess police program impact.

Experimental Designs

The value of true experiments is that they are the best design available for us to test:

1) Whether there is an empirical association between an independent and dependent

variable,

2) Whether time-order is established in that the change in the independent variable(s)

occurred before change in the dependent variable(s),

3) Whether or not there was some extraneous variable that influenced the relationship

between the independent and dependent variable.

These designs also help isolate the mechanism of change and control for contextual influences.

However, sometimes our studies under investigation do not allow for the creation of treatment

and control groups with randomization. There can be practical and ethical barriers to

randomization (e.g., large scale interventions, sex offender programs, etc.).

When this is the case, we typically rely on the most rigorous quasi-experimental design

available to us to control for these same threats to validity.

3

Quasi-Experimental Designs (see Cook and Campbell (1979) for a more systematic review)1:

1. Uncontrolled before- and after- designs – measure changes in an outcome before/after

the introduction of a treatment – and thus any change is presumed to be due to the

intervention (t-test).

2. Controlled before- and after- designs – a control population is identified (either before

the intervention or using an ex post facto design) – a between group difference estimator

is used to assess intervention effect (e.g., difference-in-difference estimate).

3. Time series designs – attempts to assess whether an intervention had an effect

significantly greater than the underlying trend. The pre-intervention serves as the control.

So, when deciding to use an interrupted time series design, we essentially have a before and

after design without a control group. In real life, there may be a large scale program or

intervention where no suitable comparison group can be identified (e.g., the financial Troubled

Assets and Relief Program (TARP) bailouts in 2008 and 2009 that occurred nationwide).

To use a criminal justice example, the Cincinnati Initiative to Reduce Violence (CIRV) was a

focused deterrence police-led intervention that took place across the entire city of Cincinnati

(rather than within a specific neighborhood, as was the case with the ‘hotspots’ policing

interventions mentioned above).

The absence of a control group might lead researchers to simply conduct an ‘uncontrolled

before/after design’ – but what is the problem here?

With this type of design there are several threats to internal validity such as history, regression

to the mean, contamination, external event effects, etc.

Certainly using either controlled before/after designs or time series designs does not eliminate

these threats to internal validity – but they can minimize their potential influence, provided

certain theoretical, empirical, and statistical assumptions are met. 1 Cook, T.D. & Campbell, D.T. (1979). Quasi-Experimentation: Design and Analysis for Field Settings. Boston, MA: Houghton-Mifflin.

4

Time Series Designs Explained

Time series designs attempt to detect whether an intervention has a significant effect on an

outcome above and beyond some underlying trend. Thus, time series designs increase our

confidence with which the estimate of effect can be attributed to an intervention (though as noted

earlier, they do not control for extraneous influences that occur uniquely between the pre/post

intervention period and go unaccounted for, or unmeasured).

In the time series design, data are collected at multiple time points (a standard ‘rule of thumb’ is

roughly 40-60 observations – evenly split between pre/post intervention, though in actuality

more pre/intervention measures are typically most important).

We need a sufficient number of observations in order to obtain a stable estimate of the

underlying trend. If the post-intervention estimate (or slope) falls outside of the confidence

interval of what the expected outcome would have been absent the intervention (i.e., the

underlying trend) we are more confident in concluding the intervention had an impact on the

outcome- and this change in the outcome is not likely due to chance.

Figure 1: Visual Display of Time Series Design

5

Interrupted time series (e.g., Figure 1) is a special case of the time series design.

The following is typically required of this design:

A) The treatment/intervention must occur at a specific point in time,

B) The series (outcome) is expected to change immediately and abruptly as a result of the

intervention (though alternative functional forms can be fit).

C) We have a clear pre-intervention functional form

D) We have many pre-intervention observations

E) No alternative (unmeasured factor) causes the change in the outcome

What are some real world constraints to interrupted time series?

A) Long span of data are not always available

B) Measurement can change (e.g., domestic violence, gang or terrorism related incidents)

C) Implementation of an intervention can span several periods (no unique onset)

D) Instantaneous and abrupt effects are not always observed (alternative models are

complicated)

E) Effect sizes are typically small

By understanding these assumptions and potential pitfalls, we have a solid foundation to move

into actually modeling time series data.

In this class, we are going to cover two time series approaches using STATA software.

1 – Autoregressive Integrated Moving Average (ARIMA) Time Series Analysis

2 – Maximum Likelihood Time Series Analysis (Poisson and Negative Binomial Regression)

Each of these approaches has strengths and limitations – based on assumptions of the models.

But, before we go into detail for these models, let’s review how to open, operate and designate longitudinal data in STATA.

6

Open the “Cincinnati Only” SPSS data (to visually see the variables in ASCII format)

These data were collected as part of a citywide police initiative designed to reduce vehicle

crashes. The onset period for the intervention was September 2006 (see Gerard et al., 2012 –

Police Chief).

Site = 1 (1 = Cincinnati).

Injuries, Lninjuries (Log number of monthly injuries), and fatals are all monthly vehicle crash

counts (outcome variables).

Gas = fuel price average per month in Cincinnati (control variable to adjust for potential

exposure).

String_date = month-year (as is the case in Excel and SPSS). Stata will always read this type of

measure as a string.

Monthly Dummy variables.

I created MONTH and YEAR variables (Jan = 1, Feb = 2, etc. for all months, and YEAR =

actual year). These numeric measures allow for the creation of a monthly time variable.

In order to designate the data as a MONTHLY TIME SERIES in STATA– its easiest to

CREATE a DATE variable in STATA from numeric variables.

7

OPEN THE STATA FILE “Cincinnati Only”

Type the following:

gen date = ym(year, month) [Generates new variable called “date”]

format date %tm [Formats the new date measure as a time variable]

List date [lists the dates you created – just as a check]

tsset date, monthly [designates time series, date variable in monthly format]

*Delta = the gap between measures (1 month)*

Note: At the beginning of every time series analysis (i.e., every time you open a new time series file)– be sure to run the tsset command first.

Also, the lninjuries variable was already created. You can create your own logged variable in stata (just as a guide for the future).

gen loginjuries = log(injuries)

8

ALWAYS GET TO KNOW YOUR DATA FIRST!!

summarize injuries lninjuries loginjuries fatals gas interven string_date month year date,

detail

Outcomes

It’s clear th

*View more*