The National Highway Traffic Safety
Administration (NHTSA) and the Fatality Analysis
Reporting Systems (FARS) provided the data required to examine the relationship
between speed limit and accidents. As a part of the U.S. Department of
Transportation, the agency started gathering data in 1975 on accidents. The
data is gathered from all 50 States on all motor vehicle crashes. It is collected from numerous sources to
include police crash records, state highway department data, and vital state
statistic records. Utilizing the sources an analyst interprets and codes data
directly onto an electronic data file. (Transportation, 2017)
A limitation of this data is that the
source documents have the potential to be biased. When completing accident
reports, if speeding is suspected in the accident it is marked as a factor in
the records. This has a significant chance of being subjective by the person completing
the paperwork. Unless there was a speed detection device present, the factor of
speeding relies solely on personal reporting or subjective analysis. Another
level of collection bias comes from the interpretation of speeding under
current conditions. The appropriate speed for road conditions is subject to
each driver, and it is likely to be over reported in the event of an accident.
It is easy to attribute an accident to speeding if the weather is poor. Under
such a scenario, speeding will be overreporting and biased.
The data from NHTSA has a potential
for measurement error throughout its collection process. From the law enforcement
officer completing the accident report to the analyst updating the reporting
system. Mistakes are common and can be aggregated across the state and nation. The
Fatatily reporting system has automatic detection software that is utilized
during inputting to minimize the chance of inputting errors, but I suspect that
it is unlikely a system to monitor the reporting at the law enforcement officer
level. Additionally, from state to state and county to county, accident reports
are likely completed with different standards. The information and frequency of
what gets reported is likely to differ from one unit to the next. These factors
lead to biases in the data parameters and need to be considered in final
In order to prepare the raw accident
data for regression analysis, data is aggregated at the state level. The
numerous accident observations are combined into aggregate observations. This
dramatically reduces the number of data observations and limits the predictive
capabilities of the analysis and reliability of the research conclusion.
order to minimize the data limitations, a fixed effect panel data regression is
going to be chosen as the empirical method for analysis. As discussed prior,
the data suffers from the aggregation of accidents into observations for this
analysis. However, this is outweighed by the benefits of the panel data
regression. Panel Data offer some important advantages over cross-sectional
only alone. It allows for simplifying statistical inference in many cases
through the use of the multi-dimensional method of panel data. Using this
empirical method will minimize the bias inherent in specific state data. Panel
data gives more informative data, more variability, less collinearity among the
alternate method of analysis is timeseries analysis. However, timeseries research
is commonly beset with multicollinearity. This is less common in panel data across
states. Panel data controls for variables that cannot be observed or
measured and variables that change over time but not across entities.
With the benefits
and limitations of panel data in mind, the model formulated for this research
is as follows:
?0 + ?1 (Roadway) + ?2(Roadside)
+ ?3(Atmospheric) + ?4(Intersection)
Fatalities – Total number of fatally injured persons in
Roadway – Total number of accidents occurring on
Roadside – Total number of accidents occurring on
Atmospheric – Total number of accidents involving
prevailing atmospheric conditions of Snow/Rain/Fog at the time of the crash
Intersection- Total Number of accidents involving
number of accidents where driver’s speed was related to the crash as indicated
by law enforcement.
results from Stata software using data from NHTSA provides limited valuable
information. Suggesting no relationship between the distance to an intersection
and the likelihood of a fatality. The analysis provided the following output:
15.59 + 1.03 (Roadway) + .72(Roadside) +
.37(Atmospheric) -.17(Intersection) + .12(Speeding).
There is little intuitive interpretation to the coefficients.
Roadway and Roadside are the only statistical significant variables in the
regression and provide no insight into accidents, speed limits and fatalities.
Some portion of accidents on the roadway are obviously going to result in fatalities.
The results of the initial regression provide no insights into the overall
hypothesis because there is no significance to the intersection nor speeding.
The results provide inconclusive evidence as to whether the relation to
intersections impact fatalities.
As a general assumption it is believed that
the time invariant factor of the error term is correlated with the independent
variables in the regression model. This is to say that there is a violation of the
assumption that the error term is uncorrelated with the independent variables.
This violation would result in biased parameter estimates and estimates that
are not BLUE. This was resolved using fixed effects GLS regression. The
intuition behind this estimation method is that the estimated coefficients for
the individual dummy variables provide estimates for each individual, thereby
providing a simple way of removing the individual component from the error term
and putting it directly into the regression model. Doing so increases the
efficiency of our estimates by exploiting the panel nature of the data to resolve
the unobserved component of the error term that does not change over time.
Although panel data regressions,
limit the impact of multicollinearity, after reviewing the collinearity in the
variables resulted in the variables Roadway and Intersection being correlated
at a .9755 level. In hindsight, it seems
obvious that the two variables are highly related and likely redundant and create
a over-specified equation. Almost near multicollinearity violates on of OLS
assumptions. The greater the multicollinearity, the greater the level standard
errors. When high multicollinearity is present, confidence intervals for
coefficients tend to be very wide.
After testing for multicollinearity
and maintaining the fixed effects regression an updated regression was produced
= ?0 + ?1(Atmospheric) +
?2(Intersection) + ?3(Speeding)
results provided the following coefficients:
30.35 + 1.29(Atmospheric) + 1.34(Intersection)
of Intersection and speeding where significant at the 5% and 10% level,
respectively. The secondary results suggest that accidents involving
intersections result in greater fatalities. The removal of the variable Roadway
affected the coefficient of intersection and making it statistically significant
in the secondary regression. The preliminary results provided no insights into
the interaction of intersections in fatalities and the secondary results
suggest that they are an important factor in statewide fatalities.