When the web application is launched from Improving agricultural injury surveillance: comparing injuries captured Iowa’s Workers’ Compensation and trauma registry data, one will be initially directed to a couple of user guides. The first guide, titled Web Application User Guide is this current page, provides a quick overview of the modeling framework and a detailed description of the inner workings of the application. Navigate to the Modeling Framework tab for see a more detailed description of the model construction and the calculations of the predicted probabilities. Access to the web application can be found in the Web Application tab. This application loads a Posit Shiny application that calculates the predicted probabilities of being in Iowa’s Workers’ Compensation database and Iowa’s Trauma Registry. Remaining information in this guides provides details on how the web application works. First, a quick overview of the model used to calculate the predicted probabilities is provided to enhance the understanding of the application basics.
We developed a logistic regression model to probabilistically predict which data source captured an injured person, given their injury characteristics and demographics. The predictive model used explanatory variables found in both data sources: age, gender, injury cause, and injury nature, and modeled the event that an injured person’s information was recorded in the Workers’ Compensation data. For categorical predictors (gender, injury cause, and injury nature), we used effects coding instead of reference coding, so the average effect would be zero.
The Web Application tab launches the application to calculate the predicted probabilities of being in Iowa’s Workers’ Compensation database and Trauma Registry. The application has two panels. The left panel receives the input from the user, while the right panel displays the results based on the information given. For this application, predicted probabilities are calculated based on sex, age, and cause and nature of the injury. The starting values for sex, cause of injury, and nature of the injury are Average of effect. This value is a by-product of effects coding in the modeling framework, allowing probabilities to be calculated while averaging all the possible values for the predictors. So, at any time, if one wants to obtain a predicted probability or a set of predicted probabilities while considering an average effect of a categorical variable, then select the Average of effect option for the variable of interest. Initially, the age of a person’s is excluded in the probability calculation.
The left panel of the application takes a select set of inputs from the user. There is a drop-down menu of all possible choices for the categorical variables (sex, cause of injury, and nature of injury). For the purpose of exploring the user inputs, consider the following example. Suppose one is interested in finding the predicted probability that a male who fell and fractured his arm is in the workers’ compensation database. Then, all one would need to do is select Male in the Sex drop-down menu, Falls in the Cause of Injury drop-down menu, and Fractures/dislocation/sprains/strains in the Nature of Injury drop-down menu. Age is the only quantitative variable in the model and application. For specifying age, there are two age sliders titled Minimum and Maximum Age and a checkbox denoting whether age should be included in the probability calculation. If age is of interest, then there are several points to consider. First, the Include age in calculation of predicted probabilities checkbox must be selected. Second, the minimum and maximum age must be set to some value within the range of possible ages. By default, the minimum and maximum ages are set to 45, the mean and median age of the combined data sources. Suppose the two age sliders are assigned to different values, and the checkbox indicating whether multiple predicted probabilities should be calculated is not marked. In that case, the age used is the average of the minimum and maximum age selected. Next, to obtain a range of predicted probabilities over various ages, the Calculate multiple predicted probabilities checkbox must be selected, and the minimum and maximum ages must differ. The application will flip the two ages assigned in the background if the minimum age exceeds the maximum. Combining all the above steps for specifying injury characteristics and demographics will provide predicted probabilities specific to those characteristics of interest.
Regardless of the injury characteristics and demographics specified
in the user input panel, the results section (right panel) will have the
same general setup. There will be columns denoting the data source of
the injury case (Source), sex (Sex), cause of
injury (Cause), nature of injury (Nature), age
(Age), predicted probability (Probability),
and \(95\%\) Wald confidence interval
(LB and UB). By default, sex, age, cause, and
nature of the injury will have empty fields. This occurs because the
categorical variables have initial values of Average of
effect, and age is not denoted to be included in the
calculation. Probabilities for Iowa’s Workers’ Compensation database
(Workers' compensation) and Trauma Registry
(Trauma registry) are displayed for each probability
calculation. As injury characteristics and demographics are defined,
their associated columns in the results section will be filled in, and
the probabilities (predicted and confidence interval) will be
updated.
Currently, the web application does not allow the user to download
the predicted probabilities calculated. Therefore, to obtain the
probabilities for external use, installation of the
agstudy1app,https://github.com/deboonstra/agstudy1app, package is
required. Then, simply call the agstudy1app::pred_prob()
function with the injury characteristics of interest. If one choose to
run the web application on their local machine, they will need to
git clone the agstudy1app
repository to their desired location. Then, in an interactive session of
R, where agstudy1app is the current working
directory, execute the following command.
agstudy1app::launch()
Authors: Joseph E. Cavanaugh and D. Erik Boonstra
We developed a logistic regression model to probabilistically predict
which data source captured an injured person, given their injury
characteristics and demographics. The predictive model used explanatory
variables found in both data sources: age, gender, injury cause, and
injury nature, and modeled the event that an injured person’s
information was recorded in the Workers’ Compensation (WC) data. For
categorical predictors (gender, injury cause, and injury nature), we
used effects coding instead of reference coding, so that the average
effect would be zero. To demonstrate how effect parameterization works,
consider the variable sex, which has two levels (male and female).
Different from reference coding, effects coding uses two parameters
\(\left(\beta^{S}_{M} \text{ and
}\beta^{S}_{F}\right)\) to characterize the effect of sex. What
allows one to “average over” the sex effect is that effect
parameterization assumes that \[\beta^{S}_{M}
+ \beta^{S}_{F} = 0.\] As seen in the accompanied paper,
Improving agricultural injury surveillance: comparing injuries
captured Iowa’s Workers’ Compensation and trauma registry data, to
this web application the estimated sex effects are 0.3959 for females
and -0.3959 for males, which sums to zero, as one would hope. The
logistic regression model implemented this parameterization for the
other categorical predictors (cause and nature of the injury). Cause of
injury has a total of 8 levels and nature of injury has 6 levels which
means \[
\begin{aligned}
\sum_{j = 1}^{8}\beta^{C}_{j} &= 0 \text{ and} \\
\sum_{k = 1}^{6}\beta^{N}_{k} &= 0,
\end{aligned}
\] just as shown with the predictor sex. For more information on
the specific levels on the cause and nature injury of variables see the
R documentation for
?agstudy1app::iwc_itr_only.
Based on the predictive model, we estimated the probabilities of an agricultural injury being captured in either the trauma registry or workers’ compensation repositories. Using the above notation: let \(\hat{\alpha}\) denote the estimate of the model intercept term, \(\hat{\beta}^{A}\) denote the estimate for age effect, and \(\hat{\cdot}\) denote the estimates of \(\beta^{S}_{i}, \beta^{C}_{j}, \text{ and } \beta^{N}_{k}\). Then, the probability that an injury case is predicted to be in Iowa’s statewide workers’ compensation database is \[\hat{\pi}(\mathbf{x}) = \frac{\exp\left(\hat{\alpha} + \hat{\beta}^{S}_{M}x^{S}_{M} + \hat{\beta}^{S}_{F}x^{S}_{F} + \hat{\beta}^{C}_{1}x^{C}_{1} + \ldots + \hat{\beta}^{C}_{8}x^{C}_{8} + \hat{\beta}^{N}_{1}x^{N}_{1} + \ldots + \hat{\beta}^{N}_{6}x^{N}_{6} + \hat{\beta}^{A}x^{A}\right)}{1 + \exp\left(\hat{\alpha} + \hat{\beta}^{S}_{M}x^{S}_{M} + \hat{\beta}^{S}_{F}x^{S}_{F} + \hat{\beta}^{C}_{1}x^{C}_{1} + \ldots + \hat{\beta}^{C}_{8}x^{C}_{8} + \hat{\beta}^{N}_{1}x^{N}_{1} + \ldots + \hat{\beta}^{N}_{6}x^{N}_{6} + \hat{\beta}^{A}x^{A}\right)},\] where \(\mathbf{x}\) are the injury characteristics and demographics of interest. As an example, suppose the probability that an injury to a 45 year old male who fell and fractured a bone is in the workers’ compensation database is of interest. Then, \[\hat{\pi}(\mathbf{x}_{0}) = \frac{\exp\left[\hat{\alpha} + \hat{\beta}^{S}_{M} + \hat{\beta}^{C}_{2} + \hat{\beta}^{N}_{1} + \hat{\beta}^{A}(45)\right]}{1 + \exp\left[\hat{\alpha} + \hat{\beta}^{S}_{M} + \hat{\beta}^{C}_{2} + \hat{\beta}^{N}_{1} + \hat{\beta}^{A}(45)\right]},\] where \(x^{S}_{M} = 1\), \(x^{C}_{2} = 1\), \(x^{N}_{1} = 1\), \(x^{A} = 45\), and remaining \(x_{0}\)’s are zeo. The probability this injury will be located in Iowa’s trauma registry is simply \(1- \hat{\pi}(\mathbf{x}_{0})\). Now if one only has the age of a person, effects coding allows for “averaging over” the effects of sex, cause of injury, and nature of injury in the calculation of the predicted probability. The chance this person is in WC is \[\hat{\pi}(\mathbf{x}) = \frac{\exp\left(\hat{\alpha} + \hat{\beta}^{A}x^{A}\right)}{1 + \exp\left(\hat{\alpha} + \hat{\beta}^{A}x^{A}\right)}.\]