Web Application User Guide

Introduction

When the web application is launched from Improving agricultural injury surveillance: comparing injuries captured Iowa’s Workers’ Compensation and trauma registry data, one will be initially directed to a couple of user guides. The first guide, titled Web Application User Guide is this current page, provides a quick overview of the modeling framework and a detailed description of the inner workings of the application. Navigate to the Modeling Framework tab for see a more detailed description of the model construction and the calculations of the predicted probabilities. Access to the web application can be found in the Web Application tab. This application loads a Posit Shiny application that calculates the predicted probabilities of being in Iowa’s Workers’ Compensation database and Iowa’s Trauma Registry. Remaining information in this guides provides details on how the web application works. First, a quick overview of the model used to calculate the predicted probabilities is provided to enhance the understanding of the application basics.

Model overview

Application launch

The Web Application tab launches the application to calculate the predicted probabilities of being in Iowa’s Workers’ Compensation database and Trauma Registry. The application has two panels. The left panel receives the input from the user, while the right panel displays the results based on the information given. For this application, predicted probabilities are calculated based on sex, age, and cause and nature of the injury. The starting values for sex, cause of injury, and nature of the injury are Average of effect. This value is a by-product of effects coding in the modeling framework, allowing probabilities to be calculated while averaging all the possible values for the predictors. So, at any time, if one wants to obtain a predicted probability or a set of predicted probabilities while considering an average effect of a categorical variable, then select the Average of effect option for the variable of interest. Initially, the age of a person’s is excluded in the probability calculation.

Application Useage

User inputs

The left panel of the application takes a select set of inputs from the user. There is a drop-down menu of all possible choices for the categorical variables (sex, cause of injury, and nature of injury). For the purpose of exploring the user inputs, consider the following example. Suppose one is interested in finding the predicted probability that a male who fell and fractured his arm is in the workers’ compensation database. Then, all one would need to do is select Male in the Sex drop-down menu, Falls in the Cause of Injury drop-down menu, and Fractures/dislocation/sprains/strains in the Nature of Injury drop-down menu. Age is the only quantitative variable in the model and application. For specifying age, there are two age sliders titled Minimum and Maximum Age and a checkbox denoting whether age should be included in the probability calculation. If age is of interest, then there are several points to consider. First, the Include age in calculation of predicted probabilities checkbox must be selected. Second, the minimum and maximum age must be set to some value within the range of possible ages. By default, the minimum and maximum ages are set to 45, the mean and median age of the combined data sources. Suppose the two age sliders are assigned to different values, and the checkbox indicating whether multiple predicted probabilities should be calculated is not marked. In that case, the age used is the average of the minimum and maximum age selected. Next, to obtain a range of predicted probabilities over various ages, the Calculate multiple predicted probabilities checkbox must be selected, and the minimum and maximum ages must differ. The application will flip the two ages assigned in the background if the minimum age exceeds the maximum. Combining all the above steps for specifying injury characteristics and demographics will provide predicted probabilities specific to those characteristics of interest.

Results

Regardless of the injury characteristics and demographics specified in the user input panel, the results section (right panel) will have the same general setup. There will be columns denoting the data source of the injury case (Source), sex (Sex), cause of injury (Cause), nature of injury (Nature), age (Age), predicted probability (Probability), and \(95\%\) Wald confidence interval (LB and UB). By default, sex, age, cause, and nature of the injury will have empty fields. This occurs because the categorical variables have initial values of Average of effect, and age is not denoted to be included in the calculation. Probabilities for Iowa’s Workers’ Compensation database (Workers' compensation) and Trauma Registry (Trauma registry) are displayed for each probability calculation. As injury characteristics and demographics are defined, their associated columns in the results section will be filled in, and the probabilities (predicted and confidence interval) will be updated.

On-device useage of web application

Currently, the web application does not allow the user to download the predicted probabilities calculated. Therefore, to obtain the probabilities for external use, installation of the agstudy1app,https://github.com/deboonstra/agstudy1app, package is required. Then, simply call the agstudy1app::pred_prob() function with the injury characteristics of interest. If one choose to run the web application on their local machine, they will need to git clone the agstudy1app repository to their desired location. Then, in an interactive session of R, where agstudy1app is the current working directory, execute the following command.

agstudy1app::launch()

Detailed Overview of the Modeling Framework used in GPACH Surveillance of Agriculture Injuries in Iowa

Authors: Joseph E. Cavanaugh and D. Erik Boonstra

Model overview

We developed a logistic regression model to probabilistically predict which data source captured an injured person, given their injury characteristics and demographics. The predictive model used explanatory variables found in both data sources: age, gender, injury cause, and injury nature, and modeled the event that an injured person’s information was recorded in the Workers’ Compensation (WC) data. For categorical predictors (gender, injury cause, and injury nature), we used effects coding instead of reference coding, so that the average effect would be zero. To demonstrate how effect parameterization works, consider the variable sex, which has two levels (male and female). Different from reference coding, effects coding uses two parameters \(\left(\beta^{S}_{M} \text{ and }\beta^{S}_{F}\right)\) to characterize the effect of sex. What allows one to “average over” the sex effect is that effect parameterization assumes that \[\beta^{S}_{M} + \beta^{S}_{F} = 0.\] As seen in the accompanied paper, Improving agricultural injury surveillance: comparing injuries captured Iowa’s Workers’ Compensation and trauma registry data, to this web application the estimated sex effects are 0.3959 for females and -0.3959 for males, which sums to zero, as one would hope. The logistic regression model implemented this parameterization for the other categorical predictors (cause and nature of the injury). Cause of injury has a total of 8 levels and nature of injury has 6 levels which means \[ \begin{aligned} \sum_{j = 1}^{8}\beta^{C}_{j} &= 0 \text{ and} \\ \sum_{k = 1}^{6}\beta^{N}_{k} &= 0, \end{aligned} \] just as shown with the predictor sex. For more information on the specific levels on the cause and nature injury of variables see the R documentation for ?agstudy1app::iwc_itr_only.

Computation of predicted probabilities

Based on the predictive model, we estimated the probabilities of an agricultural injury being captured in either the trauma registry or workers’ compensation repositories. Using the above notation: let \(\hat{\alpha}\) denote the estimate of the model intercept term, \(\hat{\beta}^{A}\) denote the estimate for age effect, and \(\hat{\cdot}\) denote the estimates of \(\beta^{S}_{i}, \beta^{C}_{j}, \text{ and } \beta^{N}_{k}\). Then, the probability that an injury case is predicted to be in Iowa’s statewide workers’ compensation database is \[\hat{\pi}(\mathbf{x}) = \frac{\exp\left(\hat{\alpha} + \hat{\beta}^{S}_{M}x^{S}_{M} + \hat{\beta}^{S}_{F}x^{S}_{F} + \hat{\beta}^{C}_{1}x^{C}_{1} + \ldots + \hat{\beta}^{C}_{8}x^{C}_{8} + \hat{\beta}^{N}_{1}x^{N}_{1} + \ldots + \hat{\beta}^{N}_{6}x^{N}_{6} + \hat{\beta}^{A}x^{A}\right)}{1 + \exp\left(\hat{\alpha} + \hat{\beta}^{S}_{M}x^{S}_{M} + \hat{\beta}^{S}_{F}x^{S}_{F} + \hat{\beta}^{C}_{1}x^{C}_{1} + \ldots + \hat{\beta}^{C}_{8}x^{C}_{8} + \hat{\beta}^{N}_{1}x^{N}_{1} + \ldots + \hat{\beta}^{N}_{6}x^{N}_{6} + \hat{\beta}^{A}x^{A}\right)},\] where \(\mathbf{x}\) are the injury characteristics and demographics of interest. As an example, suppose the probability that an injury to a 45 year old male who fell and fractured a bone is in the workers’ compensation database is of interest. Then, \[\hat{\pi}(\mathbf{x}_{0}) = \frac{\exp\left[\hat{\alpha} + \hat{\beta}^{S}_{M} + \hat{\beta}^{C}_{2} + \hat{\beta}^{N}_{1} + \hat{\beta}^{A}(45)\right]}{1 + \exp\left[\hat{\alpha} + \hat{\beta}^{S}_{M} + \hat{\beta}^{C}_{2} + \hat{\beta}^{N}_{1} + \hat{\beta}^{A}(45)\right]},\] where \(x^{S}_{M} = 1\), \(x^{C}_{2} = 1\), \(x^{N}_{1} = 1\), \(x^{A} = 45\), and remaining \(x_{0}\)’s are zeo. The probability this injury will be located in Iowa’s trauma registry is simply \(1- \hat{\pi}(\mathbf{x}_{0})\). Now if one only has the age of a person, effects coding allows for “averaging over” the effects of sex, cause of injury, and nature of injury in the calculation of the predicted probability. The chance this person is in WC is \[\hat{\pi}(\mathbf{x}) = \frac{\exp\left(\hat{\alpha} + \hat{\beta}^{A}x^{A}\right)}{1 + \exp\left(\hat{\alpha} + \hat{\beta}^{A}x^{A}\right)}.\]

Predicted Probabilities for GPACH Surveillance of Agriculture Injuries in Iowa