A Spatio-Temporal Investigation of the Growth Rate of COVID-19 Incidents in Ohio, USA, Early in the Pandemic

Understanding the initial growth rate of an epidemic is important for epidemiologists and policy makers as it can impact their mitigation strategies such as school closures, quarantines, or social distancing. Because the transmission rate depends on the contact rate of the susceptible population with infected individuals, similar growth rates might be experienced in nearby geographical areas. This research determined the growth rate of cases and deaths associated with COVID-19 in the early period of the 2020 pandemic in Ohio, United States. The evolution of cases and deaths was modeled through a Besag-York-Molliè model with linearand power-type deterministic time dependence. The analysis showed that the growth rate of the time component of the model was subexponential in both cases and deaths once the time-lag across counties of the appearance of the first COVID-19 case was considered. Moreover, deaths in the northeast counties in Ohio were strongly related to the deaths in nearby counties. Publication Date: July 2021 https://doi.org/10.18061/ojs.v121i2.8059 OHIO J SCI 121(2):33-47 Having an understanding of the spread of COVID-19 early in the pandemic is important for multiple reasons. A fast spread can put a strain on the healthcare system, especially hospitals which may be forced to work either at or beyond their capacity (as they were in the first months of the pandemic (Miller et al. 2020)). Shortages of medical and protective equipment, combined with government mandated shutdowns of the economy, can occur in such times of high healthcare burden (Miller et al. 2020; Selvitella and Foster 2020; Foster and Selvitella 2021). Having information on the spatial and temporal characteristics of the transmission of COVID-19 during the first stages of the pandemic was of assistance to policy makers (who strive for a more effective allocation of limited resources) and helped to predict the dynamics of early stages of subsequent waves or novel mutations. The interest of this study lies in understanding the spatio-temporal diffusion of COVID-19, specifically understanding the spread of COVID-19 across counties in Ohio. A Besag-York-Molliè (BYM) model (Besag et al. 1991; Blangiardo and Cameletti 2015) was developed from the cases and deaths in Ohio for the first 3 weeks following the initial 1 Address correspondence to Alessandro Maria Selvitella, Department of Mathematical Sciences, Purdue University Fort Wayne, 267 Kettler Hall, 2101 E. Coliseum Blvd., Fort Wayne, IN 46805, USA. Email: aselvite@pfw.edu © 2021 Selvitella et al. This article is published under a Creative Commons Attribution 4.0 International License ( https://creativecommons.org/licenses/by/4.0/) 34 VOL. 121(2) SPATIO-TEMPORAL MODEL OF COVID-19 IN OHIO COVID-19 case in each county. The first county in which the virus was detected was Cuyahoga on 9 March 2020; the last day considered in the study was 5 May 2020 (in Harrison County). This model provides insight to the following questions: First, to what extent does spatial correlation influence the growth of COVID-19 cases and deaths in Ohio counties? Second, was the initial growth of cases in Ohio subexponential, exponential, or superexponential (Tolle 2003)? This second question is important for both epidemiologists and policymakers. Indeed, an important factor to determine when a pandemic begins is the initial growth rate, which in turn gives information to the most important epidemiological parameter, the basic reproduction number —the expected number of secondary cases caused by a single infected individual in a fully susceptible population (Diekmann et al. 2013; Brauer et al. 2019). In the first weeks of pandemics, the main component of the growth curve of fatalities is generally of exponential type, but in some cases it has been shown to have a subexponential (Chowell et al. 2016) or superexponential (Tolle 2003) behavior. The paper is completed with: Appendix A providing relevant geographical information about Ohio, Appendix B illustrating the R-code developed, and Appendix C containing the estimates of the relevant parameters of the models developed (in particular those used for the Results and Discussion Section). METHODS Dataset and Software The dataset analyzed (CDC 2020) contained the number of cases and deaths associated with COVID-19 in the first 3 weeks of the pandemic, beginning with the first day COVID-19 appeared in each of the 88 counties in Ohio. Note that the first case did not appear on the same day in every county. For data analytic purposes, 3 supplemental spreadsheets titled "Supplemental Table 1," "Supplemental Table 2," and "Supplemental Table 3" are available at: http://hdl.handle.net/1811/92834 . The statistical analysis utilized the R-INLA statistical package (Rue et al. 2009; R-INLA 2020) available for R statistical software (R Core Team 2020). R-INLA is a package that performs approximate Bayesian inference for latent Gaussian models such as those considered in this manuscript. Spatio-Temporal Model Suppose yit represents the number of cases or deaths at a discrete time t with t = 1,...,T with T the time extension of this analysis. The index i = 1,...,n identifies the county in Ohio with n the total number of counties in Ohio. It is assumed that yit ∼ Poiss(λit), namely the case where the cases and deaths are distributed with a Poisson random variable with parameter λit is considered. Note that this assumption implies that the cases and deaths were equidispersed (McCullagh and Nelder 1989). The parameter λit is assumed to be the product λit = Ei ρit . While ρit represents the rate of appearance of new cases and deaths at time t in area (county) i, the expected number of cases or deaths in area i is defined as follows. Suppose the population under consideration is partitioned into J classes and rj is the standardized reference rate for class j = 1,...,J and Pij is the population count of class j in county i. Then the expected number of cases or deaths in area i is given by: . A link function ηit = log (ρit) is used to relate the cases or deaths to their spatio-temporal covariates in the following way: The intercept b0 represents the average outcome rate in the entire study region (i.e., Ohio in this case); the covariate vi represents the area-specific effect and it is modeled as exchangeable, while to ui is given an autoregressive structure (Besag 1974; Blangiardo and Cameletti 2015). More specifically, suppose that N(i) represents the set of neighborhoods of area i, for i = 1,...,n and that u−i represents the area-specific effects excluding county i. Then, the conditional distribution ui | u−i is given by:


INTRODUCTION
In December 2019, a new respiratory virus was first identified in a human patient in China (CDC 2020;WHO 2020). That virus was subsequently classified as SARS-CoV-2 or severe acute respiratory syndrome coronavirus 2 and is responsible for the disease now commonly referred to as COVID-19 that became a global pandemic in 2020 (WHO 2020).
A major concern with any virus is how fast it spreads, and in particular the growth rate of cases and deaths. Growth rates are especially important in the initial stages of a pandemic, when they are usually exponential (Diekmann et al. 2013;Chowell et al. 2016;Brauer et al. 2019;Hilton and Keeling 2020). Because the transmission rate depends on the contact rate of the susceptible population with infected individuals, similar growth rates might be experienced in nearby geographical areas. Geographical regions might have regionspecific factors which affect the growth rate, such as population density, mobility of the population, access to healthcare, and a number of other factors (Selvitella and Foster 2020;Foster and Selvitella 2021).
Having an understanding of the spread of COVID-19 early in the pandemic is important for multiple reasons. A fast spread can put a strain on the healthcare system, especially hospitals which may be forced to work either at or beyond their capacity (as they were in the first months of the pandemic (Miller et al. 2020)). Shortages of medical and protective equipment, combined with government mandated shutdowns of the economy, can occur in such times of high healthcare burden (Miller et al. 2020;Selvitella and Foster 2020;Foster and Selvitella 2021). Having information on the spatial and temporal characteristics of the transmission of COVID-19 during the first stages of the pandemic was of assistance to policy makers (who strive for a more effective allocation of limited resources) and helped to predict the dynamics of early stages of subsequent waves or novel mutations.
The interest of this study lies in understanding the spatio-temporal diffusion of COVID-19, specifically understanding the spread of COVID-19 across counties in Ohio. A Besag-York-Molliè (BYM) model (Besag et al. 1991;Blangiardo and Cameletti 2015) was developed from the cases and deaths in Ohio for the first 3 weeks following the initial VOL. 121(2) SPATIO-TEMPORAL MODEL OF COVID-19 IN OHIO COVID-19 case in each county. The first county in which the virus was detected was Cuyahoga on 9 March 2020; the last day considered in the study was 5 May 2020 (in Harrison County).
This model provides insight to the following questions: First, to what extent does spatial correlation influence the growth of COVID-19 cases and deaths in Ohio counties? Second, was the initial growth of cases in Ohio subexponential, exponential, or superexponential (Tolle 2003)? This second question is important for both epidemiologists and policymakers. Indeed, an important factor to determine when a pandemic begins is the initial growth rate, which in turn gives information to the most important epidemiological parameter, the basic reproduction number -the expected number of secondary cases caused by a single infected individual in a fully susceptible population (Diekmann et al. 2013;Brauer et al. 2019). In the first weeks of pandemics, the main component of the growth curve of fatalities is generally of exponential type, but in some cases it has been shown to have a subexponential (Chowell et al. 2016) or superexponential (Tolle 2003) behavior.
The paper is completed with: Appendix A providing relevant geographical information about Ohio, Appendix B illustrating the R-code developed, and Appendix C containing the estimates of the relevant parameters of the models developed (in particular those used for the Results and Discussion Section).

Dataset and Software
The dataset analyzed (CDC 2020) contained the number of cases and deaths associated with COVID-19 in the first 3 weeks of the pandemic, beginning with the first day COVID-19 appeared in each of the 88 counties in Ohio. Note that the first case did not appear on the same day in every county. For data analytic purposes, 3 supplemental spreadsheets titled "Supplemental Table 1," "Supplemental Table 2," and "Supplemental Table 3" are available at: http://hdl.handle.net/1811/92834 .
The statistical analysis utilized the R-INLA statistical package (Rue et al. 2009;R-INLA 2020) available for R statistical software (R Core Team 2020). R-INLA is a package that performs approximate Bayesian inference for latent Gaussian models such as those considered in this manuscript.

Spatio-Temporal Model
Suppose y it represents the number of cases or deaths at a discrete time t with t = 1,…,T with T the time extension of this analysis. The index i = 1,…,n identifies the county in Ohio with n the total number of counties in Ohio. It is assumed that y it ∼ Poiss(λ it ), namely the case where the cases and deaths are distributed with a Poisson random variable with parameter λ it is considered. Note that this assumption implies that the cases and deaths were equidispersed (McCullagh and Nelder 1989). The parameter λ it is assumed to be the product λ it = E i ρ it . While ρ it represents the rate of appearance of new cases and deaths at time t in area (county) i, the expected number of cases or deaths in area i is defined as follows. Suppose the population under consideration is partitioned into J classes and r j is the standardized reference rate for class j = 1,…,J and P ij is the population count of class j in county i. Then the expected number of cases or deaths in area i is given by: .
A link function η it = log (ρ it ) is used to relate the cases or deaths to their spatio-temporal covariates in the following way: The intercept b 0 represents the average outcome rate in the entire study region (i.e., Ohio in this case); the covariate v i represents the area-specific effect and it is modeled as exchangeable, while to u i is given an autoregressive structure (Besag 1974;Blangiardo and Cameletti 2015). More specifically, suppose that N(i) represents the set of neighborhoods of area i, for i = 1,…,n and that u −i represents the area-specific effects excluding county i. Then, the conditional distribution u i | u −i is given by: with where µ i represents the mean for area i, s i 2 = σ u 2 /N i is the variance of area i, and N i = | N(i) | is the number of neighborhoods of area i. In this formulation, σ u 2 represents the amount of variation between the spatially-structured random effects, and r ik is the indicator of spatial proximity between areas i and k which is defined as r ik = φW ik with W ik = a ik /N i and a ik = 1 if i and k are neighbors and 0 otherwise. The factor φ > 0 is introduced to control the properness of the probability distribution and needs to ensure the positive definiteness of the variance-covariance matrix (I − φW ) S 2 , with I the n×n identity matrix, W = {W ik } n i,k = 1 , and S 2 = diag{s 1 2 ,...,s n 2 } (Cressie 1993). With these assumptions, the proper conditional autoregressive (CAR) specification u is a multivariate normal random variable, defined as follows: with µ={µ 1 ,…,µ n } the mean vector. This structure implies that the correlation between areas i and j can be computed using the following formula (Blangiardo and Cameletti 2015): It is further assumed that φ = 1 and , a specification called intrinsic conditional autoregressive which together with the exchangeable random effect gives rise to the so-called Besag-York-Molliè model (BYM) (Besag et al. 1991;Blangiardo and Cameletti 2015). By assuming that µ i = 0 for every i = 1,…,n, then (Refer to Besag et al. 1991;Best et al. 2005;Lee 2011; and Lawson 2018 for further details.) Concerning the temporal component Φ it , a parametric trend is considered (as presented in Bernardinelli et al. 1995) which assumes for t = 1,…,T and i = 1,…,n, but with p to be determined and not necessarily p = 1. Here, β represents the main trend, while δ i represents the differential trends and the interaction between space and time components with the identifiability assumption .

Deviance Information Criterion
The deviance information criterion (DIC) is a measure of model fitting typically used for Bayesian models (Spiegelhalter et al. 2002) which includes a trade-off between goodness of fit and model complexity.
Given a random variable Y, whose distribution depends on a set of parameters θ, the DIC is given by the following formula: In this formula, p D represents the effective number of parameters, which is computed as with Here, D(θ) := −2log (p(y|θ)) represents the deviance and D the posterior expectation of the deviance. The DIC can be equivalently rewritten as in which case it resembles the common expression of the Akaike information criterion (AIC), of which the DIC is a generalization (James et al. 2013).

Statistical Analysis
The statistical analysis is divided into 2 parts. In both cases, homogeneity in the population is assumed, namely that E i = 1 for every i = 1,…,n.
In the first part of the study, the cases and deaths were modeled using the linear ( p = 1) BYM and information about the spatial correlation and temporal differential effect emerging from the assumptions was extracted. It is worth noticing that, in these models, the time index is translated of a county-dependent factor t i , i = 1,…, n so that the analysis for each county starts the day in which the first case appeared in that county. This procedure was performed before modeling.
These results are visualized by plotting both the posterior mean of the spatial main effect ζ i = exp(u i + v i ) and the differential effect δ i for i = 1,…,n of the cases and deaths models with linear growth rate p = 1. The thresholds which determine the classes (colors) in the plots are based on the 0, 0.25, 0.5, 0.75, 1-quantiles of the empirical distributions of the corresponding parameter for all Ohio counties.
In the second part of the study, the best models were determined according to the DIC among the family of BYM with a generic nonlinear term p ∈ [0,2], with the search performed over steps of length 0.1. The search for an optimal model was performed independently and for both outcome variables (number of cases and number of deaths). Fig. 1 and Fig. 2 represent the distribution of spatial and temporal effects, respectively, on the COVID-19 cases by county in Ohio for the linear model p = 1.

Spatial Main Effect and Temporal Differential Effect in the Linear Model
The spatial effect is given by the posterior mean of the spatial main effect ζ i = exp(u i + v i ) for i = 1,…, n. For the full output, see Appendix C. The counties with the darkest color (red) are the counties where the spatial effect ζ i is stronger (Fig. 1). Especially in northeast Ohio and southwest Ohio, the model supports the presence of spatial correlation between counties.
The temporal effect is given by the differential effect δ i for i = 1,…, n and represents the extra growth rate of county i with respect to the Ohio growth rate. For the full output, see Appendix C. The counties with the darkest color (red) are the counties where the differential temporal effect δ i is stronger (Fig. 2). Again, in northeast and southwest Ohio, the model mildly supports the presence of the temporal differential effect. Fig. 3 and Fig. 4 represent the distribution of spatial and temporal effects, respectively, on the COVID-19 related deaths by county in Ohio for the linear model p = 1.  Fig. 3, the spatial effect is given by the posterior mean of the spatial main effect ζ i = exp(u i + v i ) for i = 1,…, n. For the full output, see Appendix C. Also, the counties with the darkest color (red) are the counties where the spatial effect ζ i is stronger. The spatial effect in this case is striking and shows a strong spatial correlation of the diffusion of COVID-19 in the first 3 weeks of the pandemic. Northeast Ohio, in particular, is the area where the posterior mean of the spatial main effect is the strongest, with gradual decrease in the radial direction from northeast to southwest.
In Fig. 4, the temporal effect is given by the differential effect δ i for i = 1,…, n and represents the extra growth rate of county i with respect to the Ohio growth rate. For the full output, see Appendix C. The counties with the darkest color (red) are the counties where the differential temporal effect δ i is stronger. Similar to the model for the cases, in northeast Ohio and southwest Ohio, the model mildly supports the presence of temporal differential effect.
It is important to remember that the 21 days considered for each county are not temporally aligned, as the 21 days are considered from the day the first case of COVID-19 was reported in that county. This means that the temporal component due to the lag between the first case in one county and the first in another county, which is substantial, has already been taken into account in both models (number of both cases and deaths). This is in agreement with the milder effect of the temporal component in the models. Nevertheless, this analysis shows that even quotienting out the natural differential time-lag mentioned above, there is still some residual temporal effect distinguishable across counties.

Optimal Growth
This statistical analysis shows that the best model according to the DIC has sublinear growth both in the models where the outcome variables are the number of cases and those where the outcome variables are the number of deaths.
In particular, the model for the cases has optimal time growth when the exponent of the time nonlinearity is p = 0.3 (Table 1) .
The model for the deaths has optimal time growth when the exponent of the time nonlinearity is p = 0.7 (Table 2).
Qualitatively, this indicates that the residual time component-once the differential time-lag of the appearance of the first case across counties had been accounted-had a sublinear growth. Understanding the exact growth rate of the number of cases and deaths has crucial implications for public health and can guide governmental policy decisions. Note that the analysis performed in this study includes a single family of models, so it cannot be used alone to guide a governmental decision on such an important issue.

Conclusions
This research studied the growth rate of the incidents (both cases and deaths) associated with COVID-19, in Ohio, during the 21 days after the first case appeared in each county. The evolution of the cases and deaths was modeled through a Besag-York-Molliè model with linearand power-type time dependence. This model showed that the spatial effect on the deaths was substantial throughout Ohio, while the residual differential effect of the time component was mild, but present, in every county. In particular, the northeast counties in Ohio exhibited the strongest spatial effect, especially in the model of deaths related to COVID-19. Furthermore, this analysis showed that the growth rate of cases and deaths in Ohio were subexponential in the early stages of the pandemic-once the natural lag between the appearance of the first case per county was taken into consideration.
As more data become available, future analysis will be dedicated to comparing the initial growth rate to the growth rate of cases and deaths at the beginning of subsequent waves of the pandemic and to determine the possibility of seasonal components. The initial growth rate of an epidemic relates to the basic reproduction number and so the former is of great importance to policy makers as it can influence governmental decisions and prompt mitigation strategies.

APPENDICES Appendix A: Ohio Geography
Editor's Note: The authors assert, and the reviewers acknowledge, that the discrepancy between county names in the map and the county names read internally to the R-code below did not affect the overall analysis. Specifically, the names in the map are correct, but the names of the columns in the R-code below include the county name "Unknown" and omit Wyandot County. This inconsistency is derived from the original Excel file downloaded from the CDC that is posted here ( http://hdl.handle.net/1811/92834 ) and whose column names were not modified during the analysis and kept as read by the command "Import Dataset" in the R-software.
For reference, a map of Ohio with the names of all 88 counties is provided. This can be used to identify the specific county from the plot reported in previous sections and specific parameters as reported in Appendix C below.