Starting in spring of 2020, I built some visualizations of Connecticut COVID-19 data, partly because I wasn’t happy with others that I’d seen. You’ll find a webpage with some of my graphics at davebraze.github.io/ct-covid19. A GitHub repo with R code used to make the graphs is at github.com/davebraze/ct-covid19. The data are (mostly) pulled from the state’s open data portal using the Socrata API and the RSocrata R package.
My main reason for making these graphs is that I was not satisfied with other visualizations of Connecticut’s COVID-19 data that were then available. There are links to some of those below. All are based on the same Connecticut DPH data as my own dataviz.
I am not even close to being an epidemiologist or infectious disease expert. If you’re looking for an expert take on anything to do with COVID-19, then you’re in the wrong place. That said, at the bottom of the web page mentioned above, you’ll find links to a few general Covid-19 resources that might be useful to you.
I started putting together graphs based on Connecticut Department of Public Health (DPH) data in late March 2020, and I’ve been putting graphs up on various social media platforms since the first week of April. You can follow me on twitter, which is where I’ll usually announce updates to this page (or just check back here every once in a while). The date at the top of the page will always indicate when the graph data was last updated.
Around the end of May, I got busy with other things and stopped updating this page until around mid-November. Covid activity in Connecticut had subsided after the spring surge and stayed pretty flat through the summer. But, by mid-September it seemed clear that we were in the beginnings of a second wave of infections. I finally got around to updating these graphs again around mid-November. Updates continued on weekdays until around mid-May 2021. At that point, the page went silent until end of October 2021. When the winter surge started ramping up in November, I once again began updating the page, but only got around to it about once a week. As I write, it is mid-February 2022 and we are on the back side of that winter surge.
Regardless, this web page is home to my most recent Connecticut Covid-19 graphs. Over the months (and years) of the pandemic, I have expanded the collection of dataviz, and I have added some narrative around the graphics. I have to admit that addition of the narrative was a bit piecemeal and result somewhat fragmented.
I’m still trying to figure out what comes next, but it’s reasonable for you to think of this web page as a “living document.” It’s going to change over time. And, yes, there will be typos.
Finally, at the bottom of this page, you’ll find a few general Covid-19 resources that might be useful to you.
I am a freelance consultant for hire. Most of my time in my day job is spent with research and data focused in the areas education, language, and literacy. Although, I work with data in other areas as well. You can find out more at my website and blog: davebraze.org.
This webpage, the graphs it contains, and the code used to generate them are copyright 2020-2022 by David Braze. The web page content and graphs are released under the Creative Commons v4.0 CC-by license. Cutting through the legalese, it means that you’re free to re-use these materials (do what you want with them) so long as you acknowledged who made them and where you got them.
The Connecticut Deptartment of Public Health provides its most comprehensive set of statistics for the state as a whole (no subdivision for counties or towns). As of April 1, there are four basic daily statistics available for Connecticut. These are: Number of Covid-19 tests completed; Number of confirmed cases of Covid19; Number of Covid-19 related Deaths; Number of Covid-19 Hospitalizations. The first three statistics are all cumulative counts. In other words, each represents the total number since record keeping began. So, those numbers can only go up (or stay the same) from one day to the next. On the other hand, Hospitalizations is the number of people in the hospital on a given day. That statistic can go up or down from one day to the next. For two of the statistics, confirmed cases and deaths, the state also gives the breakdown by Age, Gender, and Race. But, there are no cross-tabulations provided. Only three of those statistics are available at the county level, tests completed is not. Only confirmed cases and deaths are available for individual towns. Age, Gender, and Race breakdowns are not available for county or town data.
Another thing to keep in mind is that the numbers reported for any of these statistics is not perfect. DPH is trying to keep policy makers and the public informed in a timely way. But getting those numbers out as quickly as possible means that they are sometimes not exactly right. There has been some news coverage of Covid-19 reporting discrepancies in Connecticut in local news outlets.
The number of confirmed cases is a cumulative count of people who have tested positive for coronavirus since record keeping began (about March 1). But, most people haven’t been tested and there are certainly many more Covid-19 cases among Connecticut residents who have not been tested. Confirmed cases does not distinguish between people who are currently ill and people who have recovered.
The number of tests completed is the total, cumulative, number of Covid-19 tests that have been reported to the state. My understanding is that Connecticut is reporting the results of PCR-based tests that identify the virus’s genetic material in people who are currently infected, but can’t tell if they once had the virus and then recovered (they’ll test negative). This contrasts with an antibody test which can tell if a person ever had the virus, but can’t distinguish between people who are infected at time of test vs. infected as some point in the past. The PCR type is the more useful test for establishing the current rate of infection in the state. Also, up until May 3rd, the number of tests completed equaled the number of people tested, but since May 3rd some people have been tested more than once, so the number of tests completed is more than the number of people tested.
Test positivity is the ratio of positive tests to all tests reported in a specific time frame (usually 1 day). For example, daily test positivity is calculated by dividing the number of positive tests reported each day (Confirmed Cases) by the total number of tests reported on that day (Tests Completed).
The hospitalizations count is the number of people who are, on that day, hospitalized due to Covid-19. This is the only one of the 4 counts provided by Connecticut DPH that is not cumulative. Unlike some states, Connecticut does not break out how many of those are in ICU vs standard acute care (e.g., New York).
Hospitalization rate is important because it directly measures health outcomes of interest and is not subject to most of the measurement issues that arise with case counts and related variables (e.g., Test Positivity).
The cumulative deaths statistic is fairly self-explanatory. It is just the total number of people who have died as a result of, or while sick with Covid-19. It’s worth looking at the link above on Covid-19 reporting discrepancies in Connecticut. Additional reporting points out that, as of May 2020, Connecticut uses an antiquated process for compiling death certificates and reporting them to the CDC. So, there is a significant delay in the availability of Connecticut state mortality data, which means that it is not possible to have a real-time view of the increase this year over previous years for “death from all causes.” Once available, that statistic will likely give a more accurate measure of lives lost due to Covid-19 than other indicators.
The dates associated with three of these variables (tests completed, cases, deaths) is the date that they are reported, not the date that they actually occur. This is important for understanding why the numbers fluctuate in the way that they do. For example, from July onward there is a clear weekly cycle in the numbers. This is because beginning on July 1, the state does not report numbers on weekends or holidays. Numbers for those days are effectively zero. So from July forward, at the beginning of each week there is a ‘catch-up’ day and the numbers reported on that day include weekend (and possibly holiday) numbers.
Another driver of idiosyncrasy in these data is that every once in a while the state dumps a bunch of back-logged results into their data file all at once. This happened, for example on January 18 2021, resulting in an apparent huge spike in reported tests for that day. In fact, those tests are the result of a back-log that had been accumulating for quite a while. Just how long is not entirely clear to me. Other instances of data dumps to catch up with reporting back-logs are revealed in the two large spikes in deaths that we see in April.
The map files used in Figures 7 and 8 were downloaded from the Map and Geographic Information Center at the UConn Library.
Population data for Connecticut towns was scraped from the Wikipedia page List of Towns in Connecticut. Town population data corresponds to 2010 US census counts.
Case rates for other states used in Figure 3 is downloaded from the New York Times repository of Covid-19 data on GitHub.
While, CT DPH reports cumulative counts, these can be misleading numbers. I’ve plotted them just the same in Figure 1. The problem with this kind of graph is that it makes it hard to see how much the numbers are changing from one day to the next. It’s easy to see the difference between cumulative and non-cumulative statistics. Cumulative counts (Cases, Deaths, Tests) can never go down, but a non-cumulative count (Hospitalizations) can decrease over time. For these I’ve stopped updating this figure, showing cumulative counts, as of the end of May. From the start of the pandemic throught June, DPH reported statistics 7 days a week. Beginning in July, they no longer report statistics on weekends or holidays, so stats for those days are reported on the next business day. I think Figure 2 gives a clearer view of day to day change.
One way to visually emphasize day to day change in the numbers is to start with the DPH’s cumulative counts (and daily count, in the case of Hospitalizations), and subtract the previous day’s count from each number. I’m showing these day-to-day differences in Figure 2. These difference values can look a little ‘spiky’. That is mostly due to to the fact that, starting in July, each Monday’s reports include counts for the preceeding weekend as well. Think of it as a sort of catch up day. In order to smooth that out a bit, I’ve taken seven day running averages of the reported numbers and added them as dashed lines in Figure 2. Averaging in this way makes longer term trends easier to spot. It’s important to keep in mind that Figure 1 and Figure 2 are based on the same underlying data. They just emphasize different aspects of it.
Another difference between the two graphs is that Figure 2 includes a new variable, Test Positivity. It is calculated as the number of Tests Reported on a given day (top panel of Figure 2), divided by the number of Cases diagnosed on that same day (second panel of the same figure). Test Positivity is the percentage of tests that came back positive on a given day.
The most recent day’s values, both the reported value and the seven day average, are labeled on the right-hand side of Figure 2. These can be compared to numbers in the table labelled “Connecticut COVID-19 Summary” at the top of the Connecticut DPH page here. Keep in mind that the state page seems to be updated every weekday (but not weekends). My page, the one you’re looking at now, is updated less frequently. The date at the top of this page indicates when it was last updated.
It’s also interesting to look at how Connecticut is doing relative to other states. Figure 3 shows the growth in cumulative Covid-19 cases for all 50 states. Connecticut, shown in blue, was consistently between 9th and 12th place for number of cases in the early stage of the pandemic, well into June. So, Connecticut was really outperforming, not in a good way , considering it is only the 29th most populous state.
Starting in summer, Connecticut’s response to the pandemic had the effect of reducing spread of the virus, while at the same time COVID started spreading out of control in states that were less effected during the spring. Sadly, starting in November, COVID cases started increasing all across the country.
In the following figures, I try to show how the pandemic is effecting individual towns across Connecticut. Unfortunately, detailed statistics are not available for each town, and we have to rely mostly on covid case counts, which the state does publish for each town. It should be no surprise that bigger towns tend to have more cases both across the state as a whole and within individual counties, as shown in Figures 4 and 5.
All other things being equal, you would expect larger towns to have more cases than smaller towns, and that comes out pretty clearly in Figures 4 and 5.
Figure 6 divides Connecticut’s towns up by size, grouping towns of similar size together. This let’s us see how each town is doing relative to other towns of similar size. What’s more, instead of showing the crude number of cases for each town, it shows the number of cases per 10,000 population. This graph makes clear that larger towns do tend to have more cases than smaller towns, even accounting for differences in population.
The map in Figure 7 shows the total cumulative number of COVID19 cases for every town in Connecticut over the entire span of the pandemic to date. So, the numbers are an indication of the total Covid-19 load in a town over the duration of the pandemic. The numbers shown for each town in the map correspond to the endpoints of the lines in Figures 4, 5.
Figure 8 shows the average Covid-19 Test Positivity for the most recent 10 day period with available data. In contrast to the map in Figure 7, Figure 8 gives an indication of where the current covid hotspots are in the state. Under ideal circumstances, Test Positivity is an estimate of the percentage of a town’s population that is infected with Covid-19 when the Test sample is collected. In some circumstances, Test Positivity may not be a good estimate of prevalence. For example, if not enough testing is being done in an area, then TP will not give a reliable estimage of COVID-19 prevalence.
Figure 8 is now interactive. Hover over the positivity number for any town to see more details. You can also zoom by clicking and dragging to select the area you want to zoom. Undo the zoom by double clicking anywhere on the map. None of this is very usable on small screens (phones!); it’s best viewed on a larger tablet or computer. (Tests/10k/day is the average number of tests performed in a town each day, for each 10,000 population.)
In Figure 9, you have a chart of the reported number of Covid19 cases among Connecticut school children in the the 2020-21 school year, as provided by the state. Numbers are broken down by ‘learning model’, that is, whether the reported cases indicate children engaged in remote learning, fully in-person learning, or some hybrid model.
These numbers are pretty much completely completely useless! The reason is that no baselines are provided by the state. We are not told how many students were tested in each week; we are not told how many students are in each learning model; we are not told the proportion of students in each learning model that were tested each week.
Because of these deficiencies, comparisons across groups are impossible, as are comparisons over time. It is impossible to draw any conclusions about differences in transmission rates between learning models, and it is impossible to draw any conclusions as to change over time in Covid19 prevalance among students. Don’t try to interpret these numbers. They are meaningless.
So, why does the lack of baseline matter so much? In order to make comparisons across groups (or within the same group over time), we need to divide the number of positive cases by the number actually tested in each group (or within the same group at different points in time). Without accurate baseline/denominator values there is no way to determine the percentage of a group that is effected by Covid, and no way to make valid comparisons across groups.
There is a nice blog post by Erin Bromage, PhD in microbiology and immunology, detailing what we knew in early May about how person-to-person transmission of Covid-19 actually happens. It includes specific information on what the experts know about which situations are more risky, and which ones are less risky: https://www.erinbromage.com/post/the-risks-know-them-avoid-them. This is a great article for people who are worried about how to manage risk for themselves and their loved ones as governments start to scale back on Covid-19 restrictions on movement and business activities.
Bromage got it mostly right, but we have learned some things about COVID transmission since then. The table below is taken from a research summary of what is known about COVID transmission routes as of October 22, 2020. You can get the source document here.
Keeping people at home and cutting back on public activities, including some kinds of business activities, was always intended as a temporary measure. The point of it is to keep the healthcare system from being overloaded and to allow time to build capacity for managing the spread of Covid-19. The question has always been how and when to re-open. It was never about whether or not to re-open at all. So, how do we move beyond the emergency situation with some businesses closed and many people sheltering in place?
Several credible road maps have been proposed for how and when to re-open in a responsible way. Here are three of them:
What they all have in common are specific milestones to use for deciding when and how to phase in re-opening across different parts of the country, after the spread of Covid-19 has been brought under control.
On May 18th, Governor Lamont issued executive order 7PP, which introduced rules for a phased re-opening of activity within the state, by business sector. Those rules are available here: Sector Rules for Re-opening. As far as I can tell, the rules don’t differentiate at all by region, even though some parts of the state (e.g., some counties) are clearly more impacted by Covid-19 than others.
All summaries and analyses in this report were carried out using the R statistical environment, version 4.1.0. The report itself was produced using an Rmarkdown workflow. The specific code can be found on GitHub at github.com/davebraze/ct-covid19. The following table lists the R packages, beyond base R, used in building this report.
package | version | date |
---|---|---|
cowplot | 1.1.1 | 2020-12-30 |
dplyr | 1.0.9 | 2022-04-28 |
FDButils | 0.0.10 | 2022-01-29 |
forcats | 0.5.1 | 2021-01-27 |
fs | 1.5.2 | 2021-12-08 |
ggplot2 | 3.3.6 | 2022-05-03 |
ggpmisc | 0.4.7 | 2022-06-15 |
ggpp | 0.4.4 | 2022-04-10 |
ggrepel | 0.9.1 | 2021-01-15 |
here | 1.0.1 | 2020-12-13 |
htmlwidgets | 1.5.4 | 2021-09-08 |
httr | 1.4.3 | 2022-05-04 |
kableExtra | 1.3.4 | 2021-02-20 |
lubridate | 1.8.0 | 2021-10-07 |
plotly | 4.10.0 | 2021-10-09 |
RCurl | 1.98.1.7 | 2022-06-09 |
readr | 2.1.2 | 2022-01-30 |
RSocrata | 1.7.11.2 | 2021-09-14 |
sf | 1.0.7 | 2022-03-07 |
stringr | 1.4.0 | 2019-02-10 |
tabulizer | 0.2.2 | 2018-06-07 |
wordstonumbers | 1.0.1 | 2020-11-13 |
XML | 3.99.0.10 | 2022-06-09 |