McMaster research team digitizes more than 100 years of Canadian infectious disease data

The new, publicly accessible database can be used to study the patterns of disease incidence and strengthen public health preparedness.

By Blake Dillon December 17, 2025

David Earn stands in front of a bookcase.
David Earn, a member of the Michael G. DeGroote Institute for Infectious Disease Research, hopes that the new dataset — and his team’s herculean efforts to assemble it — will help spur important changes to Canada’s current infectious disease reporting standards.

Expert Featured In This Story

David Earn
David Earn

Professor

See Profile

Twenty-five years ago, in a neglected storage area at the Ontario Ministry of Health, David Earn happened upon epidemiological gold: two boxes of hand-written documents accounting for 50 years of weekly infectious disease incidence reports, spanning 1939-1989.  

The buried treasure was exactly the sort of thing that the McMaster University professor hoped to unearth during his visit — historical public health data that could help contextualize current and future infectious disease outbreaks.  

“Initially, the Ministry said that they couldn’t provide the data — that they didn’t have the time to search through their archives for us,” recalls Earn, a professor in McMaster’s Department of Mathematics and Statistics. “So, I offered to come to Toronto and look through their files myself, if they would let me. I basically begged, insisting on the value of the historical records, and I wouldn’t let it go. Eventually, I guess I became too much of a nuisance and they relented.”   

David Earn works at a computer, with a stack files beside him, one clearly labeled '1903 to 1939 monthly communicable disease incidence, Ontario.'

The documents uncovered that day catalyzed a massive retrospective research project that has culminated in a complete, province-by-province inventory of Canadian infectious disease records.    

The result, published today in PLOS Global Public Healthis what Earn describes as a “genuinely beautiful dataset” that strings together more than 100 years of historical epidemiological information.  

Altogether, the new database — the Canadian Notifiable Disease Incidence Dataset, or “CANDID” — contains more than a million infectious disease incidence counts that date back as far as 1903.   

The dataset, which is now publicly accessible, captures weekly, monthly, and quarterly case numbers for diseases like poliomyelitis, hepatitis, tuberculosis, whooping cough, influenza, rubella, mumps, measles, and many others, and tracks their spread in each province and territory across time.   

A collage of historical disease records.

“Data like these reveal the speed and shape of outbreaks and recurrent epidemics of the past, and allow us to test models that predict patterns of spread,” Earn says. “This new dataset can be leveraged to understand the ecology and evolution of infectious disease across Canada’s history, and to help us prepare for emerging and re-emerging diseases in the future.”  

In fact, Earn’s team has already used the database to better understand the spatial and temporal incidence of polio and whooping cough across several decades of Canadian history.  

While the new study was 25 years in the making, Earn says it really accelerated in 2021, when a large pandemic-related NSERC network grant allowed him to recruit Steven Walker, a former McMaster postdoctoral fellow, to his team.  

Walker, who re-joined McMaster as a data scientist in Earn’s group, was tasked with curating, cleaning, and harmonizing the troves of data that Earn and his associates had previously unearthed from libraries, public health offices, and provincial and federal agencies based all across Canada.  

“We would start with scans of handwritten or typewritten documents and manually transcribe them into Microsoft Excel to ensure that we had functional replicas of every original document,” Walker explains. “But the replicas aren’t conducive to data analysis, due to inconsistent formatting, so we’ve also been developing flexible data structures that are more convenient for analysis and discovery.” 

Earn, a member of the Michael G. DeGroote Institute for Infectious Disease Research, hopes that the new dataset — and the herculean efforts to assemble it — will help spur important changes to Canada’s current infectious disease reporting standards, noting that the public release of infectious disease data is arguably worse now than it was at any point during the 20th century, including the pre-digital era.   

David Earn stands at a table in a living room, looking over multiple papers and files spread on the table.

In fact, today, the Public Health Agency of Canada issues only annual, nationally aggregated incidence counts — not weekly or regional information — which limits opportunity for important studies into epidemic patterns, seasonal effects, and geographic variation.  

Earn says that the reduced resolution in today’s data is due in large part to patient privacy protection — a critically important consideration, but one that Earn believes can be maintained even with increased sharing of useful data.  

“It is extremely important to protect patient privacy, and our federal, provincial, and territorial agencies have developed protocols for data release that aim to ensure privacy is protected,” he says. “But there is no individual-level information in aggregate counts of infectious disease cases, and no identifying information can be extracted from these data. I think that current data release protocols should be thoughtfully and carefully reconsidered, so that they still prioritize privacy, but also allow for the release of more useful information, which could help us to prepare for future outbreaks — to the benefit of all Canadians.” 

In the meantime, Earns group encourages epidemiologists in Canada and elsewhere to use CANDID to study the patterns of disease incidence, to learn from historical surveillance efforts, and to strengthen public health preparedness. 

A group of five people standing indoors in front of a stone wall and a large black screen. They are dressed in formal and semi-formal attire, including suits, blazers and patterned dresses. Two large balloon arrangements in shades of purple, gold and white are positioned on the left side near a podium. The setting is an event space with bright lighting and glass doors visible in the background.

Donor gift launches initiatives for youth health literacy and injury prevention

The investment supports critical initiatives at the Mary Heersink School of Global Health and Social Medicine aimed at improving the health and well-being of children and youth, especially those from diverse and underserved communities.
Three women stand on stage. The woman in the centre holds an award and a bouquet of flowers.

PhD candidate Jana Radosavljevic is on a mission to make mental health research inclusive

Radosavljevic’s research and work as an advocate for gender equity in STEM has garnered global recognition.
gloved hands hold a cardboard cutout illustration of a brain. The person is also wearing a lab coat, indicating they're a doctor.

Scientists uncover hidden cells fuelling brain cancer — and a drug that could stop them

Scientists from McMaster and the Hospital for Sick Children have uncovered a new way to slow the growth of aggressive glioblastoma, and identified an existing medication that could treat it.