AWS Public Sector Blog
Brain Data Science Platform increases EEG accessibility with open data and research enabled by AWS
Introduction
About 4.5 million electroencephalogram (EEG) tests are performed in the US each year. That’s more than if every person in Oregon, Connecticut, or Iowa got an EEG. Compared to magnetic resonance imaging (MRI) scans, which use magnetic fields and radio waves to generate images of the structure of the brain, EEGs use wires placed on the scalp to record the brain function as seen through the electrical activity that the brain generates in the process of neurons in the brain sending signals to each other. The Brain Data Science Platform (BDSP), hosted on Amazon Web Services (AWS), is increasing EEG accessibility through cooperative data sharing and research enabled by the cloud.
Because they provide insights into brain activity and not just structure, EEGs are one of the most common tests ordered by doctors to help make a diagnosis for people with brain problems. This includes seizures and epilepsy, coma, stroke, developmental delays in children, and sleep disorders. However, in current practice, EEGs are not always part of a diagnostic plan, even when they could provide important information. Currently, experts trained in EEG are in short supply, and methods to automatically interpret EEGs are not yet advanced enough to fill this gap. For these reasons, diagnostic plans that include EEGs are limited. The cloud increases EEG accessibility by facilitating data sharing and research innovation, making EEGs more accessible for more patients’ medical care plans.
How EEGs work
Brain cells talk to each other using tiny electrical impulses and are constantly active, even during sleep. Activity is measured by placing small, metal discs (electrodes), held in place by tape or glue, at different locations on your scalp. These electrodes detect the tiny voltage fluctuations from the activity of millions of neurons in the brain. The electrodes are connected to an amplifier, which magnifies the weak electrical signals picked up by the electrodes. The acquired signals are often weak and susceptible to various types of interference, including noise from the environment or other non-brain biological sources. The amplifier performs signal conditioning operations to filter out unwanted noise and artifacts while preserving the relevant brain activity. This can involve processes such as amplification, filtering, and isolation.
Once conditioned, signals are converted from analog to digital format using an analog-to-digital converter (ADC). The ADC samples the analog signals at a specific rate and converts them into a digital representation that can be processed and analyzed by a computer. The digitized EEG signals can undergo further processing, like additional filtering, artifact removal, and feature extraction. Various algorithms and techniques can be applied to extract meaningful information from the EEG signals, depending on the specific analysis or diagnostic purpose. For example, algorithms can detect EEG activity that is normal for a given age, as shown in Figure 1.
Algorithms can also detect harmful brain activity like a seizure, as shown in Figure 2.
The processed EEG signals are stored for later review or analyzed in real time. The data is visualized as waveforms or through spectral analysis, event-related potential analysis, or analysis by machine learning (ML) algorithms to detect abnormalities or patterns of interest.
Most people who need EEGs cannot get them
In the US, about 75 percent of EEGs are interpreted by neurologists without expertise in EEG interpretation. This can lead to mistakes in tricky cases, such as when an EEG looks abnormal but really is not, so some patients who really have a heart problem get misdiagnosed instead with epilepsy. In many parts of the world, patients are unable to get an EEG because the doctors available are not trained to interpret EEGs. People who have sleep problems face similar challenges in getting a diagnosis. Doctors can order a sleep test (which includes recording EEG and other signals overnight while sleeping) to help diagnose sleep problems. However, getting the sleep test done can take a long time or may not be possible because there is a shortage of sleep specialists.
Automating clinical neurophysiology test interpretation
A research team at Harvard Medical School, headed by Drs. Brandon Westover and Robert Thomas at Beth Israel Deaconess Medical Center, and Drs. Valdery Moura Junior and Sahar Zafar at Massachusetts General Hospital aims to make EEGs more easily accessible by using artificial intelligence (AI) to automate medical diagnosis based on EEG. They are joined in this effort by other scientists from several institutions.[i]
The team is working to automate EEG and sleep testing interpretation by developing the Brain Data Science Platform (BDSP), the world’s largest and most diverse set of EEG and sleep testing data. Using this data, the team is constructing algorithms that diagnose sleep disorders, detect seizures and other forms of harmful brain activity in hospitalized patients who are critically ill, predict the risk of future seizures, and calculate the probability that a patient with coma due to brain damage will be able to recover consciousness. To be useful in the real world, they need to cope with EEG patterns from all people, regardless of age, gender, race, and ethnicity, and across a vast number of different health conditions. Thus, the algorithms that underlie automated EEG and sleep test interpretation must be well-trained and well-tested so that the resulting diagnoses are just as reliable – or more so – than can currently be obtained when EEGs are interpreted by human experts with specialty medical training in EEG and sleep test interpretation.
Decoding health information from brain activity during sleep
Beyond the diagnostics information currently available from EEGs, the team believes that there is hidden information, especially during sleep, that reveals insights into the health of the brain and which even experts cannot see. The understanding is that each of the different stages of sleep — rapid-eye movement (REM), and light and deep stages of non-REM sleep — has certain patterns that are normal for a given age and gender. Divergence from these norms can indicate positive or negative deviations from normal health.
The team is developing AI algorithms that use sleep signals to detect early signs of diseases like Parkinson’s disease and Alzheimer’s disease. Earlier detection helps treatments to be given earlier when they can be more effective. The team has already found that information from sleep can predict life expectancy. Finally, one member of the research team, Dr. Haoqi Sun, has developed a way to measure “brain age,” as distinct from chronologic age. This validates the concept that someone who is 80 can have a brain that functions like someone 20 years younger. Accelerated brain aging (brain age older than chronologic age) is linked to a variety of brain health problems, including declining cognitive functioning and diseases like Alzheimer’s. The team believes that the ability to measure brain age, and similar types of hidden health information in sleep, may enable doctors to treat diseases more efficiently and effectively while providing more direct ways to measure the effects of those treatments on brain health.
The world’s largest collection of EEGs
The team has assembled a massive collection of EEG and sleep data – currently more than 200,000 EEG recordings and 26,000-plus sleep tests. These span all medical settings where EEGs and sleep tests are performed, including outpatient neurology clinics, epilepsy centers, sleep centers, and home settings where data is collected using wearable consumer devices. For more valuable research, additional “metadata” is being collected as well. This includes diagnoses, medications, laboratory testing results, and brain imaging including head computed tomography (CT) scans and brain MRI images.
What’s next: Generalizing EEG AI models, making the data available to clinicians and researchers everywhere, and growing the dataset
Dr. Westover intends for any clinic or hospital in the world to benefit from the models that BDSP researchers develop. The benefits would not be limited to clinical groups with access to on-premises powerful servers and clusters.
“We intend to offer the machine learning EEG interpretation models as an online service, where sites can upload their EEG data and get their results back within seconds,” says Dr. Westover. “If we want to engage clinical sites without such a level of internal IT infrastructure and bandwidth, we need to offer them a simple way to access our developments. This is where the cloud is going to be crucial.”
Dr. Westover is collaborating with AWS to increase access to brain data and support researchers focusing on brain health. Twelve other hospitals have already committed to adding their data to BDSP on AWS, further increasing the possibilities for new discoveries by making it possible to study rare diseases, which are typically not seen often enough at any single hospital to allow rigorous research. Dr. Westover believes BDSP will transform the field of brain health, paving the way for more personalized and precise ways to diagnose, treat, and prevent neurological disease.
“These datasets are going to let us launch the field of precision brain health,” said Dr. Westover.
BDSP: Brain datasets now available for research
With support from the AWS Open Data Sponsorship Program, the BDSP datasets are now openly available at no cost to researchers around the world. The BDSP dataset is one of the largest collections of brain data in existence.
Learn more
- Ready to get started exploring the BDSP EEG and Sleep datasets? Complete the required training and accept the data use agreement for the project. Supporting information is available at the Registry of Open Data on AWS.
- Discover how to access open data and share your open data on AWS with the Registry of Open Data on AWS and the Open Data Sponsorship Program.
- Learn more about how AWS supports researchers with the AWS Cloud Credit for Research program.
[i] Collaborators include Junior Moura, PhD, Umakanth Katwa, MD, Wolfgang Ganglberger, PhD, Thijs Nassi, MSc, Erik-Jan Meulenbrugge, MSc, Yalda Amidi, PhD, Jin Jing, PhD, Haoqi Sun, PhD, Mouhsin Shafi, MD, PhD, Daniel Goldenholz, MD, PhD, Arjun Singh, MD, Sahar Zafar, MD, Shibani Mukerji, MD, PhD, Jurriaan Peters, MD, and Tobias Loddenkemper, MD at Harvard Medical School; Aaron Struck, MD at University of Wisconsin; Jennifer Kim, MD, PhD at Yale University; Emmanuel Mignot, MD, PhD and Chris Lee-Messer, MD, PhD at Stanford University; Gari Clifford, PhD, Samaneh Nasiri, PhD, and Lynne Marie Trotti, MD at Emory University; and Dennis Hwang, MD at Kaiser Permanente.