Carnegie Mellon University

machine learning data

March 20, 2018

Using machine learning to understand microbial relationships

Using machine learning to understand microbial relationships

Krista Burns

The ecosystem in and around the Amazon River is the most bio-diverse in the world. But it has some competition when considering the roughly thirty feet of the human gastrointestinal (GI) tract. This microbiome — the sum total of microorganisms in a particular environment — has been the research focus of late for Carnegie Mellon Electrical and Computer Engineering (ECE) Professor Radu Marculescu.

“It turns out, the interactions that happen in the human [GI] microbiome have far more implications than we originally thought,” says Marculescu. “People associate changes in the microbiome to depression, infections, even cancer, so it’s sort of like a second brain for humans.”

Marculescu, along with ECE Ph.D. student Chieh Lo, has developed a machine learning algorithm — called MPLasso — that uses data to infer associations and interactions between microbes in the GI microbiome. MPLasso mines medical and scientific literature from the past few decades in search of experimental data from research focused on various types of microbial interactions and associations. MPLasso pulls this disparate information into a centralized dataset that catalogs microbial interactions within the human GI tract.

Machine learning is a novel approach for this type of investigation. Marculescu’s CMU-based System Level Design Group, which commits time to cyber-physical systems research, seemed like the right venue in which to tackle such a project. In doing so, he found a way to provide medical researchers and professionals with a catalog of inferred microbial interactions that can bolster the understanding of how those interactions influence and impact human health.

Until now, it’s been challenging to get a good look at how microorganisms interact in the human GI tract. Marculescu knows it will still be years before advanced technologies like engineered ingestible pills and bacteria are ready for mainstream adoption, but he sees MPLasso as a major step in helping researchers better understand how the microorganisms in the human GI tract co-exist.

Marculescu says this type of information is extremely valuable for preventive medicine because it lays the groundwork for uncovering how microbial interactions translate into a person being healthy or sick. If researchers first understand what microbes are present and how they behave together, they can then start establishing cause-and-effect relationships between microbial interactions and various types of ailments.

“Researchers also observe real experimentation. They observe microbial presence at, and interactions during, various events in the body,” says Marculescu. “Based on this, one can infer a network of interactions that is predictive in nature.”

MPLasso has shown to be 95 percent accurate in the associations and interactions it infers in part because it addresses issues of high-dimensionality and compositionality of human microbiome data. High-dimensionality refers to the number of potential microbial associations and interactions that exist being far larger than the number of samples available in any given library of data. Compositional data provides numbers as a percentage of a whole and not as an exact measurement.

Marculescu and Lo have made MPLasso publicly available through GitHub. When researchers download it for their own use, they are also able to upload their own data to the platform. MPLasso offers a user-friendly interface for a database that continuously updates as it constantly mines newly uploaded and relevant data.

“You’ll see the difference between yesterday and a few months ago because the algorithm is automatically collecting and improving your model,” says Marculescu. “Your model today is better than the one two weeks ago or two months ago simply because if something has been reported that aligns with what you’re looking at, the associations become that much stronger.”

The human GI tract is host to what Marculescu calls “gazillions of potential problems for the human body.” Since its biodiversity rivals that of the Amazon, knowing how all those living things in our GI tract interact and impact us would be a mighty useful tool. Marculescu’s work will help make it so.

Read Marculescu and Lo’s research paper, recently published in PLOS Computational Biology.