Carnegie Mellon University

brain made out of a circuit

November 26, 2018

Strength training deep neural networks

By Lucas Grasha

Krista Burns

Google Translate, Tesla’s Autopilot, and IBM’s Deep Blue share one thing in common: deep neural networks. Deep neural networks (DNNs) have grown popular with data-driven technological advances because they analyze and process incomprehensible sums of input. In Tesla’s Autopilot, a DNN analyzes visual and radar data to determine the vehicle's position. Potential DNN-driven medical screening equipment could determine patients’ illnesses from symptoms to aid clinician care.

These systems are often swift and efficient. But given the amount of data processed, a DNN also encounters errors and slowdowns. To solve these issues, a team led by Pulkit Grover created a more efficient DNN called PolyDot coding.

Sanghamitra Dutta, a researcher on Grover’s team, recognizes that DNNs compute enormous sets of data to generate solutions. “The greatest computation hurdle is dealing with faults, which slow down the training,” Dutta says.

PolyDot overcomes many shortcomings in DNNs, such as high energy use and poor fault management, by decentralizing the network. The math behind a DNN involves matrices and vectors, which contain and process the network’s data. But because DNNs require huge data sets to work, multiplying matrices and vectors with hundreds of data points each understandably slows a computer.

“The key innovation in PolyDot,” Dutta said, “is the splitting of matrices and vectors into sub-matrices, which lets processors perform smaller tasks and lets the computer ignore straggling processors that are faulty anyway.”

A digitized version of divide-and-conquer, PolyDot was developed jointly by Dutta and Haewon Jeong, another researcher in Grover’s team, in collaboration with Penn State University researcher Viveck Cadambe and his team. It synthesizes deep ideas in Shannon’s information theory with an understanding of how DNNs are trained.

“The novel, resilient, and fast DNN training architecture was enabled by long discussions with Tze Meng Low, who is a collaborator on this line of work” Grover said, adding, “this is also a great example of how CMU researchers come together to do something that is more than the sum its parts:  a supercomputing expert, Low, collaborating with our information theory and statistical inference team.”

Coding to optimize a DNN, though, is tricky. Conventionally, matrices encode their weighs with each data update, which strains computational power. But the team sped up the networks by maintaining matrix encoding across updates, and utilizing the power of information theory by ignoring any processors that are slow or faulty.

“This innovation in information theory to best enable modern neural networks is yet another demonstration of how this beautiful set of ideas can influence modern computational tasks,” Grover said.

Another challenge is managing the layers in a network. The layers process data in different pieces, such as when an autonomous vehicle processes its surroundings. The basic input layers take in the surroundings, while middle layers pick out specific parts of the general picture, such as curbs or signs. Each layer has processors, and each has a different weight that determines its data’s importance.

“But matrices encode their weights afresh with every data update, which costs computational power,” Dutta said. “If matrices’ weights remain encoded throughout updates, we encode far fewer data points to make a DNN work. In the end, this makes training DNNs more feasible.”

Dutta mentioned another slowdown problem: DNNs require each layer to fully process its data before an upper layer can begin its work. PolyDot instead frees processors to crunch data at will, regardless of the full completion of a preceding layer.

A few questions posed by mathematician John von Neumann motivated Dutta’s research. Von Neumann asked: How could the human mind compute information amidst irrelevant stimuli? And if brains and DNNs are analogous, how did the brain efficiently ignore errors?

“It’s hypothesized that, for the brain, it may be more efficient to accept ‘processor-level’ errors,” Datta says. That hypothesis underpins why PolyDot ignores processors stymied by errors and computes data despite them. PolyDot could ultimately improve data processing in DNNs and promote the network’s original goal: to mimic the architecture and efficiency of the human mind.

The work was supported by a grant from the NSF WiFiUS program, and NSF CAREER award. Dutta and Jeong are the lead researchers on the project. Collaborators include Tze Meng Low, Viveck Cadambe, and Jeremy Bai, an undergraduate researcher from the Chinese University of Hong Kong who was visiting Grover’s lab.