Posted in | News | Quantum Computing | Quantum Physics

Artificial Intelligence Used to Classify Real Galaxies in Hubble Images

Download PDF Copy

Apr 25 2018

A machine learning technique known as "deep learning," which has been extensively used in face recognition and other image- and speech-recognition applications, has exhibited potential in helping astronomers examine images of galaxies and get an insight on their formation and changes.

Image credit: University of California, Santa Cruz

In a new research, approved for publication in Astrophysical Journal and available online, scientists used computer simulations of galaxy formation to train a deep learning algorithm, which then turned out to be remarkably good at examining images of galaxies from the Hubble Space Telescope.

The scientists used output from the simulations to produce mock images of simulated galaxies as they would look in observations by the Hubble Space Telescope. The mock images were used to train the deep learning system to recognize three main phases of galaxy evolution earlier detected in the simulations. The team then gave the system a large set of real Hubble images to categorize.

The results revealed an extraordinary level of consistency in the neural network's classifications of simulated and real galaxies.

"We were not expecting it to be all that successful. I'm amazed at how powerful this is," said co-author Joel Primack, professor emeritus of physics and a member of the Santa Cruz Institute for Particle Physics (SCIPP) at UC Santa Cruz. "We know the simulations have limitations, so we don't want to make too strong a claim. But we don't think this is just a lucky fluke."

Galaxies are complex phenomena, altering their appearance as they evolve over several centuries, and images of galaxies can provide just snapshots in time. Astronomers can hunt deeper into the universe and thereby "back in time" to view earlier galaxies (because of the time taken for light to travel cosmic distances), however, observing the evolution of an individual galaxy during the course of time is only achievable in simulations. Comparing simulated galaxies to observed galaxies can expose vital details of the actual galaxies and their probable histories.

Blue Nuggets

In the new research, the researchers were principally interested in an occurrence seen in the simulations early in the evolution of gas-rich galaxies, when big flows of gas into the center of a galaxy fuel creation of a small, dense, star-forming area called a "blue nugget." (Young, hot stars release short "blue" wavelengths of light, so blue specifies a galaxy with active star formation, while older, cooler stars release more "red" light.)

In both simulated and observational data, the computer program revealed that the "blue nugget" phase only happens in galaxies with masses within a specific range. Subsequently quenching of star formation in the central region occurs, resulting in a compact "red nugget" phase. The consistency of the mass range was a sensational finding, because it indicates the deep learning algorithm is identifying on its own a pattern that results from a chief physical process taking place in real galaxies.

"It may be that in a certain size range, galaxies have just the right mass for this physical process to occur," said coauthor David Koo, professor emeritus of astronomy and astrophysics at UC Santa Cruz.

The researchers used advanced galaxy simulations (the VELA simulations) produced by Primack and an international team of collaborators, including Daniel Ceverino (University of Heidelberg), who ran the simulations, and Avishai Dekel (Hebrew University), who guided analysis and interpretation of them and formed new physical concepts based on them. All such simulations are inadequate, however, in their ability to capture the intricate physics of galaxy formation.

Especially, the simulations used in this research did not incorporate feedback from active galactic nuclei (injection of energy from radiation as gas is accreted by a significant supermassive black hole). Many astronomers think that this process is a crucial factor for regulating star formation in galaxies. However, observations of distant, young galaxies seem to reveal evidence of the occurrence leading to the blue nugget phase observed in the simulations.

CANDELS

For the observational data, the scientists used images of galaxies gotten through the Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey project (CANDELS), the Hubble Space Telescope’s largest project ever. First author Marc Huertas-Company, an astronomer at the Paris Observatory and Paris Diderot University, had already conducted pioneering work applying deep learning techniques to galaxy classifications using freely available CANDELS data.

Koo, a CANDELS co-investigator, invited Huertas-Company to visit UC Santa Cruz to pursue this research. Google has offered support for their research on deep learning in astronomy via gifts of research funds to Koo and Primack, allowing Huertas-Company to stay in Santa Cruz for the past two summers, with plans for another visit during the summer of 2018.

This project was just one of several ideas we had. We wanted to pick a process that theorists can define clearly based on the simulations, and that has something to do with how a galaxy looks, then have the deep learning algorithm look for it in the observations. We're just beginning to explore this new way of doing research. It's a new way of melding theory and observations.

David Koo, Co-Author

For quite some time, Primack has been partnering closely with Koo and other astronomers at UC Santa Cruz to compare his team's simulations of galaxy formation and evolution with the CANDELS observations. "The VELA simulations have had a lot of success in terms of helping us understand the CANDELS observations," Primack said. "Nobody has perfect simulations, though. As we continue this work, we will keep developing better simulations."

According to Koo, deep learning holds the promise of revealing aspects of the observational data that humans cannot see. The downside is that the algorithm is similar to a "black box," so it is difficult to know what features in the data the machine is using to form its classifications. Network interrogation methods can identify which pixels in an image contributed most to the classification, however, and the team tried out one such technique on their network.

Deep learning looks for patterns, and the machine can see patterns that are so complex that we humans don't see them. We want to do a lot more testing of this approach, but in this proof-of-concept study, the machine seemed to successfully find in the data the different stages of galaxy evolution identified in the simulations.

David Koo, Co-Author

Going forward, he said, astronomers will have a lot more observational data to examine as a result of large survey projects and new telescopes such as the James Webb Space Telescope, the Large Synoptic Survey Telescope, and the Wide-Field Infrared Survey Telescope. Deep learning and other machine learning approaches could be influential tools for understanding these massive datasets.

"This is the beginning of a very exciting time for using advanced artificial intelligence in astronomy," Koo said.

Besides Primack, Koo, and Huertas-Company, the coauthors of the paper include Avishai Dekel at Hebrew University in Jerusalem (and a visiting researcher at UC Santa Cruz); Sharon Lapiner at Hebrew University; Daniel Ceverino at University of Heidelberg; Raymond Simons at Johns Hopkins University; Gregory Snyder at Space Telescope Science Institute; Mariangela Bernardi and H. Dominquez Sanchez at University of Pennsylvania; Zhu Chen at Shanghai Normal University; Christoph Lee at UC Santa Cruz; and Berta Margalef-Bentabol and Diego Tuccillo at the Paris Observatory.

Over and above the support from Google, this research was partially supported by grants from France-Israel PICS, US-Israel Binational Science Foundation, U.S. National Science Foundation, and Hubble Space Telescope. The VELA computer simulations were tried out on NASA's Pleiades supercomputer and at DOE's National Energy Research Scientific Computer Center (NERSC).