When demonstrated using three video recordings from the 2017 mass shooting in Las Vegas that left 58 people dead and hundreds wounded, the system correctly estimated the shooter’s actual location — the north wing of the Mandalay Bay hotel. The estimate was based on three gunshots fired within the first minute of what would be a prolonged massacre.
Alexander Hauptmann, research professor in CMU’s Language Technologies Institute, said the system, called Video Event Reconstruction and Analysis (VERA), won’t necessarily replace the commercial microphone arrays for locating shooters that public safety officials already use, although it may be a useful supplement for public safety when commercial arrays aren’t available.
One key motivation for assembling VERA was to create a tool that could be used by human rights workers and journalists who investigate war crimes, terrorist acts and human rights violations, Hauptmann said.
“Military and intelligence agencies are already developing these types of technologies,” said fellow researcher Jay D. Aronson, a professor of history at CMU and director of the Center for Human Rights Science. “We think it’s crucial for the human rights community to have the same types of tools. It provides a necessary check on state power.”
The researchers presented VERA and released it as open-source code last month at the Association for Computing Machinery’s International Conference on Multimedia in Nice, France.
Hauptmann said he has used his expertise in video analysis to help investigators analyze events such as the 2014 Maidan massacre in Ukraine, which left at least 50 anti-government protesters dead. Inspired by that work — and the insight of ballistics experts and architecture colleagues from the firm SITU Research — Hauptmann, Aronson and Junwei Liang, a Ph.D. student in language and information technology, have pulled together several technologies for processing video, while automating their use as much as possible.
VERA uses machine learning techniques to synchronize the video feeds and calculate the position of each camera based on what that camera is seeing. But it’s the audio from the video feeds that’s pivotal in localizing the source of the gunshots, Hauptmann said. Specifically, the system looks at the time delay between the crack caused by a supersonic bullet’s shock wave and the muzzle blast, which travels at the speed of sound. It also uses audio to identify the type of gun used, which determines bullet speed. VERA can then calculate the shooter’s distance from the smartphone.
“When we began, we didn’t think you could detect the crack with a smartphone because it’s really short,” Hauptmann said. “But it turns out today’s cell phone microphones are pretty good.”
By using video from three or more smartphones, the direction from which the shots were fired — and the shooter’s location — can be calculated based on the differences in how long it takes the muzzle blast to reach each camera.
With the proliferation of mass protests occurring in places such as Hong Kong, Egypt and Iraq, identifying where a shot originated can be critical to determining whether protesters, police or other groups might be responsible when a shooting takes place, Aronson said.
But VERA is not limited to detecting gunshots. It is an event analysis system that can be used to locate a variety of other sounds relevant to human rights and war crimes investigations, he said. He and Hauptmann hope that other groups will add functionalities to the open-source software.
“Once it’s open source, the journalism and human rights communities can build on it in ways we don’t have the imagination for or time to do,” Aronson added.