Scientists have developed a software that uses crowdsourcing to help algorithms identify faces in photos, that could uncover the mysteries of the nearly four million photographs of Civil War-era images that may exist in the historical record. Kurt Luther, an assistant professor at Virginia Tech in the US, was inspired to develop the software for Civil War Photo Sleuth while visiting the Heinz History Center’s exhibit called “Pennsylvania’s Civil War” in Pittsburgh, Pennsylvania.
There he stumbled upon a Civil War-era portrait of Oliver Croxton, his great-great-great uncle who served in Company E of the 134th Pennsylvania, clad in a corporal’s uniform.
“Historical photos can tell us a lot about not only our own familial history but also inform the historical record of the time more broadly than just reading about the event in a history book,” said Luther.
The Civil War Photo Sleuth project allows users to upload photos, tag them with visual cues, and connect them to profiles of Civil War soldiers with detailed records of military history.
Photo Sleuth’s initial reference database contained more than 15,000 identified Civil War soldier portraits from public domain sources like the US Military History Institute and other private collections.
More than 600 users contributed more than 2,000 Civil War photos to the website in the first month after the launch, and roughly half of those photos were unidentified. Over 100 of these unknown photos were linked to specific soldiers, and an expert analysis found that over 85 per cent of these proposed identifications were probably or definitely correct. Presently, the database has grown to over 4,000 registered users and more than 8,000 photos.
“Typically, crowdsourced research such as this is challenging for novices if users don’t have specific knowledge of the subject area,” said Luther. “The step-by-step process of tagging visual clues and applying search filters linked to military service records makes this detective work more accessible, even for those that may not have a deeper knowledge of Civil War military history,” he said.
Person identification tasks can be challenging in larger candidate pools because there is a larger risk for false positives.
The novel approach behind Civil War Photo Sleuth is based on the analogy of finding a needle in a haystack. The data pipeline has three haystack-related components: building the haystack, narrowing down the haystack, and finding the needle in the haystack.
When combined, they allow users to identify unknown soldiers while reducing the risk of false positives. Any time a user uploads a photo to identify it, the photo gets added to the site’s digital archive or “haystack,” making it available for future searches.
Following upload, the user tags metadata related to the photograph such as photo format or inscriptions, as well as visual clues, such as coat colour, chevrons, shoulder straps, collar insignia, and hat insignia.
These tags are linked to search filters to prioritise the most likely matches. For example, a soldier tagged with the “hunting horn” hat insignia would suggest potential matches who served in the infantry, while hiding results from the cavalry or artillery.
Next, the site uses state-of-the-art face recognition technology to eliminate very different-looking faces and sort the remaining ones by similarity. Both the tagging and face recognition steps narrow down the haystack.
Finally, users find the needle in the haystack by exploring the highest-probability matches in more detail.
A comparison tool with pan and zoom controls helps users carefully inspect a possible match and, if they decide it’s a match, link the previously unknown photo to its new identity and biographical details. Retracing historical Civil War photos through facial recognition software like Photo Sleuth has broad applications beyond identifying historical photos, too.
The software has the potential to generate new ways to think about building person identification systems that look beyond face recognition and leverage the complementary strengths of both human and artificial intelligence.