Human Rights Watch (HRW) continues to reveal how photos of children, casually posted online years ago, are being used to train AI models, even when families use strict privacy settings. Last month, HRW researcher Hye Jung Han found 170 photos of Brazilian kids in LAION-5B, a popular AI dataset built from Common Crawl snapshots of the public web. Now, she has released a second report, flagging photos of Australian children, including indigenous children who may be particularly vulnerable to harm.
These photos are linked in the dataset “without the knowledge or consent of the children or their families.” They span the entirety of childhood, making it possible for AI image generators to create realistic deepfakes. URLs in the dataset sometimes reveal identifying information about children, including names and locations where photos were shot, making it easy to track down children whose images might not otherwise be discoverable online.
This puts children at risk, Han said, as some parents may not realize these dangers exist. For instance, from one photo showing “two boys, ages 3 and 4, grinning with paintbrushes,” Han traced “both children’s full names, ages, and the name of the preschool they attend in Perth, Western Australia.” Another image found by Han showed “a close-up of two boys making funny faces,” from a video posted on YouTube with privacy settings adjusted to “unlisted.” Only someone with the link was supposed to have access, but Common Crawl archived the image. YouTube policies prohibit AI scraping or harvesting of identifying information.
YouTube’s spokesperson, Jack Malon, told Ars that YouTube has “been clear that unauthorized scraping of YouTube content is a violation of our Terms of Service, and we continue to take action against this type of abuse.” However, Han worries that even if YouTube joins efforts to remove the images, the damage is done as AI tools have already trained on them. Han argues that kids need regulatory intervention before this happens.
Han’s report coincides with the forthcoming reform draft of Australia’s Privacy Act. The reforms include Australia’s first child data protection law, the Children’s Online Privacy Code. However, even people involved in long-running discussions are not sure how much the government will announce.
“Children in Australia are waiting to see if the government will adopt protections for them,” Han said, emphasizing that “children should not have to live in fear that their photos might be stolen and weaponized.”
Han reviewed fewer than 0.0001 percent of the 5.85 billion images and captions in the dataset. Given her small sample size, she expects a significant undercount of children impacted. “It’s astonishing that out of a random sample of 5,000 photos, I immediately encountered 190 photos of Australian children,” Han told Ars. She expected more photos of cats than personal photos of children since LAION-5B reflects the entire internet. LAION is working with HRW to remove flagged images, but cleaning the dataset is slow. Links to Brazilian kids’ photos reported a month ago remain.
In June, LAION’s spokesperson, Nathan Tyler, told Ars that as a “nonprofit, volunteer organization,” they are committed to addressing the misuse of children’s data. However, removing links does not remove the images online, where they can still be used in other AI datasets. Han points out that removing links doesn’t change AI models already trained on the data.
Children exposed to AI training face privacy risks, including the potential for harmful or explicit deepfakes. Last month, about 50 girls in Melbourne reported their social media photos being manipulated into sexually explicit deepfakes and circulated online. First Nations children face unique cultural harms as AI training could perpetuate the reproduction of photos during mourning periods.
AI models are notorious for leaking private information, Han said. Guardrails in image generators do not always prevent these leaks, with some tools repeatedly broken. LAION recommends parents remove kids’ images online to prevent abuse, but Han calls this “unrealistic and outrageous.” She insists on legal protections so kids don’t have to worry about their photos being misused.