
A few weeks ago, I came across a wild post on Reddit’s r/DHExchange, a subreddit for trading large datasets: “I hoarded a large database of something valuable, just not what’s [sic] you expect…150k stools images.”
The post, made by a user called Ill_Car_7351, was advertising exactly what it sounds like: A database of poop images, collected from an AI poop analyzing app that he had launched several years ago. Basically, 25,000 people had been taking images of their poop and uploading them to his app. He’d been collecting, analyzing, and annotating these images and now wanted to sell access to them: “I’ve got 150k+ labeled and classified images of 💩 from roughly 25K different people. Jokes aside, I know there’s a lot of value in it (hard to obtain, useful for ML [machine learning] training, cancer studies etc) but not sure on how to move about it. Feels like I’m sitting on a pile of shi..ny coins but can’t find who wants them.” The poster added that “the images are extremely rare,” and that he was trying to figure out how much money he could sell them for.
The comments were from people who were mostly horrified: “When I was 5 the teacher taught me how to read. I now regret that happened,” one read. “What in the fuck,” another read. “How to delete someone else’s post,” a third said.
I messaged the poster and told him I was interested in obtaining the database. Thus began my journey into the Internet of Shit and, by extension, the unpleasant world of the underground sale of highly sensitive, app-collected user data for AI training.
The poop database comes from an app called PoopCheck, an app made by a company called Soft All Things that purports to use AI to analyze images of one’s stool in order to give you a “daily gut health score.”
“Our AI analyzes your poop using the Bristol Stool Scale and advanced pattern recognition. Get insights on consistency, color, shape, and what they mean for your digestive health,” the app advertises. The Bristol Stool Scale classifies stools into one of seven types ranging from “separate hard lumps, like little pebbles” to “watery with no solid pieces.”


The app also features a “community,” of 151,317 “shared stools” at the time of this writing and a “leaderboard,” where people can share images of their poop for commentary from other users and earn points for participating. I found the posts in the community a bit hard to stomach, with titles “like play dough,” “Concerned,” and “Dealing with this on and off for the past 3 weeks.” Pictures are not automatically shared to the community; when you take a photo it asks if you want to share it.
“Popular” posts on the app include people speculating as to whether their fellow community members have parasites or colon cancer; in the comments section of a few posts I saw people recommending ivermectin to the original poster.
Though users have the option to share their poops with other users, the app provides mixed messages about the fact that the data uploaded to the app will be analyzed, annotated, and packaged with other poops into a commercial database to be sold to AI companies.
On the App Store page for PoopCheck, it says “The developer does not collect any data from this app.” The link to the privacy policy from within the App Store download page does not mention anything about selling or sharing the data and says “your health data is encrypted in transit and at rest. Photos are processed securely. We implement industry-standard security measures to protect your data.”
The PoopCheck website’s About page states “Privacy First.” And “Health data is sensitive. That’s why privacy isn’t a feature, it’s our foundation. Your photos are encrypted. You can delete everything at any time. We built PoopCheck the way we’d want our own health apps built.” The FAQ also notes “your privacy is our priority.”
This is completely different from the “Service Agreement” and “Terms and Conditions” people agree to when they actually open the app and make an account. The Service Agreement states that “by uploading stool images or any health-related data to the App, you grant Soft All Things LLC a worldwide, irrevocable, perpetual, unconditional, royalty-free, fully-paid, transferable, sub licensable license to use, reproduce, modify, adapt, distribute, sell, license, and create derivative works from such content for any lawful purpose, including but not limited to research, commercial exploitation, product development, and third party licensing. You acknowledge that your images and data may be used to create, train, improve, and commercialize AI technologies and machine learning models, and that such models and any outputs derived from your data may be licensed or sold to third parties, including medical organizations, research institutions, and commercial partners.”
It adds that “your data may be irreversibly incorporated into AI models and aggregated datasets. Deletion of your account will remove your personal profile data but does not require the removal of anonymized, aggregated, or derivative data already processed or incorporated into AI models.” Under a section called “Sharing of Information,” it adds that the company reserves the right to share or sell the data “for any business purpose,” including “AI and Data Licensing.”
On Reddit, I messaged Ill_Car_7351 and said “Hi – am interested in this database you posted about. Can you share any more info about what you’re looking for / details about the app where it was collected? also any chance there’s like, a sample of what the data looks like etc?” They responded quickly and said “Hey! The db was gathered by real users, we had 25k users over the last couple years, since we launched the app. It’s called PoopCheck btw if you wanna see it. Let’s maybe talk via email? I’ll be happy to share a sample of the data if that interests you.”
I sent an email to someone named “Marco” at Soft All Things, who identified himself as one of the founders of PoopCheck. I said I had reached out on Reddit and was interested in a sample of the data. I used my real email address and real name.
“We can surely send you a sampling of the dataset, would a Google Drive link containing an image folder and JSON data work? We can also figure out other ways if you prefer,” Marco said. “In terms of the actual dataset you need, what would be the size of it for your needs? And what would you be using it for? Just so we can make sure it’s actually a good fit for your use case.”
I told Marco that I wanted 10,000 pieces of data and said I would use it for AI training. I asked him for pricing and what type of data was included.
Marco responded:
“You’ll find a folder with images and JSON metadata covering the key fields we capture per entry. Let us know if you have any questions about it.
To give you a better idea of the dataset and pricing options: we currently have over 150,000 images validated by AI. Around 5,000 of these have also been manually reviewed by a member of our team, who verified the AI output and labeling, making this portion more valuable and priced accordingly. It’s also worth noting that certain types on the Bristol Stool Scale are rarer than others, so availability may vary depending on your specific needs.
With that in mind, here there is an estimation of pricing options:
• 10,000 unreviewed images (AI-validated) — $3,000
• 5,000 fully human-reviewed & annotated (on top of AI validation) — $4,000
• 5,000 reviewed + 5,000 unreviewed — $5,000
It would be great to have a quick call to take this further as there are a few things about the dataset’s structure and coverage that are easier to walk through live.”

The sample dataset Marco sent me included 20 images of poop from four specific users (five poops each). Each image was tied to a series of user-reported data points as well as AI analyses of each image. AI-analyzed datapoints included the time the poop was taken, the Bristol Type of each poop, whether it was “healthy” or “unhealthy,” the “shape” and “consistency,” whether there was blood or mucus in the poop, and the quantity (“large,” “normal,” or “small”), and whether it was “floating” or not. Each of these data points also had a “confidence” score for how confident the AI was in its analysis. Each image also had user-reported information, which included the answers to a series of questions including “when did you have your last meal,” “any discomfort while pooping? (“Hard to pass;” “burning”; “sharp pain” etc); “How long did it take?” “Did it smell stronger than usual?” “Coffee or alcohol in the last 12 hours?” The data also included demographic information, which includes age ranges, sex, height, weight, and sensitivities such as “lactose intolerance” or “irritable bowel syndrome.” Each image is tied to a specific user through a field called “externalIndividualID.”

Soft All Things is not exactly quiet about the database that it has created. On the Poop Check website, it has a page called “For Business,” which advertises its database. It sells access to both the “Stool Analysis API,” which “turns a stool photo into a structured health report,” as well as the “Annotated Dataset,” of 140,000+ images to “train your own models.” It advertises this as the “largest consumer stool image dataset we know of.”
It maybe should not be terribly surprising that a free app in which you upload images of your poop to a random company would have a business model focused on packaging and selling that data. But this type of data collection—of our literal poop—highlights how almost anything we do on our phones can ultimately end up for sale. The fact that it is advertising this for sale at all indicates that there is an AI goldrush for any and all types of data, even our literal waste.
Research has shown, over and over again, that de-identified “anonymous” data doesn’t necessarily remain anonymous when combined with other datasets. Toward the end of last year, the appliance giant Kohler endured a security shitshow when a researcher showed that its stool-analyzing smart toilet camera was not actually properly encrypting the images that it sent to Kohler. The concern there was that your poop data would be somehow accessed by bad actors. In the case of PoopCheck, anyone can simply buy access.
After I told Marco I was writing an article about PoopCheck and its database, he stopped responding to me and did not answer any of my questions.


