48 million profiles left exposed by data scraping firm, report says

The exposed data includes detailed information scraped from Facebook, Twitter, LinkedIn and Zillow.
(Getty Images)

LocalBlox, a company that scrapes user information from social media and other websites to repackage and sell, left 48 million of its records exposed on a public server, according to a report released Wednesday by cybersecurity firm UpGuard.

The data on each individual reportedly includes names, addresses, dates of birth, LinkedIn job histories, public Facebook data, Twitter handles and information from real estate listing site Zillow. Facebook, Twitter, LinkedIn and Zillow told ZDNet, which first reported the story, that data scraping without prior consent violates their policies.

The LocalBlox case bears some similarity to the data scandal embroiling Facebook, whereby data firm Cambridge Analytica was revealed to improperly obtain a trove of data profiles on 87 million Facebook users for political purposes. A main difference with the LocalBlox case, however, is that the data was left unprotected and breachable.

UpGuard said its Cyber Risk Team discovered a public Amazon Web Services S3 bucket containing the compressed 1.2 terabyte database. The says it notified LocalBlox 10 days later and the bucket was secured shortly after.


Analysis by UpGuard shows that LocalBlox has a sophisticated way of threading together data from different sources to produce a granular profile of individual users.

“The database appears to work by tracking an IP address, matching collected data to that IP address when able, and thus providing a clearer image of the behavior and background of the user at that IP address,” UpGuard writes.

LocalBlox showcases on its website the level of detail that it collects on individuals, neighborhoods and companies. The company says it has a voter database with 180 million records, an automotive database with 440 million records, 400 million consumer emails addresses with demographic attributes, and many other types of records.

On its website, LocalBlox says its platform “automatically crawls, discovers, extracts, indexes, maps and augments data in a variety of formats from the web and from exchange networks, adding crowd-sourced verification as needed. LocalBlox helps companies acquire and utilize a vast amount of information from sources held captive on the web with exceptional speed and scale.”

LocalBlox founder Ashfaq Rahman told ZDNet that the 48 million figure is inflated because the dataset includes records that are intentionally fake for testing purposes.


UpGuard says that given how lucrative the data analytics industry is, it’s not surprising that LocalBlox’s 48 million-strong dataset it exists. Rather, it’s worrying that it was so easily found.

“With this kind of business interest in data harvesting, processing, and resale, it should be no wonder that so many massive and intrusive data sets exist in the world, providing companies and political parties with detailed blueprints on how to influence people,” UpGuard writes. “What should be a wonder is that these datasets aren’t better secured and administered. This exposure was not the result of a clever hack, or well-planned scheme, but of a simple misconfiguration of an enterprise asset— an S3 storage bucket— which left the data open to the entire internet.”

Latest Podcasts