Data Breach

Facebook Data Leak: 533 Million Users Exposed Through a Contact Import Feature

Personal data from 533 million Facebook users across 106 countries was posted on a hacking forum, exposing phone numbers, emails, and personal details scraped through a contact import vulnerability.

James
Security Consultant
5 min read

In April 2021, a user on a low-level hacking forum posted the personal data of 533 million Facebook users — for free. The dataset covered users from 106 countries and included phone numbers, full names, locations, email addresses, biographical information, and in some cases relationship status and employer details.

This was not a hack in the traditional sense. No servers were breached. No encryption was broken. Instead, attackers exploited a feature that Facebook had designed to help users find their friends.

The Contact Import Scraping Technique

Facebook's platform included a "contact import" feature that allowed users to upload their phone contacts to find friends on the platform. The intended use was straightforward: upload your phone book, and Facebook would match those numbers against its user database and suggest connections.

Attackers exploited this by programmatically generating massive lists of phone numbers and feeding them through the contact import tool. When a phone number matched a Facebook account, the system returned the associated user profile information. By iterating through millions of phone numbers systematically, scrapers could build a comprehensive database mapping phone numbers to Facebook profiles.

Facebook disabled this particular contact import functionality in 2019 after discovering the abuse. However, the data that had already been scraped remained in circulation. It passed through various private sales channels before being dumped publicly in April 2021.

What Was Exposed

The dataset varied by country in terms of completeness, but generally included:

  • Phone numbers for virtually all 533 million records
  • Full names linked to those phone numbers
  • Facebook IDs (persistent identifiers)
  • Locations (city, state, country)
  • Birthdates for many records
  • Email addresses for a subset of records
  • Biographical details including employer, education, and relationship status

For context, the dataset included approximately 32 million records from the US, 11 million from the UK, 10 million from France, and significant numbers from nearly every country where Facebook operates.

Facebook's Response (or Lack Thereof)

Facebook's response was widely criticized. The company argued that the data had been scraped, not breached, and that the vulnerability had been fixed in 2019. A company spokesperson stated: "This is old data that was previously reported on in 2019. We found and fixed this issue in August 2019."

The company did not notify affected users. There was no breach notification because Facebook maintained it was not a breach — it was scraping of publicly or semi-publicly available information through an intended feature.

This distinction between "breach" and "scraping" became a significant point of contention:

  • Regulators disagreed. Ireland's Data Protection Commission (DPC) opened an investigation into whether Facebook had complied with GDPR. The DPC later fined Meta 265 million euros in November 2022 specifically for this incident.
  • Security researchers pointed out that regardless of the terminology, the end result was identical: hundreds of millions of people had their personal data exposed without consent.
  • Users were not given tools to check if they were affected. Troy Hunt eventually added the dataset to Have I Been Pwned, making it searchable.

The Real-World Risk

Phone number exposure is particularly dangerous because phone numbers are:

  • Used for two-factor authentication. SIM swapping attacks become easier when attackers know which phone number is associated with which person and which services.
  • Used for account recovery. Many services allow password resets via SMS to a phone number.
  • Persistent identifiers. People change phone numbers far less frequently than email addresses or passwords.
  • Vectors for smishing. Targeted SMS phishing campaigns become far more effective when the attacker knows the target's name, location, and employer.

The combination of phone numbers with full names, locations, and employment information created a rich dataset for social engineering attacks. An attacker could craft a highly targeted phishing message: "Hi [Name], this is [Employer]'s IT department. We need you to verify your account at..."

The Scraping Problem at Scale

The Facebook data leak highlighted a broader problem that affects every platform with user-facing lookup or search functionality. Any feature that allows querying a database — even indirectly — can potentially be exploited for mass data collection if rate limiting and abuse detection are insufficient.

This is not unique to Facebook. Similar scraping incidents have affected LinkedIn, Twitter, Clubhouse, and numerous other platforms. The fundamental tension is between usability (letting users find connections) and security (preventing automated mass enumeration).

Effective defenses against scraping include:

  • Aggressive rate limiting on any endpoint that returns user data
  • CAPTCHA challenges that scale with request volume
  • Anomaly detection that identifies patterns consistent with automated enumeration
  • Limiting data returned in search and lookup features to the minimum necessary
  • Audit logging that makes large-scale scraping detectable

How Safeguard.sh Helps

Safeguard.sh helps organizations understand their data exposure and third-party risk posture. When your employees' personal data appears in leaks like the Facebook dataset, it becomes ammunition for social engineering attacks against your organization. Our platform monitors for credential and data exposure across known breach databases, provides visibility into which third-party services your organization depends on, and enforces security policies that account for the reality that personal data — including phone numbers used for MFA — may already be compromised. This kind of proactive monitoring is essential in a world where data scraped years ago can surface as a threat today.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.