News

Global privacy regulators update guidance on protecting against unauthorized data scraping

11 November 2024

Following discussions with leading technology companies, sixteen regulators published a statement outlining recommendations for how organizations can protect against unauthorized data scraping and comply with applicable data protection and privacy laws when sharing data with third parties. While many of the recommendations reiterated prior statements, regulators raised a few new considerations and highlighted their continued attention on data scraping activities globally.

On October 28, 2024, sixteen international regulators who are members of the Global Privacy Assembly’s International Enforcement Cooperation Working Group (IWEG) co-signed a Concluding Statement building on the Joint statement on data scraping and the protection of privacy (the Initial Statement) from August 24, 2023. The signatories include representatives from twelve of the signatories of the Initial Statement, with four new representatives—the Commissioner of the Office of the Data Protection Authority in Guernsey, Director of the Agencia Española de Protección de Datos in Spain, Président of the Commission de Contrôle des Informations Nominatives in Monaco, and Commissioner of the Privacy Protection Authority in Israel.

In the Initial Statement, discussed in more detail in a prior blog post, regulators sought feedback from leading technology companies, including Alphabet Inc. (YouTube), ByteDance Ltd. (TikTok), Meta Platforms, Inc. (Instagram, Facebook and Threads), Microsoft Corporation (LinkedIn), Sina Corp (Weibo), and X Corp. (X, previously Twitter), inviting them to comment on how they follow the recommendations outlined in the Initial Statement. This Concluding Statement now outlines lessons learned from these industry discussions, as well as additional guidance for organizations that host publicly accessible personal data.

Key Takeaways

Regulators outlined several key takeaways from their engagement with industry stakeholders, including:

Take ongoing steps to prevent unlawful scraping: Regulators advise organizations that host personal data to effectively protect against unlawful scraping by deploying a combination of safeguarding measures that are regularly reviewed and updated to keep pace with advances in scraping techniques and technologies.
Leverage AI to combat AI-powered scraping: While AI is used by some sophisticated data scrapers to evade detection, it can also represent part of the solution, serving to enhance protections against unlawful scraping.
Organizations of all sizes might consider taking measures: The recommendation to take measures to protect against unauthorized scraping applies to both large corporations and Small and Medium Enterprises (SMEs). There are lower-cost measures that SMEs can implement, such as bot detection, rate limiting and CAPTCHAs, with assistance from service providers, to do so.
Contractual language alone may not be sufficient: Where social media companies (SMCs) and other organizations contractually authorize scraping of personal data from their platforms, those contractual terms alone, while a helpful safeguard, may not ensure that scraping is appropriate. Organizations may strengthen existing contractual terms by specifying limitations on the information that may be scraped and the purposes for which it may be used, as well as the consequences for non-compliance with those terms. The statement recommends that organizations have a lawful basis for granting access or permitting collection of personal data, be transparent about the scraping they allow, and obtain consent where required by law.
Provide responsible methods for permissible scraping: When an organization grants lawful permission for third parties to collect publicly accessible personal data from its platform, providing such access via an Application Programming Interface (API) can allow the organization greater control over the data through the use of credentials and the logging and monitoring of data activity. This control can facilitate the detection and apprehension of unauthorized scraping.
Comply with existing data protection laws for all web scraping activities: SMCs and other organizations that use scraped data sets and/or use data from their own platforms to train AI, such as Large Language Models, must comply with data protection and privacy laws as well as any AI-specific laws where those exist.

Updates from Initial Statement

Much of the Concluding Statement aligned with the Initial Statement, including the recommendation to have a multifaceted approach to data protection and the focus on the presence of anti-scraping measures at all organizations, not just SMCs. However, there were a few notable differences

Increased attention on AI

While AI was not explicitly mentioned in the Initial Statement, it is a clear focal point for regulators in the Concluding Statement. The rapid growth of AI has been both a pain point and an opportunity for data privacy protection programs. The regulators caution organizations to look out for AI tools that can scrape data more efficiently (e.g., via “intelligent” bots that can simulate real user activity) and recommend exploring the use of AI tools in data protection to combat the risks posed by AI tools that aid in unauthorized scraping or may involve nefarious tools. The regulators also expressly discuss the use of scraped data, including data from an organization’s own platforms, to train AI. Reminding organizations to comply with data protection and privacy laws, as well as any other AI-specific laws where they exist, the regulators also recommend that organizations comply recent guidelines on AI and data scraping by various data protection authorities and international organizations, such as the 2023 Global Privacy Assembly Resolution on Generative Artificial Intelligence Systems, Roundtable of G7 Data Protection and Privacy Authorities 2023 Statement on Generative AI, the Hiroshima Process International Code of Conduct for Advanced AI Systems, and guidelines from the Dutch, Italian, and UK DPAs.

Focus on the organization’s role, not the individual’s role

Unlike the Initial Statement, the Concluding Statement exclusively focuses on the role of organizations, rather than the role of individuals, in apprehending unauthorized scraping. The Initial Statement described steps that both organizations and individuals can take to protect against such scraping. The Concluding Statement only discusses what organizations can do to protect against unlawful scraping, not what individuals can do. This may indicate a shift away from focusing on individual accountability for unlawful scraping towards exclusive organizational accountability and responsibility to prevent individuals’ data from being scraped without consent. While it is still important for individuals to take precautions against unlawful scraping, organizations may want to focus their efforts on building out a robust, company-wide data protection plan to align with regulators’ guidelines.

Authorized data sharing may pose a threat to data security

Unlike the Initial Statement, which focused primarily on unlawful scraping conducted by a third party, the Concluding Statement expands discussion of potential data security risks to dangers that exist when organizations are authorizing the sharing of data, either because it is required by law or because the scraping furthers their commercial goals. The regulators are cognizant of the fact that, even when the sharing is authorized, personal data may still be used in violation of laws requiring consent or limited uses of such data. In light of this development, the statement advises organizations to put safeguards in place to protect personal data when sharing it with third parties, including using APIs, building out strong contractual terms, and monitoring the amount taken and uses of any shared data.

Impact for Organizations

Ultimately, this Concluding Statement emphasizes the continued recommendation for all organizations hosting publicly available personal data or scraping such data to have a robust data protection program, including the use of AI where possible. In all scenarios, organizations must follow the applicable laws, but when using scraped data to train AI, the regulators recommend that organizations also follow any applicable principles or guidelines. Additionally, when voluntarily sharing data with third parties, the statement advises that organizations take security measures beyond mere contractual terms, and to use APIs where possible to protect user data in authorized sharing scenarios.

Authored by Nathan Salminen, Alyssa Golay, and Emma Kotfica.