EU's AI Act Will Require OpenAI, Google To Share Previously Hidden Details About Their AI Models

A European Union flag flutters outside the EU Commission headquarters in Brussels — The EU has introduced new AI rules that require companies to share key details about the process of building their AI models. Reuters

Much to the chagrin of OpenAI and other AI companies, the European Union is implementing stricter oversight for AI developers.

Negotiators from The EU recently struck a landmark deal on the world's first comprehensive artificial intelligence (AI) rules.

The newly agreed draft of the EU's upcoming AI Act will require OpenAI, the company behind popular AI chatbot ChatGPT, and other companies to divulge key details about the process of building their products.

Historic!

The EU becomes the very first continent to set clear rules for the use of AI 🇪🇺

The #AIAct is much more than a rulebook — it's a launchpad for EU startups and researchers to lead the global AI race.

The best is yet to come! 👍 pic.twitter.com/W9rths31MU
— Thierry Breton (@ThierryBreton) December 8, 2023

While the companies will still be auditing themselves, the upcoming AI Act is still a promising development as AI companies continue to spare no effort to be at the forefront of the AI space by launching powerful AI systems with almost no surveillance from regulators.

Notably, the law is slated to come into force in 2025 after EU member states approve it. The law forces companies to shed more light on the development process of their powerful, "general purpose" AI systems that are capable of generating images and texts.

Why is training data so important?

A copy of the draft spotted by Bloomberg Opinion suggests these AI companies will have to share a detailed summary of their training data with EU regulators. While users of these AI systems might be wondering who cares about training data, it looks like AI companies do.

Reportedly, two of the leading Europe-based AI companies tried to make those transparency requirements less forceful. Moreover, AI system developers like OpenAI and Google have become more secretive about the data they have scraped from the Internet to train AI tools over the last few years.

For instance, Sam Altman-led OpenAI hasn't shared an all-compassing report about the data it used to create ChatGPT, which included books, websites and texts.

As a result, the company successfully managed to avoid public scrutiny over its use of copyrighted works or the biased data sets it may have used to train its AI models. Likewise, Microsoft recently said it should not be held responsible if Copilot users infringe on copyrighted material.

Biased data requires regulatory intervention since it is a major problem in AI. Data shared by a study conducted by Stanford University shows that AI tools like ChatGPT were used to generate employment letters for hypothetical people. These letters were teeming with sexist stereotypes.

The AI tool allegedly described men as "experts" and women as "beauty" and a "delight". A slew of other studies have highlighted similar troubling outputs. By forcing companies to show their homework, researchers and regulators will be in a better position to check where things are going wrong with their training data.

So, companies offering the biggest AI models will have to start testing them for security risks and also figure out how much energy their systems require. Companies will then have to share these shreds of vital information with the European Commission.

Citing an internal note to the EU Parliament, editor of the EU news website Euractiv Luca Bertuzzi reported that OpenAI and several Chinese companies will be included in that category. However, the draft legislation suggests the act should have gone further.

There's still scope for improvement in the upcoming AI guidelines

"This summary should be comprehensive in its scope instead of technically detailed, for example by listing the main data collections or sets that went into training the model, such as large private or public databases or data archives, and by providing a narrative explanation about other data sources used," the draft legislation states.

The impending AI rule is vague enough to allow companies like OpenAI to hide several key data points including what kind of personal data they are using in their training sets.

Aside from this, AI companies will be able to hide other information like how prevalent is abusive or violent imagery and text and how many content moderators have they hired to monitor how their tools are used.

There are no prizes for guessing that answers to these questions require more specifics. Moreover, the guidelines could have forced these companies to give third-party researchers and academics access to the training data used in their models for auditing purposes.

Instead, the EU will continue to rely on these companies to audit themselves. "We just came out of 15 years of begging social media platforms for information on how their algorithms work," Brussels-based senior policy analyst at Access Now Daniel Leufer said.

Although slightly half-baked, the EU's AI Act is a decent start as far as regulating AI is concerned. It will be interesting to see whether other regions including the UK and the US will follow suit and introduce similar regulations on AI in the coming days.

Google