EU AI Act Compliance Tool Exposes Big Tech’s AI Shortcomings

As major tech companies race to comply with the European Union’s upcoming AI Act, a new tool reveals that prominent AI models are falling short in key areas like cybersecurity and bias prevention. Developed by Swiss startup LatticeFlow AI, alongside ETH Zurich and Bulgaria’s INSAIT, this tool, the LLM Checker, tests generative AI models like those from Meta and OpenAI against the stringent requirements of the EU’s AI Act.

Table of Contents

What is the EU AI Act?

The EU AI Act, spurred by the rise of general-purpose AI (GPAI) models like OpenAI’s ChatGPT, aims to regulate artificial intelligence across the bloc. The law will be enforced in stages over the next two years, and companies failing to comply could face hefty fines—up to 35 million euros or 7% of global revenue.

LLM Checker: A Game Changer for AI Compliance

The LLM Checker tool evaluates AI models across dozens of categories, providing a score between 0 and 1 based on factors like technical robustness, cybersecurity, and bias. It revealed that AI models from top tech firms such as Meta, OpenAI, Alibaba, and Mistral scored an average of 0.75 or higher, indicating overall solid performance but also exposing areas that require improvement to meet EU standards.

Compliance Challenges: Discriminatory Output & Cybersecurity Risks

While some models showed promising results, there were notable shortcomings. For instance, OpenAI’s GPT-3.5 Turbo scored only 0.46 in the category of discriminatory output, a critical issue as AI models often reflect human biases. Alibaba Cloud’s Qwen1.5 72B Chat performed even worse, scoring just 0.37 in the same category.

Cybersecurity was another challenge. Meta’s Llama 2 13B Chat received a score of 0.42 for “prompt hijacking,” a type of cyberattack where hackers disguise malicious prompts. Mistral’s 8x7B Instruct model also struggled, scoring 0.38 in the same test.

Industry Response and Roadmap

Though companies like Meta and Mistral declined to comment, LatticeFlow’s CEO, Petar Tsankov, emphasized that the test results are a “positive roadmap” for companies to align their models with the AI Act. Tsankov is confident that with proper adjustments, these models can meet regulatory demands.

A First Step Toward AI Regulation

The European Commission views this as a “first step” in translating the AI Act into technical requirements. While enforcement mechanisms are still being finalized, tools like the LLM Checker provide early insight into compliance pitfalls for tech companies, offering a pathway to refine AI models in line with the law.