How do you stop an AI model turning Nazi? What the Grok drama reveals about AI training

What makes an AI ‘behave’ this way?

Pre-training

First, developers curate the data used during pre-training – the first step in building a chatbot. This involves not just filtering unwanted content, but also emphasising desired material.

GPT-3 was shown Wikipedia up to six times more than other datasets as OpenAI considered it higher quality. Grok is trained on various sources, including posts from X, which might explain why Grok has been reported to check Elon Musk’s opinion on controversial topics.

Musk has shared that xAI curates Grok’s training data, for example to improve legal knowledge and to remove LLM-generated content for quality control. He also appealed to the X community for difficult “galaxy brain” problems and facts that are “politically incorrect, but nonetheless factually true.” We don’t know if these data were used, or what quality-control measures were applied.

Fine-tuning

The second step, fine-tuning, adjusts LLM behaviour using feedback. Developers create detailed manuals outlining their preferred ethical stances, which either human reviewers or AI systems then use as a rubric to evaluate and improve the chatbot’s responses, effectively coding these values into the machine.

A Business Insider investigation revealed xAI’s instructions to human “AI tutors” instructed them to look for “woke ideology” and “cancel culture.”

While the onboarding documents said Grok shouldn’t “impose an opinion that confirms or denies a user’s bias”, they also stated it should avoid responses that claim both sides of a debate have merit when they do not.

System prompts

The system prompt – instructions provided before every conversation – guides behaviour once the model is deployed.

To its credit, xAI publishes Grok’s system prompts. Its instructions to “assume subjective viewpoints sourced from the media are biased” and “not shy away from making claims which are politically incorrect, as long as they are well substantiated” were likely key factors in the latest controversy.

These prompts are being updated daily at the time of writing, and their evolution is a fascinating case study in itself.

Guardrails

Finally, developers can also add guardrails – filters that block certain requests or responses. OpenAI claims it doesn’t permit ChatGPT “to generate hateful, harassing, violent or adult content”. Meanwhile, the Chinese model DeepSeek censors discussion of Tianamen Square.

Ad-hoc testing when writing this article suggests Grok is much less restrained in this regard than competitor products.

The transparency paradox

Grok’s Nazi controversy highlights a deeper ethical issue: would we prefer AI companies to be explicitly ideological and honest about it, or maintain the fiction of neutrality while secretly embedding their values?

Every major AI system reflects its creator’s worldview – from Microsoft Copilot’s risk-averse corporate perspective to Anthropic Claude’s safety-focused ethos. The difference is transparency.

Musk’s public statements make it easy to trace Grok’s behaviours back to Musk’s stated beliefs about “woke ideology” and media bias. Meanwhile, when other platforms misfire spectacularly, we’re left guessing whether this reflects leadership views, corporate risk aversion, regulatory pressure, or accident.

This feels familiar. Grok resembles Microsoft’s 2016 hate-speech-spouting Tay chatbot, also trained on Twitter data and set loose on Twitter before being shut down.

But there’s a crucial difference. Tay’s racism emerged from user manipulation and poor safeguards – an unintended consequence. Grok’s behaviour appears to stem at least partially from its design.

The real lesson from Grok is about honesty in AI development. As these systems become more powerful and widespread (Grok support in Tesla vehicles was just announced), the question isn’t whether AI will reflect human values. It’s whether companies will be transparent about whose values they’re encoding and why.

Musk’s approach is simultaneously more honest (we can see his influence) and more deceptive (claiming objectivity while programming subjectivity) than his competitors.

In an industry built on the myth of neutral algorithms, Grok reveals what’s been true all along: there’s no such thing as unbiased AI – only AI whose biases we can see with varying degrees of clarity.

Aaron J. Snoswell, Senior Research Fellow in AI Accountability, Queensland University of Technology

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Source link

How do you stop an AI model turning Nazi? What the Grok drama reveals about AI training

First Somvati Amavasya Of The Year Could Bring New Job, More Money, Career Growth For These 4 Zodiac Signs

India is country of innovation, says France President Macron at Bharat Innovates event

National Stock Exchange IPO: Draft Papers Expected Next Week, Reviving Listing Plans

Lack of coordination in handling Nipah outbreak, alleges Leader of the Opposition Pinarayi Vijayan

PM Modi and Trump to Hold Key Bilateral Meeting on June 17 During G7 Summit

UP: Woman alleges deception in relationship, pressure to convert; one arrested

Starmer to Meet Japan’s Takaichi as Fighter Jet Funding Sputters

‘His Mask & Wig Both Come Off’: Mahua Moitra Slams TMC MP Sudip Bandyopadhyay Over BJP Meeting In Delhi | India News

FIFA World Cup 2026 Points Table And Team Standings

Yogi condemns remarks against Akhilesh Yadav’s daughter

‘All we got to bury was a DNA sample’: British victim’s mother seeks answers a year after Air India crash | Ahmedabad News

Will not allow illegal oil shipments from Iran, U.S. tells India

पियवा किसनवा 90’S Old Hindi Songs🥰 90s Love Song😍 Udit Narayan, Alka Yagnik, Kumar Sanu songs Hindi

Sadabahar Hindi Songs Collection | 90s Hits Hindi Song |90s Evergreen Hindi Love Songs Audio Jukebox

90s Bollywood Wedding Songs | Evergreen Bollywood Hits | Shadi Song | Sadabahar Hindi Songs Jukebox

आज तो बाल बाल बच गया😄90’S Old Hindi Songs🥰 90s Love Song😍 Udit Narayan, Alka Yagnik, Kumar Sanu song

भाभी ने बचाई ननद की जान 😆 90’S Old Hindi Songs 🥺90s Love Song 😍Udit Narayan, Alka Yagnik, Kumar

When Online Love Becomes Real💞Chinese mix Hindi Songs💞Cin Klip💞Chinese Drama💞Korean Mix Hindi Songs

Cold Rude boy falling for cute girl 💕 korean mix hindi songs 💞 Chinese mix hindi songs

90s हिंदी सदाबहार गीत | 90’s Romantic Hindi Songs | 90’s सदाबहार फिल्मी गाने | 90’s Bollywood Songs

90’S Old Hindi Songs🥰 90s Love Song😍 Udit Narayan, Alka Yagnik, Kumar Sanu songs Hindi Jukebox

How do you stop an AI model turning Nazi? What the Grok drama reveals about AI training

Related Posts

Yogi condemns remarks against Akhilesh Yadav’s daughter

Best Air Fryer & Sandwich Maker Combos Under ₹7,000 | Tech News

DoJ Approves Paramount Skydance-Warner Bros. Deal, Cementing Ellison Family Control Of American Media

SpaceX IPO: Everything you need to know

9 Festivals to Celebratein August in India

Corruption cases against govt officials: SC bats for striking balance | Latest News India

Guru Randhawa – SIRRA ( Official Video )

Baharon Phool Barsao – Suraj – Rajendra Kumar, Vyjayanthimala – Old Hindi Songs

Phool Maangu Na Bahaar Maangu – Video Song | Raja | Madhuri Dixit & Sanjay Kapoor

Dil Ka Rishta Song – Aishwarya Rai,Arjun Rampal, Alka Yagnik,Udit Narayan,Kumar Sanu, Nadeem-Shravan

First Somvati Amavasya Of The Year Could Bring New Job, More Money, Career Growth For These 4 Zodiac Signs

India is country of innovation, says France President Macron at Bharat Innovates event

National Stock Exchange IPO: Draft Papers Expected Next Week, Reviving Listing Plans

Categories

Recent Posts