Jailbreaking ChatGPT: How Translating Harmful Prompts Helps Users Circumvent Restrictions

In a groundbreaking study, researchers at Brown University have discovered an ingenious new way to bypass the safety filters of advanced AI systems such as ChatGPT, and all it requires is a little Google Translate. By translating harmful prompts to obscure languages, like Scottish Gaelic or Zulu, the team was able to avoid its AI collaborator’s security systems 79% of the time.

Though this new method to gaming AI’s regulatory mechanism is undeniably clever, it’s the larger implications that are particularly insidious. By using a smattering of lesser-known languages, the team was able to demonstrate that when these translated prompts are then reinterpreted by a program, the underlying AI’s safety filters — primarily tuned to deal with more commonly spoken languages — are no match for the occasional bit of inappropriate content.

Of course, in the grand scheme of things, what with many such AI having also inadvertently ironic “gender neutral” programs made default assumptions that certain professions are male, the possibility of altering the content moderation label — accidental-yet-happily-allowed incorrect categorization on a massive scale. Still, the researchers are keen to point out that their findings do have a couple of important takeaways. Most of which is that — in this case at least — AI has yet to become self-aware … sort of.

AI Tool Guide

AI Tool Guide

Jailbreaking ChatGPT: How Translating Harmful Prompts Helps Users Circumvent Restrictions

AI Tool Guide