Jailbreaking ChatGPT: How Translating Harmful Prompts Helps Users Circumvent Restrictions

In a groundbreaking study, researchers at Brown University have discovered an ingenious new way to bypass the safety filters of advanced AI systems such as ChatGPT, and all it requires is a little Google Translate. By translating harmful prompts to obscure languages, like Scottish Gaelic or Zulu, the team was able to avoid its AI collaborator’s security systems 79% of the time.

Though this new method to gaming AI’s regulatory mechanism is undeniably clever, it’s the larger implications that are particularly insidious. By using a smattering of lesser-known languages, the team was able to demonstrate that when these translated prompts are then reinterpreted by a program, the underlying AI’s safety filters — primarily tuned to deal with more commonly spoken languages — are no match for the occasional bit of inappropriate content.

Of course, in the grand scheme of things, what with many such AI having also inadvertently-ironic “gender neutral” programs made default-assumptions that certain professions are male, the possible to altering the content moderation label — accidental-yet-happily-allowed incorrect categorization on a massive scale. Still, the researchers are keen to point out that their findings do have a couple of important takeaways. Mostly of which is that — in this case at least — AI has yet to become self-aware … sort of.