Weird behavior observed in GPT-o
October 2, 2024
Watch on YouTube (opens in new tab)

Here is a (what I find) fascinating case how ChatGPT can be completely wrong... then continues to be wrong again and again, while claiming that it understood the instructions perfectly and came up with correct results.

When called out, it backtracks, getting even the backtracking wrong ("confessing" that it made a mistake about something that wasn't actually a mistake).

My nerdy me finds this endlessly entertaining!

There is also something disconcerting about it, as - very tangentially - it is related to the alignment problem. (Not really maybe but hear me out...)

Examples I saw on Robert Miles' excellent YouTube channel: Ask an AI to fabricate as many paper clips as it can, or to bring you a cup of coffee, and it might churn up the whole world's resources to create paper clips, or run over a baby, dog, or anything else in its way to bring you that coffee. (Just speaking from memory here.) Even when you try to give it instructions not to do harm or to align with human values, it might ignore those instructions, when it feels that completing the goal has priority.

Same here... I gave it instructions to please make sure the map path should be contiguous ("guardrails" that should narrow down what it does). But it also needed to incorporate funny street names. So it just started creating imaginary paths with wildly odd street names, and ignored the other part of the instructions.

Maybe I'm off base here... maybe finding continuous paths on a U.S. map is just not in GPT-o's training. So the "alignment" aspect could be a stretch.

Still, intuitively, it feels similar... like GPT-o ignored what I said and kept insisting it found the right solution, eager to get the points for "task successfully completed" i.e. finding driving directions that made me laugh.

Also weird: if it's not in its programming to find connected streets or give directions, why does it claim that this is something it can do?

What are other tasks you gave ChatGPT where you experienced similar behavior? Leave some in the YouTube comments 🙂