I find it almost more fascinating when GPT-o fails a task (and how it fails) than when it gets things right.
Here I gave it a task to come up with size comparisons between different things.
Looking back, that seems to be out of scope of an LLM like that. It's a predictive model after all. It does not visualize things and do math in its head (like a human would) before coming up with comparisons.
If it doesn't have prior examples for a specific use case garnered from around the web and other documents (here: creating fresh new comparisons between things), how would it be able to do that?
I think now that all comparisons that it came up with for the song are probably inaccurate, unless they are quotes from somewhere.
If every dollar Jeff Bezos owns was a grain of rice, would it really fill the Statue of Liberty twice? Or only one time? Or ten times? There is not really a way for me to verify.
The tricky part is now - how would an non-tech-savvy person know what tasks an LLM is good for, and which tasks are not appropriate? It will give confident answers either way.
The finished song is here by the way: "The Comparison Song".
Anyway. I think it's really fascinating that e.g. an LLM can be very good at college physics - as the calculations are virtually based on pre-established patterns.
On the other hand, from my observations now, I think general LLMs would fail combining physics tasks combined with other goals - such as aesthetics, usability, etc. (the way e.g. home contractors use physics).
Anyway, I rambled on too long. Check out the song, as I found it to be weirdly mesmerizing once the lyrics were set to music...