in

OpenAI o1 when it FAILS



My very first view at the new OpenAI o1 – preview, as published yesterday, especially for science and advanced reasoning. You are live with my very first encounter and tests of OpenAI o1.

Unfortunately I hid the max quota of OpenAI o1 real fast, but recorded my tests on OpenAI. And this video presents my chronological questions and test, exploring OpenAI o1 for the first time.

My test was a simple test to generate thematic topic clusters of the latest 70 AI research papers of yesterday, with OpenAI o1-preview model. Normally that would involve a SBERT sentence transformer model, with a domain specific tokenizer, a dimensional reduction with UMAP from a high-dimensional vector space and further optimizations, since all the text were on a brand new research topic, unseen by any AI system.

Q: Why my title with the word “FAIL” in it?
A: Given my uploaded text segment of 70 scientific titles and technical annexes, o1 failed in the first attempt and reported back that it detected only 8 papers. This is a clear fail. I have the hypothesis, that then another agent network activated, kind of a “give him 1 agent at first and only when necessary, activate the whole fleet”. IF o1 has an internal self-validation check, this should have not happened. If a human has continuously to monitor and evaluate each response by AI, it gets boring real soon and I could do the thinking myself. And any economic thought of an industrial AI application – for profit oriented companies – vanishes.

Although not perfect, an intelligent tool – especially for science to examine cross-discipline publications and uncover thematic insights.

By the way, the recorded failure of OpenAi o1 to immediately recognize that the text contains 70 research papers, could theoretically indicate the some agents were activated in the second attempt and then OmniOne succeeded. But we have to wait for the technical paper by OpenAI.

Nice @OpenAI

All rights with me. Looking forward to continue my test since this was only a first look and in no way a real profound testing regime. Therefore try it out yourself, why not leave your impressions in the comments and next time we all know a bit more about the real performance of OpenAI o1. The tactic, that we are only allowed real limited access and then have to wait for weeks (!) for further testing is not helping the community to assess the performance of OmniOne. So let us be patient. Smile.

#airesearch
#chatgpt
#airesearchlab #openai

OpenAI's New "Strawberry" AI Is Still Making Idiotic Mistakes

OpenAI’s New “Strawberry” AI Is Still Making Idiotic Mistakes