OpenAI o1 when it FAILS

My very first view at the new OpenAI o1 – preview, as published yesterday, especially for science and advanced reasoning. You are live with my very first encounter and tests of OpenAI o1.

Unfortunately I hid the max quota of OpenAI o1 real fast, but recorded my tests on OpenAI. And this video presents my chronological questions and test, exploring OpenAI o1 for the first time.

My test was a simple test to generate thematic topic clusters of the latest 70 AI research papers of yesterday, with OpenAI o1-preview model. Normally that would involve a SBERT sentence transformer model, with a domain specific tokenizer, a dimensional reduction with UMAP from a high-dimensional vector space and further optimizations, since all the text were on a brand new research topic, unseen by any AI system.

Q: Why my title with the word “FAIL” in it?
A: Given my uploaded text segment of 70 scientific titles and technical annexes, o1 failed in the first attempt and reported back that it detected only 8 papers. This is a clear fail. I have the hypothesis, that then another agent network activated, kind of a “give him 1 agent at first and only when necessary, activate the whole fleet”. IF o1 has an internal self-validation check, this should have not happened. If a human has continuously to monitor and evaluate each response by AI, it gets boring real soon and I could do the thinking myself. And any economic thought of an industrial AI application – for profit oriented companies – vanishes.

Although not perfect, an intelligent tool – especially for science to examine cross-discipline publications and uncover thematic insights.

By the way, the recorded failure of OpenAi o1 to immediately recognize that the text contains 70 research papers, could theoretically indicate the some agents were activated in the second attempt and then OmniOne succeeded. But we have to wait for the technical paper by OpenAI.

Nice @OpenAI

All rights with me. Looking forward to continue my test since this was only a first look and in no way a real profound testing regime. Therefore try it out yourself, why not leave your impressions in the comments and next time we all know a bit more about the real performance of OpenAI o1. The tactic, that we are only allowed real limited access and then have to wait for weeks (!) for further testing is not helping the community to assess the performance of OmniOne. So let us be patient. Smile.

#airesearch
#chatgpt
#airesearchlab #openai

OpenAI o1 when it FAILS

“Wait, I'm using OpenAI Structured Output wrong ?!” – Advanced Structured Output tutorial

OpenAI Structured Output – All You Need to Know

3 Ways to Make a Custom AI Assistant | RAG, Tools, & Fine-tuning

End to end RAG LLM App Using Llamaindex and OpenAI- Indexing and Querying Multiple pdf's

Build an AI Chatbot With NestJS & Next.js | OpenAI Full Stack

OpenAI o1 for trading bots is unfair

How to build an app with OpenAI and Blazor

OpenAI Swarm Agents: Detailed Tutorial & Code Walkthrough

AIPressRoom Exclusive | Assaf on Revolutionizing Research with GPT Researcher

AIPressRoom Exclusive | Thomas Bradley on Enhancing the Home Cooking Experience with Drizzlelemons

AIPressRoom Exclusive | Hermann on Transforming Tech Funding with PitchMastr

AIPressRoom Exclusive | David Smith on Transforming Digital Documentation with EasyFill.ai

AI Face Swap Online (No Sign Up, Free) (aifaceswapper.io)

OpenAI’s New “Strawberry” AI Is Still Making Idiotic Mistakes

Build a Custom AI Block in WordPress with OpenAI Integration

Log In

With social network:

Or with username:

Sign In

Forgot password?

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections