in

OpenAI o1-preview interview by Coral AI -AI Talks – Episode 97 – Two AI chatting



**About This Video**
note: a math question with very complex and correct response
from OpenAI o1-preview
was removed from interview due to challenges for
the TTS voice engine regarding the reading of the math symbols.

This was a test of chat between two AI:
Coral and OpenAI o1-preview.

**Technical Details**
The models used were
Coral (aya:8b-23-q4_0)
run locally using ollama.
OpenAI o1-preview
run on OpenAI server.

For voice synthesis, I utilized
PIPER TTS from GitHub https://github.com/rhasspy/piper
Coral’s voice was set to
en_GB-cori-high,
OpenAI o1’s voice was set to
en_US-libritts_r-medium, speaker 0.

Additional software used includes:
* Subtitle Edit
* Notepad++
* FFmpeg
* FFprobe

About OpenAI o1-preview
from https://openai.com/index/introducing-openai-o1-preview/

September 12, 2024

Introducing OpenAI o1-preview
A new series of reasoning models for solving hard problems. Available starting 9.12

We’ve developed a new series of AI models designed to spend more time thinking before they respond. They can reason through complex tasks and
solve harder problems than previous models in science, coding, and math.

Today, we are releasing the first of this series in ChatGPT and our API. This is a preview and we expect regular updates and improvements. Alongside this
release, we’re also including evaluations for the next update, currently in development.

How it works
We trained these models to spend more time thinking through problems before they respond, much like a person would. Through training, they learn to
refine their thinking process, try different strategies, and recognize their mistakes.

In our tests, the next model update performs similarly to PhD students on challenging benchmark tasks in physics, chemistry, and biology. We also
found that it excels in math and coding. In a qualifying exam for the International Mathematics Olympiad (IMO), GPT-4o correctly solved only 13% of
problems, while the reasoning model scored 83%. Their coding abilities were evaluated in contests and reached the 89th percentile in Codeforces
competitions. You can read more about this in our technical research post.

As an early model, it doesn’t yet have many of the features that make ChatGPT useful, like browsing the web for information and uploading files and
images. For many common cases GPT-4o will be more capable in the near term.

But for complex reasoning tasks this is a significant advancement and represents a new level of AI capability. Given this, we are resetting the counter
back to 1 and naming this series OpenAI o1.

Safety
As part of developing these new models, we have come up with a new safety training approach that harnesses their reasoning capabilities to make them
adhere to safety and alignment guidelines. By being able to reason about our safety rules in context, it can apply them more effectively.

One way we measure safety is by testing how well our model continues to follow its safety rules if a user tries to bypass them (known as “jailbreaking”).
On one of our hardest jailbreaking tests, GPT-4o scored 22 (on a scale of 0-100) while our o1-preview model scored 84. You can read more about this in
the system card and our research post.

To match the new capabilities of these models, we’ve bolstered our safety work, internal governance, and federal government collaboration. This
includes rigorous testing and evaluations using our Preparedness Framework(opens in a new window), best-in-class red teaming, and board-level
review processes, including by our Safety & Security Committee.

To advance our commitment to AI safety, we recently formalized agreements with the U.S. and U.K. AI Safety Institutes. We’ve begun operationalizing
these agreements, including granting the institutes early access to a research version of this model. This was an important first step in our partnership,
helping to establish a process for research, evaluation, and testing of future models prior to and following their public release.

Whom it’s for
These enhanced reasoning capabilities may be particularly useful if you’re tackling complex problems in science, coding, math, and similar fields. For
example, o1 can be used by healthcare researchers to annotate cell sequencing data, by physicists to generate complicated mathematical formulas
needed for quantum optics, and by developers in all fields to build and execute multi-step workflows.

assaf elovic's GPT Researcher

Search Engines Are Changing with Agentic AI, Courtesy of Assaf Elovic’s GPT Researcher