OpenAI frontier models and Codex are now available on AWS
OpenAI frontier models and Codex are now generally available on AWS, giving enterprises a new path to build with OpenAI through the AWS environments, controls, and procurement w...
Databricks brings GPT-5.5 to enterprise agent workflows
Databricks uses GPT-5.5 for enterprise agent workflows after the model set a new state of the art on the OfficeQA Pro benchmark.
Introducing GPT-5.4 mini and nano
GPT-5.4 mini and nano are smaller, faster versions of GPT-5.4 optimized for coding, tool use, multimodal reasoning, and high-volume API and sub-agent workloads.
CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models
CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models
Introducing the Gemini 2.5 Computer Use model
Available in preview via the API, our Computer Use model is a specialized model built on Gemini 2.5 Pro’s capabilities to power agents that can interact with user interfaces.
Show HN: Git Issues – versioned task management for AI agents
Article URL: https://steviee.github.io/git-issues/ Comments URL: https://news.ycombinator.com/item?id=48635995 Points: 1 # Comments: 0
Is it agentic enough? Benchmarking open models on your own tooling
Is it agentic enough? Benchmarking open models on your own tooling
A New Framework for Evaluating Voice Agents (EVA)
A New Framework for Evaluating Voice Agents (EVA)
Five Eyes joint statement on AI models taking down governments and businesses
Article URL: https://www.cisa.gov/news-events/news/five-eyes-cyber-security-agencies-statement Comments URL: https://news.ycombinator.com/item?id=48636626 Points: 1 # Comments: 0
Gemma Scope 2: helping the AI safety community deepen understanding of complex language model behavior
Open interpretability tools for language models are now available across the entire Gemma 3 family with the release of Gemma Scope 2.
Strengthening our Frontier Safety Framework
We’re strengthening the Frontier Safety Framework (FSF) to help identify and mitigate severe risks from advanced AI models.
Sakana AI Ships Fugu, an Orchestration Model Claiming Fable 5 Performance
Article URL: https://pokee.ai/blog/pokee-ai-daily-2026-06-22 Comments URL: https://news.ycombinator.com/item?id=48636012 Points: 5 # Comments: 1
Introducing GPT-5.2
GPT-5.2 is our most advanced frontier model for everyday professional work, with state-of-the-art reasoning, long-context understanding, coding, and vision. Use it in ChatGPT an...
Introducing next-generation audio models in the API
For the first time, developers can also instruct the text-to-speech model to speak in a specific way—for example, “talk like a sympathetic customer service agent”—unlocking a ne...
Inside Mirakl's agentic commerce vision
Mirakl is redefining commerce through AI agents and ChatGPT Enterprise—achieving faster documentation, smarter customer support, and building toward agent-native commerce with M...
Predicting model behavior before release by simulating deployment
OpenAI introduces Deployment Simulation, a method to predict AI model behavior before deployment using real conversation data to improve safety and evaluation accuracy.
OpenAI o3-mini System Card
This report outlines the safety work carried out for the OpenAI o3-mini model, including safety evaluations, external red teaming, and Preparedness Framework evaluations.
Operator System Card
Drawing from OpenAI’s established safety frameworks, this document highlights our multi-layered approach, including model and product mitigations we’ve implemented to protect ag...
Enterprises power agentic workflows in Cloudflare Agent Cloud with OpenAI
Cloudflare brings OpenAI’s GPT-5.4 and Codex to Agent Cloud, enabling enterprises to build, deploy, and scale AI agents for real-world tasks with speed and security.
ENEOS Materials brings ChatGPT Enterprise to manufacturing
ENEOS Materials uses ChatGPT Enterprise to speed research, improve plant design safety, and cut HR analysis time by 90%, with 80% reporting better workflows.