OpenAI frontier models and Codex are now available on AWS
OpenAI frontier models and Codex are now generally available on AWS, giving enterprises a new path to build with OpenAI through the AWS environments, controls, and procurement w...
Databricks brings GPT-5.5 to enterprise agent workflows
Databricks uses GPT-5.5 for enterprise agent workflows after the model set a new state of the art on the OfficeQA Pro benchmark.
Introducing GPT-5.4 mini and nano
GPT-5.4 mini and nano are smaller, faster versions of GPT-5.4 optimized for coding, tool use, multimodal reasoning, and high-volume API and sub-agent workloads.
The AI Conundrum: We are living in highly subsidized, interesting times
If you trace the timeline of how LLMs went from a technologist's dream to early text-generation toys, to the world-shifting launch of ChatGPT, and finally to the daily drivers o...
CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models
CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models
Introducing the Gemini 2.5 Computer Use model
Available in preview via the API, our Computer Use model is a specialized model built on Gemini 2.5 Pro’s capabilities to power agents that can interact with user interfaces.
Compass – guardrails and a hard budget cap for AI coding agents
Article URL: https://github.com/dshakes/compass Comments URL: https://news.ycombinator.com/item?id=48623314 Points: 1 # Comments: 0
Conduit – Self-hosted Bitcoin Lightning payments for AI agents
Article URL: https://github.com/Jake1848/conduit Comments URL: https://news.ycombinator.com/item?id=48622458 Points: 1 # Comments: 0
Is it agentic enough? Benchmarking open models on your own tooling
Is it agentic enough? Benchmarking open models on your own tooling
A New Framework for Evaluating Voice Agents (EVA)
A New Framework for Evaluating Voice Agents (EVA)
Gemma Scope 2: helping the AI safety community deepen understanding of complex language model behavior
Open interpretability tools for language models are now available across the entire Gemma 3 family with the release of Gemma Scope 2.
Strengthening our Frontier Safety Framework
We’re strengthening the Frontier Safety Framework (FSF) to help identify and mitigate severe risks from advanced AI models.
Introducing GPT-5.2
GPT-5.2 is our most advanced frontier model for everyday professional work, with state-of-the-art reasoning, long-context understanding, coding, and vision. Use it in ChatGPT an...
Introducing next-generation audio models in the API
For the first time, developers can also instruct the text-to-speech model to speak in a specific way—for example, “talk like a sympathetic customer service agent”—unlocking a ne...
Inside Mirakl's agentic commerce vision
Mirakl is redefining commerce through AI agents and ChatGPT Enterprise—achieving faster documentation, smarter customer support, and building toward agent-native commerce with M...
Predicting model behavior before release by simulating deployment
OpenAI introduces Deployment Simulation, a method to predict AI model behavior before deployment using real conversation data to improve safety and evaluation accuracy.
OpenAI o3-mini System Card
This report outlines the safety work carried out for the OpenAI o3-mini model, including safety evaluations, external red teaming, and Preparedness Framework evaluations.
Operator System Card
Drawing from OpenAI’s established safety frameworks, this document highlights our multi-layered approach, including model and product mitigations we’ve implemented to protect ag...
Enterprises power agentic workflows in Cloudflare Agent Cloud with OpenAI
Cloudflare brings OpenAI’s GPT-5.4 and Codex to Agent Cloud, enabling enterprises to build, deploy, and scale AI agents for real-world tasks with speed and security.
ENEOS Materials brings ChatGPT Enterprise to manufacturing
ENEOS Materials uses ChatGPT Enterprise to speed research, improve plant design safety, and cut HR analysis time by 90%, with 80% reporting better workflows.