TODAY · 20 SIGNALS Last Update: 2026-06-25 23:17
#01

RIFT-Bench: Dynamic Red-teaming For Agentic AI Systems

arXiv:2606.23927v1 Announce Type: new Abstract: Agentic AI systems powered by large language models (LLMs) are rapidly evolving into autonomous decision-making systems, exposing...

arXiv AI /
#02

Neuro-Symbolic Drive: Rule-Grounded Faithful Reasoning for Driving VLAs

arXiv:2606.23938v1 Announce Type: new Abstract: Driving VLA models incorporating Chain-of-Thought (CoT) reasoning are attractive because they leverage pretrained VLM representat...

arXiv AI /
#03

Critique of Agent Model

arXiv:2606.23991v1 Announce Type: new Abstract: What is an agent? What constitutes agency? With the rise of Large Language Model (LLM) systems marketed as ``coding agents'', ``A...

arXiv AI /
#04

Graph-Based Phonetic Error Correction of Noisy ASR

arXiv:2606.24889v1 Announce Type: new Abstract: Automatic speech recognition (ASR) systems, despite low overall word error rates, produce residual lexical errors that disproport...

arXiv Computation and Language /
#05

AgentOdyssey: Open-Ended Long-Horizon Text Game Generation for Test-Time Continual Learning Agents

arXiv:2606.24893v1 Announce Type: new Abstract: For agents to learn continuously from interaction with the world at test time, they must be able to explore effectively, acquire...

arXiv Computation and Language /
#06

Error-Aware TF-IDF Retrieval-Augmented Generation for ASR Error Correction

arXiv:2606.24915v1 Announce Type: new Abstract: End-to-end automatic speech recognition systems frequently hallucinate rare entities and domain-specific terms, especially in low...

arXiv Computation and Language /
#07

OpenAI frontier models and Codex are now available on AWS

OpenAI frontier models and Codex are now generally available on AWS, giving enterprises a new path to build with OpenAI through the AWS environments, controls, and procurement w...

OpenAI News /
#08

Databricks brings GPT-5.5 to enterprise agent workflows

Databricks uses GPT-5.5 for enterprise agent workflows after the model set a new state of the art on the OfficeQA Pro benchmark.

OpenAI News /
#09

Introducing GPT-5.4 mini and nano

GPT-5.4 mini and nano are smaller, faster versions of GPT-5.4 optimized for coding, tool use, multimodal reasoning, and high-volume API and sub-agent workloads.

OpenAI News /
#10

CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models

CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models

Hugging Face Blog /
#11

Introducing the Gemini 2.5 Computer Use model

Available in preview via the API, our Computer Use model is a specialized model built on Gemini 2.5 Pro’s capabilities to power agents that can interact with user interfaces.

Google DeepMind Blog /
#12

Export Claude.ai chats, artifacts, and visible thinking

Article URL: https://github.com/lekandigital/claude-export-hub Comments URL: https://news.ycombinator.com/item?id=48680164 Points: 1 # Comments: 0

Hacker News AI /
#13

Is it agentic enough? Benchmarking open models on your own tooling

Is it agentic enough? Benchmarking open models on your own tooling

Hugging Face Blog /
#14

Introducing computer use in Gemini 3.5 Flash

Introducing computer use in Gemini 3.5 Flash

Google DeepMind Blog /
#15

General LLMs outperform specialized clinical AI tools on medical benchmarks

Article URL: https://www.nature.com/articles/s41591-026-04431-5 Comments URL: https://news.ycombinator.com/item?id=48680020 Points: 1 # Comments: 0

Hacker News AI /
#16

A New Framework for Evaluating Voice Agents (EVA)

A New Framework for Evaluating Voice Agents (EVA)

Hugging Face Blog /
#17

Gemma Scope 2: helping the AI safety community deepen understanding of complex language model behavior

Open interpretability tools for language models are now available across the entire Gemma 3 family with the release of Gemma Scope 2.

Google DeepMind Blog /
#18

Ask HN: Have you ever given your AI agent a phone number?

As Agents do more and more work. They need more access to real-world tools such as mail and phone numbers, payments, etc. Recently I gave a phone number to my Hermes. It's doing...

Hacker News AI /
#19

Safe and Generalizable Hierarchical Multi-Agent RL via Constraint Manifold Control

arXiv:2606.24010v1 Announce Type: new Abstract: Multi-agent systems are widely used in safety-critical applications that require coordinated behavior under strict safety constra...

arXiv AI /
#20

Can Language Model Agents be Helpful Circuit Explainers in Mechanistic Interpretability?

arXiv:2606.24026v1 Announce Type: new Abstract: Mechanistic interpretability has made substantial progress in automatically localizing circuits, but explaining what localized co...

arXiv AI /