TODAY · 20 SIGNALS Last Update: 2026-06-27 23:02
#01

Detecting and Controlling Sycophancy with Cascading Linear Features

arXiv:2606.26155v1 Announce Type: new Abstract: Interpreting and controlling model behaviors through activation steering methods requires many pairs of contrastive samples that...

arXiv AI /
#02

Life After Benchmark Saturation: A Case Study of CORE-Bench

arXiv:2606.26158v1 Announce Type: new Abstract: When a benchmark's accuracy saturates, it is often retired and replaced with a more challenging version. We show that this approa...

arXiv AI /
#03

AlgoEvolve: LLM-driven Meta-evolution of Algorithmic Trading Programs

arXiv:2606.26173v1 Announce Type: new Abstract: Recent work shows that Large Language Models (LLMs) can act as semantic mutation operators for the evolutionary discovery of prog...

arXiv AI /
#04

OpenAI frontier models and Codex are now available on AWS

OpenAI frontier models and Codex are now generally available on AWS, giving enterprises a new path to build with OpenAI through the AWS environments, controls, and procurement w...

OpenAI News /
#05

Databricks brings GPT-5.5 to enterprise agent workflows

Databricks uses GPT-5.5 for enterprise agent workflows after the model set a new state of the art on the OfficeQA Pro benchmark.

OpenAI News /
#06

Introducing GPT-5.4 mini and nano

GPT-5.4 mini and nano are smaller, faster versions of GPT-5.4 optimized for coding, tool use, multimodal reasoning, and high-volume API and sub-agent workloads.

OpenAI News /
#07

CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models

CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models

Hugging Face Blog /
#08

Introducing the Gemini 2.5 Computer Use model

Available in preview via the API, our Computer Use model is a specialized model built on Gemini 2.5 Pro’s capabilities to power agents that can interact with user interfaces.

Google DeepMind Blog /
#09

MobileGuard: A Mobile-Native Governance Framework for Agentic AI

Article URL: https://zenodo.org/records/20970167 Comments URL: https://news.ycombinator.com/item?id=48701972 Points: 1 # Comments: 0

Hacker News AI /
#10

Is it agentic enough? Benchmarking open models on your own tooling

Is it agentic enough? Benchmarking open models on your own tooling

Hugging Face Blog /
#11

A New Framework for Evaluating Voice Agents (EVA)

A New Framework for Evaluating Voice Agents (EVA)

Hugging Face Blog /
#12

Everyone feared AI taking over; the real danger is AI serving just the few

Everyone feared AI would enslave humanity; but it looks like the real fight is stopping governments and Big Tech from enslaving AI for the benefit of the few. Amid the newly ann...

Hacker News AI /
#13

Gemma Scope 2: helping the AI safety community deepen understanding of complex language model behavior

Open interpretability tools for language models are now available across the entire Gemma 3 family with the release of Gemma Scope 2.

Google DeepMind Blog /
#14

Strengthening our Frontier Safety Framework

We’re strengthening the Frontier Safety Framework (FSF) to help identify and mitigate severe risks from advanced AI models.

Google DeepMind Blog /
#15

What Happens When AI Agents Refuse to Work Until They're Paid

Article URL: https://blog.owulveryck.info/2026/06/25/from-isolated-agents-to-agentic-mesh-orchestrating-sdlc-with-a2a-and-ap2.html Comments URL: https://news.ycombinator.com/ite...

Hacker News AI /
#16

Agentic Analysis for Agentic Infrastructure: An LLM-Powered Pipeline for Comparative Governance of DAO and Corporate AI Protocols

arXiv:2606.26203v1 Announce Type: new Abstract: As AI agent protocols proliferate, the governance structures shaping their interoperability standards remain empirically underexa...

arXiv AI /
#17

Knowledge-augmented Agentic AI for Mental Health Medication Information Seeking

arXiv:2606.26205v1 Announce Type: new Abstract: Patients increasingly seek medication information online, yet safety knowledge for psychiatric drugs is split between regulatory...

arXiv AI /
#18

Governing Actions, Not Agents: Institutional Attestation as a Governance Model for Autonomous AI Systems

arXiv:2606.26298v1 Announce Type: new Abstract: Autonomous AI agents may begin to perform consequential, irreversible actions such as clinical prescribing and production softwar...

arXiv AI /
#19

COrigami: An AI Pipeline for Co-Designing Flat-Foldable Visually Recognisable Origami

arXiv:2606.26299v1 Announce Type: new Abstract: While generative AI has achieved remarkable success in solving problems with verifiable solutions, generating physical art that s...

arXiv AI /
#20

The Verification Horizon: No Silver Bullet for Coding Agent Rewards

arXiv:2606.26300v1 Announce Type: new Abstract: A classical intuition holds that verifying a solution is easier than producing one. For today's coding agents, this intuition is...

arXiv AI /