TODAY · 20 SIGNALS Last Update: 2026-06-30 23:12
#01

Commonplace: Self-hosted, privacy-tiered memory for your AI agents

Article URL: https://github.com/itsmeduncan/commonplace Comments URL: https://news.ycombinator.com/item?id=48740235 Points: 1 # Comments: 0

Hacker News AI /
#02

ScarfBench: Benchmarking AI Agents for Enterprise Java Framework Migration

ScarfBench: Benchmarking AI Agents for Enterprise Java Framework Migration

Hugging Face Blog /
#03

Recursive Self-Evolving Agents via Held-Out Selection

arXiv:2606.28374v1 Announce Type: new Abstract: LLM agents are increasingly improved without weight updates by evolving a natural-language artifact, such as reflections, workflo...

arXiv AI /
#04

GPTNT: Benchmarking Real-Time Collaboration Between Multimodal Agents on Keep Talking And Nobody Explodes

arXiv:2606.28514v1 Announce Type: new Abstract: Multimodal models are increasingly deployed to solve tasks collaboratively with humans or other artificial agents. Existing bench...

arXiv AI /
#05

IMCBench: A benchmark for multimodal LLMs in Image-grounded Medical Conversations

arXiv:2606.28556v1 Announce Type: new Abstract: Recent advances in large language models and vision-language models have enabled reasoning over multimodal data, offering opportu...

arXiv AI /
#06

Developmental Trajectories of Situation Modeling and Mentalizing in Transformer Language Models

arXiv:2606.28524v1 Announce Type: new Abstract: Recent work suggests that Large Language Models (LLMs) are sensitive to the belief states of agents described by text, as measure...

arXiv Computation and Language /
#07

A French OSCE Dialogue Dataset and Controllable Virtual Patient System for Clinical Training

arXiv:2606.28526v1 Announce Type: new Abstract: The clinical and communication skills of medical students are commonly assessed through Objective Structured Clinical Examination...

arXiv Computation and Language /
#08

AnTenA: Actionable and Explainable Tensor Analysis System with Large Language Models

arXiv:2606.28708v1 Announce Type: new Abstract: Accurately explaining hidden patterns in multi-aspect data has typically been done by leveraging labels and/or accompanying auxil...

arXiv Computation and Language /
#09

OpenAI frontier models and Codex are now available on AWS

OpenAI frontier models and Codex are now generally available on AWS, giving enterprises a new path to build with OpenAI through the AWS environments, controls, and procurement w...

OpenAI News /
#10

Databricks brings GPT-5.5 to enterprise agent workflows

Databricks uses GPT-5.5 for enterprise agent workflows after the model set a new state of the art on the OfficeQA Pro benchmark.

OpenAI News /
#11

Introducing GPT-5.4 mini and nano

GPT-5.4 mini and nano are smaller, faster versions of GPT-5.4 optimized for coding, tool use, multimodal reasoning, and high-volume API and sub-agent workloads.

OpenAI News /
#12

CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models

CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models

Hugging Face Blog /
#13

Introducing the Gemini 2.5 Computer Use model

Available in preview via the API, our Computer Use model is a specialized model built on Gemini 2.5 Pro’s capabilities to power agents that can interact with user interfaces.

Google DeepMind Blog /
#14

Show HN: Mimir – local-first encrypted memory for AI agents (single Rust binary)

Article URL: https://github.com/Perseus-Computing-LLC/mimir Comments URL: https://news.ycombinator.com/item?id=48739468 Points: 1 # Comments: 2

Hacker News AI /
#15

Claude Code Skills: 98 AI architectures, Haiku at 93% of Fable 5 quality

Article URL: https://github.com/GPire/claude-skills-swarm Comments URL: https://news.ycombinator.com/item?id=48740141 Points: 1 # Comments: 0

Hacker News AI /
#16

Is it agentic enough? Benchmarking open models on your own tooling

Is it agentic enough? Benchmarking open models on your own tooling

Hugging Face Blog /
#17

Gemma Scope 2: helping the AI safety community deepen understanding of complex language model behavior

Open interpretability tools for language models are now available across the entire Gemma 3 family with the release of Gemma Scope 2.

Google DeepMind Blog /
#18

Strengthening our Frontier Safety Framework

We’re strengthening the Frontier Safety Framework (FSF) to help identify and mitigate severe risks from advanced AI models.

Google DeepMind Blog /
#19

Search for Truth from Reasoning: A Dynamic Representation Editing Framework for Steering LLM Trajectories

arXiv:2606.28589v1 Announce Type: new Abstract: Current approaches to enhance Large Language Model (LLM) reasoning, such as Chain-of-Thought and "Wait" prompts, primarily encour...

arXiv AI /
#20

Aristotelian Virtue Profiling of LLMs through Ethical Dilemmas

arXiv:2606.28683v1 Announce Type: new Abstract: Large Language Models (LLMs) often face ethical tradeoffs in which several responses may be defensible but express different prio...

arXiv AI /