TODAY · 20 SIGNALS Last Update: 2026-07-01 23:15
#01

Show HN:An AI agent that applies to jobs for me (Playwright,GPT5.4 form filling)

Article URL: https://github.com/torontodeveloper/job-application-agent Comments URL: https://news.ycombinator.com/item?id=48752969 Points: 2 # Comments: 3

Hacker News AI /
#02

Show HN: CLI that helps AI agents avoid vulnerable dependencies

deptrust is a CLI that checks package versions for known vulnerabilities across npm, PyPI, crates.io, Go modules, RubyGems, NuGet, Maven, Packagist, pub.dev, CocoaPods, Hex.pm,...

Hacker News AI /
#03

What Drives Interactive Improvement from Feedback?

arXiv:2606.30774v1 Announce Type: new Abstract: We study when natural-language feedback produces improvement beyond the gains obtainable from repeated attempts alone. In multi-t...

arXiv AI /
#04

Contrastive Reflection for Iterative Prompt Optimization

arXiv:2606.30840v1 Announce Type: new Abstract: LLM agents are becoming central to information retrieval: they issue retrieval queries, synthesize answers, and increasingly serv...

arXiv AI /
#05

Beyond expert users: agents should help users construct preferences, not just elicit them

arXiv:2606.30863v1 Announce Type: new Abstract: Agents typically assume an expert user -- one with well-formed preferences about what they want -- and default to clarifying ques...

arXiv AI /
#06

When Calibration Rankings Reverse: Accuracy-Controlled Evaluation for Fair Comparison of LLMs

arXiv:2606.30814v1 Announce Type: new Abstract: Calibration evaluates whether a model confidence aligns with its empirical accuracy. Existing studies often compare the calibrati...

arXiv Computation and Language /
#07

Bridging Scientific Heritage: An Arabic--Russian Parallel Corpus and LLM Benchmark for Sustainable Knowledge Transfer

arXiv:2606.30943v1 Announce Type: new Abstract: Russian and Arabic are among the major languages of scientific communication. Language barriers impede the exchange of research r...

arXiv Computation and Language /
#08

Truth or Sophistry? LoFa: A Benchmark for LLM Robustness Against Logical Fallacies

arXiv:2606.31039v1 Announce Type: new Abstract: Large Language Models (LLMs) exhibit strong semantic capabilities, yet their resilience to manipulative linguistic patterns such...

arXiv Computation and Language /
#09

ScarfBench: Benchmarking AI Agents for Enterprise Java Framework Migration

ScarfBench: Benchmarking AI Agents for Enterprise Java Framework Migration

Hugging Face Blog /
#10

OpenAI frontier models and Codex are now available on AWS

OpenAI frontier models and Codex are now generally available on AWS, giving enterprises a new path to build with OpenAI through the AWS environments, controls, and procurement w...

OpenAI News /
#11

Databricks brings GPT-5.5 to enterprise agent workflows

Databricks uses GPT-5.5 for enterprise agent workflows after the model set a new state of the art on the OfficeQA Pro benchmark.

OpenAI News /
#12

Introducing GPT-5.4 mini and nano

GPT-5.4 mini and nano are smaller, faster versions of GPT-5.4 optimized for coding, tool use, multimodal reasoning, and high-volume API and sub-agent workloads.

OpenAI News /
#13

CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models

CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models

Hugging Face Blog /
#14

Introducing the Gemini 2.5 Computer Use model

Available in preview via the API, our Computer Use model is a specialized model built on Gemini 2.5 Pro’s capabilities to power agents that can interact with user interfaces.

Google DeepMind Blog /
#15

Is it agentic enough? Benchmarking open models on your own tooling

Is it agentic enough? Benchmarking open models on your own tooling

Hugging Face Blog /
#16

Gemma Scope 2: helping the AI safety community deepen understanding of complex language model behavior

Open interpretability tools for language models are now available across the entire Gemma 3 family with the release of Gemma Scope 2.

Google DeepMind Blog /
#17

Strengthening our Frontier Safety Framework

We’re strengthening the Frontier Safety Framework (FSF) to help identify and mitigate severe risks from advanced AI models.

Google DeepMind Blog /
#18

Show HN: Simulate what AI agents do to an engineering org (no signup)

Article URL: https://www.orgonaut.co/tools/agentic-reorg-simulator/ Comments URL: https://news.ycombinator.com/item?id=48753823 Points: 2 # Comments: 0

Hacker News AI /
#19

Investigating Multi-Agent Deliberation in Law

arXiv:2606.30906v1 Announce Type: new Abstract: Artificial Intelligence is increasingly applied to the field of law, and has the potential to increase access to justice. One par...

arXiv AI /
#20

Why Solve It Twice? Hierarchical Accumulation of Skills for Transfer-Efficient ML Engineering

arXiv:2606.30911v1 Announce Type: new Abstract: ML engineering agents waste compute rediscovering known techniques because every competition is a cold start. We present HASTE, a...

arXiv AI /