AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

67 Papers today
8h Update frequency
7 Days of history
A Stationary-Distribution Theory for Triplet-Based Plateau Search in Random Forest Ensemble-Size Selection
Andrey A. Dukhovny, Andrey M. Lange
Theory Optimization
  • Introduces a stationary-distribution theory for ensemble-size selection in Random Forests.
  • Models the ensemble size selection process as a birth-death Markov chain.
  • Demonstrates that the central ensemble size fluctuates around a stationary regime.
  • Derives key scaling relationships for the stationary center and variance of ensemble size.
Read more
Fork-Think with Confidence
Zena Al-Khalili, Rafi Hakim, Dietrich Klakow, Ji-Ung Lee
NLP Large Language Models Efficient ML
  • Introduces a new decide-first-then-think paradigm for LLM reasoning.
  • Demonstrates significant reductions in token consumption and runtime compared to traditional methods.
  • Shows that later forking points can lead to improved generation quality.
  • Combines Fork-think with existing techniques like early stopping for enhanced performance.
Read more
From Search to Synthesis: Training LLMs as Zero-Shot Workflow Generators
Gan Luo, Zihan Qin, Bin Dong, Wotao Yin
Large Language Models Reinforcement Learning Optimization
  • MetaFlow enables zero-shot workflow generation by learning task-level patterns instead of instance-specific solutions.
  • The approach consists of a two-stage training process: supervised fine-tuning followed by reinforcement learning with execution feedback.
  • MetaFlow demonstrates strong generalization capabilities, performing well on both trained and untrained tasks.
  • The model achieves competitive performance against state-of-the-art baselines across multiple benchmarks.
Read more
Signed-Permutation Coordinate Transport for RMSNorm Transformers
John Sweeney
Large Language Models Theory Interpretability
  • RMSNorm transformers require a signed-permutation gauge for accurate coordinate transport.
  • Sign-marginalized Hungarian matching improves matching accuracy compared to raw signed-correlation methods.
  • Coordinate-preserving transport outperforms endpoint matching in recovering cross-run coordinates.
  • Signed transport maintains the trajectory of optimizer states, unlike permutation-only methods.
Read more
Transformers as Bayesian In-Context Experimenters: Smoothness-Adaptive Efficient ATE Estimation
Jiachun Li, David Simchi-Levi
Theory Efficient ML Optimization
  • Introduces Bayesian in-context experimenters using transformer architectures for ATE estimation.
  • Utilizes a mixture-of-experts approach to adaptively handle unknown outcome smoothness.
  • Proves that the transformer can learn the history-to-propensity mapping through supervised pretraining.
  • Demonstrates empirical success in mimicking Bayesian-Neyman allocation behavior and improving ATE precision.
Read more
SP-CACW: Convergence-Aware Client Weighting for Selfish Personalized Learning
Yaron Kiselman, Kfir Y. Levy
Federated Learning
  • Introduction of SP-CACW framework for selfish personalized learning in federated settings.
  • Convergence-aware client weighting minimizes the target client's convergence error.
  • Guarantees provided for convergence rates, particularly in cluster-structured problems.
  • Demonstrated effectiveness on multiple datasets, outperforming existing methods.
Read more
Fisher-Routed Mixture of Experts for Federated Class-Incremental Learning
Wenhao Yuan, Chenchen Lin, Jian Chen, Jinfeng Xu, Zewei Liu, Edith Cheuk Han Ngai
Federated Learning
  • Introduces FedFMX to address challenges in federated class-incremental learning.
  • Develops FRES and AES modules for expert routing and selection based on stability-plasticity trade-offs.
  • Employs routing-aware regularization to promote balanced expert utilization.
  • Demonstrates superior performance over existing methods on multiple benchmark datasets.
Read more
Low-dimensional topology of deep neural networks
Junyu Ren, Lek-Heng Lim
Theory
  • Layer-skipping in ResNets is as powerful as attention mechanisms in transformers for changing linking numbers.
  • Feedforward networks with monotonic activations are less expressive than ResNets and transformers.
  • Nonmonotonic activations can enhance the expressivity of feedforward networks, allowing them to match the capabilities of more complex architectures.
  • The study highlights the importance of low-dimensional topology in understanding and designing neural network architectures.
Read more
Multi-Agent Routing as Set-Valued Prediction: A WildChat Benchmark and Cost-Aware Evaluation
Ananto Nayan Bala, Faisal Muhammad Shah
NLP Large Language Models Optimization
  • Introduces a set-valued prediction framework for multi-agent routing from natural language prompts.
  • Develops a WildChat-derived benchmark with 3,000 prompts and a controlled agent catalog for evaluation.
  • Demonstrates that supervised routing methods outperform nearest-neighbor and zero-shot approaches.
  • Highlights the effectiveness of a weighted routing layer in improving utility in constrained settings.
Read more
Discovering Collaboration from Novelty: Random Network Distillation for Clustered Federated Learning
Davide Domini, Gianluca Aguzzi, Ivana Dusparic, Danilo Pianini, Mirko Viroli
Federated Learning
  • Introduces a lightweight clustering approach using Random Network Distillation for Federated Learning.
  • Decouples clustering from the main training loop, reducing computational and communication costs.
  • Enables autonomous discovery of client groups based on local novelty estimates.
  • Demonstrates effectiveness on computer vision benchmarks with non-IID data.
Read more
Fora: From Weight-Space to Function-Space Protection in Capability-Preserving Fine-Tuning
Rui Zhou, Tianci Xie
NLP Large Language Models Efficient ML
  • Fora reformulates capability preservation as function-space protection, focusing on activation subspaces rather than weight geometry.
  • The method employs a unique update mechanism that combines high-capacity and controlled spectral channels to protect capabilities.
  • Fora consistently outperforms traditional fine-tuning methods in preserving existing capabilities while adapting to new tasks.
  • The study highlights the importance of projecting onto capability-derived directions rather than weight-derived directions for effective adaptation.
Read more
Estimating Supply Incrementality in Two-sided Marketplaces: A Causal Machine Learning Approach
Yufei Wu, Daniel Schmierer, Dan Zylberglejd
Theory
  • The paper introduces a causal machine learning approach to estimate supply incrementality in two-sided marketplaces.
  • It combines double machine learning with a hierarchical Bayesian framework to address endogeneity and substitution effects.
  • The methodology utilizes geospatial measures to enhance feature construction and improve model accuracy.
  • The results indicate that the model can effectively estimate the impact of additional supply on total bookings across different listing segments.
Read more
On the Convergence of Self-Improving Online LLM Alignment
Xudong Wu, Pangpang Liu, Vaneet Aggarwal, Jiayu Chen
Reinforcement Learning Large Language Models Optimization
  • Introduction of SAIL-RevKL, a regularized version of the SAIL algorithm to improve convergence properties.
  • Proof of the Polyak-Lojasiewicz condition for the regularized objective, ensuring global convergence.
  • Demonstration of near-linear sample complexity for achieving desired accuracy.
  • Empirical validation showing superior performance of SAIL-RevKL over the original SAIL.
Read more
ScaleAware-JEPA: Latent Representation for Discovery in Multiscale Physical Fields
Guang-Xing Li
Theory Generative Models Computer Vision
  • Introduces ScaleAware-JEPA for constructing latent coordinates in continuous scalar fields.
  • Utilizes Constrained Diffusion Decomposition (CDD) to separate fields into scale components.
  • Aligns context prediction with the diffusion scale rather than fixed image patches.
  • Demonstrates effectiveness in diverse applications, including MHD turbulence and urban structures.
Read more
Do Models Read What They Write? Causal Registers in Scratchpad Reasoning
Benjamin Shih, John Winnicki, Eric Darve
Large Language Models Interpretability Theory
  • Scratchpads can enhance model alignment by exposing intermediate reasoning.
  • Models trained to write intermediate states perform better in causal reasoning tasks than those that only report final states.
  • Editing internal representations of written states allows for testing whether models compute from those states.
  • The study demonstrates that running-state supervision can make scratchpad states causally usable in model computations.
Read more
Scalar Representations of Neural Network Training Dynamics
Pedro Jiménez-González, Miguel C. Soriano, Lucas Lacasa
Optimization Theory
  • Training trajectories of ANNs can be represented as temporal networks.
  • Scalar embeddings preserve critical dynamical features of training dynamics.
  • A characteristic time analogous to Lyapunov time captures decorrelation in training trajectories.
  • Asymptotic training states exhibit a common statistical distribution.
Read more
Review Residuals: Update-Conditioned Residual Gating for Transformers
Kyle Kramer
Large Language Models Theory Efficient ML
  • Introduction of Review Residuals, which scale updates based on a learned gate conditioned on the proposed update.
  • Demonstration of depth-stability issues with convex gating forms, leading to the adoption of an additive form for stable training.
  • Significant performance improvements at larger model sizes, particularly at 590M and 1B parameters, compared to standard and Highway residuals.
  • The method shows a trajectory of growing benefits with model size, indicating its potential for large-scale applications.
Read more
AdaJEPA: An Adaptive Latent World Model
Ying Wang, Oumayma Bounou, Yann LeCun, Mengye Ren
Reinforcement Learning Robotics Optimization
  • AdaJEPA allows for real-time adaptation of latent world models during planning.
  • The model utilizes self-supervised signals from observed transitions to improve predictions.
  • AdaJEPA shows substantial performance improvements in both in-distribution and out-of-distribution tasks.
  • The approach is efficient, requiring only a single gradient step for adaptation.
Read more
Bridging the Gap Between Latent and Explicit Reasoning with Looped Transformers
Ying Fan, Anej Svete, Kangwook Lee
NLP Large Language Models Efficient ML
  • LOTUS is the first latent-CoT method to match explicit CoT performance at the 3B scale.
  • The method reduces thought-phase latency by 2.5 to 6.9 times compared to traditional CoT.
  • LOTUS employs a looped padded Transformer architecture with parallel supervision on gold CoT tokens.
  • The latent space of LOTUS is interpretable, recovering gold reasoning steps and alternative valid steps.
Read more
Robust Strategic Classification under Decision-Dependent Cost Uncertainty
Sura Alhanouti, Güzin Bayraksan, Parinaz Naghizadeh
Optimization Theory
  • Introduces a two-stage robust optimization framework for strategic classification.
  • Models decision-dependent cost uncertainty, reflecting real-world scenarios.
  • Demonstrates that awareness of temporal effects can enhance robustness against manipulation.
  • Shows that strategic sacrifices in performance can lead to better long-term outcomes.
Read more
A Bayesian Filtering Approach for Learning Lagrangian Dynamics from Noisy Measurements
Kundan Kumar, Shreya Das, Simo Särkkä
Robotics Theory Optimization
  • Introduces a Bayesian state estimation framework for learning Lagrangian dynamics from noisy measurements.
  • Models unknown forces as white Gaussian noise, leading to a stochastic state-space model.
  • Utilizes neural networks to parameterize kinetic and potential energies within the Lagrangian formulation.
  • Demonstrates improved performance over traditional LNNs and approximate Bayesian filters in numerical experiments.
Read more
Behavior Cloning is Not All You Need: The Optimality of On-Policy Distillation for Noisy Expert Feedback
Ved Sriraman, Peihan Liu, Daniel Hsu, Adam Block
NLP Large Language Models Theory
  • Introduces a noisy expert model to explain the performance gap between offline and online imitation learning.
  • Demonstrates that offline learning from noisy trajectories has exponential sample complexity, while online methods can achieve polynomial complexity.
  • Proposes a novel variant of On-Policy Distillation (OPD) that outperforms traditional methods under noisy expert feedback.
  • Provides theoretical insights and empirical results supporting the effectiveness of on-policy methods in training language models.
Read more
Sequential sparse Gaussian process quantile regression
Hugo Nicolas, Olivier Le Maître
Theory Efficient ML Optimization
  • Introduces a sparse Bayesian quantile regression formulation using Laplace approximation.
  • Develops adaptive strategies for inducing-input infilling and data acquisition.
  • Combines enrichment mechanisms into a unified sequential algorithm.
  • Demonstrates improved computational efficiency and predictive accuracy in numerical experiments.
Read more
C$^{2}$R: Cross-sample Consistency Regularization Mitigates Feature Splitting and Absorption in Sparse Autoencoders
Haoran Jin, Xiting Wang, Shijie Ren, Hong Xie, Defu Lian
NLP Large Language Models Interpretability
  • Identifies the lack of cross-sample consistency as a root cause of feature splitting and absorption in Sparse Autoencoders.
  • Introduces C2R, a regularization technique that enforces consistent latent selection across samples.
  • Demonstrates that C2R significantly mitigates issues of feature splitting and absorption without compromising reconstruction fidelity.
  • Provides a theoretical analysis and empirical validation of the proposed method against state-of-the-art baselines.
Read more
Mixture-of-Control: State-Aware Fine-Tuning for Transformer-based Models
Duc Anh Nguyen, Tien Ngoc Luu, Tung Pham, Toan Tran
NLP Large Language Models Efficient ML
  • MoC integrates local and global control signals for enhanced representation learning.
  • The framework allows efficient cross-block communication without significant computational overhead.
  • MoC maintains memory efficiency while outperforming traditional state-based fine-tuning methods.
  • The approach is architecture-agnostic, applicable to various transformer configurations.
Read more
Adaptive Block Diffusion: Resolving Training-Inference Mismatch in Diffusion Language Models
Gagan Jain
NLP Large Language Models Generative Models
  • Introduction of Adaptive Block Diffusion (ABD) to resolve training-inference mismatch in DLMs.
  • ABD treats denoising configurations as stochastic variables, optimizing over a full configuration space.
  • Guarantees denoising optimality for any inference policy supported during training.
  • Demonstrates structural invariance, avoiding off-grid degradation and maintaining performance across scales.
Read more
Contextual Slate GLM Bandits with Limited Adaptivity
Tanmay Goyal, Sukruta Prakash Midigeshi, Gaurav Sinha
Theory Optimization Reinforcement Learning
  • Introduction of B-SlateGLinCB and RS-SlateGLinCB algorithms for contextual slate bandits.
  • Establishment of regret bounds that are independent of the non-linearity parameter κ.
  • Demonstration of computational efficiency with polynomial time complexity per round.
  • Empirical validation showing superior performance compared to existing limited adaptivity baselines.
Read more
QVal: Cheaply Evaluating Dense Supervision Signals for Long-Horizon LLM Agents
Sergio Hernández-Gutiérrez, Matteo Merler, Ilze Amanda Auzina, Joschka Strüber, Ameya Prabhu, Matthias Bethge
Large Language Models Reinforcement Learning Optimization
  • QVAL provides a cheap, training-free method to evaluate dense supervision signals for LLM agents.
  • The framework allows for direct comparison of different dense supervision methods without conflating results with training engineering factors.
  • Benchmarking reveals that simple prompting methods consistently outperform more complex dense supervision techniques.
  • QVAL is extensible, facilitating the evaluation of new methods and environments.
Read more
Calibration, Not Compilation: Detecting and Repairing Misspecified Probabilistic Programs Written by Language Models
Jian Xu, Delu Zeng, John Paisley, Qibin Zhao
Large Language Models Theory Generative Models
  • Calibration oracles outperform traditional unit tests in detecting statistical misspecifications in probabilistic programs.
  • A benchmark of 14 misspecification types shows that calibration can flag 88% of bugs with a low false positive rate.
  • Calibration feedback significantly enhances the repair process for LLM-generated programs compared to unit test feedback.
  • A substantial portion of runnable programs generated by LLMs are statistically misspecified, indicating a critical need for calibration.
Read more
ReGuide: From Test-Time Guidance to Self-Improving Diffusion Policies
Tzu-Hsiang Lin, Srinivas Shakkottai, Dileep Kalathil, P. R. Kumar
Robotics Reinforcement Learning Generative Models
  • ReGuide repurposes guided rollouts as training data for iterative self-improvement.
  • Phase-Conditioned Guidance (PCG) is used to generate reliable corrective rollouts.
  • The framework allows for both fine-tuning and retraining of policies using successful guided rollouts.
  • ReGuide demonstrates significant performance improvements over existing methods, achieving 1.3–7.7× success rate increases.
Read more
FedXDS: Leveraging Model Attribution Methods to counteract Data Heterogeneity in Federated Learning
Maximilian Andreas Hoefler, Karsten Mueller, Wojciech Samek
Federated Learning Interpretability Efficient ML
  • FedXDS is the first approach to leverage XAI for selective data sharing in federated learning.
  • The method uses propagation-based attribution to identify and share only task-relevant features.
  • FedXDS incorporates differential-metric privacy to enhance privacy guarantees while maintaining utility.
  • Experimental results show superior accuracy and faster convergence compared to state-of-the-art methods.
Read more
Muon learns balanced solutions in matrix factorization without slow saddle-to-saddle dynamics
Mark Rhee, Jamie Simon, Dhruva Karkada
Optimization Theory Efficient ML
  • Muon optimizer avoids slow saddle-to-saddle dynamics, leading to faster convergence.
  • It remains stable even with learning rates exceeding critical thresholds.
  • The optimizer conserves a different matrix quantity than gradient descent, affecting convergence behavior.
  • A learning rate schedule is proposed that achieves alignment in only two optimization steps.
Read more
Sequential RC-TGAN: Generating Relational Time Series with Spectral Envelope Loss
Mohamed Gueye, Yazid Attabi, Manuel Morales, Maxime Dumas
Generative Models Time Series Optimization
  • Introduction of Seq. RC-TGAN, a temporal extension of the RC-TGAN framework.
  • Novel spectral envelope loss function that optimizes latent periodic structures in categorical time series.
  • Extension of spectral methodology to continuous features using a Gaussian Mixture Model discretization.
  • Development of new evaluation metrics for assessing frequency-domain fidelity.
Read more
Replica Symmetry Breaking and Algorithmic Thresholds in Empirical Risk Minimization under Multi-Index Model
Andrea Montanari, Kangjie Zhou
Theory Optimization Efficient ML
  • Develops a precise understanding of the empirical risk landscape in high-dimensional settings.
  • Introduces the Incremental Approximate Message Passing (IAMP) algorithm for empirical risk minimization.
  • Characterizes the relationship between training and test errors in the context of high-dimensional asymptotics.
  • Demonstrates that the proposed algorithm achieves optimal performance among polynomial-time algorithms.
Read more
Predict, Reuse, and Repair: Accelerating Dynamic Sparse Attention for Long-Context LLM Decoding
Tianyu Wang, Gourav Rattihalli, Aditya Dhakal, Junbo Li, Zhiwei Ren, Dejan Milojicic, Longfei Shangguan
Large Language Models Efficient ML
  • PRR reduces per-token decoding latency by up to 40% while maintaining accuracy.
  • The method exploits temporal locality in DSA selections for efficient prediction and speculation.
  • Incremental attention repair allows for efficient correction of missed blocks without recomputing attention from scratch.
  • PRR achieves significant speedup over existing DSA methods across multiple LLMs.
Read more
Optimizing Nursing Care Taxi Dispatch Leveraging Integer Linear Programming Solvers and Machine Learning
Riku Nakao, Akihito Hiromori, Hamada Rizk, Hirozumi Yamaguchi
Optimization
  • Introduction of the Nursing Care Taxi Dispatch (NCTD) problem as a complex variant of the Vehicle Routing Problem (VRP).
  • Utilization of a supervised machine learning approach based on the Transformer architecture combined with integer linear programming solvers.
  • Effective handling of multiple constraints including wheelchair accessibility and user compatibility.
  • Demonstrated improvements in operating time and constraint violation rates compared to existing methods.
Read more
Learning Gaussian Graphical Models from a Glauber Trajectory Without Mixing
Eric Shen, Tony Wu, Mahbod Majid, Ankur Moitra
Graph Learning Theory
  • Introduces a polynomial-time algorithm for learning Gaussian graphical models from Glauber dynamics.
  • Establishes a method that does not depend on mixing time, addressing a significant gap in existing approaches.
  • Combines local edge testing and robust statistical aggregation to handle temporal dependencies in data.
  • Provides theoretical guarantees for the accuracy of the proposed method despite the challenges posed by non-i.i.d. observations.
Read more
Improving Certified Robustness via Adversarial Distillation
Matteo Melis, Jesus Martinez Del Rincon, Vishal Sharma
Theory Optimization
  • Introduction of AD-CERT, a new certified training objective combining adversarial distillation and IBP bounds.
  • AD-CERT achieves state-of-the-art certified accuracy on multiple robustness benchmarks.
  • Logit-level distillation from a robust teacher is shown to be more effective than clean or feature-space distillation.
  • The method provides a better trade-off between certified robustness and standard accuracy.
Read more
OTCache: Optimal Transport for Geometry-Aware Caching in Diffusion Models
Huanlin Gao, Fang Zhao, Qiang Hui, Fuyuan Shi, Shaoan Zhao, Yantao Li, Chao Tan, Ting Lu, Yuren You, Kai Wang, Shiguo Lian
Generative Models Optimization Efficient ML
  • OTCache provides a training-free framework for accelerating diffusion model sampling.
  • It overcomes limitations of existing caching methods by modeling schedule evolution as a smooth trajectory in policy space.
  • The framework achieves significant acceleration in sampling while improving fidelity.
  • Experiments validate the effectiveness of OTCache across multiple datasets.
Read more
CoMet: Context and Multiplicity Decomposition for Multimodal Uncertainty Estimation
Sanghyuk Chun, William Yang, Amaya Dharmasiri, Olga Russakovsky
Multimodal Large Language Models NLP
  • Introduces CoMet, a method for uncertainty estimation in multimodal large language models.
  • Decomposes uncertainty into context-specific and multiplicity-specific components.
  • Utilizes a lightweight post-hoc uncertainty module for efficient estimation.
  • Demonstrates improved performance on multimodal benchmarks compared to existing methods.
Read more
Probabilistic Inversion with Flow Matching
Baldur Paulwitz, Stefan Buske
Generative Models Optimization Theory
  • Flow Matching is adapted for probabilistic inversion in geophysics, enhancing traditional methods.
  • Probabilistic inversion allows for uncertainty assessment without requiring initial guesses or regularization.
  • The method is evaluated through case studies, demonstrating its applicability to both simple and complex models.
  • Flow Matching bridges the gap between Variational Inference and Diffusion Models in probabilistic modeling.
Read more
Evil Spectra: How Optimisers can Amplify or Suppress Emergent Misalignment
Jason R. Brown, Patrick Leask, Lev McKinney
NLP Large Language Models Optimization
  • Optimizer choice significantly influences emergent misalignment severity, with a 7× variation in misalignment rates observed.
  • Model size and family have negligible effects on emergent misalignment when using the Adam optimizer.
  • Final log training loss is a strong predictor of alignment, but the optimizer's role becomes more critical after extensive training.
  • Optimizers that produce flatter singular value spectra in learned weights better preserve alignment.
Read more
Safe Online Learning via Smooth Safety-Structured Policy Composition
Hongpeng Cao, Liqun Zhao, Yuliang Gu, Naira Hovakimyan, Lui Sha, Marco Caccamo
Reinforcement Learning Robotics Theory
  • AutoSafe integrates safety monitoring and intervention into the action generation process, enhancing smooth learning dynamics.
  • The architecture allows for risk-dependent transitions between performance and safety, ensuring continuous interaction.
  • Empirical results show AutoSafe provides strong safety assurance while maintaining stable learning dynamics.
  • The method is validated in both simulated environments and real-world applications, such as a cart-pole system.
Read more
Amplifying Membership Signal Through Chained Regeneration
Wojciech Łapacz, Stanisław Pawlak
Generative Models Multimodal Theory
  • Introduction of MADreMIA, a framework for enhancing membership inference through iterative regeneration.
  • Demonstration that chained generations yield stronger membership signals than one-shot methods.
  • Identification of 're-members' and 're-non-members' to differentiate between training data and unseen samples.
  • Comprehensive evaluations across multiple generative model families showing improved inference efficiency.
Read more
TRIAGE: Role-Typed Credit Assignment for Agentic Reinforcement Learning
Yuanda Xu, Zhengze Zhou, Hejian Sang, Xiaomin Li, Jiaxin Zhang, Xinchen Du, Zhipeng Wang, Alborz Geramifard
Reinforcement Learning
  • TRIAGE introduces a role-typed credit assignment framework for agentic RL.
  • It classifies actions into four semantic roles to improve credit assignment accuracy.
  • The framework outperforms standard GRPO and other baseline methods in multiple benchmarks.
  • TRIAGE reduces unnecessary actions in successful trajectories, enhancing efficiency.
Read more
Interpretable Inverse Design of Metal-Organic Frameworks with Large Language Model Agents
Kyungmin Nam, Seunghee Han, Jihan Kim
NLP Large Language Models Generative Models
  • LLM4MOF enables interpretable design of MOFs without requiring extensive property-labeled datasets.
  • The framework operates autonomously through a closed-loop process, refining hypotheses over multiple iterations.
  • It successfully identifies high-performing MOFs and generates new structures de novo.
  • LLM4MOF outperforms random search and genetic algorithms in efficiency and effectiveness.
Read more
Golden Hour Divide: Trauma Care Accessibility and Resource Vulnerability in Sri Lanka
Sonath Kirindage, Vihanga Nimsara, Sakindu Rajapaksa, Kavyanga Hathurusinghe, Lahiru Dilshan, Subavarshana Arumugam, Nathali Athukorala, Sandareka Wickramanayake, Nisansa de Silva
  • Significant disparities in trauma care accessibility exist across Sri Lanka, particularly in the Northern and Eastern provinces.
  • The study introduces a framework using spatial analysis to quantify gaps in emergency care resources.
  • Four policy-actionable archetypes of districts are identified based on their healthcare resource availability and clinical needs.
  • Improving accessibility by 25% in high-priority areas could reduce the national need-gap by 9.65%.
Read more
Few-Step Boltzmann Generators via Scalable Likelihood Flow Maps
RuiKang OuYang, Hanlin Yu, Xinyue Ai, Yutong He, Nicholas M. Boffi, Pradeep Ravikumar, Jose Miguel Hernandez-Lobato, Max Simchowitz, Benjamin Kurt Miller, Omar Chehab
Generative Models Efficient ML Theory
  • SCALLOP introduces a Hutchinson-free likelihood distillation objective, improving scalability and reducing variance.
  • The method achieves up to 100× reduction in training variance and faster convergence compared to F2D2.
  • SCALLOP demonstrates significant speed improvements in inference time, being 10× faster than normalizing flows.
  • Empirical results show consistent performance gains in both molecular and image domains.
Read more
A Transferable Learned Temporal Prior for Transmission Reconstruction and Decision-Relevant Uncertainty in Real Outbreak Labels
Md Ahsan Karim
Time Series Graph Learning Theory
  • Introduces a transferable temporal prior for outbreak transmission reconstruction.
  • Demonstrates significant performance improvement over traditional parametric baselines.
  • Identifies a high level of uncertainty in epidemiological transmission labels.
  • Shows that retaining uncertain transmission links alters source prioritization.
Read more
Depth Exploration for LLM Decoding
Weisi Yang, Zipeng Sun, Stephen Xia
NLP Large Language Models Efficient ML
  • DEX replaces single-depth selection with parallel exploration of multiple candidate depths.
  • The method preserves lossless equivalence to standard autoregressive decoding while reducing computational costs.
  • Empirical results show DEX outperforms existing depth-adaptive methods and achieves competitive throughput.
  • The concept of Earliest Available Depth (EAD) is introduced to quantify token readiness.
Read more
Interface-Aware Neural Newton Preconditioning for Robust Cohesive Zone Model Simulations
Zhangyong Liang, Huanhuan Gao
Optimization Theory
  • Introduction of IA-NNP to enhance CZM simulation robustness.
  • Preservation of original traction-separation laws while improving convergence.
  • Development of two solver-level implementations for effective preconditioning.
  • Demonstrated improved performance in numerical benchmarks over traditional methods.
Read more
When to Truncate a Feature Ranking: A Residual-Overlap Stopping Rule for Subset Selection
Jesus S. Aguilar-Ruiz
Theory Efficient ML Interpretability
  • Introduces a calibrated residual-overlap stopping rule for feature selection.
  • Utilizes the Bhattacharyya coefficient to measure class-conditional marginal separation.
  • Provides a single class-independent subset of features based on a statistical interpretation.
  • Demonstrates effectiveness on high-dimensional genomic datasets, achieving significant dimensionality reduction.
Read more
Why Do Few-Step Text Latents Fail When Image Latents Work? Non-Commitment at Sharp Categorical Readouts
Zhongyao Wang
NLP Generative Models Theory
  • Deterministic few-step generation fails for text latents due to geometric issues related to sharp categorical readouts.
  • DABI and CCI diagnostics reveal significant differences in performance between text and image decoders.
  • Two mechanisms, categorical commitment and stochastic re-injection, allow some systems to escape deterministic transport limitations.
  • The paper establishes a non-commitment theorem and sharp transport laws that inform the accuracy-depth-stiffness tradeoff.
Read more
Blackknife: Hard-Label Query-Limited Black-Box Attacks on Heterogeneous Graph Neural Networks
Honglin Gao, Junhao Ren, Lan Zhao, Yue Yang, Jindong Chang, Gaoxi Xiao
Graph Learning
  • Blackknife operates under strict black-box conditions without access to model internals or complete graph structures.
  • The framework constructs a local surrogate model to generate effective perturbations for attacks.
  • Blackknife demonstrates high attack success rates across multiple benchmark datasets.
  • The method remains effective against topology-based defenses, indicating significant vulnerabilities in HGNNs.
Read more
Probing Memorization of Tabular In-Context Learning
Francesco Capano, Jonas Böhler
Large Language Models Theory Interpretability
  • Introduces ICLMEM, a framework for probing memorization in LTMs.
  • Detects moderate memorization signals in LTMs across various tasks.
  • Memorization is strongest in low-cardinality and binary tasks.
  • Memorization signals largely disappear under realistic training conditions.
Read more
Can Tabular In-Context Learners Generalize to Biomolecular Property Prediction?
Davy Guan, Lu Zhang, Asiri Wijesinghe, Allen Zhu, He Zhao, Helen Power, F. Hafna Ahmed, Andrew Warden, Cheng Soon Ong, Daniel M. Steinberg
Theory Efficient ML
  • Tabular in-context learners can effectively generalize to biomolecular property prediction tasks.
  • The performance of these models is highly dependent on the choice of sequence or molecular representation.
  • The study provides a systematic evaluation of tabular foundation models in scientific prediction settings.
  • TabPFN3 and TabICL show competitive results in protein fitness regression and small-molecule classification.
Read more
Multistage Defer Trees for Hybrid Interpretability: If at First You Can't Succeed, Tree Again
Zakk Heile, Hayden McTavish, Margo Seltzer, Cynthia Rudin
Interpretability
  • Introduction of Multistage Defer Trees (MDTs) for improved interpretability and accuracy.
  • Iterative training algorithm that narrows the deferral region while enhancing model performance.
  • Ability to compress MDTs into simpler representations, maintaining interpretability.
  • Demonstrated improved accuracy-deferral-sparsity trade-offs compared to existing methods.
Read more
Randomized Exploration for Linear Bandits via Absolute Perturbations
Toshinori Kitamura, Shuai Liu, Csaba Szepesvári
Theory Efficient ML
  • Introduction of Absolute Thompson Sampling (ATS) to ensure optimism in expectation while maintaining computational efficiency.
  • ATS achieves a regret bound of eO(d^(3/2)√K), matching existing bounds for Thompson Sampling.
  • Ensemble Absolute Thompson Sampling (EATS) converges to UCB behavior as ensemble size increases, providing a practical interpolation between randomized and deterministic approaches.
  • The proposed methods simplify the regret analysis compared to traditional TS approaches, avoiding complex anti-concentration arguments.
Read more
Toward an Energy-Optimized Operation of Data Centers Located in Wind Farms Using Reinforcement Learning
Jan Stenner, Alexander Kilian, Sebastian Peitz, Hermann de Meer
Reinforcement Learning Optimization Efficient ML
  • Introduces a fixed-day RL environment for HPC data centers in wind farms.
  • Identifies and addresses a credit-assignment problem in pure RL applications.
  • Evaluates optimization-based Imitation Learning and potential-based Reward Shaping as countermeasures.
  • Demonstrates strong empirical performance improvements with RL techniques.
Read more
ITSPACE: Monotone Gaussian Optimal Transport Updates
Woojoo Na, Jennifer Dy
Optimization Efficient ML Theory
  • ITSPACE optimizes the Bures-Wasserstein objective for covariance alignment using a proximal majorization-minimization approach.
  • The method ensures that updates remain positive semidefinite and rank-constrained, suitable for real-time applications.
  • ITSPACE outperforms existing methods in terms of speed and efficiency in achieving low BW-gap solutions.
  • The paper provides theoretical guarantees for the method's performance, including bounds on deviations from exact descent.
Read more
Hierarchical Global Attention (HGA)
Woernle Frank, Fedosov Vladimir, Grinenko Artemiy
NLP Large Language Models Efficient ML
  • HGA is a drop-in replacement for dense causal attention, preserving original model parameters.
  • It enables long-context transformers to operate efficiently at 64K tokens without retraining.
  • The hierarchical routing mechanism reduces memory consumption and maintains performance.
  • HGA achieves a minimal loss gap compared to dense attention while using only 3% sparsity.
Read more
Geometry-Preserving Orthonormal Initialization for Low-Rank Adaptation in RLVR
Ruijia Zhang, Jiacheng Zhu, Hanqing Zhu, Laixi Shi
Reinforcement Learning NLP Large Language Models
  • Introduces geometry-preserving orthonormal initialization for LoRA in RLVR.
  • Demonstrates that orthonormal initialization minimizes performance gaps compared to full fine-tuning.
  • Presents two new LoRA variants, LoRA-RLPO and LoRA-RLMO, which outperform standard LoRA.
  • Provides theoretical insights into the instability of existing LoRA variants in RLVR.
Read more
Calibrating the Evaluator: Does Probability Calibration Mitigate Preference Coupling in LLM Agent Feedback Loops?
Zewen Liu
Large Language Models Reinforcement Learning Theory
  • Introduces the first study on evaluator calibration as a method to mitigate preference coupling in LLM feedback loops.
  • Demonstrates that confidence-calibrated TTRL reduces coupling coefficients and divergence metrics significantly.
  • Confirms that the observed effects are not due to output format changes through a symmetric-LR control.
  • Releases a calibrated TTRL protocol as a lightweight solution for LLM deployment pipelines.
Read more
Policy Optimization Achieves Data-Dependent Regret Bounds in MDPs with Unknown Transitions
Mingyi Li, Taira Tsuchiya, Kenji Yamanishi
Reinforcement Learning Optimization Theory
  • Developed a policy optimization algorithm for episodic tabular MDPs with unknown transitions.
  • Achieved data-dependent regret bounds, including first-order, second-order, and path-length complexities.
  • Introduced a transition-dependent complexity term that captures the cost of estimating the transition kernel.
  • Demonstrated gap-dependent polylog(T) regret in stochastic regimes.
Read more
Predictable GRPO: A Closed-Form Model of Training Dynamics
Rajat Ghosh, Datta Nimmaturi, Aryan Singhal, Vaishnavi Bhargava, Henry Wong, Johnu George, Debojyoti Dutta
Reinforcement Learning Large Language Models Theory
  • Introduces a closed-form model for GRPO training dynamics, enhancing mechanistic understanding.
  • Reinterprets empirical saturation laws through a stochastically-forced damped oscillator framework.
  • Provides measurable predictions and diagnostics for training dynamics, distinguishing failure modes.
  • Empirical validation shows strong correlation with training reward trajectories across different models.
Read more
Reliability, Faithfulness, and the Limits of Post-hoc Explanations of Opaque Scientific Models
Nick Oh, Helen Jin
Theory Interpretability
  • Post-hoc explanation methods do not guarantee insights into the structure of phenomena.
  • Reliability and faithfulness are necessary but insufficient for justified claims about the world.
  • The paper distinguishes between descriptive and justificatory assessments of models and explanations.
  • The authors argue that the composition of reliability and faithfulness does not lead to valid claims about the underlying structure of phenomena.
Read more
Mind the Residual Gap: Probabilistic Downscaling under Real-World Bias
Yujin Kim, Nidhi Soma, Sarah Dean
Generative Models Theory Optimization
  • Identifies residual target misspecification as a fundamental cause of under-dispersion in probabilistic downscaling.
  • Introduces ReMatch, a method that aligns training and test-time residual distributions using optimal transport.
  • Demonstrates that ReMatch outperforms traditional mean-residual models and state-of-the-art super-resolution techniques.
  • Provides empirical evidence through controlled synthetic benchmarks and real-world applications.
Read more