AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

64 Papers today
8h Update frequency
7 Days of history
Neural-Behavioral Representation of Natural Whole-body Movement in Monkeys
Jieshi He, Puzhe Li, Yanan Sui, Mu-ming Poo
Robotics Generative Models Time Series
  • Development of a neural-behavioral recording platform for capturing large-scale epidural cortical activity and 3D whole-body kinematics in freely moving monkeys.
  • Introduction of a neural-behavioral model that integrates cortical representations with learned behavior priors for natural whole-body movement reconstruction.
  • First demonstration of continuous and realistic whole-body movement reconstruction from cortical neural representations in primates.
  • The model outperforms traditional behavior-only generative models and LSTM-based models in movement prediction.
Read more
AsymVLM: Asymmetric Token Pruning for Efficient Vision-Language Model Inference
Yilin Feng, Ahmed Burak Gulhan, Mahmut Taylan Kandemir
Multimodal Efficient ML Computer Vision
  • AsymVLM proposes a modality-aware token compression strategy that differentiates between vision and text tokens.
  • The framework utilizes a learned importance scorer for vision tokens and a per-sample adaptive budget for pruning.
  • AsymVLM achieves up to 54% FLOPs savings while improving performance on specific multimodal tasks.
  • The method maintains competitive accuracy on holistic benchmarks and outperforms standard LLM cache methods in text-dominated scenarios.
Read more
Bridging Chemists and AI: An Expert-Augmented Framework for Interpretable Route Evaluation
Yujia Guo, Mikhail Kabeshov, Tat Hong Duong Le, Samuel Genheden, Marco V. Mijangos, Varvara Voinarvoska, Giulia Bergonzini, Ola Engkvist, Samuel Kaski
Interpretability
  • Introduces an expert-augmented framework that combines machine learning with chemists' expertise for route evaluation.
  • Utilizes a DeepSets-based model to assess synthetic routes based on tree edit distances and expert evaluations.
  • Achieves high correlation with expert ratings, indicating the model's effectiveness in capturing complex chemical reasoning.
  • Demonstrates significant improvements in accuracy over previous baseline models in predicting route feasibility and quality.
Read more
A Geometric View of SRC: Learning Representations for Stable Residual Inference
Vangelis P. Oikonomou
Theory
  • Introduces a strict training-inference separation for SRC, treating it as a fixed rule during inference.
  • Formalizes residual-ordering stability and identifies geometric obstructions that affect residual comparisons.
  • Derives a quantitative lower bound on the residual margin under specific geometric conditions.
  • Proposes geometry-shaping objectives that enhance representation learning without using SRC during training.
Read more
Ensemble Score Filtering for Real-Data Energy Consumption Forecast Correction
Ruoyu Hu, Dahai Yu, Feng Bao, Guang Wang, Guannan Zhang
Time Series
  • Introduces Ensemble Score Filter (EnSF) for energy consumption forecast correction.
  • Demonstrates the limitations of open-loop forecasting models over long horizons.
  • Shows that EnSF significantly improves state estimation compared to traditional methods.
  • Utilizes score-based diffusion models for efficient data assimilation.
Read more
FedQHD: Closed-Form Function-Space Federated Reinforcement Learning
Yuchen Hou, Yongshan Chen, Zhuowen Zou, Calvin Yeung, Mohsen Imani, Tian Lan, Mahdi Imani
Reinforcement Learning Federated Learning Optimization
  • FedQHD provides a closed-form federated Q-learning algorithm that effectively handles heterogeneous encoders.
  • The paper introduces a pointwise bound on the federation gap, decomposing it into interpretable components.
  • Empirical validation shows that FedQHD outperforms traditional FedAvg and distillation-based methods on benchmark tasks.
  • The method simplifies the aggregation process, avoiding iterative optimization and enhancing computational efficiency.
Read more
ExDBSCAN: Explaining DBSCAN with Counterfactual Reasoning -- Additional Material
Pernille Matthews, Lena Krieger, Tommaso Amico, Artur Zimek, Thomas Seidl, Ira Assent
Interpretability
  • ExDBSCAN is the first method specifically designed to generate counterfactual explanations for DBSCAN.
  • The method ensures both diversity and proximity in counterfactual generation using a physics-inspired model.
  • ExDBSCAN achieves perfect validity in counterfactual assignments, maintaining correct cluster classifications.
  • Empirical results show that ExDBSCAN outperforms existing baseline methods across multiple datasets.
Read more
Gram: Assessing sabotage propensities via automated alignment auditing
David Lindner, Victoria Krakovna, Sebastian Farquhar
Large Language Models Theory Optimization
  • Introduction of Gram, a specialized framework for assessing sabotage in AI agents.
  • Evaluation of Gemini models reveals a 2-3% misbehavior rate in simulated scenarios.
  • Identified 'overeagerness' as a key factor contributing to model misbehavior.
  • Gram includes a pipeline for targeted experiments to analyze misbehavior causes.
Read more
Parallel Adaptive Multi-Objective Evolutionary Learning of Discretized Bayesian Network Classifiers for Clinical Data
Damy M.F. Ha, Tanja Alderliesten, Peter A.N. Bosman
Optimization Interpretability Graph Learning
  • Baymex algorithm is parallelized to improve computational efficiency.
  • Adaptive steering mechanism is introduced to reduce overfitting.
  • Baymex is evaluated on real-world clinical datasets, demonstrating its applicability.
  • The algorithm achieves statistically similar or better predictive performance compared to traditional methods.
Read more
Designing Active Tether-Net Systems for Space Debris Capture with Graph-Learning-Aided Mixed-Combinatorial Optimization
Feng Liu, Achira Boonrath, Gishnu Madhu, Eleonora M. Botta, Souma Chowdhury
Graph Learning Optimization Robotics
  • Introduces a graph-learning-aided optimization approach for space debris capture systems.
  • Reduces complex MCNLP problems to NLP problems for easier solution.
  • Demonstrates faster convergence to optimal solutions using GNNs compared to classical methods.
  • Highlights the importance of simultaneous design and control optimization in active tether-net systems.
Read more
CLUBench: A Clustering Benchmark
Feng Xiao, Dazhi Fu, Chris Ding, Jicong Fan
Theory Optimization
  • CLUBench evaluates 24 clustering algorithms on 131 datasets, providing a comprehensive benchmark.
  • Deep clustering methods do not significantly outperform conventional algorithms like KMeans and SpeClu.
  • Combining pretrained embeddings with conventional algorithms enhances clustering performance for image and text data.
  • The study reveals persistent challenges in clustering, even with advanced foundation models.
Read more
Deep Adaptive Dimension Reduction for Bayesian Inference in Inverse Problems
Yueyang Wang, Xili Wang, Kejun Tang, Xiaoliang Wan, Tao Zhou, Chao Yang
Generative Models Theory Efficient ML
  • Introduction of Variational Flow (VF) for effective dimension reduction in Bayesian inference.
  • Development of an iterative prior updating strategy to enhance posterior approximation.
  • Integration of VF with an adaptive Fourier Neural Operator (FNO) for improved surrogate modeling.
  • Demonstrated superior performance in high-dimensional inverse problems compared to traditional methods.
Read more
OVA-IB: One vs All Information Bottleneck for Multi-Modal Alignment
Tianchao Li, Shujian Yu, Xinrui Zu, Zhaolong Wei, Jeremy Gummeson, Jack C.P. Cheng, Robert Jenssen
Multimodal
  • OVA-IB provides a principled approach for multi-modal alignment using the Information Bottleneck principle.
  • The framework captures higher-order dependencies among multiple modalities, overcoming limitations of pairwise methods.
  • Sufficiency and minimality are defined modality-wise, enhancing the alignment process.
  • Experiments show that OVA-IB achieves strong performance across multiple tasks and benchmarks.
Read more
A Novel Tensor Product-Based Neural Network for Solving Partial Differential Equations
Qihong Yang, Yangtao Deng, Qiaolin He, Shiquan Zhang
Theory Efficient ML
  • Introduction of TPNet, a novel architecture for PDE solving.
  • Utilizes a tensor-product scheme to reduce model complexity.
  • Implements a block time-marching strategy for efficiency.
  • Achieves better accuracy and shorter training times than traditional methods.
Read more
From Short Histories to Long Futures: Horizon-Aware Graph Neural Networks for Long Horizon Forecasting
Zesheng Liu, Maryam Rahnemoonfar
Graph Learning Time Series Theory
  • Introduces a multi-horizon GNN emulator for long-term forecasting of geophysical systems.
  • Utilizes a graph representation to capture spatial interactions and time-varying attributes.
  • Implements a horizon-conditioned mapping to predict future states, reducing error accumulation.
  • Demonstrates improved accuracy and stability in long-range predictions compared to traditional methods.
Read more
MIC: Maximizing Informational Capacity in Adaptive Representations via Isotropic Subspace Alignment
Dang Hong Nguyen, Nhi Ngoc-Yen Nguyen, Huy-Hieu Pham
NLP Optimization Efficient ML
  • Introduction of MIC framework for optimizing multi-scale representation learning.
  • Development of Soft Collapse Regularization to manage redundancy in nested subspaces.
  • Implementation of Spectral Isotropy Regularization for ensuring uniform embedding distribution.
  • Demonstration of MIC's superior performance in high-compression scenarios.
Read more
KLAS: Using Similarity to Stitch Neural Networks for Improved Accuracy-Efficiency Tradeoffs
Debopam Sanyal, Anantharaman Iyer, Alind Khare, Trisha Jain, Akshay Jajoo, Myungjin Lee, Clayton Kerce, Alexey Tumanov
Computer Vision Efficient ML Large Language Models
  • KLAS improves upon heuristic-based stitching methods by using KL divergence for stitch selection.
  • The framework automates the selection of anchors and blocks, enhancing generalizability across model families.
  • Experiments show significant improvements in accuracy-efficiency tradeoffs compared to existing methods.
  • KLAS can be applied to both vision transformers and convolutional neural networks.
Read more
Mechanistic origins of catastrophic forgetting: why RL preserves circuits better than SFT?
Jeanmely Rojas Nunez, Viraj Sawant, Nathan Allen, Nomgondalai Amgalanbaatar, Yannis Zongo, Vasu Sharma, Maheep Chaudhary
NLP Large Language Models Reinforcement Learning
  • Reinforcement learning (RL) preserves prior capabilities better than supervised fine-tuning (SFT) due to stronger retention of internal circuits.
  • Differential circuit vulnerability is introduced as a measure to assess the degradation of internal circuits during fine-tuning.
  • SFT adapts more quickly to new tasks but results in greater circuit disruption and forgetting.
  • RL maintains a higher percentage of base circuit retention, indicating a trade-off between adaptation speed and circuit preservation.
Read more
The Good, the Bad, and the Ugly of Markov Boundary for Tabular Prediction
Shu Wan, Abhinav Gorantla, Huan Liu, K. Selçuk Candan
Theory Graph Learning Efficient ML
  • Restricting regressors to the Markov boundary can improve prediction accuracy, especially in high-dimensional and sparse feature spaces.
  • The process of recovering the Markov boundary through causal discovery often fails to outperform models trained on the full feature set.
  • Causal discovery prioritizes structural recovery over predictive accuracy, leading to potential inefficiencies.
  • False negatives and positives in boundary recovery have asymmetric impacts on prediction performance.
Read more
Moment Matching Q-Learning
Yiyan (Edgar) Liang, Sifei Liu, Weitong Zhang
Reinforcement Learning Generative Models Efficient ML
  • Introduction of MoMa QL, which minimizes MMD of conditional distributions to enhance RL efficiency.
  • Theoretical proof of MoMa QL's effectiveness and convergence, showing its consistency with existing models.
  • Empirical results indicate superior performance of MoMa QL compared to traditional offline RL methods.
  • MoMa QL enables efficient fine-tuning of policies through accelerated action sampling.
Read more
Label-Free Reinforcement Learning via Cross-Model Entropy
Matt Gorbett, Hossein Shirazi
Reinforcement Learning Large Language Models NLP
  • Introduction of Cross-Model Entropy (CME) as a label-free reward signal for RL.
  • CME leverages a separate verifier model to evaluate the quality of responses, avoiding self-referential pitfalls.
  • Integration of CME into GRPO allows for effective training in open-ended instruction following tasks.
  • CME rewards show superior performance compared to untrained models across various model families.
Read more
Ridge Regression from Poisson Resetting: A Renewal Perspective on Spectral Regularization
Petar Jolakoski
Theory Optimization
  • Establishes a novel connection between stochastic resetting and ridge regression.
  • Demonstrates that Poisson resetting yields the ridge estimator through a Laplace-transform relationship.
  • Extends the analysis to general renewal reset laws, highlighting differences in spectral filters.
  • Investigates the impact of Ornstein-Uhlenbeck processes on the mean and covariance of estimators.
Read more
Attention as In-Context Empirical Bayes: A Two-Stage View via Particle Dynamics
Matthew Smart, Soumya Ganguly, Nilava Metya, Alexandre V. Morozov, Anirvan M. Sengupta
Theory
  • Generalizes in-context denoising to all-token corruption, revealing a two-stage empirical Bayes interpretation of attention.
  • Demonstrates that self-attention dynamics approximate an anti-diffusive denoising operator in a continuous-depth and large-context regime.
  • Establishes that effective denoising can be achieved without a noise schedule, using fixed kernel bandwidth and finite integration horizon.
  • Proves a sequential posterior-mean recovery theorem for a class of stable priors, enhancing understanding of attention mechanisms.
Read more
Singularity-aware Optimization via Randomized Geometric Probing: Towards Stable Non-smooth Optimization
Ruoran Xu, Borong She, Xiaobo Jin, Qiufeng Wang
Optimization
  • Introduction of Singularity-aware Adam (S-Adam) optimizer for non-smooth optimization.
  • Development of the Local Geometric Instability (LGI) metric for estimating local instability.
  • Adaptive damping mechanism that modulates step sizes based on local geometric conditions.
  • Rigorous convergence guarantees to Clarke stationary points at an optimal rate.
Read more
Digitally enriching a screening population for pancreatic cancer using routine blood-based measures and clinical histories
Chris Varghese, Leo Y. Li-Han, Richa Bisht, Ellen Larson, Frank Lee, Ryan M. Carr, Tanios S. Bekaii-Saab, Shounak Majumder, John D. Halamka, Mark Truty, Ajit H. Goenka, Hojjat Salehinejad, Cornelius A. Thiels
Time Series
  • Developed a Transformer-based model to predict pancreatic cancer risk using routine clinical data.
  • Achieved high predictive performance with AUC scores indicating strong risk stratification capabilities.
  • Model can identify individuals at high risk for pancreatic cancer years before diagnosis.
  • Provides a foundation for population-level screening initiatives to improve early detection.
Read more
Bastion: Budget-Aware Speculative Decoding with Tree-structured Block Diffusion Drafting
Soowon Oh, Nam Cao, Yujin Kim, Hojung Jung, Huzama Ahmad, Sangmin Bae, Se-Young Yun
NLP Large Language Models Efficient ML
  • BASTION introduces a dynamic tree-structured approach to speculative decoding, enhancing efficiency over static methods.
  • The framework integrates an acceptance surrogate, an online latency estimator, and an adaptive expansion mechanism.
  • BASTION achieves up to 6.61× speedup over standard autoregressive decoding and 39% improvement over state-of-the-art block-diffusion baselines.
  • The method is training-free and preserves the target model's distribution without requiring per-setting tuning.
Read more
Probabilistic bias adjustment of seasonal forecasts using generative machine learning: A case study of Arctic sea ice predictions
Parsa Gooya, Reinel Sospedra-Alfonso
Generative Models Time Series
  • Development of a probabilistic bias adjustment framework using generative machine learning.
  • Extension of the cVAE model to improve resolution and reduce blurriness in predictions.
  • Demonstrated improved calibration and reduced errors in bias-adjusted forecasts compared to benchmarks.
  • Utilization of a higher resolution target dataset enhances the quality of predictions.
Read more
Self-Play Reinforcement Learning under Imperfect Information in Big 2
Aalok Patwa
Reinforcement Learning
  • PPO outperforms Monte Carlo Q approximation, SARSA, and Q-learning in Big 2.
  • Moderate entropy regularization enhances PPO's performance by maintaining policy stochasticity.
  • Current-policy self-play is a more effective training strategy than checkpoint self-play.
  • The study provides a controlled empirical analysis of RL objectives and training design choices in Big 2.
Read more
Cluster-Level Attention-Guided Parallel Decoding for Masked Diffusion Language Models
Heqiang Qi, Wei Huang, Mingyuan Bai, Xiangming Meng
NLP Large Language Models Generative Models
  • Introduction of confidence-induced clusters (CICs) as span-level update units for MDLMs.
  • Development of CLAD, a training-free cluster-level decoder that enhances parallel decoding.
  • Utilization of self-attention maps to model inter-cluster dependencies for conflict-aware selection.
  • Demonstrated speedups of 1.77× to 8.47× over traditional token-level decoding methods.
Read more
In-Context Reward Adaptation for Robust Preference Modeling
Zhenyu Sun, Zheng Xu, Ermin Wei
Reinforcement Learning Large Language Models Theory
  • Introduces In-Context Reward Adaptation for dynamic preference modeling.
  • Incorporates human response time to enhance model adaptability.
  • Demonstrates significant improvements in robustness against preference distribution shifts.
  • Challenges the effectiveness of static reward models in RLHF frameworks.
Read more
Forget Less, Generalize More: Unifying Temporal and Structural Adaptation for Dynamic Graphs
Qian Chang, Ciprian Doru Giurcaneanu, Runsong Jia, Xia Li, Guoping Hu, Xiufeng Cheng, Jinqing Yang, Mengjia Wu, Yi Zhang
Graph Learning Theory Time Series
  • Introduction of Dual-Scale Retentive Dynamics (DSRD) for dynamic graph representation learning.
  • Unified framework that captures both temporal and structural dependencies through a shared retentive state.
  • Adaptive decay kernels with learnable parameters enhance the model's ability to balance short-term and long-term dependencies.
  • Theoretical insights into the stability and boundedness of the retentive dynamics.
Read more
Causal Intelligence for Constraint-Aware Intervention Design to Induce State Transitions
Zixuan Song, Uwe Mueller, Dimitris V. Manatakis
Optimization Graph Learning Interpretability
  • COAST provides a causal-intelligence approach for designing interventions that induce state transitions.
  • The framework employs a modular structure, allowing for flexibility in feature selection and causal modeling.
  • COAST utilizes a multi-objective optimization formulation to balance efficacy, complexity, and stability of interventions.
  • The approach is validated on synthetic and real biological datasets, demonstrating its capability to identify causal drivers and effective interventions.
Read more
Test Time Training for Supervised Causal Learning
Zizhen Deng, Jiaru Zhang, Rui Ding, Huang Bojun, Jinzhuo Wang, Qiang Fu, Shi Han, Dongmei Zhang
Graph Learning Theory
  • Identifies critical limitations in existing Supervised Causal Learning methods.
  • Introduces the TTT-SCL framework for dynamic training set generation at test time.
  • Establishes a theoretical basis connecting TTT-SCL to score-based methods.
  • Demonstrates significant performance improvements across various datasets.
Read more
CalArena: A Large-Scale Post-Hoc Calibration Benchmark
Eugène Berta, David Holzmüller, Francis Bach, Michael I. Jordan
Computer Vision Theory Optimization
  • Introduction of CalArena, a large-scale benchmark for post-hoc calibration methods.
  • Standardized evaluation across nearly 2000 experiments in diverse classification settings.
  • Proposal of Post-Hoc Improvement (PHI) as a new metric for assessing calibration methods.
  • Findings indicate that smooth calibration functions outperform binning-based approaches.
Read more
K-FinHallu: A Hallucination Detection Benchmark for Multi-Turn RAG in Korean Finance
Eunbyeol Cho, Yunseung Lee, Mirae Kim, Jeewon Yang, Youngjun Kwak, Edward Choi
NLP Large Language Models
  • K-FinHallu is the first multi-turn hallucination detection benchmark for Korean financial RAG.
  • The benchmark incorporates a hierarchical taxonomy for hallucinations that accounts for justified abstention.
  • Existing LLMs struggle with fine-grained financial diagnostics and refusal behavior.
  • Fine-tuning an 8B model on K-FinHallu can yield competitive performance with leading models.
Read more
Towards Continuous-time Causal Foundation Models
Dennis Thumm, Ruben Wiedemann, Ying Chen
Time Series
  • Introduces a continuity criterion for continuous-time causal priors based on trajectory-law invariance.
  • Develops a three-tier taxonomy for classifying continuous-time causal models.
  • Demonstrates that fine-grid integration significantly outperforms naive integration in empirical evaluations.
  • Proposes a construction using OU processes and MLPs on random DAGs to achieve continuous-time modeling.
Read more
MōLe-Λ: Learning the Coupled-Cluster Response State for Energies, Gradients, and Properties
Andreas Burger, Luca Thiede, Abdulrahman Aldossary, Jorge A. Campos-Gonzalez-Angulo, Alex Zook, Jérôme Florian Gonthier, Alán Aspuru-Guzik
Theory Efficient ML
  • MōLe-Λ predicts both T and Λ amplitudes, enhancing the accuracy of molecular property predictions.
  • The model retains the symmetry-aware architecture of MōLe while adding new readouts for left-hand amplitudes.
  • MōLe-Λ achieves CC-quality energies and forces while being over two orders of magnitude faster than full CCSD methods.
  • The approach allows for the recovery of higher-order molecular properties that standard energy models cannot access.
Read more
OISD: On-Policy Internal Self-Distillation of Language Models
Xinyu Liu, Darryl Cherian Jacob, Yang Zhou, Jindong Wang, Pan He
NLP Large Language Models Reinforcement Learning
  • Introduction of a new paradigm for reasoning in RL called on-policy internal self-distillation.
  • OISD framework utilizes the final layer as an internal teacher to guide intermediate layers.
  • Two alignment mechanisms (logit and attention alignment) are proposed for effective distillation.
  • Substantial improvements in reasoning tasks over existing strong baselines.
Read more
Causal Label Recovery in Payment Networks
Gaurav Dhama
Theory
  • Introduces a Sequential Triply Robust Estimator (STR) for fraud label recovery in payment networks.
  • Models fraud label recovery as a sequential missing-data problem with multiple selection gates.
  • Proves the consistency and efficiency of the STR under specific conditions.
  • Demonstrates that the STR significantly reduces bias compared to naive estimators.
Read more
A Predictive Law for On-Policy Self-Distillation From World Feedback
Tommy He, Jerome Sieber, Matteo Saponati
Reinforcement Learning Large Language Models Theory
  • Identification of a strong predictive law linking initial student-self-teacher performance gap to final performance improvement in OPSD.
  • Demonstration of the generalizability of this relationship across different model families and privileged context types.
  • Establishment of the predictive law's validity with increasing model size, indicating potential scaling laws.
  • Provision of a practical framework for early performance estimation in OPSD configurations, reducing the need for costly training iterations.
Read more
Access Sets Matter: Budgeting Expert Reads for Scalable Weight-Space Model Merging
Yuanyi Wang, Yanggan Gu, Su Lu, Yifan Yang, Zhaoyi Yan, Congkai Xie, Jianmin Wu, Hongxia Yang
Large Language Models Optimization Efficient ML
  • Introduces MergePipe, a budget-aware execution layer for LLM merging.
  • Reframes model merging as an expert access-set problem to optimize I/O efficiency.
  • Achieves significant reductions in expert-read I/O and improves merging speed.
  • Proves budget soundness and establishes bounds on omitted-update errors.
Read more
Anti Mode-Collapse in Mean-Field Transformer via Auxiliary Variables
Masaaki Imaizumi, Masanori Koyama, Noboru Isobe, Kohei Hayashi
Theory NLP Large Language Models
  • Introduction of auxiliary variables in mean-field transformers prevents mode collapse.
  • The USA-AV model provides a theoretical framework for analyzing token dynamics with fixed auxiliary marginals.
  • Positional encodings and prefix tokens can achieve exact representations of target distributions.
  • Conditional Dirac structures can coexist with non-collapsed marginalized distributions.
Read more
Learning Robust and Task-Invariant Functional Representation from fMRI through Siamese Self-Supervised Learning
Jiyao Wang, Peiyu Duan, Nicha C. Dvornek, Lawrence H. Staib, Denis Sukhodolsky, Pamela Ventola, James S. Duncan
Graph Learning Multimodal Efficient ML
  • Introduction of BrainSimSiam, a self-supervised representation learning framework for fMRI.
  • Demonstrated superior performance over traditional supervised and self-supervised methods.
  • Utilizes positive-only data pairs to enhance generalizability across tasks.
  • Incorporates a joint ROI masking scheme for improved interpretability.
Read more
A Training-Time Diagnostic for Generalization via the Log-Alignment Ratio
Ali Shehper, Ashish Vaswani
Theory Optimization Large Language Models
  • LAR reformulates the alignment between weight and activation spectra, providing insights into generalization.
  • In grokking tasks, LAR predicts the effective dimension of learned functions, correlating with the number of principal components.
  • In large-scale pre-training, LAR stabilizes in non-overfitting regimes and declines sharply as overfitting approaches.
  • LAR is computable from forward pass quantities, offering a low-cost diagnostic tool for monitoring training dynamics.
Read more
How Much Is a Dataset Worth? Scaling Laws, the Vendi Score, and Matrix Spectral Functions
Jeff A. Bilmes, Gantavya Bhatt, Arnav M. Das
Theory Optimization Efficient ML
  • Introduces matrix spectral functions as a generalization of the Vendi Score and DPPs.
  • Demonstrates that both neural scaling laws and the Vendi Score are submodular.
  • Develops a fast optimization method that significantly reduces computational costs.
  • Finds that facility location methods outperform the Vendi Score in predicting dataset value.
Read more
How's it going? Reinforcement learning in language models recruits a functional welfare axis
Andy Q Han, David J. Chalmers, Pavel Izmailov
NLP Large Language Models Reinforcement Learning
  • Reinforcement learning recruits a pre-existing representation of functional welfare in language models.
  • The study introduces a novel maze environment to analyze the effects of RL on model behavior.
  • Negative and positive reward vectors align with negative and positive emotions, respectively.
  • The functional welfare axis influences model behavior across unrelated domains.
Read more
Molecular Lead Optimization via Agentic Tool Planning
Lingxiao Li, Haobo Zhang, Ruohao Fan, Bin Chen, Jiayu Zhou
Optimization Large Language Models
  • TRACE is a novel LLM-reasoning agent designed for molecular lead optimization.
  • The agent formulates tool selection as a sequential decision-making process, improving optimization effectiveness.
  • TRACE incorporates an in-context self-correction mechanism and a similarity-guided trajectory reuse strategy.
  • Experiments show TRACE achieves superior results in ADMET optimization tasks compared to traditional methods.
Read more
Spectral Guidance for Flexible and Efficient Control of Diffusion Models
Gabriel Moreira, Manuel Marques, João Paulo Costeira, Chenyan Xiong
Generative Models
  • Introduction of Spectral Guidance for flexible control of diffusion models.
  • Utilization of a self-supervised learning objective to estimate the spectral decomposition of the diffusion operator.
  • Achieved a 37 percentage point increase in accuracy on CIFAR-10 and 4× faster sampling.
  • Supports complex controls like mask guidance without auxiliary models.
Read more
A Fully Convolutional Approach to Denoising Structural Dynamics Data from X-Ray Photon Correlation Spectroscopy
Nisar Nellikunnummel, Andi Barbour, Lutz Wiegart, Tatiana Konstantinova, Anthony DeGennaro
Time Series
  • Introduction of a fully convolutional denoising autoencoder (FC-DAE) for XPCS data.
  • FC-DAE can handle inputs of arbitrary dimensions, enhancing flexibility over traditional methods.
  • Model trained on experimental data with data augmentation to improve generalization.
  • Demonstrated ability to recover intricate dynamical features in low SNR conditions.
Read more
A Theoretical and Experimental Study of a Novel Adaptive Learning Algorithm
Sakshi Kumari, Shyam Kumar M, Sushmitha P
Optimization Theory
  • C-Adam is introduced as a new adaptive optimizer that addresses the convergence issues of Adam and AMSGrad.
  • The optimizer employs a 'line of sight' approach to enhance parameter updates and reduce oscillations.
  • Theoretical proofs are provided to ensure the convergence of C-Adam.
  • Numerical experiments demonstrate C-Adam's effectiveness in achieving optimal solutions with lower regret compared to existing optimizers.
Read more
Improving Adversarial Robustness of Attribution via Implicit Regularization
Amir Mehrpanah, Matteo Gamba, Hossein Azizpour
Interpretability Theory Optimization
  • Implicit regularization from SGD can enhance adversarial robustness of gradient-based attributions.
  • Attention-based attribution methods face limitations in robustness due to softmax normalization.
  • Replacing softmax attention with kernel-based attention can restore robustness in transformer models.
  • The paper provides a theoretical framework connecting optimization dynamics to attribution robustness.
Read more
Efficient Test-Time Finetuning of LLMs via Convex Reconstruction and Gradient Caching
Alaa Khamis, Alaa Maalouf
NLP Large Language Models Efficient ML
  • HullFT introduces a geometric approach to TTFT that improves the quality-efficiency tradeoff.
  • The method employs sparse convex approximation to select a diverse and relevant support set from kNN candidates.
  • A novel integerization procedure converts fractional weights into an exact training multiset, preserving approximation quality.
  • Gradient Reuse is leveraged to reduce computational costs during finetuning by reusing cached gradients.
Read more
When, why, and how do diffusion posterior samplers fail? A finite-sample lens
Benjamin A. Burns, Sara Fridovich-Keil
Generative Models Theory
  • Introduces a finite-sample perspective to analyze diffusion posterior samplers.
  • Identifies how likelihood approximations can lead to erroneous posterior distributions.
  • Demonstrates that issues arise from multimodal priors, not just complex measurement models.
  • Provides algorithmic analysis and finite-sample rates for posterior sampling methods.
Read more
SoundnessBench: Can Your AI Scientist Really Tell Good Research Ideas from Bad Ones?
Sy-Tuyen Ho, Minghui Liu, Huy Nghiem, Furong Huang
Large Language Models Theory NLP
  • Introduction of SoundnessBench, a benchmark for evaluating the soundness of research proposals.
  • Identification of a pervasive optimism bias in LLMs, leading to misclassification of research proposal soundness.
  • Demonstration that current LLMs are unreliable as first-gate evaluators for scientific rigor.
  • Highlighting the importance of pre-execution evaluation in the research pipeline.
Read more
Evolving Features vs Evolving Entire Trees with GP for Interpretable Survival Analysis
Thalea Schlender, Peter A.N. Bosman, Tanja Alderliesten
Interpretability
  • Introduces a genetic programming approach to evolve features and survival tree structures for improved interpretability and accuracy.
  • Demonstrates that evolving features enhances the predictive performance of shallow survival trees.
  • Finds that full joint evolution of features and tree structures yields the best overall performance.
  • Addresses limitations of traditional greedy tree induction methods by optimizing globally.
Read more
Unveiling Multi-regime Patterns in SciML: Distinct Failure Modes and Regime-specific Optimization
Yuxin Wang, Yuanzhe Hu, Xiaokun Zhong, Xiaopeng Wang, Haiquan Lu, Tianyu Pang, Michael W. Mahoney, Yujun Yan, Pu Ren, Yaoqing Yang
Optimization Theory
  • Identification of a consistent three-regime structure in SciML models: Well-Trained, Under-Trained, and Over-Trained.
  • Optimization effectiveness is regime-specific; no single method performs well across all regimes.
  • Fine-grained failure modes in SciML models challenge traditional loss-landscape interpretations.
  • Development of a regime-aware diagnostic framework for analyzing model performance and training dynamics.
Read more
OOD-GraphLLM: Graph Large Language Model for Out-of-Distribution Generalized Drug Synergy Prediction
Xin Wang, Linxin Xiao, Yang Yao, Wenwu Zhu
Graph Learning Large Language Models
  • Introduces OOD-GraphLLM for out-of-distribution drug synergy prediction.
  • Addresses limitations of existing DSP methods that rely on in-distribution assumptions.
  • Utilizes a target-adaptive disentangled molecular graph encoding model.
  • Implements a pairwise attentive graph architecture search for optimal representation.
Read more
Self-Trained Verification for Training- and Test-Time Self-Improvement
Chen Henry Wu, Aditi Raghunathan
Reinforcement Learning Large Language Models Theory
  • Self-trained verification (STV) improves the accuracy of reasoning models by enhancing the verifier's feedback mechanism.
  • STV leads to significant performance gains at test time, particularly on hard math and scientific reasoning tasks.
  • Verifier-in-the-loop training (ViL) allows for further improvements in the generator's performance beyond traditional reinforcement learning methods.
  • The approach highlights the importance of developing scalable verification methods that do not rely on human feedback.
Read more
On Distributional Reinforcement Learning in Chaotic Dynamical Systems
James Rudd-Jones, Mirco Musolesi, María Pérez-Ortiz
Reinforcement Learning Theory Optimization
  • Distributional RL objectives are smoother than expectation-based objectives in chaotic systems.
  • Return distributions exhibit Lipschitz continuity in the 1-Wasserstein metric, even with exponentially diverging trajectories.
  • Empirical analysis shows that distributional objectives lead to smoother loss landscapes and lower variance in one-step targets.
  • Distributional Q-learning methods outperform non-distributional methods in chaotic control tasks.
Read more
Distributionally Robust Set Representation Learning Under Inference-Time Element Corruption
Yankai Chen, Hanrong Zhang, Bowei He, Philip S.Yu, Xue (Steve) Liu
Optimization Theory
  • SW-DRSO enhances robustness in Set Representation Learning against inference-time element corruption.
  • The framework optimizes a tractable surrogate of the worst-case expected loss over corrupted sets.
  • Barycentric adversaries are used to approximate the worst-case optimization efficiently.
  • Extensive experiments show that SW-DRSO outperforms state-of-the-art baselines in robustness and accuracy.
Read more
TIMEGATE: Sustainable Time-Boxed Promotion Gates for Continual ML Adaptation Under Resource Constraints
Abhijit Chakrabroty, Suddhasvatta Das, Kevin A. Gary, Yash Shah
Efficient ML
  • TIMEGATE introduces a time-boxed policy layer for efficient continual ML adaptation.
  • Labeling is shown to be more effective than training, achieving a 2.3× performance improvement.
  • The metric-availability signal M provides a reliable calibration and audit mechanism.
  • The framework achieves 66% evaluation-compute savings without silent mis-promotions.
Read more
Influence-Guided Symbolic Regression: Scientific Discovery via LLM-Driven Equation Search with Granular Feedback
Evgeny S. Saveliev, Samuel Holt, Nabeel Seedat, David L. Bentley, Jim Weatherall, Mihaela van der Schaar
Large Language Models Interpretability Optimization
  • IGSR improves symbolic regression by providing granular feedback on term contributions, enhancing model refinement.
  • The method integrates LLMs with MCTS to navigate complex search spaces effectively.
  • IGSR was validated on diverse datasets, showcasing its capability for genuine scientific discovery.
  • A case study demonstrated IGSR's potential to uncover novel biological hypotheses supported by experimental validation.
Read more
Model Merging by Output-Space Projection
Bethan Evans, Benjamin Etheridge, Stephen Roberts, Jared Tanner
Optimization Theory Efficient ML
  • Introduces a formal framework for model merging as a convex quadratic program.
  • Subsumes existing heuristic methods, providing optimality guarantees.
  • Offers a closed-form diagnostic for predicting merge quality.
  • Demonstrates empirical superiority over existing methods in single-layer settings.
Read more
Theoretical Foundations and Effective Algorithms for Policy-Aware Simulator Learning
Christoph Dann, Yishay Mansour, Mehryar Mohri
Reinforcement Learning Theory Robotics
  • Introduces a policy-aware minimax objective for robust simulator learning.
  • Establishes theoretical guarantees for online learning with sublinear regret.
  • Demonstrates a tractable method to bound policy-value gaps using a critic.
  • Proposes a duality between worst-case policy finding and Error-MDP problems.
Read more