AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

59 Papers today
8h Update frequency
7 Days of history
OPRD: On-Policy Representation Distillation
Shenzhi Yang, Guangcheng Zhu, Bowen Song, Haobo Wang, Mingxuan Xia, Xing Zheng, Yingfan Ma, Zhongqi Chen, Weiqiang Wang, Gang Chen
NLP Large Language Models Efficient ML
  • OPRD shifts the focus of on-policy distillation from output probabilities to hidden-state representations.
  • The method eliminates sampling variance in gradient estimation, leading to more stable training.
  • OPRD exposes rich structural information from the teacher's intermediate layers, enhancing supervision.
  • Empirical results show OPRD outperforms traditional OPD methods on competitive benchmarks.
Read more
Generative Criticality in Large Language Model Temperature Scaling
Huajian Ruan, Jinyang Li, Xingyu Guo, Lingxiao Wang
NLP Large Language Models Theory
  • Introduction of a statistical-field framework for analyzing LLM outputs.
  • Identification of critical behavior in LLM text generation driven by temperature scaling.
  • Validation of findings through intrinsic dimension estimation using the TwoNN method.
  • Observations of susceptibility peaks and order parameter changes near a critical temperature.
Read more
A Sliced-Wasserstein Framework on Correlation Matrices for EEG Decoding
Chen Hu, Rui Wang, Jiale Zhou, Jingjun Yi, Shaocheng Jin, Yidong Song, Yefeng Zheng
Time Series Theory Efficient ML
  • Introduction of a Sliced-Wasserstein framework for EEG decoding using correlation matrices.
  • Development of Pullback Euclidean Metric Sliced Wasserstein (PEMSW) discrepancies.
  • Demonstration of improved generalization in EEG decoding under distribution shifts.
  • Low training overhead and no additional inference cost associated with the proposed method.
Read more
Maximising the Set-Piece Return: Optimising Football Corner Tactics with Graph Reinforcement Learning
Sean Groom, Michael Groom, Francisco Belo, Axl Rice, Liam Anderson, Victor-Alexandru Darvariu, Shuo Wang
Reinforcement Learning Graph Learning Optimization
  • Introduces a reinforcement learning framework for optimizing football corner kick tactics.
  • Formulates corner kick optimization as a Markov Decision Process (MDP) to facilitate reward-driven exploration.
  • Utilizes Graph Neural Networks to capture the spatial dynamics of player positions.
  • Demonstrates superior performance compared to traditional optimization methods on real-world data.
Read more
DiffSlack: Learning under Nonlinear Inequality Constraints via Learnable Slack Variables
Ziqian Wang, Chenxi Fang, Zhen Zhang
Robotics Optimization Theory
  • DiffSlack reformulates nonlinear inequalities as equalities using learnable slack variables, enhancing constraint satisfaction in neural networks.
  • The framework incorporates a differentiable projection layer that allows for end-to-end training while ensuring feasibility.
  • A two-stage curriculum learning approach stabilizes training and improves performance on complex tasks.
  • DiffSlack achieves superior results in vehicle path planning, outperforming traditional and learning-based methods.
Read more
DP-MacAdam: Differentially Private Mechanism with Adaptive Clipping and Adaptive Momentum
Naima Tasnim, Lalitha Sankar, Oliver Kosut
Optimization Theory Efficient ML
  • DP-MacAdam combines adaptive clipping and adaptive momentum under differential privacy.
  • The algorithm uses the same empirical statistics for both clipping and momentum, enhancing training efficiency.
  • A novel bias correction factor is introduced for unbiased gradient variance estimation.
  • Empirical results show improved performance over existing DP optimizers without manual tuning.
Read more
Causal Longitudinal Prior-Fitted Networks for Counterfactual Outcome Prediction
Amirhossein Zare, Amirhessam Zare, Herlock Rahimi, Reza Salarikia, Mohammad Kashkooli
Time Series Theory Efficient ML
  • CAUSALLONGPFN is the first PFN-style model for longitudinal causal prediction under planned treatment sequences.
  • The model is pretrained on synthetic data, allowing it to handle complex longitudinal dynamics without retraining.
  • It provides competitive performance against traditional longitudinal causal estimators on various benchmarks.
  • The approach eliminates the need for domain-specific training at test time, making it more efficient.
Read more
A prism hierarchy of learning regimes in large linear autoencoders
Eugene Golikov, Yaroslav Gusev, Dmitry Yarotsky
Theory Optimization
  • Introduction of a prism hierarchy to classify extreme learning regimes in linear autoencoders.
  • Identification of five basic extreme regimes associated with specific hyperparameter scaling relations.
  • Extension of diagram-based methods to analyze finite training sets, enabling separate study of train and population losses.
  • Derivation of explicit loss evolution expressions for four of the five regimes, with strong empirical validation.
Read more
MDP-GRPO: Stabilized Group Relative Policy Optimization for Multi-Constraint Instruction Following
Mohammad Mahdi Salmani-Zarchi, Zahra Rahimi, Heshaam Faili, Mohammad Javad Dousti
Reinforcement Learning Large Language Models Optimization
  • Identification of three failure modes in group-relative policy updates: low-variance amplification, mean-centering blindness, and zero-variance collapse.
  • Introduction of multi-temperature sampling to enhance reward diversity in small-batch settings.
  • Development of dual-anchor advantages to restore learning signals in problematic reward scenarios.
  • Application of bounded, asymmetric advantage shaping based on Prospect Theory to improve robustness.
Read more
On the training of physics-informed neural operators for solving parametric partial differential equations
Nanxi Chen, Chuanjie Cui, Airong Chen, Sifan Wang, Rujin Ma
Optimization Theory Efficient ML
  • CViT architecture consistently delivers strong performance in PINO training.
  • Optimization challenges such as gradient imbalance and causal violation arise in PINOs.
  • Mitigation strategies from PINN training improve PINO predictive accuracy.
  • Physics-informed training can outperform data-driven approaches under certain conditions.
Read more
Revisiting Prototype Rehearsal for Exemplar-Free Continual Learning: Manifold-Aware Boundary Sampling with Adaptive Class-Balanced Loss
Hongye Xu, Bartosz Krawczyk
Computer Vision
  • Prototype rehearsal can be competitive in EFCIL if redesigned to consider nearest-enemy information and class imbalance.
  • Constrained Expansive Over-Sampling (CEOS) generates boundary-aware rehearsal samples that respect the data manifold.
  • Adaptive Class-Balanced (ACB) loss addresses the imbalance between old and new classes through temporal weighting.
  • The proposed methods achieve state-of-the-art performance in EFCIL benchmarks, closing the gap with drift-compensation methods.
Read more
From Prediction to Self: Developmental Conditions for Agency in Minimal Neural Systems
Evan Ye
Theory Robotics Time Series
  • Identifies four critical developmental conditions for agency in neural systems.
  • Introduces agency gain as a measurable metric for self-awareness in predictive systems.
  • Demonstrates that self-representation is contingent on causal usefulness.
  • Falsifies 12 hypotheses that clarify the limitations of predictive coding and passive memory.
Read more
Staged Factorial Screening for Budget-Constrained Micro-Pretraining
Felipe Chavarro Polania
Optimization Efficient ML Theory
  • Introduces a staged screening methodology for micro-pretraining that emphasizes factorial design and local refinement.
  • Demonstrates that early recipe effects are significantly influenced by budget constraints, particularly in terms of batch size and model depth.
  • Identifies specific configurations that retain performance effects under budget constraints, while random searches can yield competitive results without clear factor attribution.
  • Recommends a bridge-centered approach for effective long-term performance across different hardware setups.
Read more
ERRORQUAKE: Heavy-Tailed Error Severity Distributions in Open-Weight Large Language Models
Jason Z Wang
NLP Large Language Models Theory
  • ERRORQUAKE-10K benchmark scores model responses on a continuous severity scale, revealing nuanced error distributions.
  • A Non-Reducibility Theorem proves that error severity profiles and error rates are informationally non-redundant.
  • Significant differences in error severity distributions exist among models with similar accuracy, indicating the need for more detailed evaluation metrics.
  • Human validation confirms the reliability of the severity scoring system, with high inter-rater agreement.
Read more
Quantifying the Privacy of Counterfactuals by Leveraging Membership Inference Attacks Against Synthetic Data
Maryam Babaei, Yingke Wang, Hadrien Lautraite, Heber H. Arcolezi, Ulrich Aivodji, Sebastien Gambs
Theory
  • Counterfactuals can reveal sensitive information and are vulnerable to privacy attacks.
  • Membership inference attacks designed for synthetic data can be adapted to counterfactuals.
  • Successful MIAs can be conducted without querying the model, using only the released counterfactuals.
  • The study introduces an ensembling MIA that operates in a no-box setting, expanding the attack landscape.
Read more
Generalized TVโ€“โ„“p Structured Priors for Bayesian T1 Mapping
Disi Lin, Martin Berggren, Tommy Lรถfstedt
Theory
  • Introduction of a generalized TVโ€“โ„“p prior for Bayesian T1 mapping.
  • Demonstrated properness of the proposed prior and its integration into a Bayesian framework.
  • Evaluation against multiple existing methods shows improved uncertainty quantification.
  • Results indicate lower variance and bias in estimates, enhancing reliability.
Read more
Towards Unified and Data-Efficient Prognostics and Health Management with Tabular Foundation Models
Raffael Theiler, Lev Telyatnikov, Leandro Von Krannichfeldt, Olga Fink
Time Series Efficient ML
  • Tabular Foundation Models can effectively handle fragmented and partially observed industrial PHM data.
  • In-context learning allows for task adaptation without retraining, reducing deployment overhead.
  • The proposed models outperform traditional methods in low-data regimes and across various PHM tasks.
  • Temporal context can be preserved in tabular representations, enhancing model performance.
Read more
Non-Negative Matrix Factorization for Event Data
Raphaรซl Romero
Time Series
  • Introduction of EventNMF, a continuous-time NMF model for event data.
  • Directly models event times as Poisson processes, avoiding preprocessing pitfalls.
  • Utilizes non-negative B-spline basis for latent temporal factors.
  • Demonstrates effectiveness on synthetic and real-world datasets.
Read more
Trust-Aware Predictive Emissions Monitoring for Gas Turbine Fleets with Limited Labelled Data
Rebecca Potts, Aiden Durrant, Rick Hackney, Georgios Leontidis
Time Series
  • Introduces a trust-aware framework for emissions prediction in gas turbine fleets with limited labelled data.
  • Combines multiple techniques for uncertainty quantification and confidence estimation to assess prediction reliability.
  • Demonstrates significant reduction in prediction error through confidence-based filtering.
  • Provides actionable insights for deploying predictive emissions monitoring systems in industrial settings.
Read more
On Advantage Estimates for Max@K Policy Gradients
Shota Takashiro, Soichiro Nishimori, Paavo Parmas, Yongmin Kim, Kohsei Matsutani, Gouki Minegishi, Yusuke Iwasawa, Takeshi Kojima, Yutaka Matsuo
Reinforcement Learning Large Language Models Optimization
  • Introduces a Leave-Two-Out baseline for policy-gradient estimators that ensures centered advantages.
  • Develops MaxPO, an efficient method for optimizing max@K objectives in reinforcement learning.
  • Provides a unified framework for understanding existing advantage estimators in the context of max@K.
  • Empirical results show significant reductions in gradient variance and improved performance on reasoning tasks.
Read more
Dominant-Layer ZO: A Single Layer Dominates Zeroth-Order Fine-Tuning of LLMs
Wanhao Yu, Ziyan Wang, Zheng Wang, Abeer Matar Almalky, Yihang Zuo, Shuteng Niu, Sen Lin, Adnan Siraj Rakin, Deliang Fan, Li Yang
Large Language Models Optimization Efficient ML
  • Discovery of a dominant-layer phenomenon in ZO fine-tuning, where tuning a single layer can match or exceed full-model performance.
  • The dominant layer is task-agnostic but model-specific, identifiable through activation outlier analysis.
  • Perturbation effects propagate effectively through the dominant layer, enhancing optimization signals under ZO updates.
  • Dominant-layer ZO fine-tuning shows improved performance and significant training speedup compared to existing methods.
Read more
When Good Enough Is Optimal: Multiplication-Only Matrix Inversion Approximation for Quantized Gated DeltaNet
Luoming Zhang, Yuwei Ren, Kui Zhang, Tian Liu, Lingjuan Ge, Denghao Li, Matthew Harper Langston, Yin Huang, Weiliang Will Zeng, Liang Zhang
Large Language Models Efficient ML NLP
  • Introduces a multiplication-only algorithm for matrix inversion in Gated DeltaNet, enhancing computational efficiency.
  • Utilizes a truncated Neumann series with structural masking to eliminate sequential dependencies.
  • Achieves up to 5ร— speedup in kernel execution on NPUs without sacrificing accuracy.
  • Adapts the method for low-bit integer quantization, addressing dynamic range issues.
Read more
LLM Explainability with Counterfactual Chains and Causal Graphs
Nirit Nussbaum-Hoffer, Nitay Calderon, Liat Ein-Dor, Roi Reichart
NLP Large Language Models Interpretability
  • Introduces a causal paradigm for explainability in LLMs using causal graphs.
  • Develops an MCMC-inspired method for generating counterfactuals to enhance data coverage.
  • Evaluates the approach on three LLMs across various classification tasks.
  • Demonstrates that discovered causal graphs reflect meaningful dependencies in LLM reasoning.
Read more
Consistency Training Along the Transformer Stack
Sukrati Gautam, Neil Shah, Arav Dhoot, Bryan Maruyama, Caroline Wei, Rohan Kapoor, Robert Sidey, Prakhar Gupta, Zi Cheng Huang, David Demitri Africa
NLP Large Language Models Theory
  • Introduction of two new consistency training methods: MLPCT and AttCT.
  • Application of consistency training to four new threat models, enhancing model robustness.
  • Discovery of cross-threat generalization, where training on one threat improves performance on others.
  • Identification of a shared mechanism among new methods, with BCT operating distinctly.
Read more
PC Layer: Polynomial Weight Preconditioning for Improving LLM Pre-Training
Senmiao Wang, Tiantian Fang, Haoran Zhang, Yushun Zhang, Kunxiang Zhao, Alex Schwing, Ruoyu Sun
Large Language Models Optimization Theory
  • Introduction of the PC layer for polynomial weight preconditioning in LLMs.
  • Empirical results show significant improvements in training efficiency and accuracy.
  • Theoretical proof linking weight spectrum control to convergence rates in deep linear networks.
  • No additional inference cost after training, making it practical for real-world applications.
Read more
Representation Learning Enables Scalable Multitask Deep Reinforcement Learning
Johan Obando-Ceron, Lu Li, Scott Fujimoto, Pierre-Luc Bacon, Aaron Courville, Pablo Samuel Castro
Reinforcement Learning Robotics Efficient ML
  • Representation learning is essential for scalable multitask reinforcement learning.
  • The MR.Q algorithm outperforms model-based methods and deep RL baselines.
  • Increased model capacity leads to consistent performance improvements.
  • Predictive objectives are critical for effective representation learning.
Read more
A Machine Learning-Based Framework for Discovering Huntington's Disease Stages: Integrating Graph Representation Learning and clustering to Uncover Progression Dynamics in Longitudinal Enroll-HD Dataset
Lubna M. Abu Zohair, Marta Vallejo, MD Azher Uddin, John R. Woodward, Hind Zantout
Graph Learning Time Series Multimodal
  • Developed an unsupervised machine learning framework for identifying Huntington's disease stages.
  • Utilized graph-based representation learning to capture temporal relationships in longitudinal clinical data.
  • Discovered four meaningful disease stages with clear clinical measurement boundaries.
  • Achieved robust clustering performance, surpassing traditional clinical staging methods.
Read more
Tangram: Unlocking Non-Uniform KV Cache for Efficient Multi-turn LLM Serving
Hyungmin Kim, Minsoo Kim, Hongseok Kim, Jungwook Choi
Large Language Models Efficient ML
  • Tangram improves multi-turn LLM serving efficiency by addressing non-uniform KV cache challenges.
  • The system utilizes deterministic memory scheduling to eliminate runtime overhead.
  • It employs a decoupled paging architecture to maximize memory reclamation and reduce fragmentation.
  • Ahead-of-Time load balancing ensures uniform GPU utilization without runtime planning delays.
Read more
CLaaS: Continual learning as a service for sample efficient online learning
Kion Fallah, Silen Naihin, Barak Widawsky, Qingqing Mao
NLP Large Language Models Reinforcement Learning
  • CLaaS enables sample-efficient continual learning from real-world deployment experiences.
  • The system utilizes an experience replay buffer to enhance gradient reuse during training.
  • CLaaS outperforms traditional in-context learning methods in terms of retention and adaptability.
  • The approach facilitates real-time improvements to agent performance through a chat API.
Read more
Less is MoE: Trimming Experts in Domain-Specialist Language Models
Haoze He, Xinkai Zou, Xuan Jiang, Xingyuan Ding, Ao Qu, Juncheng Billy Li, Heather Miller
NLP Large Language Models Efficient ML
  • Fisher importance outperforms traditional metrics for identifying critical model dimensions.
  • Fisher-MoE enables fine-grained compression at the intermediate dimension level, preserving model capabilities.
  • At a 50% compression ratio, Fisher-MoE reduces memory usage by ~45% and increases inference speed by 21%.
  • The study reveals that model capabilities are distributed across experts but concentrated in a small subset of intermediate dimensions.
Read more
Your GFlowNet Secretly Learns an Optimal Transport Plan
Ian Maksimov, Nikita Morozov, Denis Belomestny, Sergey Samsonov
Generative Models Graph Learning Optimization
  • Establishes a theoretical link between GFlowNets and optimal transport problems.
  • Demonstrates that fixing the initial flow in GFlowNets leads to a Kantorovich OT formulation.
  • Shows that GFlowNets can recover optimal transport plans and approximate solutions effectively.
  • Expands the GFlowNet framework's applicability to large graph OT problems.
Read more
Sharp First-Order Lower Bounds for Higher-Order Smooth Nonconvex Optimization
Dongruo Zhou
Optimization Theory
  • Introduces dimension-free first-order lower bounds for higher-order smooth nonconvex functions.
  • Establishes matching lower bounds of โ„ฆ(ฯตโˆ’7/4) for Hessian-Lipschitz functions and โ„ฆ(ฯตโˆ’5/3) for third-order-smooth functions.
  • Utilizes a block-chain mechanism for constructing hard instances that preserve smoothness.
  • Closes long-standing gaps in the lower-bound landscape for first-order oracle complexity.
Read more
End-to-End Subgraph Detection with GraphDETR
Dexiong Chen, Till Hendrik Schulz, Karsten Borgwardt
Graph Learning
  • GraphDETR reformulates subgraph detection as a set prediction problem, enhancing efficiency and scalability.
  • The framework allows for both exact and approximate matching of subgraphs, overcoming limitations of traditional methods.
  • GraphDETR achieves strong performance in detecting molecular functional groups, with an average precision of 91.2 on the ChEMBL dataset.
  • The model's architecture integrates GNNs and transformer-based set prediction, providing a unified and end-to-end solution.
Read more
What Objects Enable, Not What They Are: Functional Latent Spaces for Affordance Reasoning
Rohan Siva, Neel P. Bhatt, Yunhao Yang, Seoyoung Lee, Nishant Gadde, Christian Ellis, Alvaro Velasquez, Zhangyang Wang, Ufuk Topcu
Robotics
  • Introduction of A4D framework for affordance reasoning in robot planning.
  • Mapping of visual observations to a functional latent space based on object functionalities.
  • Achievement of 94% inference accuracy on existing affordances, outperforming previous methods.
  • Improvement of new-affordance inference accuracy from ~70% to over 90% with minimal training data.
Read more
Anomaly Detection for Electro-Hydrostatic Actuators using LSTM Autoencoder
Nehal Afifi, Abdelmonem Elhendawi, Felix Leitenberger, Nadine Piat, Sven Matthiesen
Time Series
  • Proposes an LSTM autoencoder framework for anomaly detection in EHA sensor signals.
  • Achieves high accuracy (99.0%) and precision (100%) in detecting anomalies.
  • Demonstrates effectiveness under various fault-injection scenarios.
  • Addresses limitations of traditional anomaly detection methods in capturing temporal dependencies.
Read more
Adaptive state-action abstractions via rate-distortion
Fernando E. Rosas
Reinforcement Learning Robotics Theory
  • Introduces soft state-action abstractions that adaptively adjust granularity during learning.
  • Develops a learning-abstraction decomposition that separates value error into learning and abstraction errors.
  • Proposes an adaptive abstraction principle that refines abstractions based on learning progress.
  • Demonstrates the effectiveness of the framework on classic tabular control benchmarks.
Read more
HoT-SSM:Higher-order Temporal Knowledge Graph Reasoning with State Space Models for Health Care
Thummaluru Siddartha Reddy, Vempalli Naga Sai Saketh, Yash Punjabi, Mahesh Chandran
Graph Learning Time Series Interpretability
  • Introduces a temporal knowledge-infused hypergraph framework for modeling EHR data.
  • Proposes a dynamic hypergraph state space model to capture higher-order relationships and long-range temporal information.
  • Demonstrates significant performance improvements over existing state-of-the-art models on clinical prediction tasks.
  • Establishes theoretical guarantees for the robustness of the learned representations.
Read more
TailLoR: Protecting Principal Components in Parameter-Efficient Continual Learning
Marius Dragoi, Ioana Pintilie, Alexandra Dragomir, Antonio Barbalau, Florin Brad
Efficient ML NLP Large Language Models
  • TailLoR introduces a low-rank adaptation method that protects dominant singular components during continual learning.
  • The method employs a soft spectral penalty to guide updates towards less critical, lower-rank components.
  • TailLoR does not require access to prior task adapters, enhancing privacy for sequential adaptations.
  • The approach matches or exceeds the performance of existing state-of-the-art continual learning methods.
Read more
Catastrophic Forgetting as Accessibility Collapse: A Three-Level Framework for Knowledge Persistence in Continual Learning
Ayushman Trivedi, Bhavika Melwani
Theory
  • Introduces a three-level framework for understanding knowledge in neural networks: storage, representation, and accessibility.
  • Demonstrates that catastrophic forgetting is primarily an accessibility issue, with earlier layers retaining or improving their representational quality.
  • Establishes the Accessibility Gap and Projection Energy as new diagnostic metrics for continual learning.
  • Shows that a classifier reset can recover 75.7% of original task performance without modifying the backbone.
Read more
When Denser Credit Is Not Enough: Evidence-Calibrated Policy Optimization for Long-Horizon LLM Agent Training
Yuanfan Li, Qi Zhou, Wenjing Duan, Lu Chen
Reinforcement Learning Large Language Models Optimization
  • Identifies limitations in existing reinforcement learning methods for long-horizon LLM training, particularly regarding credit assignment.
  • Proposes Evidence-Calibrated Policy Optimization (ECPO) as a solution to improve credit reliability.
  • ECPO combines techniques to group rollouts and suppress noise, enhancing stability in training.
  • Demonstrates significant performance improvements over GiGPO in empirical experiments.
Read more
Robust and sparse support vector machine via hybrid truncated loss for supervised classification
Yuliang Yang, Chen Chen, Yuxiang Liu, Huiru Wang
Optimization Theory
  • Introduction of a hybrid truncated loss function (Lht) that balances boundedness and sparsity.
  • Development of Lht-SVM for single-view classification with improved robustness to outliers.
  • Extension to multi-view learning with MvLht-SVM, adhering to both consensus and complementarity principles.
  • Demonstrated superior performance in accuracy and robustness compared to existing methods.
Read more
Intercomparison of Machine Learning Algorithms for Remote Sensing-based In-season Crop Mapping
August Posch, Jitendra Kumar, Forrest M. Hoffman, Auroop R. Ganguly
Optimization Time Series Computer Vision
  • In-season crop mapping is essential for timely responses to climate-related agricultural threats.
  • Support Vector Machines outperformed other algorithms with a mean F1 score of 0.74 for almonds and 0.59 for corn.
  • Interannual variability significantly affects mapping accuracy, indicating the need for robust validation methods.
  • The study combines remote sensing data with crop rotation history for improved mapping accuracy.
Read more
Efficient Mean Curvature Computation on High-Dimensional Data Manifolds
Alexandre L. M. Levada
Efficient ML Theory Graph Learning
  • Introduces an exact algebraic identity that reduces mean curvature computation cost from O(m^4) to O(m^2).
  • Utilizes truncated SVD to further reduce computational complexity in high-dimensional settings.
  • Demonstrates significant speedups (50 to 300 times) in real-world datasets without substantial accuracy loss.
  • Establishes local mean curvature as a practical geometric feature for diverse machine learning applications.
Read more
StableRCA: Robust Graph-Agnostic Mechanism-Level Root Cause Analysis
Xiaoyu Lin, Nicholas Tagliapietra, Kehan Li, Lavdim Halilaj, Juergen Luettin
Graph Learning Theory Interpretability
  • StableRCA is a graph-agnostic framework that identifies root causes without requiring a known causal graph.
  • The framework utilizes local Markov boundaries to differentiate true root causes from marginal anomalies.
  • Theoretical guarantees are provided for the identification of intervention targets based on conditional distribution shifts.
  • Extensive experiments show StableRCA's robustness to graph misspecification and effectiveness across diverse datasets.
Read more
LEVANTE-bench: Multi-Scale Comparison of VLMs to Children Using Cognitive Tasks (or, "Is Your VLM Smarter Than a 5th Grader?")
Alvin Wei Ming Tan, David Cardinal, Tania Lorido-Botran, Laura Bravo-Sanchez, Sunny Yu, Michael C. Frank
Multimodal
  • LEVANTE-bench provides a comprehensive dataset for comparing VLMs with children's cognitive performance.
  • The benchmark evaluates VLMs across multiple cognitive tasks and languages, enhancing cross-cultural comparisons.
  • Alignment between VLMs and children's cognitive abilities varies significantly across different scales of evaluation.
  • Current VLM architectures show limitations in matching children's cognitive error distributions, particularly in complex reasoning tasks.
Read more
Differentiable Efficient Operator Search
Xiaohuan Pei, Jiyuan Zhang, Yuanfan Guo, Weiguo Feng, Tao Huang, Cho-Jui Hsieh, Chang Xu
Efficient ML Multimodal
  • Introduction of a unified operator space that consolidates various token reduction methods.
  • Development of the Efficient Operator Search framework that automates the search for optimal operator configurations.
  • Demonstration of competitive performance against existing baselines across multiple benchmarks.
  • Reinterpretation of traditional token reduction techniques as special cases of a shared operator framework.
Read more
MolE-RAG: Molecular Structure-Enhanced Retrieval-Augmented Generation for Chemistry
Joey Chan, Wonbin Kweon, Ashley Shin, Niharika Bhattacharjee, Pengcheng Jiang, Yue Guo, Jiawei Han
NLP Large Language Models Generative Models
  • MOLE-RAG is a training-free framework that enhances LLM-based molecular property prediction.
  • It integrates three types of context: literature retrieval, molecular context injection, and structural retrieval.
  • The framework significantly improves prediction accuracy across various tasks, outperforming SMILES-only baselines.
  • Context source utility varies by model and task, indicating the need for adaptive strategies in molecular predictions.
Read more
Online KL-Regularized Reinforcement Learning with Function Approximation under Misspecification
Haoyang Hong, Zichen Wang, Quanquan Gu, Huazheng Wang
Reinforcement Learning Theory Optimization
  • Introduces KL misspecification formulations for contextual bandits and episodic RL.
  • Establishes high-probability KL-regret guarantees with explicit misspecification terms.
  • Combines Gibbs quadratic self-bounding inequalities with regression-based algorithms.
  • Demonstrates that standard realizable KL-regularized settings are recoverable as special cases.
Read more
Sharp Low-Degree Thresholds for Planted-vs-Planted Testing
Anda Skeja, Daniel Gutiรฉrrez Espinoza, Fiona Skerman, Alexander S. Wein
Theory
  • Establishment of sharp low-degree thresholds for planted-vs-planted testing.
  • Development of a low-degree certificate framework for testing and recovery.
  • Identification of strong and weak testing thresholds in planted models.
  • Demonstration that testing thresholds do not depend on the specific pair of planted structures.
Read more
Mitigating the Curse of Dimensionality in Uniform Convergence of Deep Neural Networks via Smooth Activations
Yizhe Ding, Runze Li, Jia Liu, Lingzhou Xue
Theory
  • Establishes a theoretical lower bound demonstrating the curse of dimensionality for ReLU networks in uniform convergence.
  • Develops a comprehensive theoretical framework for smooth DNNs, including novel pseudo-dimension and approximation guarantees.
  • Proves that smooth DNNs can achieve better uniform convergence rates compared to ReLU networks across various regression tasks.
  • Provides empirical support through simulation studies and real-world applications.
Read more
SALT: When More Rollouts Don't Help in Group-Based Policy Optimization and How to Make Them Matter
Powei Chang, Jinpeng Zhang, Chaoqun Sun, MiniWell Tsao, Lianrui Li, Jianxiang Xiang, Chenyu Wang, Yukang Gao, Dongying Kong
Reinforcement Learning Large Language Models Optimization
  • Identifies the inefficiency of increasing rollouts in GRPO-style RLVR due to low-rank redundancy in gradient features.
  • Introduces SALT, a method that reweights group-relative updates to enhance learning effectiveness.
  • Demonstrates that SALT improves update geometry and performance across diverse benchmarks.
  • Provides a new perspective on the structural inefficiencies in group-based policy optimization.
Read more
Causal Modeling of Selection in Evolution
Haoyue Dai, Zeyu Tang, Peter Spirtes, Kun Zhang
Theory
  • Distinction between static and evolutionary selection is crucial for causal discovery.
  • Existing models for static selection do not adequately capture evolutionary processes.
  • A new model for evolutionary selection is introduced, improving causal analysis.
  • The proposed methodology is validated through experimental results.
Read more
CaliDist: Calibrating Large Language Models via Behavioral Robustness to Distraction
Mohammad Anas Jawad, Cornelia Caragea
NLP Large Language Models
  • CALIDIST introduces a behavior-centric approach to calibrate LLMs by assessing their robustness to distractions.
  • The method quantifies prediction changes and confidence shifts when prompts are perturbed with distractors.
  • Extensive experiments show significant reductions in Expected Calibration Error (ECE) and Brier Score compared to existing methods.
  • The findings suggest a strong correlation between prediction stability and model accuracy, indicating that susceptibility to distractions is a reliable proxy for error likelihood.
Read more
PyCC.id: A package for hypothesis-driven equation discovery with structural identifiability
Federico J. Gonzalez
Time Series Theory Interpretability
  • PyCC.id enables hypothesis-driven equation discovery using structural skeletons.
  • The library reduces the ambiguity in model selection by incorporating domain knowledge and prior information.
  • Structural skeletons ensure that the discovered models are physically consistent and interpretable.
  • PyCC.id supports various paradigms for equation discovery, enhancing its flexibility and applicability.
Read more
Policy-Conditioned Counterfactual Credit for Verifiable Reinforcement Learning of Long-Horizon Language Agents
Renwei Meng
Reinforcement Learning NLP Large Language Models
  • CVT-RL introduces a constrained policy-gradient algorithm with dense verifiable rewards.
  • The PCCC estimator evaluates the contribution of interventions to final success.
  • Significant improvements in task success rates and evidence accuracy were achieved.
  • The methodology reduces reward hacking incidents compared to existing baselines.
Read more
Plug-and-Play Guidance for Discrete Diffusion Models via Gradient-Informed Logit Correction
Hongkun Dou, Zike Chen, Fengji Li, Hongjue Li, Yue Deng
Generative Models
  • Introduction of GILC as a training-free guidance framework for discrete diffusion models.
  • Utilization of a Jacobian-free mechanism for stable logit correction to address gradient instability.
  • Formal connection established between GILC and policy gradients for handling non-differentiable objectives.
  • Demonstration of state-of-the-art performance in constrained generation tasks across multiple scientific domains.
Read more
Deciphering Two Training Clocks in Grokking via Deep Linear Network Theory with Conditional ReLU Reduction
Hu Tan, Kuo Gai, Shihua Zhang
Theory
  • Introduces the concept of 'two training clocks' to explain the separation of fitting and representation simplification in deep learning.
  • Demonstrates that classification loss decreases logarithmically while representation simplification occurs on a polynomial time scale.
  • Establishes a connection between deep linear networks and ReLU networks, highlighting the role of activation patterns in training dynamics.
  • Provides empirical evidence through experiments on modular arithmetic tasks that support the theoretical framework.
Read more
Diffusion Models for Adaptive Sequential Data Generation
Haoyang Cao, Minshuo Chen, Yinbin Han, Renyuan Xu
Generative Models Time Series Theory
  • Introduction of the AD-Seq framework for adapted sequential data generation.
  • Ensures that generated values respect temporal dependencies and information flow.
  • Development of a novel score-matching objective for parallel training.
  • Statistical guarantees for score approximation and distribution estimation.
Read more
Pretraining Recurrent Networks without Recurrence
Akarsh Kumar, Phillip Isola
Theory Efficient ML NLP
  • Introduces Supervised Memory Training (SMT) as an alternative to BPTT for RNN training.
  • SMT allows for time-parallel training and stable gradient paths, improving learning of long-range dependencies.
  • Memory transition labels are generated using a Transformer model, decoupling memory representation from memory dynamics.
  • SMT outperforms BPTT in various tasks, demonstrating its effectiveness in training nonlinear RNNs.
Read more