AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

70 Papers today
8h Update frequency
7 Days of history
Learning Retrieval Models with Sparse Autoencoders
Thibault Formal, Maxime Louis, Hervé Dejean, Stéphane Clinchant
NLP Large Language Models Efficient ML
  • Introduction of SPLARE, a new LSR method utilizing Sparse Autoencoders.
  • SAEs provide semantically structured, expressive, and language-agnostic features for retrieval.
  • SPLARE outperforms traditional vocabulary-based LSR methods in multilingual and out-of-domain settings.
  • The SPLARE-7B model achieves competitive results on the MMTEB benchmark.
Read more
Deconstructing the Failure of Ideal Noise Correction: A Three-Pillar Diagnosis
Chen Feng, Zhuo Zhi, Zhao Huang, Jiawei Ge, Ling Xiao, Nicu Sebe, Georgios Tzimiropoulos, Ioannis Patras
Theory
  • Statistically consistent methods for LNL often underperform compared to empirical approaches despite theoretical guarantees.
  • Providing a perfect noise transition matrix does not resolve the performance issues of noise correction methods.
  • The failure of noise correction is attributed to deeper limitations in the corrected objective rather than just T estimation.
  • A comprehensive analysis reveals three levels of understanding: macroscopic convergence, microscopic dynamics, and information-theoretic limits.
Read more
Do Diffusion Models Dream of Electric Planes? Discrete and Continuous Simulation-Based Inference for Aircraft Design
Aurelien Ghiglino, Daniel Elenius, Anirban Roy, Ramneet Kaur, Manoj Acharya, Colin Samplawski, Brian Matejek, Susmit Jha, Juan Alonso, Adam Cobb
Generative Models Optimization Robotics
  • Introduction of a hierarchical sampling approach using two diffusion models for eVTOL design.
  • MixeDiT enables joint sampling of discrete and continuous parameters, improving design flexibility.
  • MaskeDiT supports inference over variable-dimensional design spaces, addressing challenges in traditional SBI.
  • First comprehensive application of SBI to a realistic, large-scale aerospace design problem with 144 topologies and up to 136 parameters.
Read more
PhysMoDPO: Physically-Plausible Humanoid Motion with Preference Optimization
Yangsong Zhang, Anujith Muraleedharan, Rikhat Akizhanov, Abdul Ahad Butt, Gül Varol, Pascal Fua, Fabio Pizzati, Ivan Laptev
Generative Models Optimization Robotics
  • Introduction of PhysMoDPO, a framework for optimizing humanoid motion generation.
  • Integration of a Whole-Body Controller into the training pipeline for improved physical compliance.
  • Use of physics-based and task-specific rewards for effective optimization.
  • Demonstrated improvements in physical realism and task performance in simulations.
Read more
LightMoE: Reducing Mixture-of-Experts Redundancy through Expert Replacing
Jiawei Hao, Zhiwei Hao, Jianyuan Guo, Li Shen, Yong Luo, Han Hu, Dan Zeng
Large Language Models Efficient ML
  • Introduces a novel expert replacing paradigm to reduce redundancy in MoE models.
  • LightMoE framework enhances expert selection and recovery strategies.
  • Achieves competitive performance with significant compression ratios.
  • Demonstrates improvements over existing expert compression methods.
Read more
Scaling Laws and Pathologies of Single-Layer PINNs: Network Width and PDE Nonlinearity
Faris Chaudhry
Theory Optimization
  • Identifies dual optimization failures in Single-Layer PINNs: baseline and compounding pathologies.
  • Demonstrates that scaling behavior is governed by a complex, non-separable relationship rather than a simple power law.
  • Highlights spectral bias as a significant factor hindering the learning of high-frequency solution components.
  • Proposes a systematic methodology for measuring scaling effects in PINNs across different PDE types.
Read more
Level Up: Defining and Exploiting Transitional Problems for Curriculum Learning
Zhenwei Tang, Amogh Inamdar, Ashton Anderson, Richard Zemel
Theory Efficient ML Interpretability
  • Introduces a method for identifying transitional problems that mark competence boundaries in machine learning models.
  • Demonstrates that a curriculum based on these transitional problems significantly improves learning efficiency.
  • Establishes a direct measure of problem difficulty relative to model competence, enhancing interpretability.
  • Validates the approach through experiments in chess and mathematics, outperforming traditional training strategies.
Read more
TERMINATOR: Learning Optimal Exit Points for Early Stopping in Chain-of-Thought Reasoning
Alliot Nagle, Jakhongir Saydaliev, Dhia Garbaya, Michael Gastpar, Ashok Vardhan Makkuva, Hyeji Kim
Large Language Models Efficient ML Optimization
  • Introduction of hindsight-optimal reasoning length (HORL) for determining optimal exit points in reasoning tasks.
  • Development of TERMINATOR, an inference-time early-exit algorithm that reduces unnecessary computation in LRMs.
  • Creation of a novel dataset for optimal reasoning lengths based on the first logical arrival of answers.
  • Significant reductions in CoT lengths (14%-55%) while outperforming existing methods across multiple datasets.
Read more
SciDesignBench: Benchmarking and Improving Language Models for Scientific Inverse Design
David van Dijk, Ivan Vrkic
NLP Large Language Models Reinforcement Learning
  • Introduction of SciDesignBench, a benchmark with 520 tasks across 14 scientific domains for evaluating language models in inverse design.
  • Demonstration that existing language models struggle with one-turn de novo design, achieving only 29% success.
  • Long-horizon feedback utilization is a distinct capability, with different models excelling in various task settings.
  • Introduction of RLSF, a simulator-feedback training recipe that improves model performance in scientific design tasks.
Read more
Robust Self-Training with Closed-loop Label Correction for Learning from Noisy Labels
Zhanhui Lin, Yanlin Liu, Sanping Zhou
Theory Optimization Efficient ML
  • Introduces a self-training framework that synergistically optimizes a classifier and a label correction function.
  • Provides theoretical guarantees for model stability during noise correction.
  • Achieves state-of-the-art performance on CIFAR and Clothing1M datasets with reduced training time.
  • Utilizes intermediate feature representations for richer information transfer.
Read more
A Kolmogorov-Arnold Surrogate Model for Chemical Equilibria: Application to Solid Solutions
Leonardo Boledi, Dirk Bosbach, Jenna Poonoosamy
Efficient ML
  • Introduction of Kolmogorov-Arnold networks as a surrogate model for chemical equilibria.
  • First application of data-driven models to co-precipitation with radionuclide incorporation.
  • Significant reduction in prediction errors compared to traditional multilayer perceptrons.
  • Demonstration of KANs' effectiveness in handling complex thermodynamic systems.
Read more
Massive Redundancy in Gradient Transport Enables Sparse Online Learning
Aur Shalev Merin
Theory Efficient ML Time Series
  • Only 6% of the recurrent Jacobian paths are needed to recover 84% of full RTRL's adaptation ability.
  • The redundancy in gradient transport is robust across various tasks and architectures, including LSTMs and transformers.
  • Sparse RTRL is more numerically stable than full RTRL in chaotic dynamics, while still effective in non-chaotic tasks.
  • The findings suggest that the optimization of Jacobian propagation may not be necessary due to inherent redundancy.
Read more
Mamba-3: Improved Sequence Modeling using State Space Principles
Aakash Lahoti, Kevin Y. Li, Berlin Chen, Caitlin Wang, Aviv Bick, J. Zico Kolter, Tri Dao, Albert Gu
Large Language Models Efficient ML NLP
  • Mamba-3 achieves improved inference efficiency and model quality compared to Transformer-based models.
  • Introduces a complex-valued state update rule for enhanced state tracking capabilities.
  • Utilizes a MIMO formulation to improve computational efficiency during decoding.
  • Demonstrates significant accuracy gains in downstream language modeling tasks.
Read more
Unlearning-based sliding window for continual learning under concept drift
Michał Wozniak, Marek Klonowski, Maciej Maczynski, Bartosz Krawczyk
Theory Efficient ML Time Series
  • Introduces UIL, a framework that utilizes machine unlearning for efficient continual learning under concept drift.
  • Demonstrates that unlearning outdated data followed by incremental adaptation can mimic the performance of full retraining with lower computational costs.
  • Empirical results indicate that the proposed method is competitive with existing sliding-window approaches in various drift scenarios.
Read more
Linear Predictability of Attention Heads in Large Language Models
Khalid Shaikh, Asmit Kumar Singh, Rebecca Christopher Dsouza, Shikhar Shiromani
Large Language Models Efficient ML Interpretability
  • Pretrained LLMs exhibit strong linear predictability among attention heads, particularly in their QKV activations.
  • This predictability emerges during pretraining and is absent at random initialization.
  • The authors achieve significant KV cache compression by caching only reference heads and reconstructing others on-the-fly.
  • Mean R² values indicate high fidelity in reconstructing target heads from reference heads, with values often exceeding 0.76.
Read more
Computation and Communication Efficient Federated Unlearning via On-server Gradient Conflict Mitigation and Expression
Minh-Duong Nguyen, Senura Hansaja, Le-Tuan Nguyen, Quoc-Viet Pham, Ken-Tye Yong, Nguyen H. Tran, Dung D. Le
Federated Learning Efficient ML Theory
  • FOUL introduces a two-stage framework for efficient Federated Unlearning.
  • The learning-to-unlearn stage prepares the model for unlearning by encoding key features.
  • On-server unlearning preserves privacy and reduces computational overhead.
  • A new metric, time-to-forget, quantifies the speed of unlearning effectiveness.
Read more
Bases of Steerable Kernels for Equivariant CNNs: From 2D Rotations to the Lorentz Group
Alan Garbarz
Theory
  • Introduces a new method for solving the steerable kernel constraint in equivariant CNNs.
  • Provides explicit bases for different symmetry groups and tensor types.
  • Eliminates the need for complex computations involving Clebsch-Gordan coefficients.
  • Demonstrates the method's applicability across various symmetry groups.
Read more
PLUME: Building a Network-Native Foundation Model for Wireless Traces via Protocol-Aware Tokenization
Swadhin Pradhan, Shazal Irshad, Jerome Henry
Time Series Efficient ML Interpretability
  • Plume is a 140M-parameter model tailored for 802.11 wireless traces, utilizing structured PDML dissections.
  • The protocol-aware tokenizer significantly reduces sequence length and increases information density compared to BPE.
  • Plume achieves high accuracy in next-packet prediction and anomaly detection, outperforming larger models in efficiency.
  • The model supports on-premises deployment, enhancing privacy and enabling real-time analysis.
Read more
Synthetic Data Generation for Brain-Computer Interfaces: Overview, Benchmarking, and Future Directions
Ziwei Wang, Zhentao He, Xingyi He, Hongbin Wang, Tianwang Jia, Jingwei Luo, Siyang Li, Xiaoqing Chen, Dongrui Wu
Generative Models Time Series
  • Synthetic data generation can alleviate data scarcity in BCIs.
  • The paper categorizes generative algorithms into four methodological types.
  • Benchmarking of existing algorithms across various BCI paradigms is conducted.
  • Challenges such as data heterogeneity and privacy concerns are discussed.
Read more
CrossADR: enhancing adverse drug reactions prediction for combination pharmacotherapy with cross-layer feature integration and cross-level associative learning
Y. Cheung
Graph Learning
  • CrossADR improves ADR prediction accuracy for combination pharmacotherapy.
  • Utilizes a gated-residual-flow graph neural network for feature integration.
  • Introduces a learnable ADR embedding space to capture dynamic biological correlations.
  • Evaluated on a comprehensive dataset with 1,376 drugs and 946,000 combinations.
Read more
Swap-guided Preference Learning for Personalized Reinforcement Learning from Human Feedback
Gihoon Kim, Euntai Kim
Reinforcement Learning
  • Introduces Swap-guided Preference Learning (SPL) to enhance personalization in RLHF.
  • Addresses the issue of posterior collapse in Variational Preference Learning (VPL).
  • Utilizes fictitious swap annotators to improve user-specific latent variable encoding.
  • Implements three key components: swap-guided base regularization, P-IAF, and adaptive latent conditioning.
Read more
Translational Gaps in Graph Transformers for Longitudinal EHR Prediction: A Critical Appraisal of GT-BEHRT
Krish Tadigotla
Graph Learning Time Series Interpretability
  • GT-BEHRT improves EHR representation by encoding intra-visit relationships.
  • The framework achieved high discrimination metrics but revealed significant translational gaps.
  • Key gaps include lack of calibration analysis and incomplete fairness auditing.
  • The study emphasizes the need for comprehensive evaluation before clinical deployment.
Read more
Maximizing Incremental Information Entropy for Contrastive Learning
Jiansong Zhang, Zhuoqin Yang, Xu Wu, Xiaoling Luo, Peizhong Liu, Linlin Shen
Computer Vision Theory Efficient ML
  • Introduces a new theoretical framework for contrastive learning focusing on incremental information entropy.
  • Proposes a dual optimization strategy involving a learnable transformation and an encoder regularizer.
  • Demonstrates improved performance in small-batch settings across multiple datasets.
  • Provides a plug-and-play capability for enhancing existing self-supervised models.
Read more
Continual Fine-Tuning with Provably Accurate and Parameter-Free Task Retrieval
Hang Thi-Thuy Le, Long Minh Bui, Minh Hoang, Trong Nghia Hoang
Theory Efficient ML NLP
  • Introduces PROTEUS, a parameter-free task retrieval framework for continual fine-tuning.
  • Provides theoretical guarantees linking retrieval error to clustering properties of task representations.
  • Combines adaptive knowledge transfer with a clustering-based retrieval mechanism.
  • Demonstrates significant performance improvements over existing continual learning methods.
Read more
MONET: Modeling and Optimization of neural NEtwork Training from Edge to Data Centers
Jérémy Morlier, Robin Geens, Stef Cuyckens, Arne Symons, Marian Verhelst, Vincent Gripon, Mathieu Léonardon
Optimization Efficient ML
  • Introduction of MONET, a framework for modeling neural network training on heterogeneous dataflow accelerators.
  • Demonstration of MONET's capabilities through case studies on ResNet-18 and GPT-2.
  • Exploration of the training design space and optimization of layer-fusion configurations.
  • Application of a genetic algorithm to solve activation checkpointing challenges.
Read more
Is the reconstruction loss culprit? An attempt to outperform JEPA
Alexey Potapov, Oleg Shcherbakov, Ivan Kravchenko
Theory Generative Models Time Series
  • JEPA-style predictive learning is generally more robust to noise than reconstruction-based autoencoders.
  • Autoencoder performance is heavily influenced by objective asymmetries and bottleneck effects.
  • Gated predictive autoencoders can effectively select predictable components, improving stability and performance.
  • The study underscores the necessity of rigorous evaluation methods in representation learning research.
Read more
PDE-SSM: A Spectral State Space Approach to Spatial Mixing in Diffusion Transformers
Eshed Gal, Moshe Eliasof, Siddharth Rout, Eldad Haber
Computer Vision Generative Models Efficient ML
  • PDE-SSM replaces self-attention with a learnable PDE, improving spatial inductive bias.
  • The computational complexity of PDE-SSM is O(N log N), significantly more efficient than O(N^2) for self-attention.
  • PDE-SSM-DiT achieves comparable or superior performance to existing diffusion transformers while reducing compute.
  • The approach provides a principled generalization of state space models to multi-dimensional spatial data.
Read more
CASHomon Sets: Efficient Rashomon Sets Across Multiple Model Classes and their Hyperparameters
Fiona Katharina Ewald, Martin Binder, Matthias Feurer, Bernd Bischl, Giuseppe Casalicchio
Optimization Interpretability Efficient ML
  • Introduction of CASHomon sets, extending Rashomon sets to multiple model classes.
  • Development of TruVaRImp, an active learning algorithm for efficient level set estimation.
  • Empirical results show TruVaRImp outperforms traditional methods in identifying CASHomon set members.
  • Analysis reveals significant variability in feature importance across model classes.
Read more
RXNRECer Enables Fine-grained Enzymatic Function Annotation through Active Learning and Protein Language Models
Zhenkun Shi, Jun Zhu, Dehang Wang, BoYu Chen, Qianqian Yuan, Zhitao Mao, Fan Wei, Weining Wu, Xiaoping Liao, Hongwu Ma
NLP Large Language Models Interpretability
  • RXNRECer directly predicts enzyme-catalyzed reactions, bypassing the limitations of EC number reliance.
  • The framework integrates protein language modeling and active learning for improved annotation accuracy.
  • Significant performance improvements were observed over traditional EC-based methods.
  • RXNRECer supports scalable applications in proteome-wide annotation and enzyme promiscuity identification.
Read more
M$^2$RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling
Mayank Mishra, Shawn Tan, Ion Stoica, Joseph Gonzalez, Tri Dao
NLP Large Language Models Efficient ML
  • M2RNN introduces matrix-valued hidden states and non-linear transitions for improved language modeling.
  • The architecture overcomes limitations of linear RNNs, particularly in state tracking and long-context retrieval.
  • Empirical results demonstrate significant performance gains over existing models with smaller state sizes.
  • Hybrid M2RNN models outperform traditional attention-based architectures in long-context tasks.
Read more
Robust and Computationally Efficient Linear Contextual Bandits under Adversarial Corruption and Heavy-Tailed Noise
Naoto Tani, Futoshi Futami
Theory Efficient ML Optimization
  • Introduces a new algorithm (CR-Hvt-UCB) that is robust to both adversarial corruption and heavy-tailed noise.
  • Achieves computational efficiency with O(1) per-round updates, contrasting with existing methods that are computationally expensive.
  • Establishes regret bounds that are applicable even when the noise moment bounds and total corruption are unknown.
  • Generalizes existing results under finite-variance assumptions, providing a more flexible framework for real-world applications.
Read more
Competition-Aware CPC Forecasting with Near-Market Coverage
Sebastian Frey, Edoardo Beccari, Maximilian Kranz, Nicolò Alberto Pellizzari, Ali Mete Karaman, Qiwei Han, Maximilian Kaiser
Time Series Graph Learning Optimization
  • CPC forecasting is reframed as a problem of partial competition observability.
  • Observable proxies for competition, including semantic, behavioral, and geographic signals, enhance forecasting accuracy.
  • Competition-aware forecasting shows significant improvements in stability and error profiles, especially for high-CPC, high-volatility keywords.
  • The methodology is validated using a large-scale dataset from the car-rental sector, demonstrating practical applicability.
Read more
L2GTX: From Local to Global Time Series Explanations
Ephrem Tibebe Mekonnen, Luca Longo, Lucas Rizzo, Pierpaolo Dondio
Time Series
  • L2GTX is a fully model-agnostic method for generating global explanations in time series classification.
  • The method aggregates local explanations from a selective set of instances to form class-wise global insights.
  • L2GTX effectively reduces redundancy in explanations by clustering local temporal event primitives.
  • The approach maintains high fidelity in global explanations, as evidenced by stable mean local surrogate fidelity (R2) across datasets.
Read more
Federated Learning of Binary Neural Networks: Enabling Low-Cost Inference
Nitin Priyadarshini Shankar, Soham Lahiri, Sheetal Kalyani, Saurav Prakash
Federated Learning Efficient ML
  • FedBNN achieves lower runtime computation and memory complexity compared to traditional real-valued models.
  • The framework utilizes binary weights to significantly reduce model size and inference time.
  • Comprehensive evaluations show FedBNN's competitive performance against state-of-the-art federated learning methods.
  • FedBNN is designed to operate efficiently under various data heterogeneity settings.
Read more
Exact Federated Continual Unlearning for Ridge Heads on Frozen Foundation Models
Yijun Quan, Wentai Wu, Giovanni Montana
Federated Learning
  • Introduces a method for exact federated continual unlearning using ridge regression heads on frozen foundation models.
  • Develops a communication-efficient protocol that supports continual add/delete requests without retraining.
  • Proves deterministic exactness and invariance properties of the proposed method.
  • Demonstrates experimental validation across multiple benchmarks with high accuracy and low computational cost.
Read more
ADV-0: Closed-Loop Min-Max Adversarial Training for Long-Tail Robustness in Autonomous Driving
Tong Nie, Yihong Tang, Junlin He, Yuewen Mei, Jie Sun, Lijun Sun, Wei Ma, Jian Sun
Robotics Optimization Theory
  • ADV-0 is the first closed-loop training framework for long-tail problems in autonomous driving.
  • It couples adversarial generation and policy optimization in an end-to-end manner.
  • The framework employs a preference-based solution to the zero-sum game, ensuring stability and efficiency.
  • Theoretically, ADV-0 guarantees convergence to a Nash Equilibrium and certified performance bounds.
Read more
Visualizing Critic Match Loss Landscapes for Interpretation of Online Reinforcement Learning Control Algorithms
Jingyi Liu, Jian Guo, Eberhard Gill
Reinforcement Learning Interpretability Optimization
  • Introduces a visualization method for analyzing critic match loss landscapes in online RL.
  • Constructs a three-dimensional loss surface and two-dimensional optimization path to characterize critic learning behavior.
  • Demonstrates the method using ADHDP on cart-pole and spacecraft control tasks.
  • Provides quantitative indices for structured comparison of training outcomes.
Read more
Budget-Sensitive Discovery Scoring: A Formally Verified Framework for Evaluating AI-Guided Scientific Selection
Abhinaba Basu, Pavan Chakraborty
Theory
  • Introduction of the Budget-Sensitive Discovery Score (BSDS) as a formally verified evaluation metric.
  • Discovery Quality Score (DQS) provides a single summary statistic that prevents budget cherry-picking.
  • Case study shows that LLMs do not outperform existing machine learning models in drug discovery candidate selection.
  • The framework is applicable to various domains beyond drug discovery, including safety triage and clinical trials.
Read more
Overcoming the Modality Gap in Context-Aided Forecasting
Vincent Zhihao Zheng, Étienne Marcotte, Arjun Ashok, Andrew Robert Williams, Lijun Sun, Alexandre Drouin, Valentina Zantedeschi
Time Series Multimodal
  • Introduced a semi-synthetic methodology for generating verifiably useful contexts from time series datasets.
  • Created CAF-7M, a large corpus of 7 million context-augmented time series windows with a verified test set.
  • Demonstrated that the proposed methodology enables effective context utilization and real-world transfer.
  • Showed that dataset quality is a primary bottleneck in context-aided forecasting, rather than model architecture.
Read more
Trust-Region Noise Search for Black-Box Alignment of Diffusion and Flow Models
Niklas Schweiger, Daniel Cremers, Karnik Ram
Generative Models Optimization
  • Introduction of Trust-Region Search (TRS) for black-box optimization of noise samples in generative models.
  • TRS achieves a balance between exploration and exploitation, enhancing adaptability to various tasks.
  • Significant improvements in sample quality for text-to-image, molecule, and protein design tasks compared to existing methods.
  • Minimal hyperparameter tuning required, making TRS versatile across different generative architectures.
Read more
Scribe Verification in Chinese manuscripts using Siamese, Triplet, and Vision Transformer Neural Networks
Dimitrios-Chrysovalantis Liakopoulos, Yanbo Zhang, Chongsheng Zhang, Constantine Kotropoulos
Computer Vision
  • Introduces a unified framework for scribe verification using deep learning models.
  • Compares Siamese, Triplet, and Vision Transformer architectures for handwriting analysis.
  • Demonstrates that the MobileNetV3+ Custom Siamese model achieves superior performance.
  • Utilizes two diverse datasets to evaluate model effectiveness in scribe verification.
Read more
Representation Learning for Spatiotemporal Physical Systems
Helen Qu, Rudy Morel, Michael McCabe, Alberto Bietti, François Lanusse, Shirley Ho, Yann LeCun
Theory Time Series Optimization
  • Traditional machine learning methods for spatiotemporal systems focus on next-frame prediction, leading to high computational costs and error accumulation.
  • The paper emphasizes the importance of downstream tasks, such as physical parameter estimation, as a measure of representation quality.
  • Joint Embedding Predictive Architectures (JEPAs) outperform traditional pixel-level prediction methods in learning useful representations for scientific tasks.
  • The study evaluates methods on three physical systems, demonstrating the effectiveness of latent space learning.
Read more
LaPro-DTA: Latent Dual-View Drug Representations and Salient Protein Feature Extraction for Generalizable Drug--Target Affinity Prediction
Zihan Dun, Liuyi Xu, An-Yang Lu, Shuang Li, Yining Qian
Graph Learning
  • Introduces a latent dual-view drug representation to mitigate overfitting and enhance generalization.
  • Employs a salient protein feature extraction strategy to improve the identification of relevant bioactive regions.
  • Utilizes a cross-view multi-head attention mechanism for comprehensive interaction modeling.
  • Achieves state-of-the-art performance on benchmark datasets, particularly in unseen-drug scenarios.
Read more
RelayCaching: Accelerating LLM Collaboration via Decoding KV Cache Reuse
Yingsheng Geng, Yuchong Gao, Weihong Wu, Guyue Liu, Jiang Liu
Large Language Models Efficient ML NLP
  • RelayCaching enables efficient reuse of KV caches in multi-agent LLM systems.
  • The method achieves over 80% KV cache reuse and reduces TTFT by up to 4.7 times.
  • RelayCaching maintains accuracy comparable to full prefilling with minimal overhead.
  • The approach systematically characterizes KV deviations, allowing targeted rectification.
Read more
Beyond Attention: True Adaptive World Models via Spherical Kernel Operator
Vladimer Khasia
Theory NLP Reinforcement Learning
  • Critiques the limitations of conventional world model approaches that rely on latent space projections.
  • Introduces the Spherical Kernel Operator (SKO) as a replacement for traditional attention mechanisms.
  • SKO utilizes localized ultraspherical polynomials to achieve direct function approximation without saturation.
  • Empirical results show that SKO accelerates convergence and improves performance in autoregressive language modeling.
Read more
BoSS: A Best-of-Strategies Selector as an Oracle for Deep Active Learning
Denis Huseljic, Paul Hahn, Marek Herde, Christoph Sandrock, Bernhard Sick
Computer Vision Efficient ML Optimization
  • BoSS is the first scalable oracle strategy for batch active learning applicable to large datasets and complex deep neural networks.
  • The method combines an ensemble of selection strategies with a performance-based selection approach.
  • BoSS significantly outperforms existing oracle strategies under comparable computational constraints.
  • Current state-of-the-art active learning strategies do not consistently achieve oracle performance, indicating room for improvement.
Read more
From Stochastic Answers to Verifiable Reasoning: Interpretable Decision-Making with LLM-Generated Code
Anirudh Jaidev Mahesh, Ben Griffin, Fuat Alican, Joseph Ternasky, Zakari Salifu, Kelvin Amoaba, Yagiz Ihlamur, Aaron Ontoyin Yin, Aikins Laryea, Afriyie Samuel, Yigit Ihlamur
NLP Large Language Models Interpretability
  • LLMs are reframed as code generators rather than rule evaluators, enhancing interpretability and reducing costs.
  • Automated statistical validation methods are introduced to filter low-quality rules without human intervention.
  • A cluster-based gap analysis method is developed to identify and refine decision logic for underperforming founder subpopulations.
  • The proposed framework achieves competitive performance on VCBench, outperforming existing LLMs while ensuring interpretability.
Read more
Sobolev--Ricci Curvature
Kyoichi Iwasaki, Tam Le, Hideitsu Hino
Graph Learning Theory Efficient ML
  • Introduction of Sobolev–Ricci Curvature (SRC) for graph structures.
  • Efficient evaluation of SRC using a tree-metric structure.
  • SRC recovers Ollivier–Ricci curvature on length-measure trees.
  • SRC vanishes in the Dirac limit, aligning with zero-curvature cases.
Read more
Knowledge, Rules and Their Embeddings: Two Paths towards Neuro-Symbolic JEPA
Yongchao Huang, Hassan Raza
Theory Interpretability Multimodal
  • Introduction of a bidirectional neuro-symbolic framework (RiJEPA) that merges neural networks with symbolic logic.
  • Utilization of Energy-Based Constraints (EBC) to shape the latent space and improve out-of-distribution generalization.
  • Development of continuous rule discovery methods that bypass traditional combinatorial search limitations.
  • Empirical success in achieving zero-shot logical accuracy in various applications, including clinical settings.
Read more
Sampling-guided exploration of active feature selection policies
Gabriel Bernardino, Anders Jonsson, Patrick Clarysse, Nicolas Duchateau
Reinforcement Learning Optimization Efficient ML
  • Introduces a heuristic-based strategy for exploring feature combinations in larger datasets.
  • Implements a post-fit regularization strategy to reduce decision complexity.
  • Demonstrates improved performance over existing methods in accuracy and policy complexity.
  • Addresses the limitations of previous reinforcement learning approaches in feature selection.
Read more
Optimize Wider, Not Deeper: Consensus Aggregation for Policy Optimization
Zelal Su (Lain) Mustafaoglu, Sungyoung Lee, Eshan Balachandar, Risto Miikkulainen, Keshav Pingali
Reinforcement Learning Optimization Theory
  • Introduces a Fisher-geometric decomposition of PPO updates into signal and waste, explaining the optimization-depth dilemma.
  • Proposes CAPO, which aggregates multiple PPO replicates to improve policy optimization by focusing on width rather than depth.
  • Demonstrates that consensus in natural parameter space leads to better performance and compliance than traditional averaging methods.
  • Empirical results show CAPO outperforms PPO and deeper baselines by significant margins across continuous control tasks.
Read more
Generalization and Memorization in Rectified Flow
Mingxing Rao, Daniel Moyer
Generative Models Theory Efficient ML
  • Introduction of three test statistics for Membership Inference Attacks tailored for Rectified Flow models.
  • Significant performance improvements in MIA metrics through complexity calibration.
  • Discovery of a temporal pattern in memorization dynamics, with peak susceptibility at the midpoint of integration.
  • Proposed substitution of uniform sampling with a Symmetric Exponential distribution to reduce memorization risks.
Read more
RESQ: A Unified Framework for REliability- and Security Enhancement of Quantized Deep Neural Networks
Ali Soltan Mohammadi, Samira Nazari, Ali Azarpeyvand, Mahdi Taheri, Milos Krstic, Michael Huebner, Christian Herglotz, Tara Ghasempouri
Computer Vision Efficient ML Robotics
  • Introduces a unified framework (RESQ) for enhancing both fault and attack resilience in quantized DNNs.
  • Demonstrates significant improvements in resilience metrics without sacrificing accuracy.
  • Reveals the asymmetric relationship between fault resilience and adversarial robustness.
  • Validates the framework across multiple architectures and datasets, showcasing its general applicability.
Read more
Chunk-Guided Q-Learning
Gwanwoo Song, Kwanyoung Park, Youngwoon Lee
Reinforcement Learning
  • CGQ mitigates TD error accumulation by regularizing a single-step critic with a chunk-based critic.
  • Theoretical results show that CGQ achieves tighter critic optimality bounds than existing methods.
  • Empirical evaluations indicate CGQ outperforms both single-step and action-chunked TD methods on long-horizon tasks.
  • CGQ retains fine-grained value propagation while providing stability through chunk-based backups.
Read more
ZO-SAM: Zero-Order Sharpness-Aware Minimization for Efficient Sparse Training
Jie Ji, Gen Li, Kaiyuan Deng, Fatemeh Afghah, Xiaolong Ma
Optimization Efficient ML
  • ZO-SAM integrates zero-order optimization into SAM to improve sparse training efficiency.
  • The method reduces computational costs by halving the backpropagation requirements.
  • ZO-SAM stabilizes training and accelerates convergence by reducing gradient variance.
  • Models trained with ZO-SAM show improved robustness under distribution shifts.
Read more
3DTCR: A Physics-Based Generative Framework for Vortex-Following 3D Reconstruction to Improve Tropical Cyclone Intensity Forecasting
Jun Liu, Xiaohui Zhong, Kai Zheng, Jiarui Li, Yifei Li, Tao Zhou, Wenxu Qian, Shun Dai, Ruian Tie, Yangyang Zhao, Hao Li
Generative Models Time Series Optimization
  • 3DTCR combines physical constraints with generative AI for improved TC intensity forecasting.
  • The framework utilizes conditional Flow Matching and two-stage transfer learning for vortex-following reconstruction.
  • 3DTCR significantly outperforms existing high-resolution forecasting systems in TC intensity prediction.
  • The model reduces RMSE of maximum wind speed by 36.5% compared to traditional inputs.
Read more
IGU-LoRA: Adaptive Rank Allocation via Integrated Gradients and Uncertainty-Aware Scoring
Xuan Cui, Huiyue Li, Run Zeng, Yunfei Zhao, Jinrui Qian, Wei Duan, Bo Liu, Zhanpeng Zhou
NLP Large Language Models Efficient ML
  • IGU-LoRA adapts rank allocation based on layer importance, improving upon static rank methods.
  • The use of Integrated Gradients allows for more stable and globally informed importance estimates.
  • An uncertainty-aware scoring mechanism enhances the robustness of rank selection.
  • The method shows superior performance across multiple NLP tasks and architectures.
Read more
Outcome-Aware Tool Selection for Semantic Routers: Latency-Constrained Learning Without LLM Inference
Huamin Chen, Xunzhuo Liu, Junchen Jiang, Bowei He, Xue Liu
NLP Large Language Models Efficient ML
  • OATS improves tool selection accuracy without increasing latency or resource costs.
  • The method interpolates tool embeddings based on historical success, enhancing performance.
  • Two learned extensions were evaluated, with mixed results depending on data density.
  • The approach maintains a strict latency budget suitable for high-throughput environments.
Read more
High-Probability Bounds for SGD under the Polyak-Lojasiewicz Condition with Markovian Noise
Avik Kar, Siddharth Chandak, Rahul Singh, Eric Moulines, Shalabh Bhatnagar, Nicholas Bambos
Optimization Theory
  • First uniform-in-time high-probability bound for SGD under the PŁ condition with Markovian noise.
  • Allows noise magnitude to grow with function value, enabling analysis of practical sampling strategies.
  • Establishes decay rates for both high-probability and expected suboptimality gaps.
  • Introduces a novel proof technique using the Poisson equation and probabilistic induction.
Read more
Multifidelity Surrogate Modeling of Depressurized Loss of Forced Cooling in High-temperature Gas Reactors
Meredith Eaheart, Majdi I. Radaideh
Optimization Efficient ML Theory
  • Multifidelity surrogate models can effectively reduce computational costs in nuclear reactor transient analysis.
  • Models trained on dominant input variables outperform those using the full set of inputs.
  • Two-fidelity configurations often yield better performance than three-fidelity setups at similar computational costs.
  • Multifidelity Gaussian processes demonstrated the best overall performance among the methods evaluated.
Read more
Anterior's Approach to Fairness Evaluation of Automated Prior Authorization System
Sai P. Selvaraj, Khadija Mahmoud, Anuj Iravane
Theory
  • Proposes a fairness evaluation framework based on model error rates rather than approval outcomes.
  • Utilizes a large dataset of human-reviewed prior authorization cases to assess demographic consistency.
  • Demonstrates that model error rates are consistent across most demographics, with some limitations in subgroup analysis.
  • Highlights the complexity of prior authorization processes and the need for rigorous fairness assessments.
Read more
Ultra-Early Prediction of Tipping Points: Integrating Dynamical Measures with Reservoir Computing
Xin Li, Qunxi Zhu, Chengli Zhao, Bolin Zhao, Xue Zhang, Xiaojun Duan, Wei Lin
Time Series
  • Introduces a novel framework (RCDyM) for predicting tipping points in complex dynamical systems.
  • Integrates reservoir computing with dynamical measures to analyze time series data without requiring system parameters.
  • Demonstrates ultra-early prediction capabilities by extrapolating trends in dynamical measures.
  • Validated through rigorous theoretical analysis and extensive numerical evaluations on synthetic and real-world datasets.
Read more
Not All Latent Spaces Are Flat: Hyperbolic Concept Control
Maria Rosaria Briglia, Simone Facchiano, Paolo Cursi, Alessio Sampieri, Emanuele Rodolà, Guido Maria D'Amely di Melendugno, Luca Franco, Fabio Galasso, Iacopo Masi
Generative Models Computer Vision Multimodal
  • Introduction of Hyperbolic Control (HyCon) for T2I models to enhance concept manipulation.
  • Utilization of hyperbolic geometry to achieve smoother and more predictable semantic transitions.
  • Integration with existing generative models via a lightweight adapter without the need for retraining.
  • Demonstration of state-of-the-art results across multiple safety benchmarks and T2I backbones.
Read more
MR-GNF: Multi-Resolution Graph Neural Forecasting on Ellipsoidal Meshes for Efficient Regional Weather Prediction
Andrii Shchur, Inna Skarga-Bandurova
Graph Learning Time Series Efficient ML
  • Introduction of MR-GNF, a lightweight and efficient model for regional weather forecasting.
  • Utilization of a tri-band ellipsoidal mesh for boundary-free cross-scale coupling.
  • Implementation of an axial graph-attention network for implicit 3D coupling.
  • Achieves competitive forecasting skill with significantly lower computational costs.
Read more
From Garbage to Gold: A Data-Architectural Theory of Predictive Robustness
Terrence J. Lee-St. John, Jordan L. Lawson, Bartlomiej Piechowski-Jozwiak
Theory
  • Predictive robustness arises from the synergy between data architecture and model capacity, not just data cleanliness.
  • High-dimensional, error-prone predictors can effectively mitigate noise in predictive modeling.
  • Informative collinearity enhances model reliability and convergence efficiency.
  • Proactive Data-Centric AI strategies can optimize predictor selection for better robustness.
Read more
DreamReader: An Interpretability Toolkit for Text-to-Image Models
Nirmalendu Prakash, Narmeen Oozeer, Michael Lan, Luka Samkharadze, Phillip Howard, Roy Ka-Wei Lee, Dhruv Nathawani, Shivam Raval, Amirali Abdullah
Generative Models Interpretability Multimodal
  • DreamReader provides a unified framework for interpretability in T2I diffusion models.
  • Introduces novel intervention techniques such as LoReFT and classifier-guided gradient steering.
  • Facilitates systematic analysis and intervention across different diffusion architectures.
  • Demonstrates effective control over generated images through targeted interventions.
Read more
AI-Driven Predictive Maintenance with Real-Time Contextual Data Fusion for Connected Vehicles: A Multi-Dataset Evaluation
Kushal Khemani, Anjum Nazir Qureshi
Multimodal Time Series Interpretability
  • Introduction of a multi-source contextual fusion architecture for predictive maintenance.
  • Demonstrated significant improvement in classification accuracy with the inclusion of contextual features.
  • Achieved high performance on a real-world predictive maintenance dataset.
  • Provided empirical evidence of model robustness against noise.
Read more
As Language Models Scale, Low-order Linear Depth Dynamics Emerge
Buddhika Nettasinghe, Geethu Joseph
NLP Large Language Models Interpretability
  • Low-order linear surrogates can accurately capture the depth dynamics of transformer models.
  • The fidelity of these surrogates improves with the size of the language model.
  • Linear surrogates enable more efficient multi-layer interventions compared to heuristic methods.
  • The study reveals a systems-level regularity in the dynamics of scaling language models.
Read more
How Log-Barrier Helps Exploration in Policy Optimization
Leonardo Cesani, Matteo Papini, Marcello Restelli
Reinforcement Learning Optimization Theory
  • Introduction of LB-SGB, which ensures a minimum level of exploration in policy optimization.
  • Theoretical guarantees for LB-SGB include O(ϵ−1) sample complexity and convergence without unrealistic assumptions.
  • Connection established between log-barrier regularization and Natural Policy Gradient, emphasizing the importance of Fisher information.
  • Empirical results show LB-SGB's superior performance in convergence compared to SGB and NPG.
Read more
Directional Routing in Transformers
Kevin Taylor
NLP Large Language Models Interpretability
  • Directional routing enhances transformer efficiency with minimal parameter overhead.
  • Routing is the dominant computational mechanism, crucial for factual recall and induction tasks.
  • Disabling routing leads to a significant drop in model performance, while individual attention heads show redundancy.
  • The model organizes into adaptive and fixed routing regimes, optimizing performance across layers.
Read more