AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

67 Papers today
8h Update frequency
7 Days of history
A Simple Plug-in for Improving Eviction-Based KV Cache Compression
Yuping Lin, Jiayuan Ding, Yue Xing, Pengfei He, Jiliang Tang, Subhabrata Mukherjee
NLP Large Language Models Efficient ML
  • VECTOR introduces a three-way token allocation strategy that considers both importance and reconstructability.
  • The method employs a lightweight Ordinary Least Squares (OLS) calibration for efficient value reconstruction.
  • VECTOR can be integrated into existing eviction-based methods with minimal adaptations.
  • Empirical results show significant performance improvements in high-compression scenarios.
Read more
Understanding and Improving Noisy Embedding Techniques in Instruction Finetuning
Abhay Yadav
NLP Large Language Models Theory
  • SymNoise outperforms NEFTune by 6.7% in fine-tuning performance.
  • The study clarifies the functional equivalence of Gaussian and uniform noise when scaled appropriately.
  • SymNoise enhances model performance by regulating local curvature during training.
  • The method demonstrates significant improvements across multiple instruction datasets.
Read more
Archimedean Copula Inference via Taylor-Mode AD
Cambridge Yang, Dongdong Li
Theory Efficient ML Optimization
  • ACOPULA is a JAX-native framework that handles arbitrary per-variable censoring in nested Archimedean copulas.
  • It computes exact likelihoods and parameter gradients in polynomial time, overcoming limitations of existing tools.
  • The framework supports both classical and neural copula generators, making it family-agnostic.
  • ACOPULA demonstrates significant speed improvements, achieving a ∼650× speedup over existing R implementations.
Read more
Preisach Attention: A Hysteretic Model of Sequential Memory
Piotr Frydrych
Theory Efficient ML Time Series
  • Introduction of the Preisach Attention Layer (PAL) as a new sequence modeling architecture.
  • PAL achieves Turing-completeness with O(1) depth, lower than traditional transformers.
  • Establishes expressiveness separation between PAL and transformer attention based on rate-independence.
  • The extremum stack in PAL acts as a minimal sufficient statistic for input history.
Read more
Assessing Predictive Models for Fairness Based on Movement Patterns
Francesco Lettich, Mario A. Nascimento, Chiara Pugliese, Chiara Renso
Theory
  • Introduces the concept of assessing fairness in predictive models based on individuals' movement patterns.
  • Challenges the traditional assumption of spatial fairness tied to a single geographical location.
  • Proposes a method that incorporates multiple locations, visit frequency, and duration into fairness assessments.
  • Demonstrates the effectiveness of the approach through experiments on synthetic datasets.
Read more
The Attribution Contract: Feature Attribution for Generative Language Models
Giang Nguyen
NLP Large Language Models Interpretability
  • Introduces the Attribution Contract to clarify feature attribution claims in generative language models.
  • Identifies contract ambiguity as a significant issue in feature attribution for GLMs.
  • Highlights the self-attribution fallacy, where attributions to generated tokens are misinterpreted as prompt-level explanations.
  • Proposes that feature attribution methods should be assessed as method-contract pairs rather than in isolation.
Read more
Learning partially observed systems with neural Hamiltonian ordinary differential equations
Sunniva Meltzer, Sølve Eidnes, Alexander Johannes Stasik
Time Series Theory Robotics
  • NHODE effectively learns dynamics of partially observed systems by combining HNNs and neural ODEs.
  • The framework allows for training with loss defined only on observed variables, enabling inference of latent states.
  • Incorporating physical structure improves prediction accuracy and stability in complex dynamical systems.
  • The method is evaluated on various systems, demonstrating robustness in challenging scenarios.
Read more
Coupling-Robust Accuracy in Multiphysics Physics Informed Neural Networks via Kronecker-Preconditioned Optimization
Youngjae Park, Jaemin Kim, Junghwa Hong
Optimization Theory
  • The spectral radius of the standard NTK increases with the square of the coupling strength in linearly coupled systems.
  • Block-diagonal Gauss-Newton preconditioning can stabilize the learning process by bounding the spectral radius of the preconditioned NTK.
  • SOAP+GN optimizer maintains coupling-robust accuracy across various multiphysics systems, outperforming traditional optimizers.
  • The method is validated through 234 experiments, demonstrating its effectiveness in both 1D and 2D systems.
Read more
Multi-Gate Residuals
Zhizhan Zheng, Feiyun Zhang, Shuchun Liu, Tian Xia, Xi Liu, Dasheng Hu, Hongquan Zhou
NLP Large Language Models Efficient ML
  • MGR stabilizes activation scales without incurring additional communication overhead.
  • The architecture combines features of existing methods to improve efficiency and performance.
  • Empirical results show tangible performance improvements over traditional architectures.
  • MGR addresses the challenges of information dilution and unbounded magnitude drift in deep networks.
Read more
Certification from Examples is Hard for Circuits and Transformers under Minimal Overparametrization
Artur Back de Luca, Kimon Fountoulakis
Theory
  • Exact certification can be exponentially hard even with minimal overparametrization.
  • Adding a single gate to threshold circuits of depth ≥2 can exponentially increase certification size.
  • Log-precision Transformers exhibit similar certification hardness with slight architectural changes.
  • Approximate certification still requires large certificates despite allowing polynomial mistakes.
Read more
RelPrism: A Multi-Faceted Pre-training Framework with Self-Generated Tasks for Relational Databases
Jinyu Yang, Cheng Yang, Junze Chen, Zedi Liu, Muhan Zhang, Hanyang Peng, Chuan Shi
Graph Learning
  • RelPrism is a multi-faceted self-supervised learning framework tailored for relational databases.
  • It constructs intrinsic, relational, and hybrid attributes to capture diverse information for predictive tasks.
  • The framework utilizes multi-granularity clustering to enhance representation learning.
  • Experimental results show a 4.15% improvement in ROC-AUC for classification tasks and a 10.75% reduction in MAE for regression tasks compared to existing methods.
Read more
Robust OT-Guided Generative Residual Domain Adaptation for Bike-Sharing Demand Prediction under Temporal Domain Shift
Yiming Ma
Time Series Optimization Generative Models
  • Introduces Gen-ROTDA, a robust framework for bike-sharing demand prediction under temporal shifts.
  • Focuses on residual domain adaptation rather than raw demand transfer for improved prediction accuracy.
  • Utilizes robust optimal transport to enhance stability against abnormal data records.
  • Demonstrates superior performance in mean absolute error compared to various baseline methods.
Read more
Same Architecture, Different Capacity: Optimizer-Induced Spectral Scaling Laws
Nandan Kumar Jha, Brandon Reagen
NLP Large Language Models Optimization
  • Optimizers significantly affect the spectral scaling laws of Transformer architectures.
  • AdamW and Muon optimizers yield markedly different scaling behaviors, particularly in rare-token representations.
  • Matched validation loss does not guarantee similar representation structures between different optimizers.
  • Optimizer-induced spectral shifts can surpass architectural effects in shaping representation capacity.
Read more
Value-Gradient Hypothesis of RL for LLMs
Arip Asadulaev, Daniil Ognev, Karim Salta, Martin Takac
Large Language Models Reinforcement Learning Theory
  • Critic-free RL methods like PPO and GRPO can effectively enhance LLMs despite theoretical concerns about long-horizon credit assignment.
  • The actor update in these methods behaves like a value-gradient signal, allowing for effective credit transport.
  • Empirical costates in discrete transformers approximate theoretical value gradients, with controlled error margins.
  • A decomposition of RL impact into value-gradient signals and reward headroom provides a practical criterion for RL effectiveness.
Read more
Worse than Random: The Importance of a Baseline for Unsupervised Feature Selection
Muhammad Rajabinasab, Michael E. Houle, Oussama Chelly, Arthur Zimek
Theory
  • Random feature selection is proposed as a necessary baseline for evaluating unsupervised feature selection methods.
  • Many state-of-the-art methods are shown to perform worse than random feature selection.
  • The absence of a proper baseline complicates the assessment of the value added by new feature selection methods.
  • The paper provides empirical evidence supporting the need for consistent evaluation standards in unsupervised feature selection.
Read more
No Epoch Like the Present: Robust Climate Emulation Requires Out-of-Distribution Generalisation
Bradley Stanley-Clamp, Anson Lei, Hannah M. Christensen, Ingmar Posner
Theory Time Series
  • Climate emulation is fundamentally an out-of-distribution prediction task.
  • Seasonal variation can effectively serve as a proxy for long-term climate shifts.
  • Current hybrid-ML emulators show significant performance degradation under realistic distribution shifts.
  • A novel evaluation framework is proposed that does not require additional data collection.
Read more
Valid and Expressive Copulas for Irregular Multivariate Time Series
Christian Klötergens, Tom Hanika, Lars Schmidt-Thieme, Vijaya Krishna Yalavarthi
Time Series
  • Introduction of CoPFITi, the first copula model specifically for IMTS.
  • Ensures marginalization consistency by decoupling marginal distributions from the dependency structure.
  • Demonstrates improved performance over existing non-copula baselines and previous copula models.
  • MargFlow, a model for univariate marginals, achieves the best marginal likelihood in evaluations.
Read more
AutoMCU: Feasibility-First MCU Neural Network Customization via LLM-based Multi-Agent Systems
Penglin Dai, Zijie Zhou, Xincao Xu, Junhua Wang, Xiao Wu, Lixin Duan
Efficient ML Large Language Models
  • AutoMCU shifts the focus from proxy-driven hardware-aware search to a feasibility-first approach, prioritizing backend-verified deployability.
  • The system integrates hardware-in-the-loop mechanisms to eliminate infeasible architecture candidates before training.
  • AutoMCU reduces the customization time for neural networks on MCUs to 1-2 hours, significantly improving efficiency.
  • Real-device deployments confirm the practical applicability of the proposed system for edge intelligence applications.
Read more
Models Can Model, But Can't Bind: Structured Grounding in Text-to-Optimization
Zhiqi Gao, Albert Ge, Alexander Berenbeim, Nathaniel D. Bastian, Frederic Sala
NLP Large Language Models Optimization
  • Text-to-optimization requires both modeling and binding capabilities, with binding being the primary bottleneck.
  • Text2Opt-Bench is introduced as a scalable benchmark for evaluating text-to-optimization models across diverse problem categories.
  • The BIND method significantly enhances model accuracy by programmatically binding data, achieving notable improvements in performance metrics.
  • Training binding-specific models yields better results than end-to-end supervised fine-tuning or reinforcement learning.
Read more
Learning Through Noise: Why Subliminal Learning Works and When It Fails
Vincent C. Brockers, Roman D. Ventzke, Valentin Neuhaus, Belén Hidalgo-Ogalde, Viola Priesemann
Theory
  • Subliminal learning does not require shared initializations between teacher and student models.
  • The compatibility of output heads (auxiliary and classification) is crucial for successful subliminal learning.
  • Architectural differences between teacher and student models can be accommodated as long as expressiveness conditions are met.
  • The study provides a theoretical basis for understanding subliminal learning and quantifies its limits.
Read more
Optimization of randomized neural networks for transfer operator approximation
Mohammad Tabish, Stefan Klus
Optimization Theory Efficient ML
  • Introduction of RaNNDy, a randomized neural network for transfer operator approximation.
  • Proposed algorithm optimizes activation functions while keeping hidden layer parameters fixed.
  • Demonstrated effectiveness on benchmark problems like stochastic differential equations.
  • Offers a computationally efficient alternative to fully trained neural networks.
Read more
GraphFlow: A Graph-Based Workflow Management for Efficient LLM-Agent Serving
Ao Li, Shangpeng Yang, Fahao Chen, Tianheng Xu, Peng Li, Zhou Su
Large Language Models Graph Learning Efficient ML
  • GraphFlow introduces a unified graph representation (wGraph) for dynamic workflow management.
  • The system enables adaptive workflow generation based on task-specific semantics.
  • GraphFlow optimizes memory usage by managing KV caches more efficiently.
  • Extensive experiments show significant performance improvements over state-of-the-art methods.
Read more
Multiple Neural Operators Achieve Near-Optimal Rates for Multi-Task Learning
Adrien Weihs, Hayden Schaeffer
Theory
  • Derivation of near-optimal approximation rates for Multiple Neural Operators (MNO) in multi-task learning.
  • Establishment of refined statistical learning rates that match those of single-task operator learning.
  • Introduction of lower complexity bounds for multiple operator learning, indicating intrinsic complexity barriers.
  • Comparison of MNO with DeepONet, showing similar performance in terms of approximation complexity.
Read more
Tabular foundation models for robust calibration of near-infrared chemical sensing data
Robin Reiter, Denis Cornet, Fabien Michel, Lauriane Rouan, Gregory Beurier
Theory Efficient ML
  • TabPFN shows promise as a calibration strategy for NIR chemical sensing data.
  • Preprocessing-optimized TabPFN outperforms traditional models like PLS and modern methods like CatBoost in regression tasks.
  • In classification, TabPFN applied directly to raw spectra achieves top performance.
  • Robustness analyses indicate limitations of TabPFN in handling spectral outliers and extrapolated samples.
Read more
The Value of Covariance Matching in Gaussian DDPMs and the Lanczos Sampler
Md Sahil Akhtar, Aymane El Gadarri, Vivek F. Farias, Adam D. Jozefiak
Generative Models Theory Efficient ML
  • Matching full posterior covariance reduces path-KL error from Ω(1/T) to O(1/T²).
  • The Lanczos Gaussian Sampler (LGS) enables practical covariance matching without dense storage.
  • LGS improves sample quality over strong diagonal-covariance baselines with minimal computational overhead.
  • The method leverages covariance-vector products through Jacobian-vector products for efficient sampling.
Read more
Convex Compositional Reasoning Models
Meir Roketlishvili, Semyon Semenov, Maksim Bobrin, Viktor Kovalchuk, Albert Baichorov, Abduragim Shtanchaev, Fakhri Karray, Dmitry V. Dylov, Martin Takáč, Arip Asadulaev
Optimization Theory Efficient ML
  • Identifies non-convex energy composition as a source of spurious minima in compositional energy-based models.
  • Introduces Convex Compositional Energy Minimization (CCEM) framework that maintains convexity in energy composition.
  • Develops a deterministic optimization pipeline for training and inference on tight convex relaxations.
  • Provides theoretical guarantees that convex composition prevents spurious local minima and certifies global optimality.
Read more
LLM-driven design of physics-constrained constitutive models: two agents are better than one
Marius Tacke, Matthias Busch, Kian Abdolazizi, Jonas Eichinger, Kevin Linka, Roland Aydin, Christian Cyron
Large Language Models Generative Models Theory
  • Introduction of a dual-agent system for constitutive model generation using LLMs.
  • The Creator agent proposes models while the Inspector agent ensures compliance with physical laws.
  • Significant improvement in model validity, achieving 100% compliance with physical constraints.
  • Models maintain high accuracy and generalization to unseen loading conditions.
Read more
Cost-Effective Model Evaluation with Meta-Learning
Trinh Pham, Viet Huynh, Hongzhi Yin, Quoc Viet Hung Nguyen, Thanh Tam Nguyen
Efficient ML NLP Computer Vision
  • Introduction of MetaEvaluator, a cost-effective, model-agnostic evaluation framework.
  • Utilization of meta-learning to transfer knowledge from reference models for performance estimation.
  • Development of MetaDataset, a large-scale dataset for training and evaluating the framework.
  • Demonstration of significant cost reduction in model evaluation compared to traditional methods.
Read more
Three Costs of Amortizing Gaussian Process Inference with Neural Processes
Robin Young
Theory Generative Models Efficient ML
  • Decomposes KL divergence between GP and LNP into three interpretable sources of error.
  • Establishes bounds on the truncation component of the bottleneck term related to kernel smoothness.
  • Identifies persistent costs of label contamination in neural process predictions.
  • Provides architectural recommendations to enhance predictive variance estimation.
Read more
DiLaDiff: Distilled Latent-Augmented Diffusion for Language Modeling
Jean-Marie Lemercier, Tomas Geffner, Karsten Kreis, Morteza Mardani, Arash Vahdat, Ante Jukić
NLP Generative Models Large Language Models
  • DiLaDiff combines continuous latent representations with discrete decoding for improved language modeling.
  • The model significantly accelerates inference while maintaining high sampling quality.
  • Consistency distillation enables rapid generation of high-quality text outputs.
  • DiLaDiff outperforms traditional masked diffusion models in both quality and efficiency.
Read more
Maestro: Reinforcement Learning to Orchestrate Hierarchical Model-Skill Ensembles
Jinyang Wu, Guocheng Zhai, Ruihan Jin, Yuhao Shen, Zhengxi Lu, Fan Zhang, Haoran Luo, Zheng Lian, Zhengqi Wen, Jianhua Tao
Reinforcement Learning Multimodal Large Language Models
  • MAESTRO reframes multimodal tasks as a sequential decision-making process over a hierarchical model-skill registry.
  • The orchestration policy is optimized using outcome-based reinforcement learning, eliminating the need for step-level supervision.
  • MAESTRO achieves an average accuracy of 70.1%, outperforming state-of-the-art models like GPT-5.
  • The framework demonstrates plug-and-play generalization to unseen models and skills without retraining.
Read more
Harnesses for Inference-Time Alignment over Execution Trajectories
Boyuan Wang, Bochao Li, Minghan Wang, Yuxin Tao, Fang Kong
NLP Large Language Models Optimization
  • Harness design is framed as an inference-time alignment problem, focusing on workflow and guidance components.
  • Optimal granularity in task decomposition must align with the agent's capabilities and retry budgets.
  • Guidance improves performance only when it aligns with task evidence; misalignment can lead to hallucinations.
  • Partial harnessing, which specifies only initial steps, can be more effective than fully structured workflows.
Read more
Latent Cache Flow: Model-to-Model Communication Without Text
Maximillian Rossi, Prajwal Raghunath, Eugene Wu
Large Language Models Efficient ML NLP
  • Latent Cache Flow (LCF) reduces the size of communication adapters significantly compared to Cache-to-Cache (C2C).
  • LCF allows for efficient model-to-model communication without the need for aligned contexts.
  • The method improves accuracy and speed of communication between LLMs, outperforming traditional text-based methods.
  • LCF-X extends the capabilities of LCF for cross-context communication by summarizing KV caches.
Read more
Relevant Walk Search for Explaining Graph Neural Networks
Ping Xiong, Thomas Schnake, Michael Gastegger, Grégoire Montavon, Klaus-Robert Müller, Shinichi Nakajima
Graph Learning Interpretability
  • Introduces polynomial-time algorithms for identifying relevant walks in GNNs, enhancing the scalability of GNN-LRP.
  • Presents two algorithms: EMP-neu for exact neuron-level walk identification and AMP-ave for approximate node-level walk identification.
  • Demonstrates superior performance of the proposed methods across multiple application domains with high accuracy.
  • Addresses the computational challenges of GNN-LRP, reducing complexity from exponential to polynomial.
Read more
Diffusion Domain Expansion: Learning to Coordinate Pre-trained Diffusion Models
Egor Lifar, Semyon Savkin, Timur Garipov, Shangyuan Tong, Tommi Jaakkola
Generative Models Audio & Speech Computer Vision
  • DDE extends the generative capabilities of pre-trained diffusion models to larger objects and complex conditioning.
  • The coordinator network is designed to be parameter-efficient and can generalize beyond training sizes.
  • DDE outperforms existing coordinated generation methods in both qualitative and quantitative evaluations.
  • The method is applicable to various domains, including audio and image generation.
Read more
The Implicit Bias of Depth: From Neural Collapse to Softmax Codes
Connall Garrod, Jonathan P. Keating, Christos Thrampoulidis
Theory Optimization
  • Depth induces an implicit low-rank bias that affects the emergence of Neural Collapse.
  • Low-rank structures promote efficient norm propagation through matrix multiplications.
  • The study connects depth-induced biases to softmax codes, revealing a relationship with max-margin solutions.
  • Training dynamics show that increasing depth can destabilize NC, favoring low-rank alternatives.
Read more
Ternary Decision Trees with Locally-Adaptive Uncertainty Zones
William Smits
Interpretability Theory Efficient ML
  • Introduction of ternary decision trees with uncertainty zones to enhance decision boundary handling.
  • Five methods for local computation of uncertainty zone width δ are proposed and evaluated.
  • All proposed methods significantly outperform standard CART in terms of accuracy across multiple datasets.
  • The margin method achieves the highest efficiency and requires no additional hyperparameters.
Read more
Anytime Training with Schedule-Free Spectral Optimization
Anuj Apte, Pranav Deshpande, Niraj Kumar, Shouvanik Chakrabarti, Junhyung Lyle Kim
Optimization Theory Efficient ML
  • SF-NorMuon outperforms SF-AdamW and matches or exceeds tuned AdamW optimizers.
  • The method allows for high-quality checkpoints at any training stage without predefined horizons.
  • Theoretical guarantees for stability in long-horizon training are established.
  • Weight decay is identified as crucial for maintaining performance during extended training periods.
Read more
Noise Schedule Design for Diffusion Models: An Optimal Control Perspective
Seo Taek Kong, Weina Wang, R. Srikant
Generative Models Optimization Theory
  • The paper formulates noise schedule design as an optimal control problem, enhancing theoretical understanding.
  • It establishes that a broader class of noise schedules can achieve O(d/n) sampling error bounds.
  • The introduction of Affine-Coupled Schedules (ACS) allows for systematic tuning of noise schedules.
  • Empirical results show that optimized schedules outperform traditional heuristic methods in image generation tasks.
Read more
What Linear Probes Miss: Multi-View Probing for Weight-Space Learning
Eunwoo Heo, Kyeongkook Seo, Jaejun Yoo
Theory Efficient ML Computer Vision
  • Identifies limitations of first-order single-view probing methods, which can lead to indistinguishable representations for distinct weight matrices.
  • Introduces MVProbe, a multi-perspective framework that integrates first-order and Gram-based interaction views for enhanced representation learning.
  • Demonstrates state-of-the-art performance on the Model Jungle benchmark across diverse architectures, including ResNet and Stable Diffusion LoRA.
  • Provides a principled per-sample standardization scheme to balance contributions from different probing views.
Read more
Leveraging Foundation Models for Causal Generative Modeling
Aneesh Komanduri, Xintao Wu
Generative Models Computer Vision Multimodal
  • Introduction of FM-CGM, a modular framework for causal generative modeling using foundation models.
  • Development of Causal Semantic Guidance (CSG) to ensure accurate propagation of semantic interventions.
  • Demonstration of the framework's ability to perform zero-shot causal discovery and counterfactual generation.
  • Empirical validation showing the framework's effectiveness in generating visually plausible counterfactual images.
Read more
It's the humans, not the data: Geopolitical bias in LLMs originates in post-training, amplified by the language of the prompt
Stuart Bladon, Brinnae Bent
Large Language Models NLP
  • Geopolitical bias in LLMs originates in the post-training phase, not pre-training.
  • Bias shifts are influenced by the nationality of the model developers.
  • The language used to prompt the model can amplify existing biases.
  • Significant bias shifts were observed across multiple LLMs from different labs.
Read more
Convex Low-resource Accent-Robust Language Detection in Speech Recognition
Miria Feng, William Tan, Mert Pilanci
Audio & Speech Optimization Efficient ML
  • Introduction of Convex Language Detection (CLD) framework for robust language identification.
  • Utilization of convex optimization techniques to ensure global optimality and fast training.
  • Theoretical guarantees of robustness and stability against feature perturbations.
  • Empirical validation showing high accuracy in low-resource dialect identification tasks.
Read more
CoSPlay: Cooperative Self-Play at Test-Time with Self-Generated Code and Unit Test
Zhangyi Hu, Chenhui Liu, Tian Huang, Jindong Li, Yang Yang, Jiemin Wu, Zining Zhong, Menglin Yang, Yutao Yue
Large Language Models Reinforcement Learning Generative Models
  • CoSPlay is a GT-free and training-free framework for code generation.
  • It employs a cooperative self-play mechanism to improve both code candidates and self-generated unit tests.
  • The framework demonstrates significant performance improvements over existing RLVR models.
  • CoSPlay shows scalability and generalizability across different model backbones.
Read more
Expand More, Shrink Less: Shaping Effective-Rank Dynamics for Dense Scaling in Recommendation
Guoming Li, Shangyu Zhang, Junwei Pan, Wentao Ning, Jin Chen, Gengsheng Xue, Chao Zhou, Shudong Huang, Haijie Gu, Menglin Yang
Optimization
  • RankMixer suffers from embedding collapse, limiting its expressivity and scalability.
  • RankElastor introduces parameterized full mixing and GLU-improved P-FFNs to enhance representation quality.
  • Empirical results show that RankElastor consistently outperforms RankMixer in CTR prediction tasks.
  • The architecture demonstrates improved effective rank, indicating better mitigation of representation collapse.
Read more
MedExpMem: Adapting Experience Memory for Differential Diagnosis
Qianhan Feng, Zhongzhen Huang, Yakun Zhu, Yannian Gu, Winnie Chiu Wing Chu, Xiaofan Zhang, Qi Dou
Multimodal
  • MedExpMem enables VLMs to accumulate differential diagnosis expertise through an experience memory framework.
  • The framework employs a two-phase memory construction process that mimics clinical learning.
  • It organizes knowledge around diagnosis pairs, enhancing the model's ability to differentiate between similar conditions.
  • Evaluation on a radiology benchmark shows consistent accuracy improvements, validating the approach's effectiveness.
Read more
Alike Parts: A Feature-Informed Approach to Local and Global Prototype Explanations
Jacek Karolczak, Jerzy Stefanowski
Interpretability
  • Introduction of 'alike parts' for local explanations that highlights shared features between instances and prototypes.
  • Augmentation of global prototype selection with a feature importance term to enhance diversity in feature attributions.
  • Evaluation on six benchmark datasets showing that diversity in feature importance does not compromise model fidelity.
  • Significant extension of previous work with a broader evaluation of algorithms and a more extensive experimental analysis.
Read more
Adaptive Mass-Segmented KV Compression for Long-Context Reasoning
Junzhe Yang, Xiaoyu Shen
Large Language Models NLP Efficient ML
  • Identifies 'Region Wipe-out' as a critical failure mode in KV cache compression.
  • Proposes AMS KV Compression, focusing on region-aware quota allocation.
  • Introduces EMA-based stabilization to maintain coherence in reasoning.
  • Demonstrates strong performance improvements across diverse reasoning tasks.
Read more
Learning Individual Dynamics from Sparse Cross-Sectional Snapshots
Christian Lagemann, Kai Lagemann, Steven L. Brunton, Sach Mukherjee
Time Series Theory Efficient ML
  • Introduces CADENCE, a framework for inferring individual dynamics from sparse data.
  • Establishes identifiability guarantees for single-timepoint trajectory inference.
  • Combines a score-based spatial encoder with a Soft Mixture-of-Experts router.
  • Demonstrates superior performance compared to state-of-the-art sequential models.
Read more
Uncovering the Latent Potential of Deep Intermediate Representations
Arnesh Batra, Arush Gumber, Aniket Khandelwal, Jashn Khemani, Anubha Gupta
NLP Large Language Models Multimodal
  • Task-specific subspaces formed by intermediate layers often outperform the final layer in downstream tasks.
  • LOES effectively identifies optimal layer combinations, enhancing performance by minimizing residual error under geometric constraints.
  • The proposed GeoReg loss function stabilizes representation geometry during fine-tuning, preventing feature collapse.
  • Performance improvements scale with model depth and are applicable across various architectures and data modalities.
Read more
Quantitative coronary calcification analysis for prediction of myocardial ischemia using non-contrast CT calcium scoring
Juhwan Lee, Sadeer Al-Kindi, Ammar Hoori, Tao Hu, Hao Wu, Justin N. Kim, Robert Gilkeson, Sanjay Rajagopalan, David L. Wilson
Interpretability
  • Developed a machine learning framework for predicting myocardial ischemia from non-contrast CT calcium scoring.
  • Analyzed 1,375 patients, identifying 74 variables including clinical data and calcium-omics features.
  • Achieved high precision (98.9%) and significant improvement in predictive performance with calcium-omics features.
  • Demonstrated the strong association of calcified arteries with myocardial ischemia through logistic regression analysis.
Read more
Structured-Sparse Attention for Entity Tracking with Subquadratic Sequence Complexity
Hangyue Zhao, Paul Caillon, Erwan Fagnou, Alexandre Allauzen
NLP Efficient ML
  • Introduces a structured-sparse view of attention in entity tracking.
  • Develops a blockwise inverse formulation that achieves subquadratic sequence complexity.
  • Demonstrates practical speedups in inference time while maintaining accuracy.
  • Identifies a limitation related to the capacity of attention heads in tracking multiple properties.
Read more
Memory-R2: Fair Credit Assignment for Long-Horizon Memory-Augmented LLM Agents
Sikuan Yan, Ahmed Bahloul, Ercong Nie, Susanna Schwarzmann, Riccardo Trivisonno, Volker Tresp, Yunpu Ma
Large Language Models Reinforcement Learning Optimization
  • Memory-R2 addresses the challenge of fair credit assignment in memory-augmented LLM agents.
  • The LoGo-GRPO algorithm combines local and global optimization for improved trajectory comparisons.
  • A shared-parameter architecture enables efficient co-learning between memory extraction and management.
  • Memory construction is formulated as a multi-step decision process, enhancing flexibility and accuracy.
Read more
ModeSwitch-LLM: A Lightweight Phase-Aware Controller for Cross-Mode LLM Inference on a Single GPU
Aman Sunesh, Ali Alshehhi, Hivansh Dhakne
Large Language Models Efficient ML Optimization
  • Introduces ModeSwitch-LLM, a controller for optimizing LLM inference on a single GPU.
  • Achieves 2.10× latency speedup and 51.7% lower energy per token compared to FP16.
  • Maintains accuracy close to FP16 with minimal increase in error.
  • Demonstrates that rule-based routing is more effective than learned routing policies.
Read more
A mathematical theory of balancing relational generalization and memorization
Luke Cheng, Samuel Lippl
Theory
  • Introduction of a novel task paradigm, transitive inference with exceptions, to study relational generalization and memorization.
  • Analytical characterization of kernel ridge regression models shows sensitivity to representational geometry in generalization tasks.
  • Validation of theoretical insights in pretrained language models indicates systematic errors aligned with the proposed theory.
  • Emphasis on the need for task paradigms that capture the complexity of balancing generalization and memorization.
Read more
Implicit Regularization of Mini-Batch Training in Graph Neural Networks
Clement Wang, Antoine Vialle, Robin Vaysse, Thomas Bonald
Graph Learning Optimization Efficient ML
  • Random Node Sampling (RNS) can outperform full-graph training on 8 out of 10 datasets.
  • Backward error analysis reveals that mini-batch SGD implicitly minimizes a modified objective with a gradient-variance regularizer.
  • RNS offers significant computational savings, achieving 2× to 12× speedups and up to 3× lower peak GPU memory usage.
  • The study highlights the importance of the sampler choice in shaping the effective learning objective in GNNs.
Read more
CALAD: Channel-Aware contrastive Learning for multivariate time series Anomaly Detection
Jaehyeop Hong, Youngbum Hur
Time Series
  • CALAD introduces a channel-aware approach to anomaly detection, enhancing the relevance of different channels.
  • The framework uses reconstruction errors to estimate channel importance without requiring labeled anomalies.
  • A channel-wise contrastive augmentation strategy is employed to align learning with anomaly semantics.
  • Combining contrastive learning with an auxiliary reconstruction head allows for the preservation of normal patterns.
Read more
Strong Teacher Not Needed? On Distillation in LLM Pretraining
Taiming Lu, Zhuang Liu
NLP Large Language Models Theory
  • Weaker teachers can improve stronger students under certain conditions.
  • Stronger teachers do not always lead to better student performance; overtraining can be detrimental.
  • Distillation enhances generalization more effectively than in-domain fitting.
  • Teacher-student compatibility is crucial for effective knowledge transfer.
Read more
The Matching Principle: A Geometric Theory of Loss Functions for Nuisance-Robust Representation Learning
Vishal Rajput
Theory
  • Introduces the Matching Principle, linking various robustness challenges in representation learning.
  • Establishes a unified statistical framework for estimating nuisance covariances and regularizing encoder Jacobians.
  • Presents empirical validation across multiple domains, confirming theoretical predictions on deployment drift.
  • Introduces the Trajectory Deviation Index (tdi) as a new measure of embedding sensitivity.
Read more
The Readout Shortcut: Positional Number Copying Dominates Arithmetic CoT Readout in Small Language Models
Ming Liu
NLP Large Language Models
  • CoT prompting is necessary for arithmetic tasks in small LMs, but its step order is less important than previously believed.
  • Models often rely on a positional shortcut, copying the last number in the answer context, which significantly impacts accuracy.
  • The presence of the correct answer can account for 54-92 percentage points of accuracy, demonstrating a strong dependency on positional information.
  • Different models exhibit varying degrees of content gating, affecting their ability to reject distractor numbers.
Read more
Hierarchical Variational Policies for Reward-Guided Diffusion
Kushagra Pandey, Farrin Marouf Sofian, Jan Niklas Groeneveld, Felix Draxler, Stephan Mandt
Generative Models Computer Vision Efficient ML
  • Introduces a unified framework for test-time guidance in diffusion models using hierarchical variational policies.
  • Develops Amortized HVP (AHVP) for efficient generation of high-quality reward-aligned samples.
  • Presents Semi-Amortized HVP (SHVP) that combines amortized proposals with test-time refinement.
  • Demonstrates significant improvements in perceptual quality and inference speed over state-of-the-art methods.
Read more
Understanding Multimodal Failure in Action-Chunking Behavioral Cloning
Lorenzo Mazza, Massimiliano Datres, Ariel Rodriguez, Sebastian Bodenstedt, Gitta Kutyniok, Stefanie Speidel
Robotics Generative Models Theory
  • Multimodality in behavioral cloning presents significant challenges when multiple valid actions correspond to the same observation.
  • Posterior-prior regularization can enhance sampling reliability but may also lead to loss of critical action-conditioned information.
  • The Lipschitz constant of the mapping from base to action space affects the multimodality of action-space generative policies.
  • Empirical experiments validate the theoretical findings regarding multimodality collapse and its implications for policy performance.
Read more
SepsisAI Orchestrator: A Containerized and Scalable Platform for Deploying AI Models and Real-Time Monitoring in Early Sepsis Detection
Santiago Ospitia, John Sanabria, John Garcia-Henao
Efficient ML
  • Introduction of SepsisAI Orchestrator as a modular, open-source platform for early sepsis detection.
  • Integration of HL7 FHIR-inspired preprocessing, NoSQL storage, and a containerized LightGBM classifier.
  • Empirical characterization of horizontal scaling under realistic concurrency, revealing a U-shaped latency curve.
  • Provision of a reproducible deployment recipe for clinical prediction tasks beyond sepsis.
Read more
Optimal Dimension-Free Sampling for Regularized Classification
Meysam Alishahi, Alexander Munteanu, Simon Omlor, Jeff M. Phillips
Theory Optimization Efficient ML
  • Establishes optimal sampling bounds for Lipschitz continuous classification loss functions.
  • Demonstrates k²/ε² and k/ε² bounds for different regularization terms.
  • Identifies conditions under which linear sampling complexity can be achieved.
  • Improves upon existing sensitivity sampling bounds through refined analytical techniques.
Read more
Why SGD is not Brownian Motion: A New Perspective on Stochastic Dynamics
Igor Ignashin, Anna Radovskaya, Andrew Semenov, Egor Lopatin, Stanislav Potapov, Aleksandr Kovalenko, Andrey Veprikov, Aleksandr Shestakov, Andrey Leonidov, Aleksandr Beznosikov
Optimization Theory
  • Proposes a new framework for SGD dynamics that accounts for finite learning rates and minibatch sampling.
  • Derives a discrete Fokker-Planck equation that reveals discrepancies with standard Langevin approximations.
  • Identifies distinct dynamical regimes based on the curvature of the loss landscape, with implications for optimization behavior.
  • Provides empirical evidence supporting the theoretical framework through analysis of neural network models.
Read more
Non-normal spectral signatures of instability in neural network training dynamics
Souvik Ghosh
Optimization Theory
  • Linearized update operators for Adam and SGD with momentum are generically non-normal.
  • Non-normality leads to transient amplification of perturbations during training.
  • The eigenvector condition number κ(V) serves as a more effective stability measure than the spectral radius.
  • Numerical experiments confirm κ(V) can separate stable and unstable training phases.
Read more
MARS: Magnitude-Aware Rank Statistics
Muhammad Rajabinasab, Afsaneh M. Nejad, Arthur Zimek
Theory
  • MARS addresses the issue of magnitude-blindness in traditional rank statistics.
  • It incorporates performance metric values into the ranking process for more accurate evaluations.
  • MARS uses a dynamic regularization of the Critical Difference to reflect performance volatility.
  • The methodology includes a non-parametric permutation test for stability assessment.
Read more