AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

57 Papers today
8h Update frequency
7 Days of history
Human-like autonomy emerges from self-play and a pinch of human data
Daphne Cornelisse, Julian Hunt, Zixu Zhang, Waël Doulazmi, Kevin Joseph, Jaime Fernández Fisac, Eugene Vinitsky
Reinforcement Learning Robotics
  • Spiced self-play combines self-play RL with minimal human data to improve driving policy alignment with human behavior.
  • Only 30 minutes of human driving data is used, significantly less than traditional imitation learning methods.
  • The method avoids complex reward engineering and domain randomization, simplifying the training process.
  • Policies trained with this approach exhibit lower collision rates and more human-like driving behaviors.
Read more
Thermodynamic Signatures of Reasoning: Free-Energy and Spectral-Form-Factor Diagnostics for Hallucination Detection in Large Language Models
Salim Khazem
NLP Large Language Models Theory
  • Introduction of Free-Energy Signatures (FES) for hallucination detection in LLMs.
  • FES captures thermodynamic properties of attention Laplacians, enhancing spectral analysis.
  • Empirical results show FES significantly improves AUROC metrics compared to existing methods.
  • The study establishes a connection between spectral statistics and reasoning quality in LLMs.
Read more
Effective Dimension Governs Generalization in Quantum Kernel Vision Models
Jian Xu, Delu Zeng, John Paisley, Qibin Zhao
Computer Vision Theory
  • Effective dimension (deff) is a key factor governing generalization in quantum vision models.
  • Entanglement structure and quantum noise are two mechanisms that influence deff.
  • Test accuracy across different entangling ansatze collapses onto a single function of deff.
  • Quantum noise can act as a spectral regularizer, improving accuracy in overfitting scenarios.
Read more
Hierarchical Control in Multi-Agent Games: LLM-based Planning and RL Execution
Jannik Hösch, Alessandro Sestini, Florian Fuchs, Amir Baghi, Joakim Bergdahl, Konrad Tollmar, Jean-Philippe Barrette-LaPierre, Linus Gisslén
Reinforcement Learning Large Language Models
  • Proposes a hierarchical LLM+RL architecture for multi-agent coordination.
  • Achieves competitive performance compared to hand-crafted behavior trees.
  • Significantly outperforms Flat RL approaches in task execution.
  • User study indicates LLM+RL agents are perceived as more human-like.
Read more
Manifold Bandits: Bayesian Curriculum Learning over the Latent Geometry of Large Language Models
Darrien McKenzie, Nicklas Hansen, Xiaolong Wang
Reinforcement Learning Large Language Models NLP
  • Introduces Bayesian Manifold Curriculum (BMC) for structured problem sampling in RL for LLMs.
  • Frames problem sampling as a manifold-structured bandit problem, emphasizing the relationships among tasks.
  • Demonstrates the importance of balancing productivity, diversity, and utility in problem selection.
  • Presents Latent Task Trees for hierarchical task organization based on model embeddings.
Read more
Neural Additive and Basis Models with Feature Selection and Interactions
Yasutoshi Kishimoto, Kota Yamanishi, Takuya Matsuda, Shinichi Shirakawa
Interpretability Efficient ML Theory
  • Incorporation of a feature selection mechanism into NAM and NBM to enhance computational efficiency.
  • Ability to handle high-dimensional datasets and capture feature interactions without losing interpretability.
  • Proposed models (NAM-FS and NBM-FS) show better or comparable performance to existing GAMs.
  • Demonstrated effectiveness of feature selection during training compared to pre-selected features.
Read more
VERITAS: Verifier-Guided Proof Search for Zero-Shot Formal Theorem Proving
Manish Acharya, Zhenyu Liao, Yueke Zhang, Kevin Leach, Yu Huang, Yifan Zhang
Theory Large Language Models Reinforcement Learning
  • Introduces a zero-shot framework that utilizes structured verifier feedback in proof search.
  • Implements a two-phase protocol combining Best-of-N sampling and Critic-guided MCTS.
  • Achieves improved theorem solving rates compared to existing methods, particularly in complex scenarios.
  • Releases VERITAS-CombiBench, a benchmark of 55 combinatorics theorems for further research.
Read more
Shifting-based Optimizable Linear Relaxations for General Activation Functions
Philipp Kern, László Antal, Erika Ábráham, Carsten Sinz
Optimization Theory Efficient ML
  • SLiR provides a general framework for optimizable linear relaxations applicable to various activation functions.
  • The method requires minimal manual effort, needing only a Lipschitz constant or critical points for parameterization.
  • SLiR enables the verification of up to 7.8 times more properties than existing methods.
  • The approach integrates seamlessly with modern bound-propagation frameworks.
Read more
StreamKL: Fast and Memory-Efficient KL Divergence for Boosting Attention Distillation
Guangda Liu, Yiquan Wang, Chengwei Li, Wenhao Chen, Jing Lin, Yiwu Yao, Danning Ke, Wenchao Ding, Jieru Zhao
Efficient ML Large Language Models Optimization
  • StreamKL is the first fused GPU primitive for attention KL divergence, eliminating quadratic memory costs.
  • The method achieves significant speedups (up to 43×) over existing attention distillation techniques.
  • StreamKL reduces the extra HBM footprint from O(NQNK) to O(1), facilitating long-context attention distillation.
  • The approach is particularly beneficial for large language models and other applications requiring efficient attention mechanisms.
Read more
ProMUSE: Progressive Multi-modal Uncertainty-guided Staged Evidential Alzheimer Disease Classification
Long Doan, Branden Chen, Ethan Litton, Huan Huang, Jiajing Huang, Yixin Xie, Weihua Zhou, Nandakumar Narayanan, Chen Zhao
Multimodal Efficient ML
  • ProMUSE reduces reliance on costly MRI and PET imaging by 50-90% while maintaining diagnostic accuracy.
  • The framework uses a progressive approach to incorporate modalities based on uncertainty levels.
  • Evidential classification is performed initially with low-cost clinical data, enhancing accessibility.
  • ProMUSE demonstrates competitive performance across multiple datasets, indicating robustness and generalizability.
Read more
UltraQuant: 4-bit KV Caching for Context-Heavy Agents
Inesh Chakrabarti, David Limpus, Aditi Ghai Rana, Bowen Bao, Spandan Tiwari, Thiago Crepaldi, Ashish Sirasao
Large Language Models Efficient ML
  • UltraQuant significantly improves KV caching efficiency for context-heavy agents.
  • The framework integrates TurboQuant-style rotation and codebook quantization.
  • Key optimizations on AMD GPUs enhance performance and reduce latency.
  • UltraQuant achieves a 3.47× reduction in time-to-first-token in late rounds.
Read more
LOKI: Memory-Free Null-Space Constrained Lifelong Knowledge Editing
Masih Eskandar, Miquel Sirera Perelló, Stratis Ioannidis, Jennifer Dy
NLP Large Language Models Efficient ML
  • Dynamic layer selection allows for per-sample modification, enhancing flexibility in knowledge editing.
  • Utilizes null-space projection to preserve past knowledge without needing previous data access.
  • Achieves significant performance improvements over existing lifelong knowledge editing methods.
  • Reduces computational overhead and avoids extensive pre-processing requirements.
Read more
Quantum-classical physics-informed Kolmogorov-Arnold networks for PDEs
Xiang Rao, Yuxuan Shen
Theory Efficient ML
  • Introduction of QCPIKAN, a novel quantum-classical physics-informed network for PDEs.
  • Theoretical proof of accelerated error convergence and reduced numerical dispersion.
  • Validation across three seepage scenarios demonstrating superior performance.
  • Enhanced global prediction accuracy and local error control compared to existing models.
Read more
CRAX: Fast Safe Reinforcement Learning Benchmarking
Tristan Tomilin, Mourad Boustani, Mickey Beurskens, Thiago D. Simão
Reinforcement Learning Robotics Efficient ML
  • CRAX provides a hardware-accelerated SafeRL benchmark, significantly speeding up simulations compared to traditional CPU-based setups.
  • The benchmark includes diverse tasks and difficulty levels, allowing for comprehensive evaluation of SafeRL methods.
  • No single SafeRL method dominates across all tasks, indicating the importance of understanding performance-safety trade-offs.
  • Curriculum learning and safety transfer techniques can improve agent performance in complex environments.
Read more
On the Variance of Temporal Difference Learning and its Reduction Using Control Variates
Hsiao-Ru Pan, Bernhard Schölkopf
Reinforcement Learning Theory
  • The asymptotic variance of TD learning is bounded above by that of Monte Carlo methods.
  • TD learning reduces variance by effectively aggregating over a larger pool of trajectories.
  • Shorter horizon updates in TD learning incur less variance for a fixed number of samples.
  • Direct Advantage Estimation (DAE) serves as a regression-adjusted control variate, achieving tighter variance bounds than TD.
Read more
Convex training of Lipschitz-regularized shallow neural networks
Chao Yin, Antoine Lesage-Landry
Optimization Theory
  • Introduction of a convex training method for Lipschitz-regularized SNNs.
  • The proposed method guarantees that the optimal network is no worse than the initial pre-trained network.
  • Demonstrated improvements in accuracy and robustness against adversarial attacks.
  • The convex program can be solved efficiently using standard optimization solvers.
Read more
The Token Is a Group Element: On Lie-Algebra Attention over Matrix Lie Groups
Przemyslaw Musialski
Robotics Theory Efficient ML
  • Introduces Lie-Algebra Attention, where tokens are elements of matrix Lie groups.
  • Attention scores are calculated using closed-form algebra norms of relative poses, eliminating the need for learned kernels.
  • Demonstrates applicability to various matrix Lie groups with empirical validation on SE(2), SO(3), and Aff(2).
  • Achieves superior performance with significantly fewer parameters compared to traditional learned kernel methods.
Read more
Evolutionary Two-Stage Hyperparameter Optimization Strategies for Physics-Informed Neural Networks
Fedor Buzaev, Dmitry Efremenko, Egor Bugaev, Andrei Ermakov, Denis Derkach, Daria Pugacheva, Fedor Ratnikov
Optimization
  • Introduction of a two-stage optimization strategy for PINNs to enhance training efficiency.
  • Demonstration that evolutionary algorithms outperform classical hyperparameter tuning methods.
  • Evidence-based guidelines for budget allocation between exploration and exploitation phases.
  • Significant reduction in mean error achieved under constrained computational resources.
Read more
Direct Advantage Estimation for Scalable and Sample-efficient Deep Reinforcement Learning
Hsiao-Ru Pan, Bernhard Schölkopf
Reinforcement Learning Efficient ML Theory
  • Extension of DAE to partially observable domains, enhancing its applicability.
  • Introduction of a discrete latent dynamics model to reduce computational complexity.
  • Demonstrated scalability with function approximator capacity while retaining efficiency.
  • Achieved competitive performance with significantly less data compared to existing methods.
Read more
Adversarial Bandit Optimization with Globally Bounded Perturbations to Convex Losses
Zhuoyu Cheng, Kohei Hatano, Eiji Takimoto
Optimization Theory
  • Introduces a bandit optimization model with C-approximately convex and β-smooth function sequences.
  • Establishes expected regret guarantees that account for adversarial perturbations under a global budget.
  • Demonstrates that sublinear expected regret is achievable even with non-convex losses.
  • Modifies existing algorithms to separate contributions from structured convex components and perturbations.
Read more
Integrating national forest inventory, airborne lidar, and satellite imagery for wall-to-wall mapping of forest structure with computer vision
Luke J. Zachmann, David D. Diaz, Vincent A. Landau, Chelsey Walden-Schreiner, Tony Chang, Nathan E. Rutenbeck, Katharyn A. Duffy, Kiarie Ndegwa, Andreas Gros, Scott Conway, Guy Bayes
Computer Vision
  • Introduction of the VibrantForests framework for comprehensive forest structure mapping.
  • Utilization of satellite imagery and lidar data to estimate multiple forest attributes.
  • Demonstration of improved predictive capabilities over existing models, particularly in diverse forest conditions.
  • Provision of annual updates at high spatial resolution (10 meters) for effective forest management.
Read more
Low-Energy Reduced RISC-V Instruction Subset Processor for Tsetlin Machine Inference at the Edge
Chanda Gupta, Sanidhya Bhatia, Shaurya Priyadarshi, Himani Panwar, Rishad Shafik, Sudip Roy
Efficient ML
  • Introduction of a programmable RISC-V architecture tailored for Tsetlin Machine inference.
  • Instruction profiling and simplification techniques to enhance performance and reduce energy consumption.
  • Demonstrated superior accuracy of Tsetlin Machines compared to Binarized Neural Networks on various datasets.
  • Achieved up to 98% reduction in execution time and 29.7× reduction in energy consumption.
Read more
When, Where, and How: Adaptive Binning for Tabular Self-Supervised Learning
Daehwan Kim, Haejun Chung, Ikbeom Jang
Theory Efficient ML
  • Introduction of Adaptive Binning for training-adaptive discretization in tabular SSL.
  • Feature-wise coarse-to-fine curriculum that refines discretization based on learning dynamics.
  • Integration of categorical and ordinal supervision for improved representation learning.
  • Demonstrated consistent performance gains across multiple medical tabular datasets.
Read more
Compositionality Emerges in a Narrow Depth-Connectivity Regime: Architecture Constraints and Solution Manifolds
Dat H. Do, Rushi Shah, Duc V. Le, Dianbo Liu
Theory Generative Models Computer Vision
  • Compositionality emerges in a narrow depth-connectivity regime, with specific sparsity patterns being crucial.
  • Gradient descent fails to find compositional solutions outside this regime, leading to fractured representations.
  • The introduction of similarity-based pruning (SP) and a depth predictor enhances the likelihood of discovering compositional structures.
  • A theoretical framework is provided to explain the conditions under which compositional solutions are reachable.
Read more
Computational Identifiability
Lucius E.J. Bynum, Rajesh Ranganath, Kyunghyun Cho
Theory
  • Introduction of computational identifiability as a practical alternative to theoretical identifiability.
  • Formalization of the relationship between causal effect estimation and meta-learning.
  • Empirical validation of computational identifiability across various complex scenarios.
  • Framework allows for identification with small sample sizes and ambiguous data.
Read more
Flow Map Denoisers: Traversing the Distortion-Perception Plane for Inverse Problems
Nicolas Zilberstein, Morteza Mardani, Santiago Segarra
Computer Vision Generative Models Theory
  • Flow maps can represent a continuum of denoisers, enabling traversal of the distortion-perception plane with a single model.
  • The lookahead parameter allows control over the tradeoff between distortion and perceptual quality.
  • The method achieves exact optimality for Gaussian targets and shows promising empirical results for natural images.
  • The integration into a Plug-and-Play framework provides a versatile solver for various inverse problems.
Read more
The Significance of Style Diversity in Annotation-Free Synthetic Data Generation
Zahra Abbasiantaeb, Zeno Belligoli, Omar Essam, Mohammad Aliannejadi
NLP Large Language Models Generative Models
  • Proposes an annotation-free framework for synthetic dialogue generation.
  • Demonstrates that style diversity is more crucial than topic diversity for data utility.
  • Introduces two stylization models (Univ and Exam) for enhancing linguistic style.
  • Achieves up to 93.3% performance of human-annotated data in intent classification tasks.
Read more
Protein Representation Learning with Secondary-Structure and Energy-Filtered Hydrogen-Bond Graphs
Mohamed Mouhajir, Limei Wang, El Houcine Bergou, Hajar El Hammouti, Lamiae Azizi, Dongqi Fu
Graph Learning
  • Introduction of SSProNet, a graph neural network that integrates secondary structure and hydrogen-bond interactions for protein representation.
  • Utilization of biophysically grounded graph topology that reflects stabilizing forces rather than mere proximity.
  • Augmentation of residue nodes with secondary structure assignments to enhance local structural context.
  • Empirical validation shows consistent performance improvements across various protein-related tasks.
Read more
Closing the Social-Semantic Gap: SPSD for Edge-Based Prompt Compression in Cloud LLM Inference
Abhinit Sen, Ajeet Kumar, Manaranjan Pradhan
NLP Large Language Models Efficient ML
  • Introduces SPSD, a novel edge-based prompt compression technique for LLM inference.
  • Achieves an average reduction of 99.9 tokens per prompt while maintaining response quality.
  • Demonstrates significant energy savings per call, estimated between 70-270 μWh.
  • Utilizes a 4-bit quantized SLM to compress prompts before transmission to cloud LLMs.
Read more
Activation- and Influence-Aware Ranks (AIR): Function-Preserving SVD Compression for LLMs
Nico Harder, Daniel Becking, Karsten Mueller, Wojciech Samek
Large Language Models Efficient ML Optimization
  • AIR integrates activation and influence metrics for improved SVD-based compression of LLMs.
  • The method achieves over 18% lower perplexity compared to SVD-LLM(W) at 60% parameter retention.
  • AIR requires approximately 90% less calibration data while maintaining model quality.
  • The framework leads to significant gains in system-level efficiency, including reduced peak memory and latency.
Read more
SL-S4Wave: Self-Supervised Learning of Physiological Waveforms with Structured State Space Models
Feng Wu, Harsh Deep, Eric Lehman, Sanyam Kapoor, Guoshuai Zhao, Rahul Krishnan, Gari Clifford, Li-wei H Lehman
Time Series
  • SL-S4Wave outperforms existing supervised and self-supervised methods in arrhythmia detection.
  • The framework demonstrates high label efficiency, requiring fewer labeled examples for training.
  • It effectively models long-range temporal dependencies in noisy, multichannel physiological waveforms.
  • SL-S4Wave shows strong cross-domain generalization to unseen arrhythmia types.
Read more
Insulin4RL: Real-Time Insulin Management in the Intensive Care Unit for Offline Reinforcement Learning
Thomas Frost, Steve Harris
Reinforcement Learning
  • Insulin4RL is a new ORL dataset that captures real clinical decision-making processes without temporal discretization.
  • The dataset includes over 375,000 labeled insulin titration decisions from ICU patients, providing a rich resource for ORL research.
  • Baseline experiments demonstrate that varying temporal assumptions can lead to divergent policies in insulin management.
  • The paper emphasizes the need for realistic evaluation environments to avoid biased conclusions about model performance.
Read more
Data Bias Mitigation under Coverage Constraints & The Price of Fairness
Bruno Scarone, Alfredo Viola, Renée J. Miller
Optimization Theory
  • Introduces coverage constraints to ensure adequate representation of intersectional subgroups in training data.
  • Balances bias mitigation with data efficiency, allowing for small approximation errors.
  • Formulates bias mitigation as an integer linear program to optimize data modification strategies.
  • Characterizes the cost of achieving fairness, aiding in decision-making for data governance.
Read more
PaAno+: Multiscale Encoding and Cross-Variable Attention for Time Series Anomaly Detection
Youji Zhu, Hongbing Wang, Wenchao Liu, Xiaodong Liu, Xiangguang Xiong
Time Series
  • PaAno+ introduces a lightweight model for time series anomaly detection that balances accuracy and computational efficiency.
  • The model employs multiscale feature extraction and cross-variable attention to enhance anomaly detection capabilities.
  • A novel self-supervised learning task is designed to improve the model's understanding of time series structure.
  • Extensive experiments show that PaAno+ achieves superior performance on benchmark datasets compared to existing methods.
Read more
When Calibration Fails the Vulnerable Hospital: Federated Conformal Risk Control via Risk-Curve Shrinkage
Nafis Fuad Shahid
Federated Learning Computer Vision Theory
  • Quantifies the marginal-conditional coverage gap in federated CRC using real brain tumor data.
  • Proposes a shrinkage-based federated CRC protocol to improve prediction set efficiency while maintaining coverage.
  • Demonstrates that naive pooling of calibration scores can lead to significant coverage violations at individual institutions.
  • Identifies the necessity of finite-sample correction terms in maintaining coverage guarantees.
Read more
3D-DLP: Self-Supervised 3D Object-Centric Scene Representation Learning
Ellina Zhang, Madhaven Iyengar, Amir Zadeh, Chuan Li, Deepak Pathak, David Held, Tal Daniel
Computer Vision Robotics
  • Introduction of the first self-supervised object-centric scene representation for colored 3D voxels.
  • Demonstration of interpretable and controllable 3D latent particles for scene representation.
  • Methodological innovations including an appearance-aware K-means keypoint prior and chroma reconstruction loss.
  • Significant performance gains in robotic manipulation tasks using 3D-DLP compared to traditional methods.
Read more
What Makes Effective Supervision in Latent Chain-of-Thought: An Information-Theoretic Analysis
Xinghao Chen, Chak Tou Leong, Wenjin Guo, Jian Wang, Wenjie Li, Xiaoyu Shen
NLP Large Language Models Theory
  • Identifies dual collapse in outcome supervision as a barrier to effective latent reasoning.
  • Proposes a framework decomposing process supervision into Trajectory and Space Supervision.
  • Introduces the Unified Latent Probe (ULP) for measuring mutual information in latent reasoning.
  • Finds that generative reconstruction is more effective than geometric compression for preserving information capacity.
Read more
Constrained hybrid modelling to predict microbial dynamics and organic matter turnover in soil systems
Paul Collart, Juergen Gall, Andrea Schnepf, Holger Pagel, Lars Doorenbos
Theory Optimization
  • Introduction of HySoMi, a hybrid modeling framework for soil carbon cycling predictions from microbial genomic data.
  • Integration of ecological theory into the model through a constrained loss function to enhance prediction accuracy.
  • Demonstration of improved performance over traditional models, even with small training datasets.
  • Evaluation on both synthetic and real datasets, showcasing the model's effectiveness in learning unmeasurable components.
Read more
Emyx: Fast and efficient all-atom protein generation
Nicholas J. Williams, Ward Haddadin, Matteo P. Ferla, Constantin Schneider, Nicholas B. Woodall, Ruby Sedgwick, Christian D. Madsen, Andrew L. Hopkins, Edward O. Pyzer-Knapp
Generative Models Efficient ML
  • Emyx introduces a simplified architecture for all-atom protein generation, reducing training costs and improving efficiency.
  • The model outperforms existing state-of-the-art methods in generating proteins with high structural novelty and accuracy.
  • Emyx achieves significant computational savings, requiring only 682 GPU-hours for training compared to competitors.
  • The model bridges flow matching training with diffusion model sampling techniques, enhancing its applicability.
Read more
Multi-Modal Contrastive Learning for Implicit Earth Embeddings via Location Tying
Jonathan Hecht, Lukas Arzoumanidis, Ziyue Li, Youness Dehbi
Multimodal
  • Introduction of two multimodal contrastive learning architectures: MELT and SALT.
  • Both architectures utilize unpaired geospatial data to improve location encoding.
  • Performance of MELT and SALT matches the best existing two-modality baseline.
  • Increasing modality diversity does not necessarily enhance performance, indicating limitations in the location encoder.
Read more
FlexLAM: Resolving the Bottleneck Trade-off in Latent Action Learning
Takanori Yoshimoto, Yang Hu, Naruya Kondo, Tatsuya Matsushima
Robotics Computer Vision Reinforcement Learning
  • Identifies a bottleneck trade-off in fixed-capacity LAMs that affects action alignment.
  • Introduces retained-prefix training for variable-length latent actions, enhancing transition decoding.
  • Demonstrates that FlexLAM outperforms fixed-capacity LAMs across all evaluated token budgets.
  • Supports inference-time token-budget adjustments without retraining.
Read more
Interactive Pareto navigation for deep multi-task learning
Augustina C. Amakor, Konstantin Sonntag, Sebastian Peitz
Optimization
  • Introduction of the Preference Pareto Exploration (PPE) framework for interactive navigation of Pareto fronts.
  • Utilization of a predictor-corrector method to efficiently explore Pareto-optimal solutions.
  • Avoidance of explicit Hessian computations through the use of Krylov subspace methods.
  • Demonstration of the method's functionality and performance on toy problems and deep learning tasks.
Read more
Matching Markets meet Cumulative Prospect Theory: Towards Optimal and Adversarially Robust Learning
Ananya Kunisetty, Avishek Ghosh
Theory Optimization
  • Introduces Cumulative Prospect Theory (CPT) in the context of multi-agent multi-armed bandit problems.
  • Derives regret bounds for a CPT-weighted learning algorithm in matching markets.
  • Implements an improved algorithm that optimally selects arms during exploration to achieve lower regret.
  • Addresses adversarial settings with corrupted rewards, ensuring robust learning outcomes.
Read more
Exploring the potential of AlphaEarth and TESSERA embeddings for Fine-scale Local Climate Zone Mapping: A case study across five cities in Switzerland
Htet Yamin Ko Ko, Clement Atzberger
Computer Vision
  • TESSERA embeddings outperform traditional Sentinel-1/2 composites and AlphaEarth for LCZ mapping.
  • An attention-based U-Net architecture is effective for generating fine-scale LCZ maps.
  • The study demonstrates the potential of embedding datasets to reduce preprocessing and manual feature engineering.
  • Higher-resolution reference data significantly enhances classification accuracy.
Read more
Multi-Task Bayesian In-Context Learning
Qingyang Zhu, Eric Karl Oermann, Kyunghyun Cho
Theory Efficient ML Time Series
  • Introduces a flexible framework for test-time adaptation in Bayesian predictive inference.
  • Demonstrates that the proposed method matches oracle Bayesian predictors across diverse tasks.
  • Achieves significantly faster inference compared to traditional Bayesian methods.
  • Shows robust generalization under controlled out-of-meta-distribution prior shifts.
Read more
Hard or Just Unreached? Diagnosing the Sampling Blind Spot in Math-Reasoning Difficulty Estimation
Luca Zhou, Sajel Shah, Emanuele Rodolà, Roberto Dessì
NLP Reinforcement Learning Large Language Models
  • The pass@k metric has a persistent blind spot for the hardest examples in math reasoning tasks.
  • A deterministic decoding regime can solve a significant fraction of problems that sampling methods fail to reach.
  • Activation grafting serves as an effective diagnostic tool to identify and recover hard examples from the model's residual stream.
  • Current difficulty estimation methods may misclassify problems, conflating 'hard' with 'unreached'.
Read more
Zero-Inflated Gaussian Distributions Enable Parameter-Space Sparsity in Estimation-of-Distribution Algorithms
Andreas Faust, Sven Nitzsche, Juergen Becker
Optimization
  • Introduction of zero-inflated Gaussian distributions for sparse parameter optimization in EDAs.
  • Joint optimization of sparsity patterns and active values without additional hyperparameters.
  • Identification of latent parameters from observed samples, enhancing model robustness.
  • Empirical results demonstrate superior performance of ZIG-EDA compared to traditional methods.
Read more
Weibull Weight-Scale Parameter Evolution under AdamW Training Dynamics
Tiexin Ding
Optimization Theory Large Language Models
  • Establishes a connection between Weibull weight-scale parameter λ and AdamW squared-norm dynamics.
  • Demonstrates that alignment force significantly influences the rise phase of λ, contributing 88-94% of the force budget.
  • Identifies a transition from growth to relaxation of λ corresponding to a balance between alignment and decay forces.
  • Introduces a spline displacement method for recovering alignment force from sparse checkpoints with high accuracy.
Read more
When to Trust, How to Distill: Multi-Foundation Model Guidance for Lightweight, Robust Scientific Time Series Forecasting
Rupasree Dey, Abdul Matin, Nathan Orwick, Yao Zhang, Shrideep Pallickara, Sangmi Lee Pallickara
Time Series Efficient ML Theory
  • Introduction of Guard, a framework for dynamic multi-teacher knowledge distillation.
  • Utilization of a contextual router for adaptive teacher selection based on input statistics.
  • Implementation of an uncertainty-aware gating mechanism to filter unreliable teacher guidance.
  • Demonstrated significant RMSE reduction in various scientific forecasting tasks.
Read more
ADaPT: Token-Level Decoupling for Efficient Large Reasoning Models
Tingyun Li, Zishang Jiang, Jinyi Han, Xinyi Wang, Sihang Jiang, Han Xia, Zhaoqian Dai, Shuguang Ma, Fei Yu, Jiaqing Liang, Yanghua Xiao
NLP Large Language Models Efficient ML
  • Identifies sequence-level coupling as a primary cause of performance degradation in efficient reasoning methods.
  • Proposes ADaPT, a token-level framework that decouples efficiency and correctness signals during training.
  • Enables precise control over the efficiency-performance trade-off at inference time.
  • Demonstrates significant reductions in inference costs without sacrificing reasoning performance.
Read more
On the QUEST for Uncertainty Quantification via Highest Density Regions
Sam Goring, Tom Kuipers, Nicola Paoletti, David S. Watson
Theory
  • QUEST provides a novel framework for uncertainty quantification based on highest density regions.
  • The approach addresses limitations of traditional UQ methods that rely on proper scoring rules.
  • QUEST measures satisfy important axioms from the UQ literature, enhancing their theoretical soundness.
  • Empirical evaluations show that QUEST performs better than standard UQ measures in regression tasks.
Read more
Algebraic Dead Directions in LayerNorm Transformers: A Forward-Pass-Only Diagnostic at LLM Scale
Tejas Pradeep Shirodkar, P. J. Narayanan
Theory Large Language Models Optimization
  • Introduces a forward-pass-only method to identify dead directions in LayerNorm transformers.
  • Derives an algebraic kernel direction from the LayerNorm scale parameter, requiring no complex computations.
  • Validates the method on 14 pretrained transformers, achieving high accuracy in predicting dead directions.
  • Demonstrates that training significantly deepens the kernel direction and opens additional dead directions.
Read more
Advances in Scientific Machine Learning for Coupled Fluid Flow and Transport
Gabriel F. Barros, Rômulo M. Silva, Alvaro L. G. A. Coutinho
Theory Efficient ML
  • Introduction of SciML methods for modeling complex fluid flow and transport phenomena.
  • Review of linear and nonlinear surrogate modeling techniques, including PINNs and β-VAEs.
  • Presentation of new contributions in modeling turbidity currents and thermal flows.
  • Discussion of computational challenges and the role of HPC strategies in SciML.
Read more
Latent Confounded Causal Discovery via Lie Bracket Geometry
Sridhar Mahadevan
Theory Graph Learning
  • Introduces BRIDGE and SKFM algorithms for causal discovery under latent confounding.
  • Establishes that latent confounding obstructs coherent causal information transport.
  • Demonstrates high performance on synthetic data while highlighting challenges with real data.
  • Combines information-geometric and categorical methods for improved causal inference.
Read more
Federated Bilevel Performative Prediction
Liangxin Qian, Chang Liu, Xuanyu Cao, Jun Zhao, Kwok-Yan Lam
Optimization Federated Learning Theory
  • Introduces federated bilevel performative prediction, addressing decision-dependent distribution shifts.
  • Establishes the concept of the federated bilevel performatively stable (FBPS) point with conditions for its existence and uniqueness.
  • Develops two algorithms, FBi-RRM and FBi-SGD, with convergence guarantees under specific conditions.
  • Demonstrates improved performance in strategic learning tasks and validates stability thresholds through experiments.
Read more
Self-Adaptive Scale Handling for Forecasting Time Series with Scale Heterogeneity
Xu Zhang, Zhengang Huang, Yunzhi Wu, Xun Lu, Erpeng Qi, Yunkai Chen, Zhongya Xue, Peng Wang, Wei Wang
Time Series
  • Introduction of a self-Adaptive Scale-handling (AS) module for scale-heterogeneous time series forecasting.
  • The AS module includes Scale Calibrating (SC) and Scaling Selection (SS) components to optimize scale handling.
  • Empirical results demonstrate significant improvements in forecasting accuracy when using the AS module with existing models.
  • The approach preserves semantic discriminability while reducing inverse-scaling errors.
Read more
Connect the Dots: Training LLMs for Long-Lifecycle Agents with Cross-Domain Generalization Via Reinforcement Learning
Yanxi Chen, Weijie Shi, Yuexiang Xie, Boyi Hu, Yaliang Li, Bolin Ding, Jingren Zhou
Large Language Models Reinforcement Learning
  • Introduction of the CoD framework for training LLMs to enhance long-lifecycle agent capabilities.
  • End-to-end reinforcement learning approach interleaving task-solving and context-updating episodes.
  • Empirical validation showing improved task-solving performance through self-updated context.
  • Demonstration of cross-domain generalization potential of the CoD meta-capability.
Read more