AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

59 Papers today
8h Update frequency
7 Days of history
Reinforcement Learning for Neural Model Editing
Shaivi Malik
Reinforcement Learning Computer Vision NLP
  • Introduces a reinforcement learning framework for neural model editing.
  • Develops two environments: MaskWorld and ShiftWorld for different weight modification strategies.
  • Achieves significant improvements in bias mitigation and machine unlearning tasks.
  • Demonstrates the potential of RL to automate model editing without specialized algorithms.
Read more
The Mathematics of AI Winters: The mathematical Taxonomy of Paradigm Fragility in AI Winter
Miquel Noguer i Alonso, David Pacheco Aznar
Theory
  • AI winters were influenced by formal mathematical barriers, not just engineering failures.
  • Key limitations include representational capacity, optimization hardness, and statistical learnability.
  • The paper synthesizes existing mathematical theories to provide a unified interpretation of AI winters.
  • Later breakthroughs in mathematics and algorithms were crucial for overcoming the identified barriers.
Read more
Clustering Node Attributed Networks with Graph Neural Networks and Self Learning
Rodrigo de Sapienza Luna, Daniel Ratton Figueiredo
Graph Learning
  • Introduces DCSL-GNN, a novel unsupervised framework for clustering attributed networks.
  • Utilizes self-learning and context generation across multiple rounds to improve node representation.
  • Demonstrates superior performance compared to traditional clustering methods that use only network structure or node attributes.
  • Empirical results indicate competitiveness with state-of-the-art methods on real datasets.
Read more
Understanding helpfulness and harmless tension in reward models
Eshaan Tanwar, Pepa Atanasova
NLP Large Language Models Reinforcement Learning
  • Mixed-objective reward models underperform compared to single-objective models due to alignment tension.
  • Neurons associated with helpfulness and harmlessness objectives exhibit interference, affecting model performance.
  • A significant proportion of neurons are shared between the two objectives, contributing to alignment tension.
  • The study provides insights into the internal mechanisms of reward models and their representation of alignment objectives.
Read more
Beyond the Commitment Boundary: Probing Epiphenomenal Chain-of-Thought in Large Reasoning Models
Daniel Scalena, Sara Candussio, Luca Bortolussi, Elisabetta Fersini, Malvina Nissim, Gabriele Sarti
NLP Large Language Models Interpretability
  • Introduces a causal framework for analyzing CoT reasoning in language models.
  • Identifies a commitment boundary where models stabilize their answers.
  • Demonstrates that reasoning beyond the commitment boundary is often redundant.
  • Uses lightweight attention probes to predict answer-formation stages accurately.
Read more
When Does Routing Become Interpretable? Causal Probes on Block Attention Residuals
Aydin Javadov
Interpretability Large Language Models
  • Introduces a routing-ablation framework for analyzing Block AttnRes models.
  • Demonstrates that explicit depth routing alone does not guarantee mechanistic interpretation.
  • Identifies three localized causal motifs in the trained Block AttnRes model.
  • Finds a sharp dissociation between routing mass and causal importance.
Read more
Improving Crash Frequency Prediction from Simulated Traffic Conflicts Using Machine Learning Based Microsimulation
Xian Liu, Carlo G. Prato, Gustav Markkula
Optimization
  • ML-based behavior models improve the realism of traffic microsimulation.
  • The study demonstrates that ML-generated conflicts yield better crash predictions compared to rule-based models.
  • Current ML models struggle to produce realistic crash scenarios despite accurately simulating conflicts.
  • Extreme Value Theory is effectively used to model crash frequency from simulated conflicts.
Read more
WHAR Arena: Benchmarking the State of the Art in Efficient Wearable Human Activity Recognition
Maximilian Burzer, Tobias King, Till Riedel, Michael Beigl, Tobias Röddiger
Efficient ML Time Series
  • Introduction of a standardized benchmarking framework for WHAR.
  • Curation of 30 datasets and 17 models to facilitate fair comparisons.
  • Evaluation of performance metrics alongside deployment efficiency.
  • Identification of a distributed state of the art in WHAR rather than dominance by a single architecture.
Read more
Boosting Direct Preference Optimization with Penalization
Pengwei Sun
NLP Large Language Models Optimization
  • DPOP enhances DPO by incorporating a gated penalty on reference-greedy responses.
  • The penalty is activated only when the current policy misclassifies the preferred response.
  • Empirical results show DPOP outperforms existing methods like DPO, SimPO, and AlphaDPO.
  • The method demonstrates the utility of reference-greedy responses in preference optimization.
Read more
CLARITree: Cholesky and Lookahead Accelerations for Regression with Interpretable Piecewise Linear Trees
Yixiao Wang, Hayden McTavish, Varun Babbar, Margo Seltzer, Cynthia Rudin
Efficient ML Interpretability Optimization
  • CLARITree combines lookahead search strategies with Cholesky updates for efficient regression tree construction.
  • The algorithm maintains numerical stability and computational feasibility for continuous-feature searches.
  • Empirical results show that CLARITree consistently outperforms greedy induction methods and scales better than optimal baselines.
  • The method achieves a strong trade-off between runtime and accuracy, making it suitable for large-scale problems.
Read more
Multi-Bitwidth Quantization for LLMs Using Additive Codebooks
Liza Babaoglu, Shuangyi Chen, Ashish Khisti
Large Language Models Efficient ML Theory
  • Drop-by-Drop enables inference-time precision control for LLMs without retraining.
  • The method is grounded in information theory and successive refinement, allowing for progressive compression.
  • It utilizes Matryoshka-style supervision to enhance the training of additive codebooks.
  • The framework maintains low perplexity and strong task accuracy across multiple bitwidths.
Read more
Let's Ask Gauss: Improved One-Run Privacy Auditing
Adya Agrawal, Yu Wei, Jaspal Singh, Malik Magdon-Ismail, Vassilis Zikas
Theory Efficient ML Federated Learning
  • Introduces a Gaussian-pair auditor for improved one-run privacy auditing in DP-SGD.
  • Demonstrates that canary-aligned scores converge to a Gaussian distribution, allowing for tighter privacy bounds.
  • Proves practical convergence guarantees for the Gaussian asymptotics within typical training steps.
  • Achieves significant improvements in empirical lower bounds compared to existing auditing methods.
Read more
The Metric Picks the Winner: Evaluation Choice Flips Model Rankings for Drug-Response Prediction in Unseen Chemistry
Dhruv Agarwal, Riya Bisht
Theory Optimization Interpretability
  • Complex models often fail to outperform simple baselines in drug-response prediction when faced with unseen chemistry.
  • A staged approach combining baseline reporting, non-parametric retrieval, and fusion with chemistry embeddings improves prediction accuracy.
  • Model rankings can invert based on the evaluation metric, emphasizing the need for careful metric selection in model assessment.
  • Deep learning models can outperform simpler models when evaluated with a well-calibrated metric that reflects true predictive performance.
Read more
Speculative Rollback Correction for Quality-Diverse Web Agent Imitation
Longkun Hao, Hongyu Lin, Hao Li, Zhichao Yang, Haojie Hao, Dongshuo Huang, Haitao Yang, Hongyu Ge, Mingjie Xie, Yanjun Wu, Zihao Yin, Yan Bai, Yihang Lou
Reinforcement Learning Robotics Optimization
  • Introduction of Speculative Rollback Correction (SRC) for web agent imitation learning.
  • SRC allows for localized expert intervention, preserving useful exploration while correcting harmful actions.
  • The framework achieves significant performance gains on long-horizon tasks compared to baseline methods.
  • SRC supports the retention of diverse solution paths, enhancing the training signal for agents.
Read more
Quantizing Time-Series Models As Dynamical Systems: Trajectory-Based Quantization Sensitivity Score
Mariya Pavlova, Harrison Bo Hua Zhu, Elizaveta Semenova, Yingzhen Li
Time Series Efficient ML Theory
  • Introduction of Trajectory-based Quantization Sensitivity Score (TQS) for quantization sensitivity analysis.
  • Decoupling of sensitivity estimation from quantization choices allows for flexible quantization budget planning.
  • Development of TQS-PTQ, a calibration-free mixed-precision quantization framework.
  • Identification of distinct quantization sensitivity patterns in time-series models compared to large language models.
Read more
Dolph2Vec: Self-Supervised Representations of Dolphin Vocalizations
Chiara Semenzin, Faadil Mustun, Roberto Dessi, Pierre Orhan, Alexis Emanuelli, Yair Lakretz, Gonzalo de Polavieja, German Sumbre
Audio & Speech
  • Dolph2Vec is the first large-scale, species-specific SSL model for dolphin vocalizations.
  • The dataset includes over 180,000 whistles collected over five years, enabling detailed analysis of dolphin communication.
  • Dolph2Vec significantly outperforms general-purpose models in signature whistle classification and whistle detection tasks.
  • The model's embeddings capture interpretable acoustic units, aiding in the understanding of dolphin communication patterns.
Read more
Order Is Not Control
Gareth Seneque, Lap-Hang Ho, Nafise Erfanian Saeedi, Jeffrey Molendijk, Tim Elson
Theory Interpretability Large Language Models
  • Control requires a receiver-gated response law that maps states and actions to measurable outcomes.
  • Order is distinct from control; interventions can induce order without achieving control.
  • Empirical evidence from biological systems and LLMs supports the proposed response law framework.
  • Local control is defined by the ability to move a target response while keeping side effects bounded.
Read more
Scale Buys Interpolation, Structure Buys a Horizon: Certified Predictability for Equivariant World Models
Hongbo Wang
Theory Robotics Time Series
  • Introduces a certified horizon for equivariant latent world models, stratified by Lyapunov spectrum.
  • Establishes that only equivariant models can achieve a predictable horizon, with a matching lower bound for approximate equivariance.
  • Empirical validation shows that equivariant networks significantly outperform non-equivariant models in predicting chaotic dynamics.
  • The certificate allows for training-free audits of pretrained models, enhancing trustworthiness without additional data.
Read more
Once-for-All: Scalable Simultaneous Forecasting via Equilibrium State Estimation
Beinan Xu, Andy Song, Jiti Gao, Feng Liu
Time Series Efficient ML Theory
  • ESE enables simultaneous forecasting of multiple systems in a single pass, improving efficiency.
  • The method demonstrates a 10–70× speedup compared to state-of-the-art methods while maintaining accuracy.
  • ESE can be integrated with existing predictors, enhancing their capabilities for multi-prediction.
  • The approach is robust under diverse perturbations and scales effectively with the number of systems.
Read more
How Useful is Causal Invariance for Domain Adaptation in Finite-Sample Settings?
Julia Kostin, Kasra Jalaldoust, Elias Bareinboim, Samory Kpotufe, Fanny Yang
Theory
  • Causal invariance can improve supervised domain adaptation by identifying invariant predictors.
  • Finite-sample gains depend on the target-risk margins and finite-source estimation errors.
  • An adaptive aggregation procedure can outperform target-only learning under certain conditions.
  • The study connects theoretical results to structural shifts in linear Structural Causal Models (SCMs).
Read more
Limits of spectral learning under noise
Sabin Roman, Ljupco Todorovski, Saso Dzeroski, Marta Sales-Pardo, Roger Guimera
Theory Interpretability
  • Noise induces a predictable drift in spectral coefficient vectors.
  • The magnitude of drift is related to the effective number of active spectral modes.
  • A closed-form expression for the overlap between noisy and noiseless coefficients is derived.
  • Numerical experiments validate the theoretical predictions across multiple spectral bases.
Read more
Disparate Impact in Synthetic Data Generation
Paul Andrey, Michaël Perrot, Batiste Le Bars, Marc Tommasi
Generative Models Graph Learning Theory
  • Introduces a new definition of fairness in SDG based on disparate impact.
  • Highlights the importance of assessing utility equality across sensitive groups.
  • Investigates the causes of disparate impact, including estimation errors and sampling biases.
  • Proposes a group-wise modeling approach to enhance utility and fairness.
Read more
Crossing the Validation Crisis: Cross-Validation Reduces Benchmarking Variance Surprisingly Well
Célestin Eve, Gaël Varoquaux, Thomas Moreau
Theory
  • Cross-validation significantly reduces benchmarking variance and increases confidence in performance estimates.
  • The concept of 'sample gain' quantifies the benefits of using multiple cross-validation splits.
  • Diminishing returns from additional splits occur later than expected, suggesting more splits can be beneficial.
  • A dynamic early-stopping procedure for cross-validation can optimize computational efficiency.
Read more
Boltzmann Attention: Learnable Ising Couplings for Cooperative Attention
Gilhan Kim, Daniel K. Park
NLP Large Language Models Theory
  • Boltzmann Attention introduces learnable pairwise couplings to enhance attention mechanisms.
  • The method outperforms standard softmax attention in tasks involving longer sequences.
  • Ablation studies confirm that improvements are due to the learnable couplings.
  • The Ising model framework allows for potential integration with quantum computing techniques.
Read more
Multimodal Graph Negative Learning
Zhengyu Wu, Xu Wang, Hongchao Qin, Xunkai Li, Guang Zeng, Rong-Hua Li, Guoren Wang
Graph Learning Multimodal
  • Introduces GraphMNL, a framework for addressing semantic imbalance in MAGs.
  • Utilizes Negative Learning to guide inferior branches without forcing imitation of dominant branches.
  • Implements a graph-aware reliability arbitration mechanism for branch selection.
  • Achieves significant performance improvements over existing methods on benchmark datasets.
Read more
Rubric-Guided Self-Distillation: Post-Training Without Rubric Verifiers
MohammadHossein Rezaei, Anas Mahmoud, Zihao Wang, Utkarsh Tyagi, Advait Gosai, Razvan-Gabriel Dumitru, Aakash Sabharwal, Bing Liu, Yunzhong He
NLP Large Language Models Reinforcement Learning
  • RGSD eliminates the need for external LLM verifiers in training, reducing computational overhead.
  • The method transforms sparse trajectory-level rewards into dense per-token learning signals.
  • RGSD achieves competitive performance compared to traditional judge-based methods while being more efficient.
  • The study highlights the importance of rubric conditioning in enhancing model responses.
Read more
MiniPIC: Flexible Position-Independent Caching in <100LOC
Nathan Ordonez, Thomas Parnell
Large Language Models NLP Efficient ML
  • MiniPIC enables flexible position-independent caching with minimal code changes.
  • It introduces user-controlled primitives for cache reuse, enhancing flexibility.
  • The system achieves significant performance improvements in LLM inference tasks.
  • MiniPIC integrates seamlessly with existing KV cache implementations.
Read more
μVLA: On Recurrent Memory for Partially Observable Manipulation in VLA Models
Egor Cherepanov, Nikita Kachaev, Daniil Zelezetsky, Aydar Bulatov, Artem Pshenitsyn, Yuri Kuratov, Alexey Skrynnik, Aleksandr I. Panov, Alexey K. Kovalev
Multimodal Robotics
  • Introduces μVLA, a family of recurrent fine-tunes of OpenVLA-OFT focused on isolating the effects of recurrence.
  • Demonstrates significant performance improvements on MIKASA-Robo tasks, particularly under partial observability.
  • Establishes a controlled study framework to evaluate the impact of recurrence without confounding factors.
  • Identifies performance regimes for minimal recurrence, highlighting when it is sufficient and when additional memory structures are needed.
Read more
Emerging Flexible Designs for Geospatial Multimodal Foundation Models
Philipe Dias, Waqwoya Abebe, Abhishek Potnis, Aristeidis Tsaris, Dan Lu, Xiao Wang, Dalton Lunga
Computer Vision Multimodal
  • Standardized benchmarking of geospatial foundation models using unified pretraining objectives and evaluation protocols.
  • Insights into the impact of tokenization and fusion strategies on model robustness and spectral reasoning.
  • Identification of trade-offs between flexibility and homogeneity in model architectures.
  • Demonstration of Flex's adaptability to missing or heterogeneous bands compared to standard architectures.
Read more
A Stabilized Path-Space Approach to Diffusion-Based Posterior Sampling
Evan Scope Crafts, Umberto Villa, Saviz Mowlavi, Yanting Ma, Hassan Mansour, Wael H. Ali
Generative Models Optimization Theory
  • Introduces a stabilized path-space framework for diffusion-based posterior sampling.
  • Connects posterior sampling to stochastic optimal control, enhancing uncertainty quantification.
  • Eliminates bias from the initial value function through time reparameterization.
  • Demonstrates improved accuracy and robustness in benchmark inverse problems.
Read more
Reliability of Probabilistic Emulation of Physical Systems
Sam F. Greenbury, Radka Jersakova, Paolo Conti, Marjan Famili, Christopher Iliffe Sprague, Edwin Brown, Jason D. McEwen
Generative Models Theory Efficient ML
  • Developed a framework to evaluate the reliability of probabilistic emulation methods.
  • CRPS-trained ensembles generally provide more reliable uncertainties and faster inference than generative models.
  • Generative models trained in latent space can achieve comparable coverage to CRPS ensembles but with higher latency.
  • Introduced AutoCast and AutoSim for modular modeling and flexible dataset generation.
Read more
SupraBench: A Benchmark for Supramolecular Chemistry
Tianyi Ma, Yijun Ma, Zehong Wang, Weixiang Sun, Ziming Li, Connor R. Schmidt, Chuxu Zhang, Matthew J. Webber, Yanfang Ye
Large Language Models NLP
  • Introduction of SUPRABENCH, the first benchmark for supramolecular chemistry tasks.
  • Development of four fundamental tasks and one auxiliary task for evaluating LLMs.
  • Release of SUPRAPMC, a large corpus of supramolecular chemistry articles for research.
  • Benchmarking reveals significant performance gaps in existing LLMs, highlighting areas for improvement.
Read more
Simplex-Constrained Sparse Bagging: Transitioning from Uniform Priors to Sparse Posteriors in Ensemble Learning
Meher Sai Preetam Madiraju, Meher Bhaskar Madiraju
Optimization Efficient ML Theory
  • SCSB transitions from uniform priors to sparse posteriors in bagging ensembles.
  • Introduces a concave quadratic penalty to address the L1-simplex paradox.
  • Achieves up to 96% ensemble compression with linear inference speedups.
  • Improves probability calibration and generalization accuracy.
Read more
CausalMoE: A Billion-Scale Multimodal Foundation Model for Granger Causal Discovery with Pattern-Routed Heterogeneous Experts
Bo Liu, Di Dai, Jingwei Liu, Jiarui Jin, Xiaocheng Fang, Guangkun Nie, Hongyan Li, Shenda Hong
Time Series Multimodal Large Language Models
  • CausalMoE addresses the limitations of existing GCD methods by modeling patch-level temporal heterogeneity.
  • The model utilizes a Pattern-Routed Mixture of Heterogeneous Experts to route time-series data to specialized experts.
  • Integration of LLMs and VLMs allows for the incorporation of multimodal semantic priors in causal discovery.
  • CausalMoE achieves state-of-the-art results on supervised benchmarks and demonstrates effective generalization in few-shot settings.
Read more
DynamicPTQ: Mitigating Activation Quantization Collapse via Residual-Stream Dynamics
Zimo Zhao, Maolin Wang, Bowen Yu, Bowen Liu, Xiao Han, Xiangyu Zhao
Large Language Models Efficient ML NLP
  • DynamicPTQ addresses the issue of quantization collapse in activations during PTQ.
  • The method introduces phase-aware mixed-precision quantization based on residual-stream dynamics.
  • DynamicPTQ improves model performance while maintaining low memory overhead.
  • The approach can be integrated with existing PTQ methods like QuaRot and SpinQuant.
Read more
Representing Time Series as Structured Programs for LLM Reasoning
Jaeho Kim, Changhun Oh, Seokhyun Lee, Irina Rish, Changhee Lee
Large Language Models Time Series
  • Introduction of T2SP, a structured representation for time series that aligns with LLM capabilities.
  • T2SP is deterministic, invertible, and training-free, making it compatible with off-the-shelf LLMs.
  • Demonstrated improvements in reasoning performance and reduced inference time across various tasks.
  • Addresses the representation mismatch that hampers LLMs' performance on time series analysis.
Read more
Different Layers, Different Manifolds: Module-Wise Weight-Space Geometry in Transformer Optimization
Kirato Yoshihara
Optimization Large Language Models NLP
  • Different transformer modules (attention vs. MLP) prefer different weight-space geometries.
  • Stiefel geometry for attention layers and DGram geometry for MLP layers yields optimal performance.
  • Uniform manifold constraints can lead to instability in training dynamics.
  • Singular value growth in DGram-constrained attention weights can cause softmax saturation.
Read more
Loss-Shift Transfer via Bayes Quotients
Vasileios Sevetlidis
Theory
  • Identifies loss shift as a distinct transfer failure mode from distribution shift.
  • Introduces Bayes quotients to compare losses and their refinements.
  • Establishes that a representation optimal for a coarser loss is insufficient for a finer loss.
  • Quantifies the frozen-transfer gap in terms of conditional mutual information.
Read more
Forecasting Is Not Attribution: Localizing Decoder Bypass in Graph-Based Neural Marketing Mix Models
Yunbo Wang, Bolbi Liu
Graph Learning Time Series Theory
  • Identification of 'attribution bypass' in graph-based neural marketing mix models.
  • Introduction of DICE-MMM as a two-stage diagnostic framework for graph learning.
  • Demonstration that low forecasting error can coexist with misaligned attribution graphs.
  • Empirical evidence showing that oracle graphs significantly improve attribution diagnostics.
Read more
A green solvent screening tool for emerging materials via uncertainty aware, transformer enhanced transfer learning
Ioannis Kouroudis, Simon Ternes, Zhaosu Gu, Gohar Ali Siddiqui, Marina Ustinova, Angelo Lembo, Alessio Gagliardi, Aldo Di Carlo
Optimization
  • Development of a machine learning tool for green solvent screening.
  • Integration of uncertainty quantification in predictions.
  • High performance achieved on limited data targets.
  • Augmentation of solubility descriptor data by up to two orders of magnitude.
Read more
Getting Better at Working With You: Compiling User Corrections into Runtime Enforcement for Coding Agents
Yujun Zhou, Kehan Guo, Haomin Zhuang, Xiangqi Wang, Yue Huang, Zhenwen Liang, Pin-Yu Chen, Tian Gao, Nuno Moniz, Nitesh V. Chawla, Xiangliang Zhang
Large Language Models NLP
  • Identifies a gap between user correction access and compliance in coding agents.
  • Introduces Trace, a skill-layer pipeline that converts user corrections into runtime-enforceable rules.
  • Demonstrates that memory alone is insufficient for ensuring compliance with user preferences.
  • Achieves significant reductions in preference violations across multiple coding tasks.
Read more
Out-of-Distribution (OOD) Detectors for Open-Set RF Fingerprinting
Sudeepta Mondal, Ganesh Sundaramoorthi
Theory
  • Introduces a unified mathematical framework for applying OOD detection methods to RF fingerprinting.
  • Demonstrates the feasibility of tuning OOD detectors without access to OOD data.
  • Achieves comparable performance to traditional methods using true OOD data on the POWDER dataset.
  • Establishes a baseline for future research in open-set RF fingerprinting.
Read more
Circuit Synchronization Precedes Generalization: Causal Evidence from Fourier Structure in Grokking Transformers
Achyuthan Sivasankar
Theory Interpretability Large Language Models
  • Introduction of Frequency Synchronization Degree (FSD) as a predictor for grokking in transformers.
  • FSD synchronizes significantly before the grokking event, providing a causal link between circuit formation and generalization.
  • Weight decay influences the timing of generalization, with a derived empirical scaling law relating timing to weight decay.
  • Multi-block transformer architectures show stronger FSD precursors compared to single-layer models.
Read more
Select and Improve: Understanding the Mechanics of Post-Training for Reasoning
Akshay Krishnamurthy, Audrey Huang, Nived Rajaraman
NLP Large Language Models Reinforcement Learning
  • Identifies two core mechanisms of RL post-training: strategy selection and strategy improvement.
  • Demonstrates that diverse reasoning patterns in pre-training data are essential for effective strategy selection.
  • Shows that RL training with more difficult questions enhances strategy improvement and out-of-distribution generalization.
  • Links observed phenomena like strategy amplification and composition to the core mechanisms rather than treating them as separate.
Read more
Fed-FBD: Federated Functional Block Diversification for Isolation, Privacy, and Surgical Unlearning
Weijie Chen, Alan B. McMillan
Federated Learning
  • FED-FBD provides architecturally guaranteed block-level isolation to prevent adversarial contamination.
  • The framework achieves privacy-by-design, with membership inference indistinguishable from chance before additional privacy measures.
  • Surgical unlearning is facilitated through aggregate block replacement, achieving minimal AUC loss without retraining.
  • Experimental results show that FED-FBD maintains accuracy close to FedAvg while providing enhanced security and privacy.
Read more
Uncertainty Estimation for Molecular Diffusion Models
Paul Seij, Christian A. Naesseth, Stephan Mandt, Metod Jazbec
Generative Models
  • Introduces a method for estimating uncertainty in molecular diffusion models.
  • Utilizes a Laplace approximation to measure noise prediction variability.
  • Demonstrates that the uncertainty score correlates negatively with established quality metrics.
  • Shows that filtering based on uncertainty can improve sample quality without retraining.
Read more
Predicting Cognitive Load from Speech and Interaction Dynamics in Dyadic Conversations
Tahiya Chowdhury
Audio & Speech Multimodal
  • Cognitive load can be predicted from speech dynamics in natural dyadic conversations.
  • The study employs a regression approach rather than classification to capture continuous variations in cognitive load.
  • Turn-taking dynamics and speaker participation are critical indicators of cognitive load.
  • The research utilizes a diverse dataset of remote collaborative tasks to enhance ecological validity.
Read more
Detecting Explanatory Insufficiency in Learned Representations: A Framework for Representational Vigilance
Jacques Raynal, Pierre Slangen, Elsa Raynal, Jacques Margerit
Theory Interpretability
  • Introduces the VER framework for monitoring representational adequacy.
  • Distinguishes between representational inadequacy and ordinary prediction errors.
  • Emphasizes the importance of identifying persistent residual structures in learned representations.
  • Aims to complement existing evaluation methods rather than replace them.
Read more
Enhanced Low-Density Region Exploration in Classifier-Guided Diffusion Models Through Modified Reverse Diffusion Sampling
Jagriti Singh, Shekhar Verma, Muneendra Ojha
Generative Models Computer Vision
  • Introduces a density-aware extension to classifier-guided diffusion models.
  • Targets low-density regions during sampling without additional training.
  • Implements dual guidance to enhance sample diversity and fidelity.
  • Demonstrates improved recall of rare samples on ImageNet.
Read more
How Much Memory Do We Need? Adaptive Memory Gate for Neural Operators
Jihyeon Hur, Yongseok Kwon, Min-Gi Jo, Jeongwhan Choi, Noseong Park
Theory Efficient ML Time Series
  • AMGFNO introduces a dynamic memory weight modulation mechanism for neural operators.
  • The optimal memory weight varies with resolution and viscosity, necessitating an adaptive approach.
  • AMGFNO achieves significant performance improvements over fixed-memory approaches.
  • The method is validated on complex PDEs, showcasing its practical applicability.
Read more
Decoding Insect Song: A Multitask Semisupervised Orthoptera Bioacoustic Classifier
Olga Isupova, Danil Kuzin, Ella Browning, Tom Mills, Steven Reece
Audio & Speech
  • PULSE addresses the limitations of existing automated tools for insect bioacoustics by integrating semi-supervised and multi-task learning.
  • The framework outperforms state-of-the-art models in species classification metrics, demonstrating the effectiveness of combining labelled and unlabelled data.
  • Active learning further improves model performance, indicating the potential for continuous learning in ecological monitoring.
  • The embeddings generated by PULSE encode ecologically relevant information, supporting ecological research and conservation efforts.
Read more
Accelerating Speculative Diffusions via Block Verification
Alexander Soen, Hisham Husain, Valentin De Bortoli, Arnaud Doucet
Generative Models Efficient ML Theory
  • Introduces efficient Γ-maximal coupling for diffusion models, simplifying existing techniques.
  • Adapts block verification from LLMs to enhance acceptance rates in diffusion sampling.
  • Presents the Free Drafter, a heuristic that outperforms previous drafting strategies.
  • Demonstrates empirical speedups of up to 6.3% in sampling latency without additional training.
Read more
Exposure Bias as Epistemic Underidentification in Recursive Forecasting
Riku Green, Zahraa S. Abdallah, Telmo M Silva Filho
Theory Time Series
  • Exposure bias in recursive forecasting is linked to epistemic underidentification due to insufficient state representation.
  • The authors introduce a formal framework using induced states and provenance to analyze recursive forecasting errors.
  • Empirical evidence shows that fixed induced states create distinct local corrective tasks and that closed-loop corrections can improve performance.
  • The study highlights the importance of considering provenance information in recursive forecasting to mitigate exposure bias.
Read more
LoRA-Muon: Spectral Steepest Descent on the Low-Rank Manifold
Franz Louis Cesista, Katherine Crowson, Cédric Simal, Stella Biderman
Optimization Efficient ML Large Language Models
  • LoRA-Muon is derived from the spectral steepest descent rule of the Muon optimizer, tailored for low-rank settings.
  • The method ensures optimal learning rates transfer across different model configurations, enhancing tuning efficiency.
  • LoRA-Muon is gauge-invariant and avoids the computational overhead of QR-decomposition, improving memory efficiency.
  • Empirical results indicate that LoRA-Muon can outperform dense training baselines in specific scenarios.
Read more
ReCal: Reward Calibration for RL-based LLM Routing
Qihang Yu, Hanwen Tong, Zhengqi Zhang, Bo Zheng, Feng Wei, Shengyu Zhang, Zemin Liu, Fei Wu
NLP Large Language Models Reinforcement Learning
  • ReCal introduces a hierarchical reward decomposition mechanism to clarify learning signals for RL-based LLM routing.
  • The framework employs variance-aware reweighting and per-dataset normalization to address optimization variability.
  • ReCal improves routing performance and training stability across diverse datasets compared to existing methods.
  • The approach separates objective-level supervision from distribution-level optimization variability, enhancing policy learning.
Read more
A Stationary (and Therefore Compatible) Representation is All You Need
Niccolò Biondi, Federico Pernici, Simone Ricci, Alberto Del Bimbo
Theory Computer Vision Efficient ML
  • Stationary representations learned via d-Simplex fixed classifiers imply compatibility.
  • Combining cross-entropy and contrastive loss captures higher-order dependencies.
  • The proposed method achieves state-of-the-art performance in compatible representation learning.
  • The approach allows for uninterrupted retrieval services during model updates.
Read more
Adaptive Weighted Averaging
Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit
Optimization Theory Efficient ML
  • Introduces the SBern strategy, which is admissible and strictly dominates uniform random selection.
  • Constructs the Speel strategy that dominates any arbitrary fixed deterministic strategy.
  • Demonstrates impossibility results in non-independent observation settings.
  • Establishes new online-to-batch conversion bounds in stochastic optimization.
Read more
Interpretable Factor Decomposition for Decision Intelligence in Large-Scale Financial Markets: Evidence from China's A-Share Market
Xiao Han, Yao Xiao, Zhen Zhang, Moxuan Zheng
Interpretability
  • The XGBoost model with TreeSHAP attribution effectively decomposes equity return predictability into interpretable factors.
  • Behavioral signals (turnover and momentum) are significantly more predictive than traditional valuation ratios in the Chinese A-share market.
  • The model demonstrates strong performance with a mean AUC of 0.547 and an annualized Sharpe Ratio of 2.23.
  • Ablation analysis reveals insights into feature substitutability, enhancing the understanding of predictive factors.
Read more
Positional Encoding in the Context of Memristor-Based Analog Computation for Automatic Speech Recognition
Benedikt Hilmes, Nick Rossenbach, Ralf Schlüter
Audio & Speech Efficient ML NLP
  • Memristors enable efficient analog computation for neural models in NLP.
  • Large output values from positional encodings can degrade performance in memristor-based systems.
  • Adjusting ADC configurations can significantly reduce performance degradation.
  • Relative positional encodings improve model performance in low-precision environments.
Read more