AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

43 Papers today
8h Update frequency
7 Days of history
Revisiting Decentralized Online Convex Optimization with Compressed Communication
Hao Zhou, Xiaoyu Wang, Chang Yao, Mingli Song, Yuanyu Wan
Optimization Theory Efficient ML
  • Introduction of two FTRL-type algorithms for D-OCO with compressed communication.
  • First algorithm matches existing regret bounds in full-information settings.
  • Second algorithm improves regret bounds and communication costs in bandit settings.
  • The dual update mechanism of FTRL facilitates effective communication compression.
Read more
Probabilistic Low-Voltage Peak Load Forecasting with Time Series Foundation Models Evaluated on Application-Oriented Metrics
Benedikt Kaas, Manuel Treutlein, Hannes Benedikt Gerber, Oliver Neumann, Cheewan Phatthanakhuha, Oliver Resch, Ralf Mikut, Veit Hagenmeyer
Time Series
  • Extensive evaluation of time series foundation models for low-voltage load forecasting.
  • Chronos-2 demonstrated superior performance in peak load prediction.
  • Ablation study indicates TSFMs can handle uncertainty without weather covariates.
  • Introduction of a novel application-oriented metric for evaluating forecasting performance.
Read more
WARP: Weight-Space Analysis for Recovering Training Data Portfolios
Tzu-Heng Huang, Aditya Goyal, John Cooper, Frederic Sala
NLP Large Language Models Interpretability
  • WARP recovers domain mixtures from fine-tuned model weights, addressing the access asymmetry in AI research.
  • The framework uses model merging to simulate training trajectories, allowing for the extraction of geometric features related to training data.
  • WARP demonstrates superior performance compared to traditional membership inference methods.
  • The method is robust across different training recipes, including overtraining scenarios.
Read more
Conditional Inference Trees and Forests for Feature Selection
Robert Milletich, Justin Downes, Steve Goley, Newel Hirst
Theory Efficient ML
  • CIT and CIF effectively reduce split-selection bias in feature selection.
  • CIF ranks 4th among 17 classification methods and 3rd among 18 regression methods in benchmark tests.
  • Adaptive stopping and threshold search parameters significantly influence runtime efficiency.
  • High-dimensional simulations reveal potential shortcomings in feature sampling strategies.
Read more
Gaming Consensus: Coordinated Manipulation in Crowdsourced Fact-Checking
Nikil Roashan Selvam, Jay Baxter, Sophie Hilgard, Brad Miller, Keith Coleman, Ellen Vitercik, Sanmi Koyejo
Theory
  • Demonstrates a coordinated manipulation attack on crowdsourced fact-checking systems.
  • Empirical findings show that a small number of coordinated ratings can significantly alter note quality scores.
  • Reveals a counterintuitive property of the rating system where 'Not Helpful' ratings can enhance a note's perceived helpfulness.
  • Develops a cost model for manipulation efforts, informing mitigation strategies.
Read more
On the Utility and Factual Reliability of Pruned Mixture-of-Experts Models in the Biomedical Domain
Atsuki Yamaguchi, Szymon Palucha, LΓ©o Bijar, Aline Villavicencio, Nikolaos Aletras
NLP Large Language Models Efficient ML
  • First systematic study on the impact of expert pruning on factual reliability in MoE models within the biomedical domain.
  • Moderate pruning preserves utility while extreme pruning increases hallucination risks.
  • Utility and reliability degrade significantly when shifting from in-domain to general-domain tasks.
  • Reliability assessments are essential for high-stakes deployments, beyond mere utility evaluations.
Read more
Multi-Head Recurrent Memory Agents
Jiatong Li, Samuel Yeh, Sharon Li
Large Language Models NLP Optimization
  • Identifies memory retention failure as the key issue in recurrent memory agents for long contexts.
  • Proposes Multi-Head Recurrent Memory (MHM) to structurally prevent overwriting of retained information.
  • Introduces MHM-LRU, a lightweight implementation that guarantees uniform memory head utilization.
  • Demonstrates substantial improvements in memory retention and accuracy across various benchmarks.
Read more
Program-as-Weights: A Programming Paradigm for Fuzzy Functions
Wentao Zhang, Liliana Hotsko, Woojeong Kim, Pengyu Nie, Stuart Shieber, Yuntian Deng
NLP Large Language Models Efficient ML
  • Introduces the Program-as-Weights (PAW) paradigm for fuzzy function programming.
  • Utilizes a two-stage compilation process to convert natural language specifications into neural binaries.
  • Demonstrates significant efficiency gains with a smaller interpreter outperforming larger models.
  • Releases FuzzyBench, a dataset with 10 million examples for fuzzy tasks.
Read more
A More Accurate Algorithm Comparison through A/B Testing using Offline Evaluation Methods
Koki Konishi, Masataka Ushiku, Yuta Saito
Theory Optimization
  • A/B testing can lead to higher algorithm selection error rates than offline evaluation methods.
  • The proposed estimator introduces a middle algorithm to induce positive correlation, improving selection accuracy.
  • The new method achieves the same selection error rate as traditional methods with only half the data.
  • The findings challenge the established view of A/B testing as the superior method for algorithm selection.
Read more
SABER: A Semantic-Aligned Brain Network Analysis Framework via Multi-scale Hypergraphs
Yidan Xu, Xiangmin Han, Rundong Xue, Huihui Ye
Graph Learning Large Language Models Interpretability
  • SABER integrates LLM-derived semantics directly into the brain network classification process.
  • The framework employs multi-scale hypergraphs to effectively model complex interactions among brain regions.
  • A decision-level semantic alignment mechanism allows for direct influence of semantic information on predictions.
  • Extensive evaluations show SABER outperforms existing methods on datasets like ABIDE and ADHD-200.
Read more
Beyond Adam: SOAP and Muon for Faster, Label-Efficient Training of Machine Learning Interatomic Potentials
Gil Harari, Yoel Zimmermann, Ola Tangen Kulseng, Laura Zichi, Chuin Wei Tan, Marc L. Descoteaux, Boris Kozinsky
Optimization Efficient ML
  • SOAP and SOAP-Muon optimizers consistently improve energy and force accuracy compared to AdamW.
  • These optimizers demonstrate robust performance even with reduced force supervision, suggesting a pathway to lower force-label requirements.
  • The resulting MLIPs maintain physical fidelity, accurately reproducing ab initio calculations and experimental data.
  • Muon shows limited benefits over AdamW, indicating that not all optimizers provide substantial improvements.
Read more
Predictive Conformal Slip Monitoring: An Empirical Evaluation of Rolling Split Conformal Prediction for Pre-Incident Traction Loss Detection
Varshith Roy Kotla
Theory Time Series
  • The study evaluates Rolling Split Conformal Prediction for detecting pre-incident traction loss in motorsport telemetry.
  • A significant methodological correction was made by including vehicle speed as an explicit feature in the model.
  • The results showed a mean precision and recall of 0.0, indicating the method's ineffectiveness in real-world applications.
  • High false-alarm rates were attributed to violations of the exchangeability assumption in the conformal prediction framework.
Read more
SA-HGNN: Sample-Adaptive Hyperbolic Graph Neural Network for EEG-Based Depression Recognition
Yang Li, Pan Hu, Yan Zhang, Wenfan Yang, Tao Wu, Lianbo Guo
Graph Learning
  • Introduction of SA-HGNN, a model specifically designed for EEG-based depression recognition.
  • Dynamic construction of personalized brain network topologies to better capture complex spatial relationships.
  • Utilization of hyperbolic geometry to address the limitations of Euclidean space in modeling hierarchical structures.
  • Incorporation of an Attention Pooling module to mitigate noise interference in EEG signals.
Read more
Finite-Lag Operator Geometry of Recurrent Representations
Kanishka Reddy
Theory Time Series Optimization
  • Introduces finite-lag operator geometry for recurrent representations, focusing on dynamics rather than static snapshots.
  • Defines key constructs such as the conditional transport law Qβˆ† and the source-centered transport tensor Gβˆ†.
  • Proves structural results including affine covariance and stability of the Gaussian estimator on bounded trajectory clouds.
  • Demonstrates the ability to detect deterministic recurrent motion not captured by traditional methods.
Read more
Population-Based Multi-Objective Training of Discriminators for Semi-Supervised GANs
Francisco SedeΓ±o, Francisco Chicano, Jamal Toutouh
Generative Models Optimization Theory
  • Introduces a population-based training framework for SSL-GANs that separates supervised and unsupervised losses.
  • Utilizes Pareto-based selection to maintain diverse discriminator populations, improving training stability.
  • Demonstrates improved classification accuracy and robustness over existing SSL-GAN methods.
  • Explores various evolutionary strategies, including elitist replacement and mono-objective ablation.
Read more
Online Resource Allocation with Continuous Random Consumption: Regret under Degeneracy
Jiawei Zhang
Theory Optimization
  • Introduces a model for online resource allocation with continuous random consumption.
  • Defines an active weighted-mass exponent to analyze additive regret.
  • Demonstrates that continuous random consumption can lead to polynomial regret in certain cases.
  • Shows that a sample-path marginal policy can achieve logarithmic regret under specific conditions.
Read more
Hybrid quantum-classical neural network for sentiment analysis
Giacomo Cappiello, Filippo Caruso, Xing Liang, Dimitrios Makris
NLP
  • Hybrid quantum-classical neural networks can effectively perform sentiment analysis.
  • The study utilizes a dataset of COVID-19-related tweets for sentiment classification.
  • Hybrid models show comparable accuracy to classical models but with enhanced learning dynamics.
  • Transfer learning experiments reveal significant improvements in performance for spam classification tasks.
Read more
Black-Box Inference of LLM Architectural Properties with Restrictive API Access
Christopher Ellis, Shreyas Chaudhari, Mei-Yu Wang, Leighton Barnes, Giulia Fanti, JosΓ© M. F. Moura
Large Language Models NLP Theory
  • Introduces NightVision, an attack for inferring LLM architectural properties under restrictive API access.
  • Utilizes a novel common-set prompting technique to recover hidden dimensions without logit bias or top-k access.
  • Employs timing measurements to estimate depth and parameter count based on model characteristics.
  • Achieves significant accuracy in recovering architectural parameters across various open-source LLMs.
Read more
Dynamic Neural Graph Encoding of Inference Processes in Deep Weight Space
Di Wu, Huan Liu, Zhixiang Chi, Yuanhao Yu, Konstantinos N. Plataniotis, Yang Wang
Graph Learning Optimization Theory
  • Introduction of dynamic neural graphs for modeling neural network parameters.
  • Development of the Dynamic Neural Graph Encoder (DNG-Encoder) to process dynamic graphs.
  • Creation of INR2JLS for mapping INR weights into a joint latent space.
  • Demonstration of improved classification accuracy on CIFAR-10 and CIFAR-100 datasets.
Read more
The Rollout Infrastructure Tax in Coding-Agent Reinforcement Learning
Daniel Thi Graviet, Lovre Pesut, Ivan Dagelic, Vedran Jukic, Ivan Burazin
Reinforcement Learning Efficient ML
  • Introduction of the 'rollout infrastructure tax' concept, highlighting the impact of execution substrate on RL efficiency.
  • Significant variations in performance metrics (cold-start latency and worker-hours) across different execution substrates.
  • Proposal of design requirements for optimizing rollout-native substrates to enhance coding-agent RL performance.
  • Emphasis on the need to treat execution infrastructure as a core concern in the development of RL systems.
Read more
Rank-Then-Act: Reward-Free Control from Frame-Order Progress
Yuriy Maksyuta, George Bredis, Ruslan Rakhimov, Daniil Gavrilov
Reinforcement Learning Computer Vision Multimodal
  • RTA provides a framework for learning control policies without environment rewards.
  • The method utilizes a Vision–Language Model trained on shuffled video segments to derive ordinal progress rankings.
  • A correlation-based reward signal is introduced, computed as Spearman correlation, allowing for stable learning across tasks.
  • RTA outperforms existing video-based reward learning methods and demonstrates strong generality across tasks.
Read more
Role-Aware Neural Convex Divergence Heads for Asymmetric Representation Learning
He Huang, Lu Shen, Yunfeng Huang, Li Qi
NLP Graph Learning Theory
  • Introduction of a role-aware neural convex divergence head for asymmetric representation learning.
  • Theoretical characterization of the proposed method, retaining classical Bregman properties.
  • Empirical validation shows improved directional accuracy on multiple benchmarks.
  • The method serves as a plug-in distance module for various encoder architectures.
Read more
Generalization in offline RL: The structure is more important than the amount of pessimism
Max Weltevrede, Matthijs T.J. Spaan, Wendelin BΓΆhmer
Reinforcement Learning Theory Robotics
  • The structure of pessimism is more critical for generalization than the amount of pessimism in offline RL.
  • A symmetric value function can lead to better generalization compared to a mildly pessimistic, non-symmetric one.
  • Data augmentation should prioritize symmetry preservation during policy extraction to improve generalization.
  • Empirical results validate the effectiveness of the proposed methods in a rotationally symmetric environment.
Read more
Wind-Aware Reinforcement Learning Control of a Small Quadrotor Using Learned Onboard Wind Estimation in Simulated Atmospheric Turbulence
Abdullah Al Tasim, Wei Sun
Reinforcement Learning Robotics
  • Introduces a two-stage learning pipeline for wind estimation and control in small quadrotors.
  • Achieves high accuracy in wind estimation using an attention-augmented GRU network.
  • Demonstrates a significant reduction in trajectory tracking error with a wind-aware RL controller.
  • Highlights the regime-dependent value of wind perception in improving control performance.
Read more
Ask the Right Comparison: Bias-Aware Bayesian Active Top-$k$ Ranking with LLM Judges
Jian Xu, Delu Zeng, John Paisley, Qibin Zhao
NLP Large Language Models Optimization
  • Introduces a bias-aware Bayesian model for ranking with LLM judges, addressing systematic biases in their evaluations.
  • Develops a top-k-aware active acquisition strategy that focuses on identifying the top-k items efficiently under a fixed comparison budget.
  • Demonstrates significant improvements in identifying top-k items compared to naive aggregation methods, especially with biased judges.
  • Finds that verbosity bias is prevalent among cheaper judges, while frontier judges exhibit minimal bias.
Read more
X-LogSMask: Expand Transformer for Graph-Structured Data
Leyan Li, Rennong Yang, Zhenxing Zhang, Liping Hu
Graph Learning
  • X-LogSMask introduces a logarithmic structural mask for graph data, enhancing interpretability and efficiency.
  • The method allows multi-hop information propagation within a single Transformer layer by assigning different powers of the adjacency matrix to attention heads.
  • X-LogSMask achieves state-of-the-art performance on 13 out of 20 benchmark datasets.
  • The approach maintains the core Transformer architecture while improving its applicability to graph-structured data.
Read more
kNNGuard: Turning LLM Hidden Activations into a Training-Free Configurable Guardrail
Mahmoud Abdelfattah, Hamid Nasiri, Peter Garraghan
NLP Large Language Models
  • kNNGuard is a training-free guardrail framework that utilizes LLM hidden activations for prompt classification.
  • It achieves competitive or superior F1 scores compared to fine-tuned guardrails while being significantly faster.
  • The framework allows for rapid domain adaptation by updating a small reference bank of labeled examples.
  • kNNGuard employs a multi-layer kNN approach that fuses activation-space and embedding-space scores.
Read more
Set Diffusion: Interpolating Token Orderings Between Autoregression and Diffusion for Fast and Flexible Decoding
Marianne Arriola, Volodymyr Kuleshov
NLP Large Language Models Generative Models
  • Set diffusion allows for flexible-length and flexible-position token generation, improving decoding flexibility.
  • The set-causal diffusion architecture supports KV cache updates after every inference step, enhancing efficiency.
  • Set diffusion achieves better speed-quality tradeoffs than prior diffusion models across various tasks.
  • The method outperforms block diffusion in infilling performance.
Read more
Predicting Closed-Loop Performance of Latent World Models: Offline Checkpoint Selection for MPC and Model-Based RL Under Non-Markovian Rewards in LunarLander
Nikolai Smolyanskiy
Reinforcement Learning Robotics Optimization
  • Introduces a suite of 40 structural validation-time metrics for evaluating world models.
  • Presents the Composite Reward Observability Fraction (CROF) for offline checkpoint selection.
  • Demonstrates that CROF correlates with better closed-loop performance compared to traditional metrics.
  • Achieves a significant improvement in return using the CROF-selected model with fewer real-environment interactions.
Read more
EHHN: An Event-driven Heterogeneous Hypergraph Network for Object-Centric Next Activity Prediction
Jiaxing Wang, Kaitao Chen, Zhubin Han, Chenyu Hou, Bin Cao, Jing Fan, Ji Zhang
Graph Learning Time Series Theory
  • EHHN introduces a heterogeneous hypergraph representation for object-centric next activity prediction, preserving multi-object participation.
  • The dual-stream architecture effectively models both local event-driven state changes and global execution patterns.
  • EHHN achieves superior performance compared to existing OCEL-based predictors, with notable improvements in accuracy and efficiency.
Read more
Multi-modal Rail Crossing Safety Analysis
Paimon Goulart, Chansong Lim, NΓ­colas Roque dos Santos, Yue Dong, Sheldon Peterson, Jia Chen, Evangelos E. Papalexakis
Multimodal
  • Integration of visual and structured data improves safety assessments at railway crossings.
  • Vision-Language Models (VLMs) effectively analyze complex visual scenes for risk evaluation.
  • The proposed system achieves a macro F1 score of 0.757 in classifying crossing risks.
  • The methodology addresses critical challenges in data preparation and learning paradigms.
Read more
Frequency Shift Physics-Informed Extreme Learning Machine for Solving High-Frequency Partial Differential Equations
Xiong Xiong, Ruonan Zhai, Zheng Zeng, Sheng Zhou, Rongchun Hu, Zichen Deng
Theory Efficient ML
  • Introduces FS-PIELM to mitigate spectral bias in high-frequency PDE solutions.
  • Utilizes a novel weight initialization mechanism that shifts the mean of Gaussian weights.
  • Demonstrates improved accuracy over existing methods in multiple benchmark problems.
  • Maintains computational efficiency with only a single linear solve required.
Read more
Regularized Variational and Spectral Log-Density-Ratio Estimation in the Gaussian Location Model
Francis Bach
Theory
  • Introduces ridge-regularized log-density-ratio estimation in a Gaussian location model.
  • Compares variational and spectral estimators, highlighting their performance under different observation conditions.
  • Derives high-dimensional asymptotic equivalents to analyze estimator behavior.
  • Demonstrates that variational estimators outperform spectral estimators with many observations.
Read more
Don't Let Gains FADE: Breaking Down Policy Gradient Weights in RL
Juliette Decugis, Sean O'Brien, Francis Bach, Gabriel Synnaeve, Taco Cohen
Reinforcement Learning Large Language Models
  • Introduces a framework for analyzing policy gradient weights in RL, clarifying the effects of different advantage functions.
  • Proposes FADE, a self-adapting advantage function that optimizes gradient weights based on training dynamics.
  • Demonstrates that balancing positive and negative gradient masses is crucial for stable and efficient RL training.
  • FADE achieves faster learning and improved accuracy-diversity trade-offs compared to static methods.
Read more
Neuron-Aware Active Few-Shot Learning for LLMs
Zhuowei Chen, Liwei Chen, Christian Schunn, Raquel Coelho, Xiang Lorraine Li
NLP Large Language Models
  • NEUFS shifts the selection paradigm from output-level signals to internal model dynamics.
  • The framework utilizes neuron activation patterns for sample representation and selection.
  • A dual-criteria selection strategy ensures diversity and targets informative samples prone to hallucinations.
  • Extensive experiments show NEUFS outperforms existing AFSL methods in reasoning and text classification tasks.
Read more
Evolutionary Feature Engineering for Structured Data
Ege Onur Taga, Yilin Zhuang, M. Emrullah Ildiz, Petros Mol, Abhimanyu Das, Karthik Duraisamy, Samet Oymak
Time Series Optimization Large Language Models
  • Introduces EFE, a framework for evolving preprocessing transformations using LLMs.
  • EFE-Time improves time-series forecasting by discovering dataset-specific normalization programs.
  • EFE-Tab evolves compact feature programs that enhance interpretability and performance in tabular data.
  • Demonstrates significant performance improvements across various datasets and models.
Read more
NeuroBridge: Bridging Multi-Task MRI Knowledge for Neurodegenerative Disease Diagnosis
Mengyu Li, Guoyao Shen, Chad W. Farris, Xin Zhang
Multimodal
  • NeuroBridge integrates multi-task learning with self-supervised MRI pretraining for neurodegenerative disease diagnosis.
  • Achieved high classification accuracy, particularly in distinguishing AD and MCI from cognitively normal controls.
  • Demonstrated strong cross-cohort generalization and effective probability-based analysis for opportunistic screening.
  • Utilizes a gated fusion mechanism to combine multiple MRI-derived findings into a unified diagnostic representation.
Read more
Denser $ eq$ Better: Limits of On-Policy Self-Distillation for Continual Post-Training
Meng Wang, Haohan Zhao, Wenzhuo Liu, Lu Yang, Geng Liu, Haiyang Guo, Guo-Sen Xie, Gaofeng Meng, Hongbin Liu, Fei Zhu
NLP Large Language Models Reinforcement Learning
  • On-policy self-distillation can accelerate specialization but is fragile in continual learning contexts.
  • SDPO shows stronger forgetting and potential collapse compared to GRPO.
  • Denser self-distillation can amplify noise and artifacts, complicating the learning process.
  • The paper distinguishes between on-policy data and the training objective, clarifying their roles in continual learning.
Read more
How Should Transformers Encode Numeric Values in Electronic Health Records?
Maria Elkjær Montgomery, Christian Igel, Mikkel Odgaard, Martin Sillesen, Mads Nielsen
NLP Optimization Time Series
  • Introduces a unified evaluation framework for numeric reasoning in EHR transformers.
  • Systematically compares discrete, continuous, and hybrid numeric value encodings.
  • Demonstrates a precision-stability trade-off in numeric reasoning approaches.
  • Finds that transformers can perform approximate numeric computations reliably.
Read more
Message Passing Based Two-Timescale Bayesian Learning for Joint Channel and Memory Hardware Impairments Tracking
Wei Xu, An Liu
Theory Optimization Time Series
  • Introduces a two-timescale Bayesian learning framework for joint channel and hardware impairment tracking.
  • Utilizes a residual recurrent gated unit (RGRU) to effectively model intra-slot memory of hardware impairments.
  • Implements a message-passing algorithm that allows for efficient channel estimation and impairment calibration.
  • Demonstrates superior performance in channel estimation error reduction compared to traditional compensators.
Read more
DeadPool: Resilient LLM Training with Hot-Swapping via Zero-Overhead Checkpoint
Haotian Xie, Junlin Chen, Mingkai Zheng, Lishan Yang, Zhao Zhang
Large Language Models Efficient ML Optimization
  • DEADPOOL enables hot-swapping of failed nodes in LLM training without job termination.
  • The system employs an asynchronous in-memory checkpointing mechanism to achieve zero overhead during error-free execution.
  • Recovery from node failures is completed in under 40 seconds, significantly reducing downtime.
  • The methodology is evaluated on up to 512 GPUs and models with up to 65 billion parameters.
Read more
Efficient Temporal Point Processes via Monotone Alternating Splines
Cheng Wan, Quyu Kong, Feng Zhou
Time Series Efficient ML Theory
  • Identifies fundamental limitations of Monotone Neural Networks in CCIF modeling.
  • Introduces Monotone Alternating Splines (MAS) to enhance flexibility and computational efficiency.
  • Establishes a theoretical foundation for MAS, including generalization error analysis.
  • Demonstrates superior performance of MAS on synthetic and real-world datasets.
Read more
Decomposer: Learning to Decompile Symbolic Music to Programs
Yewon Kim, Apurva Gandhi, David Chung, Graham Neubig, Chris Donahue
Audio & Speech Reinforcement Learning Generative Models
  • DECOMPOSER effectively converts MIDI to Strudel, addressing the inverse problem of musical instruction recovery.
  • The framework utilizes a two-stage training process: supervised fine-tuning followed by reinforcement learning.
  • A synthetic dataset, STRUDEL-SYNTH, is created to facilitate the supervised learning phase.
  • The model achieves superior performance in both MIDI reconstruction fidelity and code readability compared to existing methods.
Read more