AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

59 Papers today
8h Update frequency
7 Days of history
Rapid FinFET Modelling Using an Autoencoder
Amit Sarkar, Suman Sau, Swagata Mandal
Efficient ML
  • Utilizes an autoencoder for efficient FinFET modeling.
  • Incorporates drain to source voltage (VDS) as an input feature.
  • Achieves high accuracy with minimal training data.
  • Successfully reconstructs full I-V curves and extracts key device metrics.
Read more
CKM-Driven Communication-Aware UAV Intelligent Trajectory Optimization for Urban Inspection
Yang Xiaomeng, Jia Ziye, Zhu Qiuming, Wu Qihui
Optimization Reinforcement Learning Robotics
  • Introduction of a channel knowledge map (CKM) for UAV trajectory planning.
  • Utilization of a diffusion model for constructing a time-accumulated CKM.
  • Development of a graph attention network soft actor-critic (GATSAC) algorithm for optimizing UAV paths.
  • Demonstrated improvement in communication reliability and trajectory efficiency.
Read more
The Degeneracy Distillery
T. Lucas Makinen, Deaglan J. Bartlett, Niall Jeffrey, Benjamin D. Wandelt
Theory Efficient ML Interpretability
  • Introduction of a three-stage pipeline for detecting and resolving parameter degeneracies.
  • Method leverages Fisher information geometry to identify independent parameter combinations.
  • Demonstrated significant reductions in simulation budget for posterior estimation.
  • Validated on synthetic problems and applied to real-world scientific challenges.
Read more
Learning Subset-Shared Invariances for Domain Generalization with Mixture-of-Experts
Tien-Hung Nguyen, Tien-Dat Tran, M.-Duong Nguyen, Kok-Seng Wong
Theory
  • Identifies limitations of global invariance in domain generalization, which can reduce predictive information.
  • Introduces subset-shared invariance, reframing DG as learning structured, subset-dependent invariances.
  • Proposes a routing-based Mixture-of-Experts framework to selectively enforce invariance across domain subsets.
  • Demonstrates improved out-of-domain generalization and robustness under heterogeneous domain shifts.
Read more
Exploring Dualistic Meta-Learning to Enhance Domain Generalization in Open Set Scenarios
Xiran Wang, Jian Zhang, Lei Qi, Yang Gao, Yinghuan Shi
Computer Vision Theory Optimization
  • Introduces MEDIC, a dualistic meta-learning strategy for open set domain generalization.
  • Addresses the issue of biased decision boundaries caused by imbalanced sample distributions.
  • Implements simultaneous gradient matching for inter-domain and inter-class tasks.
  • Provides a theoretical analysis of gradient matching that improves upon previous methods.
Read more
Natural Identifiers for Privacy and Data Audits in Large Language Models
Lorenzo Rossi, Bartłomiej Marek, Franziska Boenisch, Adam Dziedzic
Large Language Models NLP Theory
  • Introduction of natural identifiers (NIDs) as a solution for post-hoc privacy audits in LLMs.
  • NIDs allow for the generation of additional random strings from the same distribution, facilitating auditing without retraining.
  • The method adapts existing differential privacy auditing frameworks to leverage NIDs effectively.
  • Empirical validation shows accurate inference of training membership without false positives.
Read more
Memory-Efficient Policy Libraries with Low-Rank Adaptation in Reinforcement Learning
Samuel Valland Lyngset, Tor Viljen Raanaas, Gard Sveipe, Eirik Møller Nilsen, Jim Torresen, Kai Olav Ellefsen, Tobias Lømo
Reinforcement Learning Robotics Efficient ML
  • LoRA can reduce memory usage by 20-160 times compared to full fine-tuning.
  • 90-95% storage savings enable the deployment of multiple specialized policies in memory-constrained environments.
  • No significant performance loss when using LoRA for fine-tuning compared to traditional methods.
  • The approach addresses catastrophic forgetting by maintaining a library of specialist policies.
Read more
Communicability-Inspired Positional Encoding (CIPE)
Yipeng Zhang, Zhongtian Sun, Pietro Liò, Kelin Xia
Graph Learning
  • CIPE leverages communicability to create a positional encoding that reflects meaningful graph structure.
  • The method introduces an Attention-Compatible Geometry that enhances self-attention mechanisms in Transformers.
  • Dimensionality alignment is employed to map CIPE representations to a shared embedding space.
  • Empirical results show a 35.5% performance improvement over existing positional encodings across multiple benchmarks.
Read more
A Survey on Federated Causal Discovery and Inference
Xianjie Guo, Yuwei Wang, Guodu Xiang, Xiaoli Tang, Kui Yu, Han Yu, Qiang Yang
Federated Learning Graph Learning Theory
  • The paper provides a systematic review of Federated Causal Discovery and Inference, bridging a gap in existing literature.
  • FCD and FCI are formalized as complementary stages in a unified federated causal reasoning pipeline.
  • The authors categorize methodologies based on design decisions, federation topology, and structural scope.
  • Key practical challenges include data heterogeneity, privacy, and the need for theoretical guarantees.
Read more
Grad Detect: Gradient-Based Hallucination Detection in LLMs
Anand Kamat, Daniel Blake, Brent M. Werness
NLP Large Language Models Interpretability
  • Grad Detect is the first framework for hallucination detection based on layer-wise gradient analysis.
  • It outperforms confidence-based methods by 3-8 percentage points in hallucination detection.
  • The final five transformer layers concentrate over 97% of the discriminative gradient signal.
  • The method simultaneously predicts response correctness and model abstention.
Read more
GRACE: Gated Refinement for Accurate Causal Edge Discovery in High-Dimensional Time Series
Mohammad Fesanghary, Abhinav Havaldar
Time Series Graph Learning Theory
  • GRACE improves causal edge discovery by refining constraint-based methods with a gated neural model.
  • The use of Hard Concrete gates with L0 regularization allows for robust binary decisions in edge selection.
  • Empirical results show GRACE outperforms traditional methods in both synthetic and real-world datasets.
  • The framework is computationally efficient, achieving results 75 times faster than nonlinear CI tests.
Read more
Machine Learning Modeling for Real-Time Melt Pool Monitoring in Laser Powder Bed Fusion Additive Manufacturing: A Hybrid Approach
Inioluwa Emmanuel, Zhuo Yang, Ho Yeung, Xinyao Zhang
Computer Vision Efficient ML
  • Developed a hybrid machine learning framework for real-time melt pool monitoring in LPBF.
  • Achieved high classification performance with a balanced dataset of 1,200 images.
  • Hybrid model outperformed purely deep learning models in terms of accuracy and inference time.
  • Demonstrated the potential of combining pretrained CNN features with classical ML methods.
Read more
RAVEN: A Regime-Aware Variable-context Expert Network for Financial Time Series Forecasting
Cheng He, Zhenyu Guan, Xijie Liang, Defu Lian, Jiajia Li, Enhong Chen, Patrick P. C. Lee, Geng Hu, Zehao Chen
Time Series
  • RAVEN addresses the limitations of fixed context windows in financial time series forecasting.
  • The model dynamically determines the temporal context for each input sample using a hierarchical windowing approach.
  • RAVEN achieves state-of-the-art performance in cumulative log-return prediction and fund sales forecasting.
  • The introduction of Correlation-Aware Weighting (CAW) enhances the aggregation of expert outputs.
Read more
KLip-PPO: A per-sample KL perspective on PPO-Clip
Riccardo Colletti, Robin Holzinger
Reinforcement Learning Optimization Theory
  • Establishes a per-sample equivalence between PPO-Clip and PPO-KL surrogates.
  • Demonstrates that both surrogates produce indistinguishable training outcomes on benchmark tasks.
  • Clarifies the implicit structure of the PPO-Clip algorithm, highlighting its step function behavior.
  • Suggests potential extensions and generalizations of the algorithm for broader applications.
Read more
MGI: Member vs Generated Inference
Bihe Zhao, Michel Meintz, Juangui Xu, Franziska Boenisch, Adam Dziedzic
Generative Models Computer Vision
  • Introduction of the Member vs Generated Inference (MGI) task.
  • Existing methods inadequately distinguish between training members and generated samples.
  • Proposed Data Circuit Breaker (DCB) method effectively addresses MGI.
  • DCB combines signals from an autoencoder and latent generator for improved accuracy.
Read more
Managing Task Execution for Unknown Workloads in Batteryless IoT: A Hardware-Agnostic Evaluation
Samer Nasser, Henrique Duarte Moura, Ritesh Kumar Singh, Maarten Weyn, Jeroen Famaey
Reinforcement Learning Optimization Efficient ML
  • Introduction of hardware-agnostic dynamic scheduling strategies for batteryless IoT systems.
  • Comparison of model-free Reinforcement Learning and Approximated Prediction methods against traditional approaches.
  • Evaluation of methods using real-world solar data and dynamic transmission profiles.
  • Identification of operational trade-offs among different scheduling strategies.
Read more
FlowPipe: LLM-Enhanced Conditional Generative Flow Networks for Data Preparation Pipeline Construction
Kunyu Ni, Lei Cao, Jie He, Xiaotong Zhang, Jianfeng Jin, Junyu Dong, Yanwei Yu
Reinforcement Learning Generative Models Large Language Models
  • FlowPipe automates data preparation pipeline construction, addressing the combinatorial complexity and high evaluation costs.
  • It utilizes Conditional Generative Flow Networks for effective credit assignment and decision-making.
  • Deep Semantic Modulation enhances the context-awareness of the pipeline construction process.
  • The framework significantly outperforms existing methods, improving accuracy and training speed.
Read more
An LLM-based Two-Stage Transformer Framework for Cross-Domain Bearing Fault Diagnosis with Limited Data
Jinghan Wang, Feng Cheng, Wentao Wu, Hang Li, Gaoliang Peng, Tianchen Liu
Time Series
  • Proposes a two-stage transfer learning framework for bearing fault diagnosis under limited data.
  • Introduces explicit knowledge transfer mechanisms for improved performance in dual-shift scenarios.
  • Develops a dynamic classification head for seamless adaptation across heterogeneous fault taxonomies.
  • Achieves significant accuracy improvements over existing methods with minimal labeled data.
Read more
The Geometry of Sequential Learning: Lie-Bracket Prediction of Transfer Order
John Sweeney
Theory Optimization Large Language Models
  • Introduces a commutator theory of transfer order that connects order-dependent target loss to directional bracket scores.
  • Presents a drift-matched Trotter estimator for efficient pairwise planning using gradients and Hessian-vector products.
  • Develops Lie-Bracket Tournaments for scalable scheduling of multiple domains, avoiding exhaustive permutation evaluations.
  • Validates the approach across various post-training and domain adaptation scenarios, demonstrating robust performance.
Read more
Exact Schur-Sylvester Dimensionality Reductions for Non-Smooth Stochastic Complexity and Manifold Sampling
Trenton Lau, Gary P. T. Choi
Theory Efficient ML Optimization
  • Introduces a new formulation for computing NML code-length that bypasses traditional computational bottlenecks.
  • Reduces the complexity of projection and volume factor calculations from O(N^3) to O(k^3 + N^2k).
  • Generalizes the method to various non-smooth estimators including Sparse SVMs and Elastic Net.
  • Demonstrates significant speedup in sampling efficiency on high-dimensional datasets.
Read more
Fast and Slow Variational Continual Learning
Subarnaduti Paul, Yohan Jung, Mohammad Emtiyaz Khan, Siddharth Swaroop, Thomas Möllenhoff, Martin Mundt
Optimization Large Language Models Theory
  • Introduces Continual IVON (CoVON) optimizer for continual learning.
  • Incorporates fast and slow adaptation mechanisms into the VCL framework.
  • Merges past posteriors to create a slow-moving prior for knowledge retention.
  • Demonstrates superior performance over existing VCL optimizers and weight-regularization strategies.
Read more
Offline Reinforcement Learning for Warehouse SLAM Throughput Control
Tina Dongxu Li, Mouhacine Benosman, Rajat Kumar, Kevin Tan, Ken Meszaros, Trevor Dardik
Reinforcement Learning Optimization
  • Introduces an offline RL framework for optimizing SLAM throughput in warehouses.
  • Employs a history-informed state representation and a compact action space to enhance learning.
  • Utilizes a balanced reward function to address upstream and downstream operational metrics.
  • Demonstrates superior performance of the CQL policy in improving system health and reducing throttling.
Read more
QC-SMOTE: Quality-Controlled SMOTE for Imbalanced Classification
Parth Upman, Shreyank N Gowda
Theory Efficient ML Optimization
  • QC-SMOTE improves the generation of synthetic samples by assessing their reliability and quality.
  • The method adapts its sampling strategy based on the local geometry of the data.
  • Experiments show significant performance improvements in AUC-ROC and Macro F1 scores compared to existing methods.
  • QC-SMOTE provides a graceful degradation mechanism by reverting to duplication in noisy regions.
Read more
UC-Search: Risk-Aware Test-Time Search for Delayed Constrained Time-Series Control
Xibai Wang
Time Series Reinforcement Learning Optimization
  • UC-Search is a novel framework for risk-aware decision-making in time-series control under uncertainty.
  • The methodology combines a feasibility automaton with uncertainty-guided search mechanisms for improved action selection.
  • Empirical results indicate significant performance gains over traditional methods in various test scenarios.
  • The paper establishes theoretical foundations for when bounded lookahead can enhance decision-making.
Read more
How Modular Is a Frontier Mixture-of-Experts? A Pre-registered Causal Test in Which Apparent Expert Modularity Mostly Dissolves
Tony Salomone, Deep Gandhi, Ali Asaria
NLP Large Language Models Interpretability
  • Only one out of six pre-registered expert families shows robust modularity.
  • Apparent modularity is sensitive to the choice of corpus, metric, and statistical bar.
  • Ablation-based assessments of modularity require careful control to avoid misleading conclusions.
  • The study provides a pre-registered causal testing framework for evaluating expert modularity.
Read more
Data Augmentation: A Fourier Analysis Perspective
Behrooz Tahmasebi, Melanie Weber, Stefanie Jegelka
Theory Efficient ML
  • Partial data augmentation can achieve statistical benefits comparable to full augmentation using a smaller subset of group elements.
  • The theoretical framework employs Fourier analysis and representation theory to analyze the effectiveness of partial augmentation.
  • Statistical optimality is maintained as long as the sampled subset size is sufficiently large relative to the invariant dimension.
  • Exact invariance cannot be achieved without averaging over the entire group, emphasizing the limitations of partial methods.
Read more
Closed-Loop Graph Algorithm Execution with Small Language Models: Step Accuracy and Rollout Reliability
Michal Podstawski
NLP Large Language Models Graph Learning
  • Introduces a closed-loop framework for evaluating small language models in graph algorithm execution.
  • Demonstrates that high next-step prediction accuracy does not ensure reliable overall execution.
  • Identifies significant differences in performance between traversal and weighted graph procedures.
  • Utilizes a comprehensive evaluation methodology that includes teacher-forced and autonomous rollout assessments.
Read more
Sesame: Structure-Aware Molecular Generation via Spatial Density-Map Conditioning
Konstantin Yatsenko, Arvind Thiagarajan
Generative Models
  • Introduces a novel density map conditioning architecture for structure-aware molecular generation.
  • Supports both de novo generation and fragment-conditioned lead optimization through a unified conditioning mechanism.
  • Implements a hybrid discrete-continuous diffusion process for effective molecular generation.
  • Utilizes trajectory finetuning to enhance the quality of generated molecules.
Read more
Speculative Decoding at Temperature Zero: A Scoped Safety-Invariance Screen with a 48,072-Sample Expansion
Sahil Kadadekar
Large Language Models Efficient ML Theory
  • Introduces the Typical-Acceptance Invariance Screen (TAIS) for assessing safety in speculative decoding.
  • Demonstrates no detectable safety divergence in outputs from speculative versus target-only decoding at temperature zero.
  • Utilizes a large dataset of 64,855 samples to validate findings across multiple model configurations.
  • Establishes a clear boundary between inference-time acceleration and other safety considerations in model training and deployment.
Read more
Emergent Capabilities Arise Randomly from Learning Sparse Attention Patterns
Vatsal Baherwani, Zixi Chen, Shikai Qiu, Andrew Gordon Wilson, Pavel Izmailov
NLP Large Language Models Theory
  • Emergent capabilities in transformer models arise stochastically and are influenced by model size.
  • Learning task-relevant attention patterns is crucial for the emergence of capabilities.
  • Context length and pattern sparsity significantly affect the learning difficulty of attention patterns.
  • Scaling the number of attention heads improves learning efficiency, while increasing head dimension yields diminishing returns.
Read more
Learning to Trigger: Reinforcement Learning at the Large Hadron Collider
Zixin Ding, Shaghayegh Emam, Giovanna Salvi, Cecilia Tosciri, Abhijith Gandrakota, Jennifer Ngadiuba, Nhan Tran, Christian Herwig, David W. Miller, Yuxin Chen
Reinforcement Learning
  • Introduced a reinforcement learning framework for real-time trigger threshold optimization at the LHC.
  • Demonstrated significant improvements in signal efficiency and background rate stability using RL-based methods.
  • Developed two new variants of Group-Filtered Policy Optimization tailored for streaming control.
  • Achieved successful transfer of the RL agent from simulated to real collision data without fine-tuning.
Read more
Reconstructing GRACE Terrestrial Water Storage with Spatio-Temporal Graph Neural Networks: An Application to South America
Lukas Arzoumanidis, Lara Johannsen, Klara Middendorf, Annette Eicker, Youness Dehbi
Graph Learning Time Series
  • Introduces a spatio-temporal graph neural network for reconstructing TWS anomalies from meteorological data.
  • Achieves high correlation with GRACE observations, demonstrating effectiveness in capturing hydrological dynamics.
  • Outperforms traditional reconstruction methods in terms of predictor efficiency and accuracy.
  • Successfully reproduces major climatic events, validating the model's applicability in climate science.
Read more
Training Dynamics of Neural Software Defect Predictors under Coupled Data-Quality Issues
Emmanuel Charleson Dapaah, Philip Makedonski, Jens Grabowski
Theory
  • Investigates the interaction of class imbalance and class overlap in neural software defect prediction.
  • Proposes a novel empirical protocol to analyze training dynamics under various data-quality conditions.
  • Aims to catalog training-dynamics patterns associated with data-quality issues for better model diagnostics.
  • Highlights the limitations of relying solely on endpoint performance metrics for understanding model behavior.
Read more
A Zeroth-Order Deep Learning Method for Fully Nonlinear Parabolic Partial Differential Equations with Unknown Coefficients
Yanwei Jia, Du Ouyang, Huyên Pham, Xun Yu Zhou
Theory Optimization
  • Introduces a model-free approach to solving fully nonlinear parabolic PDEs with unknown coefficients.
  • Utilizes zeroth-order derivative estimators from Monte Carlo simulations for learning solutions and their derivatives.
  • Establishes a non-asymptotic error bound and analyzes bias-variance tradeoff for the proposed method.
  • Demonstrates competitive performance in numerical experiments across various dimensions.
Read more
Is Variational Monte Carlo Robust? Sharp Moment Thresholds and Heavy-tailed Stochastic Optimization
Philipp Grohs, Davide Nobile
Theory Optimization
  • VMC's stochastic optimization is heavily influenced by the nodal geometry of the wave function.
  • Local energy and gradient estimators in VMC are generally heavy-tailed and lack higher moments.
  • The proposed PS-Clip-VMC variant improves robustness and convergence properties of VMC.
  • Preliminary results indicate PS-Clip-VMC significantly outperforms traditional VMC methods.
Read more
Tensorion: A Tensor-Aware Generalization of the Muon Optimizer
Vladimir Bogachev, Vladimir Aletov, Alexander Molozhavenko, Sergei Kudriashov, Maxim Rakhuba
Optimization Computer Vision Efficient ML
  • Tensorion extends the Muon optimizer to higher-order tensors, preserving the multilinear structure crucial for optimization.
  • The optimizer utilizes a linear minimization oracle over a specially defined tensor norm ball, balancing computational efficiency and optimization effectiveness.
  • Experiments show that Tensorion outperforms conventional optimizers like Adam in terms of convergence and stability on tensor-based tasks.
  • The proposed method adapts unfolding strategies for tensors, enhancing the practical application of tensor optimization in deep learning.
Read more
Lightweight Transformer Models for On-Device Fault Detection: A Benchmark Study on Resource-Constrained Deployment
Disha Patel
Efficient ML Time Series Theory
  • Lightweight transformers can match traditional ML methods in accuracy but at a much higher resource cost.
  • TinyBERT-4L is the most deployment-friendly transformer model with a balance of size and latency.
  • INT8 quantization can significantly reduce model size while maintaining high accuracy.
  • An adaptive inference pipeline can optimize performance by routing most predictions through a lightweight model.
Read more
A Time-Reparameterized Cumulative Intensity Extrapolation Sampler for Discrete Flow Matching
Feiyang Fu, Hehe Fan
Generative Models Efficient ML NLP
  • Introduces TR-CIE sampler to improve sampling efficiency in DFM.
  • Utilizes a schedule-based time reparameterization to mitigate stiffness.
  • Implements a cumulative-intensity extrapolation rule for better approximation.
  • Requires only one function evaluation per step, maintaining efficiency.
Read more
DREG: A Layer-Wise Jacobian Regularization as a General-Purpose Penalty
Rowan Martnishn
Computer Vision NLP Theory
  • DREG outperforms traditional regularizers in accuracy and noise robustness.
  • It is particularly effective under the GELU activation function.
  • DREG shows significant advantages in data-scarce environments.
  • The method requires minimal implementation effort, functioning as a plug-and-play regularizer.
Read more
AsyncOPD: How Stale Can On-Policy Distillation Be?
Wonjun Kang, Kevin Galim, Seunghyuk Oh, Minjun Kang, Sanghyun Park, Donghoon Kim, Minjae Lee, Minseo Kim, Rishabh Tiwari, Yuchen Zeng, Hyung Il Koo, Kangwook Lee
Large Language Models Reinforcement Learning Efficient ML
  • AsyncOPD addresses the staleness issue in on-policy distillation by decoupling rollout generation from learner updates.
  • The study reveals that forward KL is more robust to stale data compared to reverse KL, which is more vulnerable.
  • Existing asynchronous reinforcement learning stabilization techniques do not effectively mitigate OPD staleness.
  • A multi-sample Monte Carlo estimator is proposed to reduce variance in reverse KL OPD implementations.
Read more
Closing the Loop: Formally Verified Law as a Reward Signal for Self-Improving Legal AI
Armin Heydari, Torben Leowald
NLP Large Language Models Reinforcement Learning
  • Current legal AI systems lack the capability for autonomous legal reasoning due to their reliance on unverifiable outcomes.
  • The proposed architecture integrates LLMs with formal verification to ensure provable correctness in legal reasoning.
  • The system provides structural guarantees for legal argumentation, addressing open-textured legal analysis.
  • Demonstrated effectiveness through practical examples in various legal contexts.
Read more
Digital Twin-Driven Adaptive Sim-to-Real Alignment via Reinforcement Learning for Vibration-Based Bearing Health Monitoring Under Data Scarcity
Jinghan Wang, Yanjun Chen, Wei Zhang, Wentao Wu, Tianchen Liu, Gaoliang Peng
Reinforcement Learning Time Series Optimization
  • Proposes a novel RL-driven approach for sim-to-real feature alignment in bearing health monitoring.
  • Addresses the limitations of existing class-agnostic domain adaptation methods by recognizing the heterogeneous nature of fault classes.
  • Utilizes a three-stage framework that combines physics-based pretraining, RL for adaptive alignment, and an asymmetry-aware training strategy.
  • Achieves a cross-equipment linear probing accuracy of 92.8% without the need for encoder retraining.
Read more
Do Thinking Tokens Help with Safety?
Narutatsu Ri, Abhishek Panigrahi, Sanjeev Arora
Large Language Models NLP Theory
  • Thinking tokens do not significantly enhance safety decision-making in reasoning models.
  • The outcome of compliance or refusal can be predicted early in the thinking process.
  • Existing safety interventions often lead to over-refusal and suppress deliberation signals.
  • The thinking process is more akin to prefix completion than to genuine deliberation.
Read more
Parallel Manifold Steering: Efficient Adaptation of Large Associative Memories via Residual Energy Shaping
Kanishk Awadhiya
Large Language Models Efficient ML Theory
  • H-Res introduces a new method for adapting large Transformers without modifying synaptic weights or increasing sequence length.
  • The approach preserves the attention entropy of the model and supports Neural Collapse.
  • Empirical results show a 26% improvement in associative retrieval tasks compared to global weight modification methods.
  • H-Res avoids the computational overhead of prompt-based methods, making it efficient for structured domains.
Read more
The Gentle Collapse: Distributional Metrics for Continual Learning
Ahmed Anwar, Andreas Wagner, Federico Raue, Tobias Nauen, Andreas Dengel
Computer Vision Theory
  • Introduces six new metrics for evaluating catastrophic forgetting that provide a continuous view of forgetting dynamics.
  • Demonstrates that these metrics reveal information about forgetting that traditional accuracy metrics cannot capture.
  • Shows that using metric scores as loss weights can effectively reduce forgetting in continual learning tasks.
  • Establishes that the slope of metrics over short time windows is a reliable indicator for prioritizing replay samples.
Read more
Learning the Koopman Operator using Attention Free Transformers
Mohammed Nagdi, Evangelos-Marios Nikolados, Alexey Yermakov, Mars Gao, Nathan Kutz, Filippo Menolascina
Time Series Theory Optimization
  • Introduction of an attention-free latent memory block to improve long-horizon prediction accuracy.
  • Dynamic re-encoding mechanism to detect and correct latent drift, enhancing model robustness.
  • Evaluation on three benchmark systems shows significant error reduction compared to existing models.
  • Koopman+AFT model outperforms GRU and Transformer autoencoders in long-horizon predictions.
Read more
Deciphering Fingerprints of 3D Molecular Surfaces for Accurate Epitope Prediction
Fang Wu, Weihao Xuan, Jure Leskovec, Yejin Choi, Li Erran Li
Graph Learning
  • SurfBind is a surface-centric framework that enhances epitope prediction by focusing on molecular surface representations.
  • The model integrates geometric and physicochemical cues using a Transformer architecture with hierarchical prediction.
  • Experiments show that SurfBind outperforms existing methods, achieving state-of-the-art results on benchmark datasets.
  • The framework demonstrates strong generalization capabilities across diverse antibody contexts and conformational states.
Read more
Holistic Data Scheduler for LLM Pre-training via Multi-Objective Reinforcement Learning
Chenhao Dang, Jing Ma, Mingjie Liao
Large Language Models Reinforcement Learning Optimization
  • Introduction of the Holistic Data Scheduler (HDS) for LLM pre-training.
  • HDS employs a multi-objective reward function that considers data quality, inter-domain influence, and model-driven aspects.
  • Utilizes the Soft Actor-Critic (SAC) algorithm for reinforcement learning in a continuous control space.
  • Achieved 44% fewer training iterations and a 7.2% improvement on MMLU 0-shot task compared to existing methods.
Read more
Cyclic Denoising Reveals Ultrastable Memories in Diffusion Models
Rishabh Sharma, Stefano Martiniani
Generative Models
  • Cyclic denoising is a new extraction attack method for image diffusion models.
  • The technique reveals ultrastable attractors that correspond to memorized training images.
  • Cyclic denoising requires no prior knowledge of training data and operates based on the model's own dynamics.
  • The method demonstrates a yielding-like transition in dynamics based on noise amplitude.
Read more
Reliable Conformal Prediction for Ordinal Classification Using the Ranked Probability Score
Stefan Haas, Luca Killmaier, Alireza Javanmardi, Eyke Hüllermeier
Theory
  • Introduction of RPS as a nonconformity measure for conformal prediction in ordinal classification.
  • The method ensures contiguous prediction sets that respect the ordinal structure of labels.
  • Theoretical guarantees for marginal coverage and reduced miscoverage severity.
  • Model-agnostic approach that can be integrated with various probabilistic predictors.
Read more
Adaptive Joint Compression and Synchronisation in Federated Split Learning for IoT Rainfall Prediction
Wenjie Ding, Yi Sin Lin, Jiale Liu, Baoyi Liu, Guanghua Liu, Zhuolu Li, Suleiman Sabo, Chuadhry Mujeeb Ahmed, Aydin Abadi, Rehmat Ullah, Rajiv Ranjan
Federated Learning Time Series Efficient ML
  • Introduction of a joint optimization approach for activation compression and synchronization intervals in FSL.
  • Validation of the framework through extensive simulations and real-world Raspberry Pi deployments.
  • Demonstrated significant reductions in communication overhead while maintaining predictive performance.
  • Adaptive scheduling mechanism adjusts communication parameters based on runtime latency signals.
Read more
ASAP: Agent-System Co-Design for Wall-Clock-Centered Auto HPO Research for ML Experiments
Taicheng Guo, Haomin Zhuang, Kehan Guo, Yujun Zhou, Nitesh V. Chawla, Olaf Wiest, Xiangliang Zhang
Optimization
  • ASAP integrates multiple inductive-biased optimizers to enhance sample efficiency in HPO.
  • The approach focuses on minimizing real-world wall-clock time rather than just iteration count.
  • Innovative techniques such as KV-cache reuse and speculation parallelism are employed to optimize performance.
  • Extensive experiments validate the superiority of ASAP over traditional HPO methods.
Read more
EPTS: Elastic Post-Training Sparsity for Efficient Large Language Model Compression
Ke Xu, Jiaqi Wan, Wenhao Hu, Han Pu, Xiaoyun Wang
NLP Large Language Models Efficient ML
  • EPTS provides a unified framework for Multi-Sparsity optimization, eliminating the need for separate optimization sessions.
  • The MS-HiLoRA mechanism allows for effective knowledge inheritance across different sparsity levels.
  • The MSFM enhances model adaptability to varying sparsity configurations.
  • EPTS demonstrates competitive performance against existing methods while improving deployment efficiency.
Read more
A Fair Evaluation of Graph Foundation Models for Node Property Prediction
Oleg Platonov, Gleb Bazhenov, Dmitry Eremeev, Liudmila Prokhorenkova
Graph Learning
  • The study reevaluates nine recent GFMs for node property prediction in a standardized manner.
  • Only the latest GFMs based on the Prior-data Fitted Networks paradigm outperform well-tuned GNNs.
  • The paper emphasizes the need for a unified evaluation framework in the GFM community.
  • Higher computational costs are associated with the GFMs that outperform GNNs.
Read more
Learning with a Single Rollout via Monte Carlo Pass@k Critic
Fengdi Che, Yang Liu, Lei Yu, Meng Cao, Tong Che, A. Rupam Mahmood, Dale Schuurmans
Reinforcement Learning Large Language Models NLP
  • Introduces SR-PPO for efficient token-level credit assignment in RL for language models.
  • Utilizes a single rollout to mitigate the computational cost and improve credit assignment accuracy.
  • Employs a Pass@k metric to provide a more selective learning signal compared to traditional Pass@1.
  • Demonstrates stable learning dynamics and improved success rates on reasoning benchmarks.
Read more
When Top-1 Fails: Calibrating LoRA Monitors for Masked Diffusion LMs
Lucky Verma, Pratik Yadav
NLP Large Language Models Generative Models
  • Top-1 concentration fails as a reliable stability warning for DLM fine-tuning.
  • Max gradient norm serves as a more effective parameter-side signal for training stability.
  • The study provides a family-calibrated triage protocol with significant predictive precision.
  • Calibration of monitoring thresholds should be specific to each DLM family.
Read more
EMAgnet: Parameter-Space EMA Regularization for Policy Gradient Self-Play in Large Games
Tristan Maidment, JB Lanier, Chase McDonald, Nathan Tsang, Eugene Vinitsky, Roy Fox, Albert Wang, Wesley N. Kerr
Reinforcement Learning
  • EMAgnet introduces an adaptive regularization target using an exponential moving average of policy parameters.
  • The method improves upon traditional uniform regularization by focusing on viable strategies and discarding dominated ones.
  • Evaluation shows EMAgnet achieves lower exploitability and better performance in various game scenarios.
  • The approach is applicable to deep reinforcement learning, extending previous tabular methods to neural network policies.
Read more
The Inference-Compute Frontier and a Latency-Efficient Architecture for Limit Order Book Prediction
C. Evans Hedges
Time Series Efficient ML
  • Identification of a power-law relationship between predictive loss and structural forward work in LOB prediction.
  • Demonstration that latency behaves differently from compute, necessitating separate considerations in model design.
  • Introduction of FastBiNLOB, an architecture that achieves lower latency while maintaining high predictive accuracy.
  • Empirical validation of the inference-compute frontier across various model families.
Read more
FactorLibrary: From Polynomials to Circuits via Recursive Subgoals
Rohan Pandey, Michael Ruofan Zeng, Weikun K. Zhang, Kaijie Jin, Naomi Morato, Archit Ganapule, Bhaumik Mehta, Jarod Alper
Reinforcement Learning Theory Optimization
  • Introduces FactorLibrary to manage combinatorial search space in arithmetic circuit optimization.
  • Formulates the problem as a reinforcement learning task with both bottom-up and top-down approaches.
  • Demonstrates that the top-down PPO+MCTS agent achieves a 91.8% success rate for complexity up to 8.
  • Shows that learned policies generalize well to unseen targets, outperforming random baselines.
Read more