AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

58 Papers today
8h Update frequency
7 Days of history
Reinforcement Learning with LLM-Guided Action Spaces for Synthesizable Lead Optimization
Tao Li, Kaiyuan Hou, Tuan Vinh, Monika Raj, Zhichun Guo, Carl Yang
Reinforcement Learning Large Language Models Optimization
  • MOLREACT bridges the gap between property optimization and synthetic feasibility in drug discovery.
  • The framework uses a tool-augmented LLM to propose feasible chemical transformations dynamically.
  • A dedicated policy model optimizes multi-step reaction trajectories to maximize long-term rewards.
  • The SMILES-based caching mechanism significantly reduces optimization time.
Read more
Bayesian Optimization for Mixed-Variable Problems in the Natural Sciences
Yuhao Zhang, Ti John, Matthias Stosiek, Patrick Rinke
Optimization
  • Introduces a generalized probabilistic reparameterization method for mixed-variable optimization.
  • Demonstrates the effectiveness of Bayesian optimization in handling non-equidistant discrete variables.
  • Conducts extensive benchmarks to optimize kernel formulations and validate the proposed method.
  • Shows that the approach can efficiently optimize complex objective landscapes in real-world scenarios.
Read more
Joint Task Offloading, Inference Optimization and UAV Trajectory Planning for Generative AI Empowered Intelligent Transportation Digital Twin
Xiaohuan Li, Junchuan Fan, Bingqi Zhang, Rong Yu, Xumin Huang, Qian Chen
Reinforcement Learning Generative Models Optimization
  • Integration of GAI with ITDT enhances data processing and fidelity.
  • Joint optimization of task offloading, inference, and UAV trajectories is crucial for system performance.
  • The SU-HATD3 algorithm effectively addresses the challenges of dynamic network environments.
  • Numerical results indicate significant improvements in system utility and convergence compared to baseline algorithms.
Read more
Persistence-Augmented Neural Networks
Elena Xinyi Wang, Arnur Nigmetov, Dmitriy Morozov
Computer Vision Graph Learning Interpretability
  • Introduces a persistence-based data augmentation framework for deep learning.
  • Utilizes the Morse–Smale complex to retain local topological information.
  • Demonstrates efficiency with a computational complexity of O(n log n).
  • Achieves superior performance on histopathology image classification and 3D porous material regression compared to existing methods.
Read more
ADAPTive Input Training for Many-to-One Pre-Training on Time-Series Classification
Paul Quinlan, Qingguo Li, Xiaodan Zhu
Time Series
  • Introduction of the ADAPT framework for efficient many-to-one pre-training in time-series classification.
  • Demonstrated ability to train on 162 diverse time-series datasets simultaneously.
  • Achieved state-of-the-art performance on classification benchmarks.
  • Framework designed to be model agnostic, allowing for future improvements in model architectures.
Read more
Prediction Arena: Benchmarking AI Models on Real-World Prediction Markets
Jaden Zhang, Gardenia Liu, Oliver Johansson, Hileamlak Yitayew, Kamryn Ohly, Grace Li
Reinforcement Learning
  • Prediction Arena benchmarks AI models in live prediction markets with real capital.
  • Cohort 1 models showed significant performance differences between Kalshi and Polymarket.
  • The model grok-4-20-checkpoint achieved the highest settlement win rate across platforms.
  • Initial prediction accuracy is crucial for model success, while research volume does not correlate with outcomes.
Read more
Fraud Detection System for Banking Transactions
Ranya Batsyas, Ritesh Yaduwanshi
Theory Optimization Efficient ML
  • The framework utilizes the PaySim synthetic dataset to model fraudulent transactions.
  • Employs CRISP-DM methodology for structured analysis and model development.
  • Implements SMOTE to address class imbalance in the dataset.
  • Compares multiple machine learning models, highlighting the effectiveness of ensemble methods.
Read more
Guardian-as-an-Advisor: Advancing Next-Generation Guardian Models for Trustworthy LLMs
Yue Huang, Haomin Zhuang, Jiayi Ye, Han Bao, Yanbo Wang, Hang Hua, Siyuan Wu, Pin-Yu Chen, Xiangliang Zhang
NLP Large Language Models Reinforcement Learning
  • Introduction of Guardian-as-an-Advisor (GaaA) as a soft-gating alternative to traditional hard-gated safety checkers.
  • Development of GuardSet, a large-scale dataset with over 208,000 examples for training guardian models.
  • GuardAdvisor model achieves competitive performance while reducing unnecessary refusals and maintaining low latency.
  • The framework enhances the utility of LLMs by providing interpretable risk assessments without blocking generation.
Read more
Zero-shot Multivariate Time Series Forecasting Using Tabular Prior Fitted Networks
Mayuka Jayawardhana, Nihal Sharma, Kazem Meidani, Bayan Bruss, Tom Goldstein, Doron Bergman
Time Series
  • Introduces a framework for zero-shot multivariate time series forecasting using tabular foundation models.
  • Addresses the limitation of treating MTS as independent univariate problems by modeling inter-channel dependencies.
  • Reformulates MTS forecasting as scalar regression problems, enabling the use of existing tabular models without retraining.
  • Empirical results indicate improved performance over traditional methods and competitive results against specialized time series models.
Read more
DMax: Aggressive Parallel Decoding for dLLMs
Zigeng Chen, Gongfan Fang, Xinyin Ma, Ruonan Yu, Xinchao Wang
NLP Large Language Models Generative Models
  • DMax mitigates error accumulation in parallel decoding of dLLMs.
  • Introduces On-Policy Uniform Training (OPUT) for effective self-correction.
  • Proposes Soft Parallel Decoding (SPD) for robust intermediate state representation.
  • Demonstrates significant improvements in TPF on multiple benchmarks.
Read more
GAN-based Domain Adaptation for Image-aware Layout Generation in Advertising Poster Design
Chenchen Xu, Min Zhou, Tiezheng Ge, Weiwei Xu
Generative Models Computer Vision
  • Introduction of the CGL-Dataset for training image-aware layout generation models.
  • Development of two GAN-based models: CGL-GAN and PDA-GAN, with the latter utilizing unsupervised domain adaptation.
  • Proposal of three novel content-aware metrics for evaluating layout generation quality.
  • PDA-GAN demonstrates significant improvements over CGL-GAN in generating aesthetically pleasing layouts.
Read more
DSPR: Dual-Stream Physics-Residual Networks for Trustworthy Industrial Time Series Forecasting
Yeran Zhang, Pengwei Yang, Guoqing Wang, Tianyu Li
Time Series Graph Learning Interpretability
  • DSPR effectively decouples stable trends from regime-dependent dynamics in industrial time series forecasting.
  • The framework incorporates an Adaptive Window module for flow-dependent transport delays and a Physics-Guided Dynamic Graph for learning time-varying interactions.
  • DSPR achieves state-of-the-art predictive performance with high Mean Conservation Accuracy and Total Variation Ratio.
  • The model provides interpretable insights into physical mechanisms, enhancing understanding beyond mere prediction.
Read more
Less Approximates More: Harmonizing Performance and Confidence Faithfulness via Hybrid Post-Training for High-Stakes Tasks
Haokai Ma, Lee Yan Zhen, Gang Yang, Yunshan Ma, Ee-Chien Chang, Tat-Seng Chua
NLP Large Language Models Reinforcement Learning
  • Introduces HyTuning, a hybrid post-training framework for LLMs.
  • Proposes Progressive Reasoning Gain (PRG) to measure the reliability of reasoning steps.
  • Addresses challenges of data scarcity, overconfidence, and erroneous updates in high-stakes tasks.
  • Demonstrates significant improvements in accuracy and confidence faithfulness through extensive experiments.
Read more
Provably Adaptive Linear Approximation for the Shapley Value and Beyond
Weida Li, Yaoliang Yu, Bryan Kian Hsiang Low
Theory Efficient ML Interpretability
  • Introduces a theoretical framework for approximating semi-values with improved query complexities.
  • Develops Adalina, an adaptive algorithm that achieves linear-time and linear-space efficiency.
  • Establishes a connection between existing approximation algorithms and provides insights on paired sampling benefits.
  • Demonstrates that the proposed methods can significantly reduce the number of utility queries required for accurate approximations.
Read more
Learning is Forgetting: LLM Training As Lossy Compression
Henry C. Conklin, Tom Hosking, Tan Yi-Chern, Julian Gold, Jonathan D. Cohen, Thomas L. Griffiths, Max Bartolo, Seraphina Goldfarb-Tarrant
NLP Large Language Models Theory
  • LLMs are conceptualized as instances of lossy compression, retaining only relevant information from training data.
  • Pre-training follows a two-phase trajectory consistent with Information Bottleneck theory, with models approaching optimal compression over time.
  • The degree of optimal compression correlates significantly with performance across multiple benchmarks for various LLM families.
  • Quantifying preference information in models predicts downstream performance, indicating alignment with human-like preferences.
Read more
Optimal Decay Spectra for Linear Recurrences
Yang Cao
NLP Large Language Models Theory
  • Introduces Position-Adaptive Spectral Tapering (PoST) to improve long-range memory in linear recurrent models.
  • Establishes a design blueprint for optimal memory channel distribution based on logarithmic equipartition.
  • Demonstrates minimax optimality through Spectral Reparameterization for geometrically spaced decay rates.
  • Implements Position-Adaptive Scaling to eliminate scale mismatch and enhance approximation bounds.
Read more
A Systematic Framework for Tabular Data Disentanglement
Ivan Tjuawinata, Andre Gunawan, Anh Quan Tran, Nitish Kumar, Payal Pote, Harsh Bansal, Chu-Hung Chi, Kwok-Yan Lam, Parventanis Murthy
Theory Generative Models Time Series
  • Introduces a systematic framework for tabular data disentanglement.
  • Modularizes the disentanglement process into four core components.
  • Identifies limitations of existing methods and proposes a comprehensive view.
  • Demonstrates the framework's applicability through a case study.
Read more
What Drives Representation Steering? A Mechanistic Case Study on Steering Refusal
Stephen Cheng, Sarah Wiegreffe, Dinesh Manocha
NLP Large Language Models Interpretability
  • Introduces a multi-token activation patching framework for analyzing steering vectors in LLMs.
  • Finds that refusal steering primarily interacts with the OV circuit, with minimal impact from the QK circuit.
  • Demonstrates that different steering methodologies leverage highly interchangeable circuits.
  • Shows that refusal steering vectors can be sparsified by 90-99% while maintaining performance.
Read more
Bit-by-Bit: Progressive QAT Strategy with Outlier Channel Splitting for Stable Low-Bit LLMs
Binxing Xu, Hao Gu, Lujun Li, Hao Wang, Bei Liu, Jiacheng Liu, Qiyuan Zhu, Xintong Yang, Chao Li, Sirui Han, Yike Guo
Large Language Models Efficient ML Optimization
  • Introduces a progressive QAT framework that enhances stability during low-bit training.
  • Utilizes outlier channel splitting to reduce quantization errors.
  • Enables a 'train once, deploy any precision' capability through nested quantization grids.
  • Achieves significant performance improvements over existing QAT methods on LLaMA models.
Read more
Data Warmup: Complexity-Aware Curricula for Efficient Diffusion Training
Jinhong Lin, Pan Wang, Zitong Zhan, Lin Zhang, Pedro Morgado
Generative Models Efficient ML
  • Identifies the mismatch between data complexity and model readiness as a source of inefficiency in diffusion training.
  • Introduces a semantic-aware image complexity metric that combines foreground dominance and typicality.
  • Demonstrates significant improvements in IS and FID on ImageNet with a simple-to-complex training curriculum.
  • Confirms that the order of image complexity is critical for performance, as reversing the curriculum harms results.
Read more
MIPT-SSM: Scaling Language Models with O(1) Inference Cache via Phase Transitions
Yasong Fan
NLP Large Language Models Theory
  • Introduces a learned measurement rate for dynamic computation routing in sequence models.
  • Proves the incompatibility of norm-preserving and selective forgetting in linear operators.
  • Achieves significant performance improvements over Transformers in text classification tasks.
  • Demonstrates a 42.8x reduction in memory usage compared to traditional Transformers.
Read more
Cognitive-Causal Multi-Task Learning with Psychological State Conditioning for Assistive Driving Perception
Keito Inoshita, Nobuhiro Hayashida, Akira Imanishi
Multimodal
  • Introduction of CauPsi, a cognitive science-grounded causal multi-task learning framework for ADAS.
  • Implementation of a Causal Task Chain for hierarchical task dependency modeling.
  • Incorporation of psychological state signals into multi-task learning through Cross-Task Psychological Conditioning.
  • Achieved 82.71% mean accuracy on the AIDE dataset with only 5.05M parameters, surpassing prior work.
Read more
CausalVAE as a Plug-in for World Models: Towards Reliable Counterfactual Dynamics
Ziyi Ding, Xianxin Lai, Weiyu Chen, Xiao-Ping Zhang, Jiayu Chen
Generative Models Graph Learning Theory
  • CausalVAE is introduced as a plug-in for latent world models to improve counterfactual dynamics.
  • The model captures causal relationships among latent variables using a directed acyclic graph (DAG) structure.
  • A staged training strategy is employed to stabilize sequential training and enhance interpretability.
  • Significant improvements in counterfactual retrieval metrics, especially in the Physics benchmark.
Read more
Fast Heterogeneous Serving: Scalable Mixed-Scale LLM Allocation for SLO-Constrained Inference
Jiaming Cheng, Duong Tung Nguyen
Large Language Models Optimization Efficient ML
  • Introduction of two efficient heuristics (GH and AGH) for LLM inference allocation.
  • Incorporation of constraint-aware mechanisms to ensure feasibility under resource and SLO constraints.
  • AGH achieves over 260× speedup compared to traditional MILP approaches.
  • Robust performance under stress tests, maintaining stable costs and controlled SLO violations.
Read more
Kuramoto Oscillatory Phase Encoding: Neuro-inspired Synchronization for Improved Learning Efficiency
Mingqing Xiao, Yansen Wang, Dongqi Han, Caihua Shan, Dongsheng Li
Computer Vision Efficient ML Theory
  • Introduction of Kuramoto Oscillatory Phase Encoding (KoPE) to Vision Transformers.
  • KoPE enhances learning efficiency through neuro-inspired synchronization mechanisms.
  • Demonstrated improvements in training, parameter, and data efficiency.
  • KoPE excels in structured understanding tasks like semantic segmentation and few-shot reasoning.
Read more
GIRL: Generative Imagination Reinforcement Learning via Information-Theoretic Hallucination Control
Prakul Sunil Hiremath
Reinforcement Learning Generative Models Robotics
  • GIRL addresses imagination drift in model-based reinforcement learning by using a cross-modal grounding signal.
  • The framework employs an uncertainty-adaptive trust-region bottleneck to control the imagination process.
  • Theoretical contributions include a new value-gap bound that remains valid as the discount factor approaches one.
  • Empirical results show GIRL outperforms DreamerV3 and TD-MPC2 across various tasks, demonstrating improved sample efficiency.
Read more
Sheaf-Laplacian Obstruction and Projection Hardness for Cross-Modal Compatibility on a Modality-Independent Site
Tibor Sloboda
Multimodal Theory Graph Learning
  • Introduces a modality-independent site for evaluating cross-modal compatibility.
  • Defines projection hardness and sheaf-Laplacian obstruction as key invariants for alignment.
  • Establishes a connection between sheaf spectral gap and global alignment stability.
  • Demonstrates non-transitivity in compatibility and the potential for bridging through intermediate modalities.
Read more
TTVS: Boosting Self-Exploring Reinforcement Learning via Test-time Variational Synthesis
Sikai Bai, Haoxi Li, Jie Zhang, Yongjiang Liu, Song Guo
Reinforcement Learning Large Language Models
  • TTVS enhances self-exploring RL by dynamically augmenting training data from unlabeled test queries.
  • The framework consists of two modules: Online Variational Synthesis and Test-time Hybrid Exploration.
  • TTVS outperforms existing test-time adaptation methods and state-of-the-art supervised RL techniques.
  • The approach is agnostic to policy optimization algorithms, allowing flexible integration with various methods.
Read more
Robust Length Prediction: A Perspective from Heavy-Tailed Prompt-Conditioned Distributions
Jing Wang, Yu-Yang Qian, Ke Xue, Chao Qian, Peng Zhao, Zhi-Hua Zhou
Large Language Models Efficient ML Theory
  • Output-length prediction is essential for efficient LLM serving and resource allocation.
  • Existing methods treat output length as a deterministic scalar, which is statistically misaligned with the true nature of LLM outputs.
  • The proposed ProD methods leverage multiple generations to create robust training targets, improving prediction accuracy.
  • Empirical results show significant improvements in prediction quality over previous state-of-the-art methods.
Read more
Regret-Aware Policy Optimization: Environment-Level Memory for Replay Suppression under Delayed Harm
Prakul Sunil Hiremath
Reinforcement Learning Optimization Graph Learning
  • Introduces the Replay Suppression Diagnostic (RSD) to analyze replay phenomena in RL.
  • Establishes a theoretical framework showing that replay cannot be suppressed without changing action distributions.
  • Proposes Regret-Aware Policy Optimization (RAPO) to modify transition dynamics based on historical harm.
  • Demonstrates significant reduction in replay and retention of task performance in graph diffusion tasks.
Read more
Cluster Attention for Graph Machine Learning
Oleg Platonov, Liudmila Prokhorenkova
Graph Learning
  • Introduction of Cluster Attention (CLATT) to enhance graph machine learning models.
  • CLATT allows nodes to attend to all nodes within their clusters, improving receptive fields.
  • Augmentation of MPNNs and Graph Transformers with CLATT leads to significant performance gains.
  • Experimental validation on 12 real-world graph datasets demonstrates the effectiveness of CLATT.
Read more
Validated Synthetic Patient Generation for Small Longitudinal Cohorts: Coagulation Dynamics Across Pregnancy
Jeffrey D. Varner, Maria Cristina Bravo, Carole McBride, Thomas Orfeo, Ira Bernstein
Generative Models Time Series Theory
  • Introduces multiplicity-weighted Stochastic Attention (SA) for synthetic patient generation.
  • SA preserves the geometry of small longitudinal cohorts while generating new patient profiles.
  • Synthetic patients were validated against real data and found to be statistically indistinguishable.
  • SA enables targeted amplification of rare clinical subgroups without retraining.
Read more
SPAMoE: Spectrum-Aware Hybrid Operator Framework for Full-Waveform Inversion
Zhenyu Wang, Peiyuan Li, Yongxiang Shi, Ruoyu Wu, Chenfei Liao, Lei Zhang
Optimization
  • SPAMoE effectively decouples high and low-frequency information flows in FWI.
  • The Spectral-Preserving DINO Encoder maintains balanced frequency content, improving model stability.
  • The Adaptive Spectral Mixture-of-Experts enhances multi-scale geological structure reconstruction.
  • SPAMoE outperforms existing FWI methods, achieving a 54.1% reduction in average MAE.
Read more
Bias Redistribution in Visual Machine Unlearning: Does Forgetting One Group Harm Another?
Yunusa Haruna, Adamu Lawan, Ibrahim Haruna Abdulhamid, Hamza Mohammed Dauda, Jiaquan Zhang, Chaoning Zhang, Shamsuddeen Hassan Muhammad
Computer Vision Theory Interpretability
  • Bias redistribution occurs when a model forgets a demographic group, often amplifying bias in other groups.
  • The study reveals that forgetting the Young Female group primarily benefits the Old Female group, indicating a gender-dominant structure in CLIP's embedding space.
  • Current unlearning methods struggle to achieve perfect forgetting due to the geometric relationships between embeddings.
  • A novel redistribution score is introduced to quantify bias redistribution in machine unlearning.
Read more
PolicyLong: Towards On-Policy Context Extension
Junlong Jia, Ziyang Chen, Xing Wu, Chaochen Gao, TingHao Yu, Feng Zhang, Songlin Hu
NLP Large Language Models
  • PolicyLong introduces an on-policy framework for long-context training, addressing the off-policy gap in traditional methods.
  • The framework employs an iterative self-curriculum that adapts to the model's evolving capabilities.
  • Both positive contexts and hard negatives are dynamically selected based on the current model's entropy landscape.
  • Experiments show significant performance improvements over baseline methods, especially at longer context lengths.
Read more
Is your algorithm unlearning or untraining?
Eleni Triantafillou, Ahmed Imtiaz Humayun, Monica Ribero, Alexander Matt Turner, Michael C. Mozer, Georgios Kaissis
Theory
  • Distinction between Unlearning and Untraining is crucial for clarity in research.
  • Untraining removes the influence of specific examples, while Unlearning targets the underlying distribution.
  • Misunderstanding these terms can lead to inappropriate metrics and hinder progress.
  • The paper aims to initiate discussions on technical definitions and overlooked research questions.
Read more
The Lifecycle of the Spectral Edge: From Gradient Learning to Weight-Decay Compression
Yongzhong Xu
Theory Optimization
  • The spectral edge transitions from a gradient-driven to a weight-decay-driven state during training.
  • At grokking, gradient and weight decay align along the spectral edge, indicating a phase transition.
  • Post-grok, the spectral edge's orientation is crucial for model performance, while displacements along it are not.
  • Three universality classes of spectral edges are identified based on their functional content.
Read more
Implicit Regularization and Generalization in Overparameterized Neural Networks
Zeran Johannsen
Theory Optimization
  • Overparameterized neural networks can generalize well despite classical predictions of overfitting.
  • Implicit regularization through optimization algorithms like SGD influences generalization performance.
  • Smaller batch sizes lead to flatter minima and lower test errors.
  • Sparse subnetworks can achieve performance comparable to full models, highlighting effective capacity constraints.
Read more
Quantization Impact on the Accuracy and Communication Efficiency Trade-off in Federated Learning for Aerospace Predictive Maintenance
Abdelkarim Loukili
Federated Learning Time Series Efficient ML
  • AeroConv1D model designed specifically for aerospace predictive maintenance.
  • INT4 quantization achieves accuracy similar to FP32 while reducing communication costs by 8×.
  • Non-IID evaluation reveals the true performance of quantization methods, contrasting with IID assumptions.
  • INT2 quantization, while showing lower MAE, leads to significant instability in performance metrics.
Read more
A Direct Approach for Handling Contextual Bandits with Latent State Dynamics
Zhen Li, Gilles Stoltz
Reinforcement Learning Theory Optimization
  • Introduces a general model for linear contextual bandits with latent-state dynamics, allowing rewards to depend on contexts, actions, and hidden states.
  • Achieves stronger high-probability regret bounds compared to previous work, which relied on simplified models.
  • Demonstrates that the belief-dependent reward model is a significant simplification and does not capture the complexities of the problem.
  • Provides a more direct and efficient methodology for handling contextual bandits with latent state dynamics.
Read more
The Impact of Dimensionality on the Stability of Node Embeddings
Tobias Schumacher, Simon Reichelt, Markus Strohmaier
Graph Learning
  • Dimensionality significantly affects the stability of node embeddings.
  • Different embedding methods exhibit varying stability patterns with increased dimensions.
  • Higher dimensionality does not guarantee better performance in downstream tasks.
  • The study emphasizes the importance of selecting appropriate embedding dimensions.
Read more
Learning Markov Processes as Sum-of-Square Forms for Analytical Belief Propagation
Peter Amorese, Morteza Lahijanian
Theory Efficient ML Time Series
  • Introduces a functional modeling framework using Sum-of-Squares forms for analytical belief propagation in Markov processes.
  • Provides a theoretical analysis of the limitations of SoS for conditional density estimation.
  • Presents a novel functional form that alleviates restrictions of SoS while preserving theoretical attributes.
  • Demonstrates a training method that ensures valid distribution constraints are met.
Read more
Tree-of-Evidence: Efficient "System 2" Search for Faithful Multimodal Grounding
Micky C. Nnamdi, Benoit L. Marteau, Yishan Zhong, J. Ben Tamo, May D. Wang
Multimodal Interpretability Optimization
  • Tree-of-Evidence (ToE) is introduced as a novel algorithm for improving interpretability in multimodal models.
  • ToE employs a beam search strategy to identify minimal evidence sets necessary for model predictions.
  • The algorithm retains high predictive performance while providing auditable evidence traces.
  • ToE adapts its search strategy based on the ambiguity of the data, effectively integrating multiple modalities.
Read more
Adversarial Label Invariant Graph Data Augmentations for Out-of-Distribution Generalization
Simon Zhang, Ryan P. DeMilt, Kun Jin, Cathy H. Xia
Graph Learning Optimization Theory
  • Introduction of RIA, a method for OoD generalization under covariate shift.
  • Adversarial label invariant data augmentations are used to create diverse training environments.
  • The methodology includes an alternating gradient descent-ascent algorithm for optimization.
  • Extensive experiments show RIA outperforms existing OoD generalization approaches.
Read more
SAGE: Sign-Adaptive Gradient for Memory-Efficient LLM Optimization
Wooin Lee, Hyun-Tae Kim
Large Language Models Optimization Efficient ML
  • SAGE addresses the memory bottleneck of the AdamW optimizer in LLM training.
  • The optimizer effectively manages the unique challenges posed by embedding layers' sparse, high-variance gradients.
  • SAGE combines a Lion-style update with a memory-efficient adaptive scale for improved stability and convergence.
  • The proposed method outperforms existing optimizers in terms of perplexity and memory efficiency.
Read more
Conservation Law Breaking at the Edge of Stability: A Spectral Theory of Non-Convex Neural Network Optimization
Daniel Nobrega Medeiros
Optimization Theory
  • Gradient flow preserves conservation laws in L-layer ReLU networks, confining optimization trajectories.
  • Discrete gradient descent breaks these conservation laws, leading to a drift characterized by a non-integer exponent α.
  • A closed-form spectral crossover formula for drift is derived, explaining the observed behavior across different architectures.
  • Cross-entropy loss is shown to induce exponential Hessian spectral compression, independent of training set size.
Read more
Approximation of the Basset force in the Maxey-Riley-Gatignol equations via universal differential equations
Finn Sommer, Vamika Rathi, Sebastian Goetschel, Daniel Ruprecht
Theory Optimization Time Series
  • Introduces a neural network-based approximation for the Basset force in MaRGE.
  • Transforms complex integro-differential equations into solvable ordinary differential equations.
  • Compares FNN and LSTM architectures to effectively model the history effects.
  • Demonstrates the applicability of universal differential equations in fluid dynamics.
Read more
Multimodal Latent Reasoning via Predictive Embeddings
Ashutosh Adhikari, Mirella Lapata
Multimodal
  • PEARL eliminates the need for explicit tool invocation at inference time, reducing overhead.
  • The framework supports multi-step reasoning and avoids training-inference mismatch.
  • PEARL outperforms traditional supervised fine-tuning and reconstruction-based methods.
  • Empirical analysis reveals that reconstruction-based methods focus on embedding learning rather than true latent transformations.
Read more
SOLAR: Communication-Efficient Model Adaptation via Subspace-Oriented Latent Adapter Reparametrization
Seyed Mahmoud Sajjadi Mohammadabadi, Xiaolong Ma, Lei Yang, Feng Yan, Junshan Zhang
Efficient ML Computer Vision NLP
  • SOLAR significantly reduces the communication and storage costs of PEFT methods.
  • The framework is model-agnostic and can be applied post-training without modifying existing fine-tuning processes.
  • The method leverages subspace similarity to create compact and efficient adapter representations.
  • Theoretical bounds on reconstruction error are established, allowing for controlled compression.
Read more
Physics-informed neural operators for the in situ characterization of locally reacting sound absorbers
Jonas M. Schmid, Johannes D. Schmid, Martin Eser, Steffen Marburg
Audio & Speech
  • Introduces a novel physics-informed neural operator approach for estimating acoustic surface admittance.
  • Utilizes deep operator networks to learn mappings from measurement data without requiring an explicit forward model.
  • Incorporates governing acoustic equations as regularization to enhance prediction consistency and noise robustness.
  • Demonstrates accurate reconstruction of admittance components and reliable acoustic field predictions using synthetic data.
Read more
Introducing Echo Networks for Computational Neuroevolution
Christian Kroos, Fabian Küch
Audio & Speech Efficient ML Time Series
  • Introduction of Echo Networks, a new type of recurrent neural network for neuroevolution.
  • Echo Networks utilize a single connection matrix for topology and weights, enhancing mutation and recombination processes.
  • Demonstrated effectiveness in classifying electrocardiography signals with minimal network sizes.
  • Potential for systematicity in network evolution, addressing limitations of traditional neuroevolution methods.
Read more
EgoEverything: A Benchmark for Human Behavior Inspired Long Context Egocentric Video Understanding in AR Environment
Qiance Tang, Ziqi Wang, Jieyu Lin, Ziyun Li, Barbara De Salvo, Sai Qian Zhang
Computer Vision Multimodal
  • EgoEverything incorporates human attention into question generation, improving realism in AR interactions.
  • The benchmark includes over 5,000 question-answer pairs and spans more than 100 hours of video.
  • A novel VQA generation pipeline with multi-agent collaboration and attention-inspired sampling is introduced.
  • Evaluation reveals that existing VLMs perform poorly on EgoEverything, indicating a need for improved models in AR contexts.
Read more
Accelerating Training of Autoregressive Video Generation Models via Local Optimization with Representation Continuity
Yucheng Zhou, Jianbing Shen
Generative Models Optimization Efficient ML
  • Training on fewer video frames accelerates training but increases error accumulation.
  • Local Optimization method reduces error propagation by optimizing tokens within localized windows.
  • Representation Continuity strategy enhances video consistency and reduces errors.
  • The proposed methods achieve better performance than existing autoregressive video generation methods.
Read more
An Imperfect Verifier is Good Enough: Learning with Noisy Rewards
Andreas Plesner, Francisco Guzmán, Anish Athalye
Reinforcement Learning Large Language Models
  • RLVR is robust to noise in verification, tolerating up to 15% noise without significant performance loss.
  • Precision in verification is more important than recall for effective RL training.
  • Diminishing returns are observed when improving verifier accuracy beyond a certain point.
  • The findings generalize across different model families and sizes, indicating broad applicability.
Read more
Automating aggregation strategy selection in federated learning
Dian S. Y. Pang, Endrias Y. Ergetu, Eric Topham, Ahmed E. Fetit
Federated Learning
  • Introduces a novel framework for automating aggregation strategy selection in Federated Learning.
  • Utilizes large language models for single-trial strategy inference and genetic search for multi-trial exploration.
  • Demonstrates improved robustness and generalization in non-IID conditions through extensive experiments.
  • Reduces reliance on manual intervention and trial-and-error experimentation in strategy selection.
Read more
Structured Distillation of Web Agent Capabilities Enables Generalization
Xing Han Lù, Siva Reddy
Large Language Models
  • Introduction of AGENT-AS-ANNOTATORS framework for web agent training.
  • Generation of a synthetic dataset (A3-SYNTH) with 3,000 web tasks.
  • 9B-parameter student model achieved 41.5% on WebArena, surpassing closed-source models.
  • Significant transfer learning capabilities demonstrated on unseen platforms.
Read more
Flow Learners for PDEs: Toward a Physics-to-Physics Paradigm for Scientific Computing
Yilong Dai, Shengyu Chen, Xiaowei Jia, Runlong Yu
Theory Generative Models Optimization
  • Current learned PDE solvers often rely on state prediction, which is inadequate for complex scientific problems.
  • Flow learners parameterize transport vector fields, allowing for continuous-time predictions and better uncertainty quantification.
  • The proposed approach aligns solver structure with the physical evolution described by PDEs, enhancing the modeling of dynamics.
  • The paper outlines a new research agenda focused on transport-based learning for PDEs.
Read more
The Role of Emotional Stimuli and Intensity in Shaping Large Language Model Behavior
Ameen Patel, Felix Lee, Kyle Liang, Joseph Thomas
NLP Large Language Models
  • Emotional prompting can significantly influence LLM performance, including accuracy and toxicity.
  • The study introduces a broader emotional spectrum, including both positive and negative emotions.
  • Sycophantic behavior in LLMs increases with positive emotional stimuli, raising concerns about reliability.
  • A novel prompt-generation pipeline was developed to create a diverse set of emotional prompts.
Read more