AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

46 Papers today
8h Update frequency
7 Days of history
Agentic AIs Are the Missing Paradigm for Out-of-Distribution Generalization in Foundation Models
Xin Wang, Haibo Chen, Wenxuan Liu, Wenwu Zhu
Theory Large Language Models Multimodal
  • Foundation models face unique OOD challenges that differ from classical OOD assumptions.
  • A stage-aware formalization of OOD is necessary to account for multi-stage training distributions.
  • Model-centric methods have intrinsic limitations, leading to a 'parameter coverage ceiling' for certain inputs.
  • Agentic systems extend the capabilities of FMs by integrating perception and external strategies.
Read more
No Triangulation Without Representation: Generalization in Topological Deep Learning
Johannes S. Schmidt, Martin Carrasco, Ernst Röell, Guy Wolf, Nello Blaser, Bastian Rieck
Graph Learning Theory
  • Extension of the MANTRA dataset to include a wider variety of manifold triangulations.
  • Demonstration that GNNs and HOMP methods can saturate the benchmark with the right representations.
  • Introduction of a novel evaluation protocol focusing on representational diversity and triangulation refinement.
  • Findings indicate that existing models fail to generalize beyond combinatorial structures, highlighting a gap in TDL.
Read more
Structure-Preserving Gaussian Processes Via Discrete Euler-Lagrange Equations
Jan-Hendrik Ewering, Kathrin Flaßkamp, Niklas Wahlström, Thomas B. Schön, Thomas Seel
Robotics Time Series Theory
  • Introduction of Lagrangian Gaussian Processes (LGPs) for learning dynamics models.
  • Preservation of the geometric structure of the Lagrange-d’Alembert principle.
  • Ability to learn from discrete position data without requiring velocity or momentum measurements.
  • Demonstrated data efficiency and generalization in synthetic and real-world scenarios.
Read more
WARP: A Benchmark for Primal-Dual Warm-Starting of Interior-Point Solvers
Dhruv Suri, Helgi Hilmarsson, Shourya Bose
Optimization
  • The evaluation baseline for warm-start methods has been corrected, revealing that previous claims of iteration reductions were misleading.
  • Primal prediction accuracy is anticorrelated with convergence speed in interior-point methods.
  • Providing complete primal-dual information significantly reduces solver iterations compared to primal-only methods.
  • The authors release a benchmark suite and a new model (WARP) that effectively predicts the full interior-point state.
Read more
Crafting Reversible SFT Behaviors in Large Language Models
Yuping Lin, Pengfei He, Yue Xing, Yingqian Cui, Jiayuan Ding, Subhabrata Mukherjee, Hui Liu, Zhen Xiang
NLP Large Language Models Interpretability
  • Introduces the concept of sparse behavioral carriers for controlling SFT-induced behaviors in LLMs.
  • Proposes Loss-Constrained Dual Descent (LCDD) for constructing these carriers through joint optimization.
  • Demonstrates the effectiveness of SFT-Eraser in reversing SFT behaviors without weight modification.
  • Provides evidence that the sparse structure is essential for causal control over behaviors.
Read more
Hypothesis generation and updating in large language models
Hua-Dong Xiong
Large Language Models Theory Interpretability
  • LLMs generate and update hypotheses based on sparse numerical examples, revealing their inductive biases.
  • LLMs often exhibit a two-parameter Bayesian fit but show systematic biases towards narrower hypotheses.
  • There is a significant gap between hypothesis evaluation and generation performance in LLMs.
  • LLMs generalize poorly to unobserved parts of the hypothesis domain, indicating limitations in their reasoning capabilities.
Read more
AeroJEPA: Learning Semantic Latent Representations for Scalable 3D Aerodynamic Field Modeling
Francisco Giral, Abhijeet Vishwasrao, Andrea Arroyo Ramo, Mahmoud Golestanian, Federica Tonti, Adrian Lozano-Duran, Steven L. Brunton, Sergio Hoyas, Hector Gomez, Soledad Le Clainche, Ricardo Vinuesa
Optimization Theory Generative Models
  • AeroJEPA introduces a novel predictive latent architecture for aerodynamic modeling.
  • The architecture separates the prediction of latent representations from the resolution of output fields.
  • It demonstrates competitive performance on high-fidelity aerodynamic datasets.
  • The learned latent space enables effective interpolation and design optimization.
Read more
Does Synthetic Data Help? Empirical Evidence from Deep Learning Time Series Forecasters
Hugo Cazaux, Eyjólfur Ingi Ásgeirsson, Hlynur Stefánsson
Time Series
  • Synthetic data augmentation is architecture-dependent, benefiting channel-mixing models while degrading performance in channel-independent models.
  • In low-resource settings, synthetic data can significantly enhance model performance, particularly with TimesNet.
  • The Seasonal-Trend generator is the most effective synthetic data method across tested benchmarks.
  • Hard curriculum switching is detrimental, leading to increased mean squared error (MSE).
Read more
RVPO: Risk-Sensitive Alignment via Variance Regularization
Ivan Montero, Tomasz Jurczyk, Bhuwan Dhingra
Reinforcement Learning Large Language Models Optimization
  • RVPO addresses constraint neglect in multi-objective RLHF by penalizing inter-reward variance.
  • The LogSumExp operator is shown to effectively act as a smooth variance penalty.
  • RVPO improves adherence to critical constraints while avoiding late-stage training degradation.
  • The framework is validated across two distinct multi-objective paradigms with significant performance improvements.
Read more
Nationwide EHR-Based Chronic Rhinosinusitis Prediction Using Demographic-Stratified Models
Sicong Chang, Yidan Shen, Justina Varghese, Akshay R Prabhakar, Sebastian Guadarrama-Sistos-Vazquez, Jiefu Chen, Masayoshi Takashima, Omar G. Ahmed, Renjie Hu, Xin Fu
Interpretability
  • Utilized nationwide EHR data to enhance CRS prediction accuracy.
  • Developed a hybrid feature-selection method to condense clinical codes.
  • Implemented demographic-stratified models to capture variations in disease presentation.
  • Achieved an AUC of 0.8461, improving prediction discrimination.
Read more
Two-Stage Learned Decomposition for Scalable Routing on Multigraphs
Filip Rydin, Morteza Haghir Chehreghani, Balázs Kulcsár
Optimization Reinforcement Learning Graph Learning
  • Introduces Node-Edge Policy Factorization (NEPF) for scalable routing on multigraphs.
  • Utilizes a pre-encoding edge aggregation scheme to reduce memory and computational costs.
  • Employs a non-autoregressive architecture for efficient edge selection.
  • Demonstrates superior performance in solution quality and speed compared to existing methods.
Read more
AdaGamma: State-Dependent Discounting for Temporal Adaptation in Reinforcement Learning
Yaomin Wang, Jianting Pan, Ran Tian, Xiaoyang Li, Yu Zhang, Hengle Qin, Tianshu YU
Reinforcement Learning Theory Robotics
  • AdaGamma introduces a practical implementation of state-dependent discounting in deep RL.
  • The method includes a return-consistency objective to stabilize learning and prevent TD-error collapse.
  • Empirical results show consistent improvements in performance on continuous-control tasks.
  • AdaGamma was successfully validated in a real-world online A/B test on the JD Logistics platform.
Read more
MOSAIC: Module Discovery via Sparse Additive Identifiable Causal Learning for Scientific Time Series
Shicheng Fan, Nour Elhendawy, Jianle Sun, Ke Fang, Kun Zhang, Yihang Wang, Lu Cheng
Time Series Interpretability Generative Models
  • MOSAIC integrates identifiable causal learning with support recovery for enhanced interpretability in scientific time series.
  • The framework employs a sparse temporal VAE with an additive decoder to clarify the influence of latent variables.
  • Theoretical guarantees for the recovery of identifiable supports are provided, ensuring robustness in various applications.
  • Empirical results demonstrate successful recovery of interpretable latent mechanisms across multiple scientific domains.
Read more
When and Why SignSGD Outperforms SGD: A Theoretical Study Based on ℓ1-norm Lower Bounds
Hongyi Tao, Dingzhi Yu, Lijun Zhang
Optimization Theory Efficient ML
  • SignSGD achieves superior convergence rates compared to SGD under specific conditions, particularly in the presence of sparse noise.
  • The paper establishes tight ℓ1-norm lower bounds for SignSGD, providing a clear characterization of its performance.
  • The theoretical framework is extended to matrix optimization, showing that the advantages of sign-based methods persist in higher dimensions.
  • Empirical validation demonstrates the practical benefits of SignSGD, aligning theoretical predictions with observed performance in large-scale model training.
Read more
Accelerating LMO-Based Optimization via Implicit Gradient Transport
Won-Jun Jang, Si-Hyeon Lee
Optimization Theory Efficient ML
  • Introduction of LMO-IGT, a new class of stochastic LMO-based optimization methods.
  • Development of a unified framework for analyzing stochastic LMO-based methods.
  • Introduction of the regularized support function (RSF) as a new stationarity measure.
  • Theoretical improvement in iteration complexity for LMO-IGT compared to existing methods.
Read more
Enabling Federated Inference via Unsupervised Consensus Embedding
Yui Hashimoto, Takayuki Nishio, Yuichi Kitagawa, Takahito Tanimura
Federated Learning Computer Vision Time Series
  • CE-FI enables federated inference without sharing raw inputs or model parameters.
  • The framework consists of a Consensus Embedding layer and a Cooperative Output layer.
  • CE-FI outperforms solo inference and matches conventional methods under non-IID conditions.
  • The approach is applicable beyond image classification, including text and time-series tasks.
Read more
Contrastive Identification and Generation in the Limit
Xiaoyu Li, Andi Han, Jiaojiao Jiang, Junbin Gao
Theory
  • Introduces contrastive identification and generation in the limit, focusing on relational data.
  • Presents an exact characterization of contrastive identifiable classes and a new combinatorial dimension.
  • Demonstrates a reversal under finite adversarial corruption, highlighting robustness in contrastive learning.
  • Establishes a common crossing graph to analyze learning challenges in contrastive settings.
Read more
Internalizing Outcome Supervision into Process Supervision: A New Paradigm for Reinforcement Learning for Reasoning
Fei Ding, Yongkang Zhang, Runhao Liu, Yuhao Liao, Zijian Zeng, Sibo Wang, Huiming Yang
Reinforcement Learning Large Language Models NLP
  • Reframes reasoning RL as internalizing outcome supervision into process supervision.
  • Introduces the IOP framework for automatic generation of process-level signals.
  • Demonstrates improved policy optimization through failure repair during RL.
  • Achieves 4.9-6.9% accuracy improvement and 2.3x sample efficiency over existing methods.
Read more
When Can Voting Help, Hurt, or Change Course? Exact Structure of Binary Test-Time Aggregation
Yi Liu
Theory
  • Majority voting can exhibit nonmonotonic behavior under heterogeneous latent correctness distributions.
  • The voting curve is equivalent to a signed voting signature that captures the latent correctness distribution.
  • Different voting behaviors can arise from simple latent mixtures, challenging the notion of 'more votes always help'.
  • The study separates two estimation regimes: direct access to per-example success probabilities versus finite repeat-depth grouped labels.
Read more
Structural Correspondence and Universal Approximation in Diagonal plus Low-Rank Neural Networks
Ying Chen, Aoxi Li, Jihun Kim, Javad Lavaei
Theory Efficient ML
  • Purely low-rank neural networks suffer from 'orthogonal blindness,' limiting their expressivity for function approximation.
  • The introduction of a minimal sparse diagonal component in DLoR structures enables universal approximation.
  • The Structural Correspondence framework allows for effective decomposition of full-rank transformations into low-rank components.
  • DLoR networks can achieve better parameter-to-expressivity scaling through multiplicative depth compared to additive width.
Read more
Towards Steering without Sacrifice: Principled Training of Steering Vectors for Prompt-only Interventions
Yuntai Bao, Qinfeng Li, Xinyan Yu, Xuhong Zhang, Ge Su, Wenqi Zhang, Liu Yan, Haiqin Weng, Jianwei Yin
NLP Large Language Models Theory
  • Joint training of steering factors and directions eliminates the need for post-hoc factor selection.
  • Prompt-Only Steering Vectors (PrOSV) outperform traditional full-sequence steering vectors (FSSVs).
  • PrOSV achieves a better balance between model utility and adversarial robustness.
  • Optimal initialization sizes and learning rates are crucial for effective joint training.
Read more
Hitting Time Isomorphism for Multi-Stage Planning with Foundation Policies
Magnus Victor Boock, Abdullah Akgül, Mustafa Mert Çelikok, Melih Kandemir
Reinforcement Learning Theory Robotics
  • Introduces a new framework for offline reinforcement learning based on hitting time observations.
  • Proves the existence and uniqueness of a Hilbert-space displacement geometry for controlled Markov processes.
  • Develops Isomorphic Embedding Learning (IEL) as a goal-agnostic foundation policy learning algorithm.
  • Demonstrates that IEL improves upon existing methods in offline maze locomotion tasks.
Read more
Physics-Informed Neural Networks with Learnable Loss Balancing and Transfer Learning
Reza Pirayeshshirazinezhad
Theory Efficient ML
  • Introduction of a self-supervised mechanism for adaptive loss balancing in PINNs.
  • Integration of transfer learning to enhance efficiency in scientific machine learning tasks.
  • Validation on a challenging heat transfer problem with limited data, achieving significant performance improvements.
  • Framework provides a general recipe for embedding physics adaptively into neural networks.
Read more
How Many Iterations to Jailbreak? Dynamic Budget Allocation for Multi-Turn LLM Evaluation
Shai Feldman, Yaniv Romano
NLP Large Language Models Optimization
  • Introduction of DAPRO, a dynamic budget allocation framework for multi-turn LLM evaluation.
  • Theoretical guarantees for budget constraints and coverage without requiring conditional independence assumptions.
  • Demonstrated lower variance and improved coverage rates compared to static budget allocation methods.
  • Applicability of the framework to various safety and utility evaluation tasks in LLMs.
Read more
SAT: Sequential Agent Tuning for Coordinator-Free Plug and Play Multi-LLM Training with Monotonic Improvement Guarantees
Yi Xie, Yangyang Xu, Yi Fan, Bo Liu
Large Language Models Reinforcement Learning Efficient ML
  • Introduction of Sequential Agent Tuning (SAT) for decentralized multi-LLM training.
  • Theoretical guarantees for monotonic improvement and plug-and-play invariance.
  • Empirical results show SAT-trained teams outperform larger models on benchmarks.
  • Demonstration of effective agent upgrades without retraining the entire team.
Read more
Retain-Neutral Surrogates for Min-Max Unlearning
Junhao Cai, Dohun Kim, Dowon Kim, Sung Il Choi, Chengjun Jin, Juhyun Park, Changhee Joo
Optimization Theory Efficient ML
  • Introduction of Retain-Orthogonal Surrogate Unlearning (ROSU) for effective min-max unlearning.
  • ROSU constrains the inner perturbation to maximize forget gain while maintaining retain neutrality.
  • Theoretical analysis shows improved performance under positive alignment of gradients.
  • Empirical results demonstrate ROSU's advantages across multiple datasets, especially in high-coupling scenarios.
Read more
OpenG2G: A Simulation Platform for AI Datacenter-Grid Runtime Coordination
Jae-Won Chung, Zhirui Liang, Yanyong Mao, Jiasi Chen, Mosharaf Chowdhury, Vladimir Dvorkin
Optimization Theory Efficient ML
  • OpenG2G is an open-source library for simulating AI datacenter-grid runtime coordination.
  • The platform supports various control strategies, allowing for standardized evaluation of their impacts on coordination outcomes.
  • OpenG2G captures metrics from both AI datacenters and power systems, facilitating comprehensive analysis.
  • The simulation reveals trade-offs between AI operational metrics and grid performance, aiding in design decision-making.
Read more
Weak-to-Strong Generalization is Nearly Inevitable (in Linear Models)
Scott Geng, Dutch Hansen, Jerry Li
Theory
  • Weak-to-strong generalization can occur in linear logistic regression without requiring a mismatch in model capacity.
  • The phenomenon is nearly inevitable under mild distributional assumptions, suggesting broad applicability.
  • The study challenges existing theoretical beliefs about the necessity of model capacity differences for generalization.
  • Empirical observations align with theoretical findings, reinforcing the robustness of the phenomenon.
Read more
MACS: Modality-Aware Capacity Scaling for Efficient Multimodal MoE Inference
Bo Li, Chuan Wu, Shaolin Zhu
Multimodal Large Language Models Efficient ML
  • MACS introduces a novel Entropy-Weighted Load mechanism to address information heterogeneity in visual tokens.
  • The Dynamic Modality-Adaptive Capacity mechanism allows real-time allocation of expert resources based on input composition.
  • The proposed methods significantly improve inference efficiency in MoE MLLMs compared to existing approaches.
  • The paper systematically analyzes the straggler effect in multimodal contexts, highlighting the unique challenges faced.
Read more
Q-MMR: Off-Policy Evaluation via Recursive Reweighting and Moment Matching
Xiang Li, Nan Jiang
Reinforcement Learning Theory Optimization
  • Q-MMR provides a dimension-free finite-sample guarantee for off-policy evaluation.
  • The framework learns weights inductively through a moment matching objective.
  • It establishes connections to existing methods like importance sampling and linear FQE.
  • The paper offers new insights into the coverage concept in offline reinforcement learning.
Read more
On Semantic Loss Fine-Tuning Approach for Preventing Model Collapse in Causal Reasoning
Pratik Deshmukh, Atirek Gupta
Theory Large Language Models Graph Learning
  • Identification of catastrophic model collapse during fine-tuning of causal reasoning tasks.
  • Introduction of a semantic loss function with graph-based logical constraints to prevent collapse.
  • Demonstrated significant performance improvements over traditional fine-tuning methods.
  • Comprehensive evaluation across 200,000+ samples validates the effectiveness of the proposed approach.
Read more
Weight-Decay Turns Transformer Loss Landscapes Villani: Functional-Analytic Foundations for Optimization and Generalization
Abhijit Das, Sayantan Dutta
NLP Large Language Models Optimization
  • Weight decay is shown to be essential for meeting Villani's differential growth conditions in Transformer loss landscapes.
  • The paper introduces empirical diagnostics to visualize the effects of weight decay on model curvature.
  • Explicit convergence rates for Langevin-based optimizers are derived, linking theoretical insights with practical training efficiency.
  • The authors provide a reproducible experimental suite for evaluating functional-analytic properties in large Transformer models.
Read more
Adaptive Selection of LoRA Components in Privacy-Preserving Federated Learning
Myoungjun Kim, Sangwoo Park, Yoseob Han, Jin-Hyun Ahn
Federated Learning Optimization NLP
  • Introduction of AS-LoRA, an adaptive framework for LoRA in federated learning.
  • Layer-wise and round-wise adaptivity enhances optimization by selecting components based on training dynamics.
  • Curvature-aware scoring function accelerates convergence and biases solutions towards flatter minima.
  • AS-LoRA shows significant performance improvements over existing methods under strict differential privacy budgets.
Read more
Grokking or Glitching? How Low-Precision Drives Slingshot Loss Spikes
Liu Hanqing, Jianjun Cao, Yuanze Li, Zijian Zhou
Theory Optimization Efficient ML
  • The Slingshot Mechanism is primarily driven by finite-precision arithmetic in cross-entropy loss computation.
  • Numerical Feature Inflation (NFI) is identified as a feedback loop causing abnormal growth in parameters and logits.
  • The paper provides theoretical insights into the dynamics of loss spikes and proposes practical interventions to stabilize training.
  • NFI dynamics can lead to rapid parameter norm growth, challenging classical gradient-flow analyses.
Read more
E = T*H/(O+B): A Dimensionless Control Parameter for Mixture-of-Experts Ecology
Qingjun Zhang
Computer Vision NLP Theory
  • E ≥ 0.5 guarantees zero dead experts, eliminating the need for auxiliary losses.
  • Dead experts can revive, contradicting the traditional view of permanent expert death.
  • Task complexity affects the critical threshold for E, indicating a need for adaptive strategies.
  • Ecological structures in MoE are temperature-invariant, suggesting stable diagnostics.
Read more
Matrix-Decoupled Concentration for Autoregressive Sequences: Dimension-Free Guarantees for Sparse Long-Context Rewards
Pei-Sen Li
Large Language Models Reinforcement Learning Theory
  • Introduction of the Matrix-Decoupled Concentration (MDC) framework to address concentration bounds for autoregressive sequences.
  • Resolution of scalar collapse and causal structure mismatch issues in existing concentration frameworks.
  • Establishment of a McDiarmid-type inequality that guarantees dimension-free O(1) variance proxies for sparse rewards.
  • Demonstration of optimal transport constants recovery for homogeneous Markov chains and order-optimal bounds for causal trees.
Read more
SMolLM: Small Language Models Learn Small Molecular Grammar
Akhil Jindal, Harang Ju
NLP Generative Models Interpretability
  • SMolLM achieves 95% validity in generating SMILES with only 53K parameters.
  • The model outperforms a larger GPT model (527K parameters) in terms of validity.
  • A weight-shared transformer architecture allows for mechanistic interpretability of the generation process.
  • The model resolves SMILES constraints in a structured manner across multiple passes.
Read more
MinMax Recurrent Neural Cascades
Alessandro Ronca
Theory Efficient ML NLP
  • MinMax RNCs can express all regular languages and are capable of parallel evaluation with logarithmic complexity.
  • The architecture maintains bounded states and outputs, preventing issues of vanishing or exploding gradients.
  • Empirical results show superior performance on synthetic tasks compared to existing RNN architectures.
  • A MinMax RNC with 127M parameters achieves competitive performance in next-token prediction tasks.
Read more
Can Attribution Predict Risk? From Multi-View Attribution to Planning Risk Signals in End-to-End Autonomous Driving
Le Yang, Ruoyu Chen, Haijun Liu, Jiawei Liang, ShangQuan Sun, Xiaochun Cao
Computer Vision Robotics Interpretability
  • Introduces a hierarchical attribution framework for analyzing decision-level risks in autonomous driving.
  • Develops a coarse-to-fine attribution method that integrates multi-view camera inputs for trajectory planning.
  • Derives three statistics to quantify reliance on visual evidence, serving as predictive signals for planning risk.
  • Demonstrates strong correlation between attribution statistics and planning risks through extensive experiments.
Read more
In-Context Black-Box Optimization with Unreliable Feedback
Nicolas Samuel Blumer, Julien Martinelli, Samuel Kaski
Optimization
  • Introduction of FICBO, a framework for optimizing black-box functions with unreliable feedback.
  • Utilization of a structured feedback prior to model the reliability of auxiliary feedback sources.
  • Empirical results show FICBO's superiority over classical and amortized optimization baselines.
  • The model's ability to adaptively infer feedback reliability enhances query selection.
Read more
Towards Metric-Faithful Neural Graph Matching
Jyotirmaya Shivottam, Subhankar Mishra
Graph Learning Theory
  • Introduces a geometric framework linking encoder geometry to GED estimation quality.
  • Demonstrates that bi-Lipschitz encoders improve the stability and accuracy of GED surrogates.
  • Establishes a theoretical basis for the influence of encoder distortion on downstream estimation performance.
  • Implements FSW-GNN as a drop-in replacement in neural GED architectures, resulting in significant performance improvements.
Read more
Differentiable Parameter Optimization for DAEs with State-Dependent Events
Ion Matei, Maksym Zhenirovskyy, Anthony Wong
Optimization Theory
  • Formulates parameter learning for hybrid DAEs as a constrained least-squares problem.
  • Develops two gradient-computation strategies: automatic differentiation through simulation and explicit discrete adjoint method.
  • Compares the two methods regarding their handling of gradients, event times, and implementation complexity.
  • Both methods provide gradients that are local to the event path selected by the forward simulation.
Read more
Asymmetric On-Policy Distillation: Bridging Exploitation and Imitation at the Token Level
Nan Jia, Haojin Yang, Xing Ma, Jiesong Lian, Shuailiang Zhang, Weipeng Zhang, Ke Zeng, Xunliang Cai, Zequn Sun
Reinforcement Learning Large Language Models Optimization
  • AOPD addresses high variance updates and vanishing gradients in standard on-policy distillation.
  • The framework utilizes localized divergence minimization to improve learning in non-positive advantage regions.
  • AOPD shows significant performance improvements on mathematical reasoning benchmarks compared to traditional methods.
  • The method enhances capability retention during sequential tool-use adaptation.
Read more
Graph Normalization: Fast Binarizing Dynamics for Differentiable MWIS
Laurent Guigues
Graph Learning Optimization Theory
  • Graph Normalization (GN) provides a differentiable solution to the NP-hard MWIS problem.
  • GN guarantees convergence to a valid binary output without the need for external annealing schedules.
  • The methodology utilizes a quasi-Newton descent approach through Majorization-Minimization.
  • GN demonstrates superior performance on large graphs, achieving near-optimal solutions rapidly.
Read more
Efficient Serving for Dynamic Agent Workflows with Prediction-based KV-Cache Management
Haoyu Zheng, Fangcheng Fu, Jia Wu, Binhang Yuan, Yongqiang Zhang, Hao Wang, Yuanyuan Zhu, Xiao Yan, Jiawei Jiang
NLP Large Language Models Efficient ML
  • PBKV predicts future agent invocations to optimize KV-cache management in dynamic workflows.
  • The system employs hierarchical eviction and conservative prefetching to enhance cache reuse.
  • PBKV demonstrates significant performance improvements over existing cache management techniques.
  • The predictor's design is robust to errors, ensuring stable performance across varying conditions.
Read more
Approximate Next Policy Sampling: Replacing Conservative Target Policy Updates in Deep RL
Dillon Sandhu, Ronald Parr
Reinforcement Learning Theory Optimization
  • Introduction of Approximate Next Policy Sampling (ANPS) as an alternative to conservative policy updates.
  • Development of Stable Value Approximate Policy Iteration (SV-API) to implement ANPS effectively.
  • Demonstration of improved performance on RL benchmarks with larger policy updates.
  • Establishment of theoretical bounds that highlight the importance of aligning training distribution with the next policy's state visitation.
Read more