AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

24 Papers today
8h Update frequency
7 Days of history
Learning Large-Scale Modular Addition with an Auxiliary Modulus
Hanato Kikuchi, Ryosuke Masuya, Kazuhiko Kawamoto, Hiroshi Kera
Theory Efficient ML
  • Introduces a covariate-shift-free method for learning modular addition.
  • Utilizes an auxiliary modulus to reduce training difficulty while preserving input distribution.
  • Demonstrates improved scalability and sample efficiency over previous methods.
  • Achieves high accuracy even with smaller training datasets compared to sparse methods.
Read more
The Position Curse: LLMs Struggle to Locate the Last Few Items in a List
Zhanqi Zhang, Hua-Dong Xiong, Robert C. Wilson, Mikio Aoi, Marcelo G. Mattar, Li Ji-An
NLP Large Language Models
  • Identified the 'Position Curse' in LLMs, where they struggle with backward retrieval of items in lists.
  • Developed POSBENCH, a dataset aimed at improving position-based retrieval through post-training.
  • Demonstrated that LoRA fine-tuning can enhance retrieval performance but does not reach saturation.
  • Introduced PYINDEX, a benchmark for assessing position-based retrieval in code understanding.
Read more
Bilevel Graph Structure Learning, Revisited: Inner-Channel Origins of the Reported Gain
Minkyoung Kim, Beakcheol Jang
Graph Learning Optimization Theory
  • Inner-loop training dynamics contribute significantly to performance gains in bilevel GSL, often more than graph rewiring.
  • The frozen-Ï• control allows for a clearer understanding of the contributions to performance gains by isolating training dynamics from graph modifications.
  • Empirical results show that the inner channel accounts for 78-101% of gains in spatio-temporal flow forecasting and 37-44% in node classification.
  • Three independent diagnostics validate the findings, providing a robust framework for future evaluations of bilevel GSL.
Read more
GameGen-Verifier: Parallel Keypoint-Based Verification for LLM-Generated Games via Runtime State Injection
Chaobo Jia, Ruipeng Wan, Ting Sun, Weihao Tan, Borui Wan, Yuxuan Tong, Guangming Sheng, Hong Xu
Large Language Models Generative Models Reinforcement Learning
  • GameGen-Verifier reformulates game verification from open-ended exploration to state-grounded checks of specification-derived keypoints.
  • The approach enables parallelizable and localized verification, reducing reliance on unreliable gameplay.
  • GGV-HARNESS provides a robust framework for managing verification processes at scale.
Read more
The E$Δ$-MHC-Geo Transformer: Adaptive Geodesic Operations with Guaranteed Orthogonality
Arash Shahmansoori
Theory Optimization Efficient ML
  • Introduces a Data-Dependent Cayley rotation with unconditional orthogonality for all inputs.
  • Combines Cayley rotation and Householder reflection through a learned gate for enhanced adaptability.
  • Implements a midpoint-collapse regularizer to encourage effective operator selection.
  • Demonstrates superior performance in stability and loss metrics compared to existing architectures.
Read more
On the Invariance and Generality of Neural Scaling Laws
Xing Han, Ziyin Liu, Suchi Saria, Paul Pu Liang
Theory NLP Time Series
  • Introduces a framework for generalizable Neural Scaling Laws based on information theory.
  • Distinguishes between bijective and non-bijective transformations and their effects on scaling laws.
  • Introduces the concept of information resolution to quantify the preservation of task-relevant information.
  • Validates the framework across language, vision, and speech domains.
Read more
Actor-Critic Algorithm for Dynamic Expectile and CVaR
Yudong Luo, Erick Delage
Reinforcement Learning Optimization Theory
  • Introduces a surrogate policy gradient method for dynamic risk optimization without transition perturbation.
  • Develops model-free value learning methods for expectile and CVaR using elicitability.
  • Presents an off-policy actor-critic algorithm tailored for dynamic risk measures.
  • Empirical results show superior performance in learning risk-averse policies compared to existing approaches.
Read more
A Hierarchical Ensemble Pipeline for Anomaly Detection in ESA Satellite Telemetry
Lorenzo Riccardo Allegrini, Geremia Pompei
Time Series
  • Introduction of a hierarchical ensemble pipeline for anomaly detection in satellite telemetry.
  • Integration of shapelet-based and statistical feature extraction techniques.
  • Use of a two-level masking strategy to enhance model diversity and prevent information leakage.
  • Demonstrated strong generalization capabilities on the ESA-ADB dataset.
Read more
On the Divergence of Differential Temporal Difference Learning without Local Clocks
David Antrobius, Shangtong Zhang
Reinforcement Learning Theory
  • Differential TD learning can diverge with a global clock while converging with a local clock in average-reward RL.
  • The correspondence between local and global clocks in convergence analysis breaks down in average-reward settings.
  • The choice of learning rates has a more pronounced effect on convergence in average-reward RL compared to discounted RL.
  • This work closes an open problem regarding the convergence of differential TD algorithms with global clocks.
Read more
RateQuant: Optimal Mixed-Precision KV Cache Quantization via Rate-Distortion Theory
Fei Zuo, Zikang Zhou, Hao Cong, Xiaoyan Xi, Ho Fai Leung
NLP Large Language Models Efficient ML
  • Introduces RATEQUANT for optimal mixed-precision KV cache quantization.
  • Addresses distortion model mismatch, a critical failure mode in existing quantization methods.
  • Implements a rate-distortion optimization framework for effective bit allocation.
  • Achieves a 70% reduction in perplexity for KIVI and improves QuaRot performance.
Read more
Latent Order Bandits
Emil Carlsson, Newton Mwai, Fredrik D. Johansson
Reinforcement Learning Theory Efficient ML
  • Introduction of Latent Order Bandits (LOB) to improve personalization in bandit algorithms.
  • LOB requires only a partial order of action preferences, allowing for variability in reward distributions.
  • Development of a UCB algorithm (lobUCB) and a Thompson sampling algorithm (lobTS) tailored for LOB.
  • Empirical results show competitive performance against traditional latent bandits, especially in cases of differing reward scales.
Read more
Modulated learning for private and distributed regression with just a single sample per client device
Praneeth Vepakomma, Amirhossein Reisizadeh, Samuel Horváth, Munther Dahleh
Federated Learning Theory Efficient ML
  • Introduces a method for federated learning with clients having only one data sample.
  • Utilizes a cosine-modulated transformation and Gaussian noise for privacy preservation.
  • Achieves unbiased gradient estimation that matches centralized gradient updates.
  • Establishes asymptotic normality for valid statistical inference on regression coefficients.
Read more
PerCaM-Health: Personalized Dynamic Causal Graphs for Healthcare Reasoning
Elahe Khatibi, Ziyu Wang, Saba A. Farahani, Di Huang, Hung Cao, Ramesh Jain, Amir M. Rahmani
Graph Learning Time Series Interpretability
  • Introduces a framework for learning personalized dynamic causal graphs from longitudinal health data.
  • Bridges the gap between cohort-level models and patient-specific causal discovery.
  • Utilizes a knowledge-guided population temporal graph adapted with patient-specific evidence.
  • Enables patient-level counterfactual queries for healthcare interventions.
Read more
Graph Representation Learning Augmented Model Manipulation on Federated Fine-Tuning of LLMs
Hanlin Cai, Kai Li, Houtianfu Wang, Haofan Dong, Yichen Li, Falko Dressler, Ozgur B. Akan
Large Language Models Federated Learning Graph Learning
  • Introduction of AugMP, a novel manipulation strategy targeting FFT-based LLMs.
  • Utilization of a graph representation learning framework to synthesize malicious updates.
  • Demonstrated significant degradation in model performance while maintaining benign-like characteristics.
  • AugMP outperforms existing defenses by evading detection based on statistical consistency metrics.
Read more
Convergent Stochastic Training of Attention and Understanding LoRA
Zhengkai Sun, Dibyakanti Kumar, Alejandro F Frangi, Anirbit Mukherjee, Mingfei Sun
Theory Optimization Efficient ML
  • Establishes trainability of attention layers and LoRA under stochastic methods without data or architecture assumptions.
  • Proves that a mild regularization induces a Poincaré inequality for Gibbs measures, facilitating convergence.
  • Introduces a novel SDE framework that captures the dynamics of stochastic gradient methods for training.
  • Provides theoretical insights into the optimization dynamics of neural models using LoRA.
Read more
GRAPHLCP: Structure-Aware Localized Conformal Prediction on Graphs
Peyman Baghershahi, Fangxin Wang, Debmalya Mandal, Sourav Medya
Graph Learning
  • GRAPHLCP integrates graph topology into localized conformal prediction for GNNs.
  • The framework includes a feature-aware densification step to enhance prediction reliability in sparse graphs.
  • Personalized PageRank is used to model structural proximity, improving anchor sampling and calibration.
  • Extensive experiments show GRAPHLCP achieves efficient marginal and conditional coverage.
Read more
Transfer Learning Across Fast- and Full-Simulation Domains in High-Energy Physics
Matthias Schott, Lucie Flek
Theory Efficient ML Graph Learning
  • Transfer learning can effectively bridge the gap between fast-simulated and fully simulated datasets in HEP.
  • Pretrained models consistently outperform independently trained baselines across various tasks.
  • Significant reduction in target-domain training data requirements, typically by a factor of two.
  • Demonstrates the utility of fast simulation data in creating reusable representations.
Read more
When Are Experts Misrouted? Counterfactual Routing Analysis in Mixture-of-Experts Language Models
Youngsik Yoon, Siwei Wang, Wei Chen, Jungseul Ok
NLP Large Language Models Efficient ML
  • Standard top-k routing in MoE models is effective for confident tokens but misaligned for fragile tokens.
  • Routing decisions are evaluated based on executed routes, leading to potential suboptimal choices.
  • A minimal update to the router can improve performance on difficult reasoning tasks.
  • The study emphasizes the need to consider routing quality as a critical aspect of MoE training.
Read more
Convex Optimization with Nested Evolving Feasible Sets
Karthick Krishna M., Haricharan Balasundaram, Rahul Vaze
Optimization Theory
  • Introduction of CONES framework for convex optimization with evolving feasible sets.
  • Development of lazy-algorithm achieving O(T^(1−β)) and O(T^β) for regret and movement cost.
  • FRUGAL algorithm achieves zero regret with O(log T) movement cost for strongly convex loss functions.
  • Establishment of lower bounds for movement cost in relation to regret, proving FRUGAL's optimality.
Read more
Beyond Distribution Estimation: Simplex Anchored Structural Inference Towards Universal Semi-Supervised Learning
Yaxin Hou, Jun Ma, Hanyang Li, Bo Han, Jie Yu, Yuheng Jia
Theory Graph Learning Computer Vision
  • Introduction of Universal Semi-supervised Learning (UniSSL) to address challenges with unlabeled data distributions.
  • Development of Simplex Anchored Graph-state Equipartition (SAGE) to leverage inter-sample relations for representation learning.
  • Utilization of a simplex equiangular tight frame to improve representation separation.
  • Implementation of a weighting strategy to enhance the quality of pseudo-labels.
Read more
Less Random, More Private: What is the Optimal Subsampling Scheme for DP-SGD?
Andy Dong, Ayfer Özgür
Theory Efficient ML Federated Learning
  • Balanced Iteration Subsampling (BIS) outperforms Poisson subsampling in terms of privacy amplification.
  • BIS is optimal at both low and high noise levels, addressing participation variance effectively.
  • A near-exact Monte Carlo accountant for BIS is introduced, improving privacy evaluation accuracy.
  • Empirical results show a significant reduction in required noise multipliers with BIS.
Read more
Causal-Aware Foundation-Model for Bilevel Optimization in Discrete Choice Settings
Shivaram Subramanian, Zhengliang Xue, Markus Ettl, Yingdong Lu, Jayant Kalagnanam
Optimization
  • Introduction of C3PO network for bi-level decision-making in discrete-choice settings.
  • Integration of imitation learning, multi-task learning, and in-context learning for effective pricing strategies.
  • Demonstrated strong performance in simulated and real-world datasets.
  • C3PO consistently improves pricing KPIs, especially with higher customer price sensitivity.
Read more
ProteinJEPA: Latent prediction complements protein language models
Dan Ofer, Dafna Shahaf, Michal Linial
NLP Generative Models Theory
  • Introduction of masked-position MLM+JEPA as a superior training recipe for protein language models.
  • Demonstrated that this hybrid approach outperforms MLM-only methods in several downstream tasks.
  • Identified critical factors for success, including the retention of MLM objectives and targeted latent predictions.
  • Provided a comprehensive evaluation across multiple models and tasks, showcasing the versatility of the proposed method.
Read more
Geometric Kolmogorov–Arnold Network (GeoKAN)
Abhijit Sen, Bikram Keshari Parida, Giridas Maiti, Mahima Arya, Denys I. Bondar
Theory Optimization Efficient ML
  • GeoKAN introduces a geometry-aware approach to KAN models, enhancing function approximation capabilities.
  • The model learns a diagonal Riemannian metric that adapts the input space for better representation.
  • Three main variants of GeoKAN are developed, each tailored for different applications in function approximation and physics-informed learning.
  • GeoKAN reallocates representational resolution dynamically, improving performance in regions with sharp variations.
Read more