AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

47 Papers today
8h Update frequency
7 Days of history
CalArena: A Large-Scale Post-Hoc Calibration Benchmark
Eugène Berta, David Holzmüller, Francis Bach, Michael I. Jordan
Computer Vision Theory Optimization
  • Introduction of a large-scale benchmark for post-hoc calibration covering diverse tasks and models.
  • Standardized implementations of numerous calibration methods for reproducible comparisons.
  • Proposed Post-Hoc Improvement (PHI) metric for evaluating calibration methods.
  • Empirical findings indicate smooth calibration functions outperform binning methods.
Read more
The Sample Complexity of Multiclass and Sparse Contextual Bandits
Liad Erez, Fan Chen, Alon Cohen, Tomer Koren, Yishay Mansour, Shay Moran, Alexander Rakhlin
Theory Reinforcement Learning Efficient ML
  • Introduces tight sample complexity bounds for contextual bandits with sparse rewards.
  • Achieves significant improvement over previous bounds by eliminating high-degree polynomial dependencies.
  • Utilizes two complementary approaches: exploration-by-optimization and low-variance exploration.
  • Establishes minimax optimality of the proposed bounds up to logarithmic factors.
Read more
MarginGate: Sparse Margin-Triggered Verification for Batch-Invariant LLM Inference
Kexin Chu, Yang Zhou, Wei Zhang
Large Language Models Efficient ML
  • MarginGate selectively verifies only low-margin decoding steps, significantly reducing computational overhead.
  • The approach restores deterministic decoding in LLMs while maintaining high performance on high-margin steps.
  • The method demonstrates a substantial reduction in latency compared to existing per-token verification methods.
  • Empirical results show that token flips during batch processing are rare, allowing for targeted verification.
Read more
BlockBatch: Multi-Scale Consensus Decoding for Efficient Diffusion Language Model Inference
Xiaoyou Wu, Cheng-Jhih Shih, Binfei Ji, Yong Liu, Yingyan (Celine) Lin
NLP Large Language Models Generative Models Efficient ML
  • BlockBatch introduces block-size diversity as a new axis for improving dLLM inference efficiency.
  • The framework allows for simultaneous processing of multiple block sizes, enhancing parallelism and accuracy.
  • Confidence-gated merging and leader-based synchronization optimize the use of computational resources.
  • Periodic full-sequence refreshes correct accumulated errors in the KV cache, ensuring consistency.
Read more
Access Sets Matter: Budgeting Expert Reads for Scalable Weight-Space Model Merging
Yuanyi Wang, Yanggan Gu, Su Lu, Yifan Yang, Zhaoyi Yan, Congkai Xie, Jianmin Wu, Hongxia Yang
Large Language Models Efficient ML Optimization
  • Introduction of MergePipe as a budget-aware execution layer for LLM merging.
  • Reformulation of model merging as an expert access-set problem to optimize I/O operations.
  • Demonstration of significant reductions in expert-read I/O and substantial speed improvements.
  • Establishment of theoretical bounds for omitted updates and budget soundness in merging operations.
Read more
Inferring the Size of Large Language Models From Popular Text Memorization
Ivica Nikolic
Large Language Models NLP
  • Introduces a method to infer lower bounds on LLM sizes using text memorization signals.
  • Develops two complementary inference methods: a pairwise statistical test and a scaling-law estimator.
  • Validates the methods on both open-weight and closed-weight models, achieving high accuracy.
  • Reveals differences in scaling strategies among major LLM developers based on inferred sizes.
Read more
When RL Suppresses Its Own Vocabulary: Recovering Reasoning Diversity in Puzzle-to-Math Transfer
Mayug Maniparambil, Arjun Karuvally, Terrence Sejnowski, Fergal Reid
Reinforcement Learning Large Language Models
  • RL on constraint-satisfaction puzzles can transfer to hard mathematics benchmarks, isolating RL's contribution from SFT.
  • A primitive- and motif-level analysis framework reveals a depth-recovery tradeoff in reasoning capabilities.
  • Introducing a perplexity-based novelty bonus during RL training restores suppressed reasoning primitives and improves performance.
  • The proposed method increases the hard-math capability ceiling from 16.0% to 36.0% without using math problems in training.
Read more
A Geometric View of SRC: Learning Representations for Stable Residual Inference
Vangelis P. Oikonomou
Theory
  • Introduces a geometric framework for understanding the stability of Sparse Representation Classification (SRC).
  • Establishes a strict separation between training and inference, treating SRC as a fixed inference rule.
  • Identifies geometric obstructions that can collapse the residual margin, affecting classification reliability.
  • Derives a quantitative lower bound on the residual margin under specific geometric conditions.
Read more
Momentum Based Reward Design for Low Emission Traffic Signal Control
Chinmay Mundane, Amith Manoharan, Arun Singh
Reinforcement Learning Optimization
  • Introduction of a Momentum-Based Reward Function (MBRF) for traffic signal control.
  • MBRF promotes continuous vehicle movement rather than penalizing congestion.
  • Evaluation conducted using SUMO with standard traffic metrics.
  • Results indicate improved throughput-emission trade-offs and stable learning behaviors.
Read more
Parallax: Parameterized Local Linear Attention for Language Modeling
Yifei Zuo, Dhruv Pai, Zhichen Zeng, Alec Dewulf, Shuming Hu, Zhaoran Wang
NLP Large Language Models Efficient ML
  • Introduction of Parallax, a scalable parameterized Local Linear Attention mechanism.
  • Demonstrated improvements in perplexity and downstream accuracy over traditional Softmax Attention.
  • Development of a hardware-aware algorithm that enhances computational efficiency.
  • Identification of a strong interaction between the Parallax architecture and the Muon optimizer.
Read more
Conf-Gen: Conformal Uncertainty Quantification for Generative Models
Gabriel Loaiza-Ganem, Kevin Zhang, Wei Cui, Marc T. Law, Kin Kwan Leung
Generative Models Theory Large Language Models
  • Conf-Gen extends conformal risk control to generative models, enabling uncertainty quantification in unsupervised learning.
  • The framework relaxes theoretical assumptions of traditional conformal prediction, making it applicable to a wider range of tasks.
  • Conf-Gen has been empirically validated, showing superior performance in LLM question answering and other generative tasks.
  • A flexible Python package is provided to support the implementation of Conf-Gen across various applications.
Read more
Do Physics Foundation Models Learn Generalizable Physics? A Bias-Aware Benchmark Across Physical Regimes and Distribution Shifts
Mengdi Chu, Yang Liu, Ayan Biswas, Han-Wei Shen
Theory Time Series
  • Current physics foundation models demonstrate conditional rather than universal generalization capabilities.
  • Model performance is significantly affected by physical regime, temporal scale, and initial conditions.
  • Increasing training data complexity only partially addresses generalization limitations.
  • Pretraining and model scaling do not consistently eliminate biases and can sometimes introduce negative transfer.
Read more
In-Context Reward Adaptation for Robust Preference Modeling
Zhenyu Sun, Zheng Xu, Ermin Wei
Reinforcement Learning Large Language Models Theory
  • Proposes In-Context Reward Adaptation to model diverse human preferences dynamically.
  • Incorporates human response time as an auxiliary signal to enhance preference modeling.
  • Demonstrates that traditional binary preference comparisons are insufficient for robust adaptation.
  • Validates the method through experiments on synthetic and real-world datasets.
Read more
Molecular Lead Optimization via Agentic Tool Planning
Lingxiao Li, Haobo Zhang, Ruohao Fan, Bin Chen, Jiayu Zhou
Optimization Large Language Models
  • TRACE introduces a trajectory-aware approach to molecular lead optimization, improving decision-making over traditional methods.
  • The agent effectively coordinates multiple optimization tools while adhering to structural constraints.
  • In-context self-correction enhances the stability and reliability of the optimization process.
  • Experimental results show significant improvements in ADMET-related properties compared to baseline models.
Read more
HARP: Hadamard-Preconditioned Adaptive Rotation Processor for Extreme LLM Quantization
Artur Zagitov, Gleb Molodtsov, Aleksandr Beznosikov
Large Language Models Efficient ML Optimization
  • HARP is a learnable two-sided orthogonal processor designed for extreme low-bit PTQ.
  • It adapts the quantization basis to each layer and backend, improving robustness against outliers.
  • HARP maintains full-precision equivalence and is compatible with existing Hadamard-based PTQ pipelines.
  • The method shows significant improvements in perplexity and accuracy over fixed RHT across various model sizes.
Read more
Optimal Gap-Dependent Regret for Private Stochastic Decision-Theoretic Online Learning
Tommaso Cesari, Roberto Colomboni
Theory
  • Introduces a horizon-free algorithm for stochastic decision-theoretic online learning under pure differential privacy.
  • Achieves an optimal regret bound that resolves a COLT open problem regarding gap-dependent regret rates.
  • Utilizes a novel approach of randomizing prefix lengths to control privacy and regret effectively.
  • Clarifies the separation between statistical and privacy costs in the context of online learning.
Read more
Alignment-Guided Score Matching for Text-to-Image Alignment in Diffusion Models
Jaa-Yeon Lee, Yeobin Hong, Taesung Kwon, Jong Chul Ye
Generative Models Multimodal Computer Vision
  • Introduces a reward-free framework for text-to-image alignment in diffusion models.
  • Addresses limitations of existing contrastive learning methods by providing explicit guidance for negative pairs.
  • Achieves significant improvements in semantic consistency and counting accuracy in generated images.
  • Compatible with existing diffusion model architectures, enhancing their performance without extensive retraining.
Read more
Sequential Physics-Constrained Neural Operator Forward Modeling for the Norne Reservoir System
Clement Etienam, Juntao Yang, Oleg Ovcharenko, Nick Luiken, Tsubasa Onishi, Nefeli Moridis, Issam Said
Theory Efficient ML Time Series
  • Development of a physics-constrained neural operator framework for reservoir modeling.
  • Rigorous mathematical formulation addressing stability and convergence issues.
  • Empirical validation shows high accuracy and efficiency compared to traditional simulators.
  • Demonstration of the self-reinforcing cycle between physics constraints and training efficiency.
Read more
Mean-Field Diffuser: Scaling Offline MARL to Thousands of Agents
Wenhao Li, Xiangfeng Wang, Bo Jin
Reinforcement Learning Generative Models Theory
  • MF-Diffuser effectively scales offline MARL to thousands of agents by utilizing mean-field approximations.
  • The framework combines trajectory planning with generative modeling in the Wasserstein space.
  • Hierarchical coarse-to-fine planning allows for efficient population growth during the denoising process.
  • Theoretical guarantees provide insights into suboptimality and convergence of the generated policies.
Read more
Unveiling Multi-regime Patterns in SciML: Distinct Failure Modes and Regime-specific Optimization
Yuxin Wang, Yuanzhe Hu, Xiaokun Zhong, Xiaopeng Wang, Haiquan Lu, Tianyu Pang, Michael W. Mahoney, Yujun Yan, Pu Ren, Yaoqing Yang
Optimization Theory
  • Identification of a consistent three-regime structure in SciML models: Well-Trained, Under-Trained, and Over-Trained.
  • Optimization methods exhibit regime-specific effectiveness, necessitating tailored approaches for different training regimes.
  • Fine-grained failure modes in SciML challenge traditional loss-landscape interpretations.
  • Development of a regime-aware diagnostic framework that connects optimizer behavior to loss landscape features.
Read more
RL2ML: Finite-Rollout Surrogate Objectives from Reinforcement Learning to Maximum Likelihood
Yifu Zheng
Reinforcement Learning NLP Large Language Models
  • Introduction of RL2ML, a family of finite-rollout surrogate objectives.
  • Development of a closed-form, unbiased gradient estimator for RLVR.
  • Identification of a subcritical-supercritical update-scale transition affecting training dynamics.
  • Establishment of a one-dimensional optimization framework for selecting surrogate objectives.
Read more
Opir: Efficient Multi-Task Safety Classification for Toxicity, Jailbreaks, Hate Speech, and Harmful Content
Ihor Stepanov, Aleksandr Smechov
NLP Large Language Models Efficient ML
  • Introduction of Opir, an efficient multi-task safety classification model for LLMs.
  • Development of a comprehensive three-level safety taxonomy with 996 categories.
  • Release of edge variants with fewer than 100M parameters for binary safe/unsafe classification.
  • Competitive performance against eight contemporary guardrail systems with lower latency.
Read more
Striding Across Reynolds Numbers: Representation Geometry in Neural PDE Generalisation
Jianing Shi
Theory
  • Representation geometry significantly influences cross-Reynolds number generalization in neural PDE solvers.
  • ConvAE-Relay demonstrates effective state matching without requiring target-regime fitting.
  • Local and multi-scale representations outperform global linear or spectral representations.
  • Autoregressive drift is identified as a primary bottleneck in predictive accuracy.
Read more
On-Policy Replay for Continual Supervised Fine-Tuning
Yan Chen, Taojie Zhu, Meng Zhang, Xin Chen, Jiaqi Huang, Dongyang Xu, Yizhi Wang
NLP Large Language Models
  • Introduces On-Policy Replay (OPR) to enhance continual SFT without auxiliary losses.
  • Demonstrates that on-policy signals can effectively reduce catastrophic forgetting in LLMs.
  • Provides a label-free scoring method (OPR-SC) for constructing replay buffers.
  • Achieves significant improvements in backward transfer metrics across multiple LLMs.
Read more
MōLe-Λ: Learning the Coupled-Cluster Response State for Energies, Gradients, and Properties
Andreas Burger, Luca Thiede, Abdulrahman Aldossary, Jorge A. Campos-Gonzalez-Angulo, Alex Zook, Jérôme Florian Gonthier, Alán Aspuru-Guzik
Theory Efficient ML
  • MōLe-Λ extends MōLe by predicting both T and Λ amplitudes, enhancing the range of molecular properties that can be computed.
  • The model maintains the symmetry constraints of the original MōLe architecture while adding new readouts for Λ amplitudes.
  • MōLe-Λ achieves CC-quality accuracy for energies and forces while recovering higher-order properties that standard models cannot.
  • The computational efficiency of MōLe-Λ is significantly improved, being over two orders of magnitude faster than full CCSD calculations.
Read more
Bridging Chemists and AI: An Expert-Augmented Framework for Interpretable Route Evaluation
Yujia Guo, Mikhail Kabeshov, Tat Hong Duong Le, Samuel Genheden, Marco V. Mijangos, Varvara Voinarvoska, Giulia Bergonzini, Ola Engkvist, Samuel Kaski
Interpretability
  • Introduces an expert-augmented framework that combines machine learning with chemists' expertise for route evaluation.
  • Utilizes a DeepSets-based model to assess synthetic routes based on tree edit distance and expert evaluations.
  • Achieves a Spearman correlation coefficient of 0.78 and a Pearson correlation of 0.77 in category assessments.
  • Demonstrates a top-1 ranking accuracy of 60.2%, significantly outperforming previous baselines.
Read more
The Good, the Bad, and the Ugly of Markov Boundary for Tabular Prediction
Shu Wan, Abhinav Gorantla, Huan Liu, K. Selçuk Candan
Theory Graph Learning Efficient ML
  • Restricting regressors to the Markov boundary can improve prediction accuracy, especially in larger and sparser feature spaces.
  • Causal discovery methods often fail to effectively recover the Markov boundary for practical predictive tasks.
  • The predictive costs of false negatives and false positives in feature selection are not equal, complicating the recovery process.
  • Many feature sets can outperform the full feature set, indicating that the exact Markov boundary is not the only viable option.
Read more
Influence-Guided Symbolic Regression: Scientific Discovery via LLM-Driven Equation Search with Granular Feedback
Evgeny S. Saveliev, Samuel Holt, Nabeel Seedat, David L. Bentley, Jim Weatherall, Mihaela van der Schaar
Large Language Models Interpretability Optimization
  • IGSR improves symbolic regression by providing granular feedback on term contributions.
  • The method combines LLM-generated candidate functions with influence scores for effective pruning.
  • Integration with MCTS allows for efficient exploration of the equation search space.
  • Demonstrated effectiveness on diverse datasets, including real-world biological applications.
Read more
Hista and Numca: Estimate State Value Effectively for LLM Reinforcement Learning
Zizhe Chen, Jiqian Dong, Yizhou Tian, Garry Yang, Yongqiang Chen, Zhitang Chen, James Cheng
Reinforcement Learning Large Language Models NLP
  • Introduction of the State Value Estimation Benchmark (SVEB) to assess state value estimation methods in LLMs.
  • Identification of limitations in standard PPO approaches, which often yield coarse group-average state values.
  • Development of Numca, a heuristic leveraging numerical spans for effective state value estimation.
  • Proposal of Hista, a general framework using hidden states for improved state value estimation.
Read more
Pre-Registering the Detectable Effect: A Paired-MDE Budget for 4-bit Quantization Benchmarks, with a Pilot Audit
Zexin Zhuang, Yanhang Li, Zhichao Fan
Large Language Models Theory Efficient ML
  • Introduces a conservative paired MDE bound for quantization comparisons.
  • Demonstrates that much of the perceived unreliability in benchmarks is due to binomial sampling noise.
  • Presents a Quantization Reliability Index (QRI) to assess signal-to-noise ratios in quantization studies.
  • Recommends pre-registering MDE targets and reporting discordant counts for improved reliability.
Read more
ExDBSCAN: Explaining DBSCAN with Counterfactual Reasoning -- Additional Material
Pernille Matthews, Lena Krieger, Tommaso Amico, Artur Zimek, Thomas Seidl, Ira Assent
Interpretability
  • ExDBSCAN is the first method specifically designed to generate counterfactual explanations for DBSCAN clustering.
  • The method provides both noise-to-cluster and cluster-to-cluster transition explanations.
  • ExDBSCAN employs a physics-inspired model to ensure diversity and proximity in counterfactual generation.
  • Empirical results show that ExDBSCAN outperforms four baseline methods while achieving perfect validity.
Read more
TaxDistill: Improving Metagenomic Taxonomic Annotation via Distilled Genomic Foundation Models
Rongye Ye, Lun Li, Zheng Luo, Yiran Zhan, Shuhui Song
Generative Models
  • TaxDistill is a knowledge distillation framework specifically designed for metagenomic taxonomic annotation.
  • The framework utilizes GenomeOcean as a teacher network to mitigate label noise from traditional sequence retrieval methods.
  • TaxDistill achieves significant improvements in classification performance across various microbial datasets.
  • The method incorporates uncertainty awareness, allowing for high reliability in real-world applications.
Read more
Open World Autoencoding Drift Detection with Novel Class Recognition in Tabular Non-stationary Data Streams
Joanna Komorniczak
Theory Time Series Efficient ML
  • Proposes a novel unsupervised method for detecting concept drift and recognizing novel classes in data streams.
  • Utilizes mirrored autoencoders for independent adaptation to changing data distributions.
  • Demonstrates competitive performance against state-of-the-art methods through experiments on synthetic data.
  • Addresses the challenges of real-time data processing in the presence of concept drift and novel class emergence.
Read more
Forget Less, Generalize More: Unifying Temporal and Structural Adaptation for Dynamic Graphs
Qian Chang, Ciprian Doru Giurcaneanu, Runsong Jia, Xia Li, Guoping Hu, Xiufeng Cheng, Jinqing Yang, Mengjia Wu, Yi Zhang
Graph Learning Time Series Theory
  • Introduction of Dual-Scale Retentive Dynamics (DSRD) framework for dynamic graphs.
  • Unified retentive state that captures both temporal and structural dependencies.
  • Adaptive decay kernels with learnable parameters for balancing short-term and long-term dependencies.
  • Theoretical insights into stability and boundedness of the model.
Read more
LoRe: Adaptive Interaction-Evaluation Routing with Per-Step Interaction Budgets for Iterative Graph Solvers
Jintao Li, Yong-Yi Wang, Zheng-An Wang, Heng Fan
Graph Learning Optimization Efficient ML
  • LoRe introduces a training-free, inference-time wrapper for iterative graph solvers that enforces per-step interaction budgets.
  • The method dynamically routes computation to prioritize high-conflict interactions, improving efficiency over static sparsification techniques.
  • Empirical results show LoRe achieves up to 15× speedup and 44× memory reduction on the Traveling Salesperson Problem.
  • The approach demonstrates cross-task generality and robustness to topology shifts in large-scale combinatorial optimization problems.
Read more
Learning to Perturb Hidden Representations for Generalizable Deep Learning
Hua Li
Theory
  • Establishes a unified framework for hidden activation perturbation.
  • Introduces Learning to Perturb Activations (LPA) for adaptive class-level perturbations.
  • Theoretically connects activation perturbation to flat minima and perturbation amplification.
  • Demonstrates that LPA outperforms existing methods across various classification scenarios.
Read more
Designing Active Tether-Net Systems for Space Debris Capture with Graph-Learning-Aided Mixed-Combinatorial Optimization
Feng Liu, Achira Boonrath, Gishnu Madhu, Eleonora M. Botta, Souma Chowdhury
Graph Learning Optimization Robotics
  • Introduces a graph-learning-aided optimization approach for space debris capture.
  • Transforms a complex MCNLP problem into a simpler NLP problem using GNNs.
  • Demonstrates faster convergence to optimal solutions compared to traditional methods.
  • Validates the framework through practical design scenarios for tether-net systems.
Read more
The Hamilton-Jacobi Theory of Deep Learning
Jose Marie Antonio Miñoza, Erika Fille T. Legara, Christopher P. Monterola
Theory Optimization
  • Training a neural network is framed as solving Hamilton-Jacobi initial-value problems.
  • A single parameter ε unifies different perspectives on neural networks, tropical algebra, and PDEs.
  • The paper establishes a minimax optimal generalization rate and certifiable adversarial robustness.
  • Backpropagation is shown to correspond to the co-state equation of the Hamiltonian system for residual networks.
Read more
Cluster-Level Attention-Guided Parallel Decoding for Masked Diffusion Language Models
Heqiang Qi, Wei Huang, Mingyuan Bai, Xiangming Meng
NLP Large Language Models Generative Models
  • Introduction of confidence-induced clusters (CICs) as span-level update units for decoding in MDLMs.
  • Development of CLAD, a training-free cluster-level decoder that enhances parallelism in decoding.
  • Utilization of self-attention maps to model inter-cluster dependencies and avoid conflicts during decoding.
  • Demonstrated significant speedups (1.77×–8.47×) over traditional token-level decoding methods.
Read more
Bastion: Budget-Aware Speculative Decoding with Tree-structured Block Diffusion Drafting
Soowon Oh, Nam Cao, Yujin Kim, Hojung Jung, Huzama Ahmad, Sangmin Bae, Se-Young Yun
NLP Large Language Models Efficient ML
  • Introduction of a tree-based block diffusion drafting method for speculative decoding.
  • Dynamic construction of query-dependent trees to optimize decoding speed and quality.
  • Integration of an acceptance surrogate, online latency estimator, and adaptive expansion mechanism.
  • Achieves up to 6.61× speedup over standard autoregressive decoding.
Read more
FormInv: A Measurement Protocol for Semantic Invariance in Mathematical Reasoning Benchmarks
Nishal Thomas, Noel Thomas
Theory Large Language Models
  • FormInv reveals significant flaws in existing mathematical reasoning benchmarks regarding semantic invariance.
  • Accuracy metrics can be misleading, as they do not account for inconsistencies across semantically equivalent paraphrases.
  • The proposed invariance framework formalizes semantic invariance using SCR and Cochran’s Q.
  • FormInv includes a comprehensive benchmark and an algorithm for model selection based on semantic consistency.
Read more
One Mask to Rule Them All: On Hidden Facts after Editing and How to Find Them
Ali Holmov, Paul Youssef, Nandi Schoots, Christin Seifert
NLP Large Language Models
  • Knowledge editing methods ROME and MEMIT modify MLP weights to update facts without retraining.
  • A common subset of weights is critical for maintaining edits, which can be isolated using a binary mask.
  • The edits suppress original knowledge retrieval rather than overwriting it, leading to limitations in propagating changes.
  • Injecting the identified mask prior to editing drastically reduces the success rate of edits.
Read more
Learning Robust and Task-Invariant Functional Representation from fMRI through Siamese Self-Supervised Learning
Jiyao Wang, Peiyu Duan, Nicha C. Dvornek, Lawrence H. Staib, Denis Sukhodolsky, Pamela Ventola, James S. Duncan
Graph Learning Multimodal Efficient ML
  • Introduction of BrainSimSiam, a self-supervised framework for fMRI representation learning.
  • Outperforms traditional supervised and self-supervised models in various tasks.
  • Utilizes positive-only data pairs to avoid the challenges of defining negative samples.
  • Integrates voxel-wise and graph-based representations through a joint ROI masking scheme.
Read more
Singularity-aware Optimization via Randomized Geometric Probing: Towards Stable Non-smooth Optimization
Ruoran Xu, Borong She, Xiaobo Jin, Qiufeng Wang
Optimization Theory
  • Introduction of Singularity-aware Adam (S-Adam) optimizer to handle non-smooth optimization challenges.
  • Development of the Local Geometric Instability (LGI) metric for estimating local instability in loss landscapes.
  • Adaptive damping mechanism that adjusts step sizes based on geometric instability.
  • Rigorous convergence guarantees to Clarke stationary points with optimal rates.
Read more
KLAS: Using Similarity to Stitch Neural Networks for Improved Accuracy-Efficiency Tradeoffs
Debopam Sanyal, Anantharaman Iyer, Alind Khare, Trisha Jain, Akshay Jajoo, Myungjin Lee, Clayton Kerce, Alexey Tumanov
Computer Vision Efficient ML
  • KLAS improves accuracy-efficiency tradeoffs in stitched neural networks by leveraging KL divergence.
  • The framework automates stitch selection, overcoming the limitations of heuristic-based approaches.
  • KLAS achieves up to 1.21% higher accuracy or 1.33× reduction in FLOPs compared to existing methods.
  • The method is applicable across various model families, including vision transformers and CNNs.
Read more
On Distributional Reinforcement Learning in Chaotic Dynamical Systems
James Rudd-Jones, Mirco Musolesi, María Pérez-Ortiz
Reinforcement Learning Theory Optimization
  • Distributional RL objectives are smoother than expectation-based objectives in chaotic systems.
  • Return distributions are Lipschitz continuous in the 1-Wasserstein metric, even with diverging trajectories.
  • Empirical analysis shows that distributional objectives lead to lower variance and better optimization in chaotic environments.
  • Distributional Q-learning methods outperform non-distributional approaches in specific chaotic control tasks.
Read more
TRACER: Persistent Regularization for Robust Multimodal Finetuning
Hesam Asadollahzadeh, Feng Liu, Christopher Leckie, Sarah M. Erfani
Multimodal Theory Computer Vision
  • Introduces a theoretical framework for multimodal contrastive finetuning with closed-form solutions.
  • Identifies the collapse issue of EMA teachers in robust finetuning and proposes WMA teachers as a solution.
  • Develops TRACER, a method that combines contrastive learning with WMA-guided distillation.
  • Demonstrates consistent improvements in OOD accuracy and calibration across multiple CLIP architectures.
Read more