AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

48 Papers today
8h Update frequency
7 Days of history
SL-BiLEM: Structured Learnable Behavior-in-the-Loop Epidemic Modeling for Forecasting and Policy Evaluation
Haochun Wang, Sendong Zhao, Jingbo Wang, Yanrui Du, Bing Qin, Ting Liu
Time Series
  • SL-BiLEM integrates behavioral dynamics into epidemic modeling, addressing feedback loops in disease spread.
  • The model shows a 76% improvement over neural-mechanistic baselines and maintains robustness under distribution shifts.
  • It provides interpretable components for effective transmission, facilitating counterfactual analysis for policy evaluation.
  • SL-BiLEM achieves 100% bootstrap confidence interval coverage across synthetic counterfactual experiments.
Read more
SYNAPSE: Neuro-Symbolic Visual Thought-to-Text Decoding via Topological Semantic Denoising
Akshaj Murhekar, Abhijit Mishra
NLP Large Language Models Multimodal
  • Introduction of SYNAPSE, a lightweight neuro-symbolic framework for EEG-to-text decoding.
  • Implementation of a graph purification mechanism to enhance semantic stability.
  • Development of a latent exemplar retrieval strategy for improved text generation.
  • Demonstrated robust performance across multiple EEG benchmarks and frozen LLMs.
Read more
Online Irregular Multivariate Time Series Forecasting via Uncertainty-Driven Dual-Expert Calibration
Haonan Wen, Hanyang Chen, Songhe Feng
Time Series
  • Introduces Under-Cali, a framework for online IMTS forecasting that adapts to distribution shifts.
  • Utilizes an uncertainty estimator to guide the routing of samples to different calibration experts.
  • Maintains a frozen source forecasting model while enabling lightweight, model-agnostic adaptation.
  • Demonstrates significant performance improvements in dynamic scenarios with evolving data distributions.
Read more
Dimensionality Reduction for Robust Federated Learning: A Theoretical Analysis and Convergence Guarantee
Shiyuan Zuo, Jiashuo Li, Rongfei Fan, Han Hu, Jie Xu
Federated Learning Theory Efficient ML
  • Introduction of Projected Dimensionality Reduction (PDR) framework for efficient robust aggregation in FL.
  • Theoretical convergence guarantees established for both non-convex and strongly convex functions.
  • Significant reduction in server computational complexity to O(Mp), where M is the number of clients and p is the model dimension.
  • Empirical results demonstrate orders of magnitude speedups in execution time while maintaining accuracy.
Read more
Latent Recurrent Transformer: Architecture Exploration, Training Strategies, and Scaling Behavior
Zeyi Huang, Xuehai He, LiLiang Ren, Yiping Wang, Baolin Peng, Hao Cheng, Shuohang Wang, Pengcheng He, Jianfeng Gao, Yong Jae Lee, Yelong Shen
NLP Large Language Models Efficient ML
  • Introduction of Latent Recurrent Transformer (LRT) that reuses previous token hidden states as recurrent memory.
  • Development of interleaved parallel training to efficiently pretrain LRT without sacrificing parallelism.
  • LRT maintains standard transformer architecture while enhancing context utilization through recurrent pathways.
  • Empirical results show improved language modeling and in-context learning with minimal parameter increase.
Read more
Symbolic Regression via Latent Iterative Refinement
Xieting Chu, Sriram Vishwanath, Vijay Ganesh
Interpretability Optimization Theory
  • Introduction of Latent Equation Embedding (LEE) for symbolic regression.
  • Iterative amortized inference closes the gap between one-shot predictions and true posteriors.
  • LEE produces significantly simpler expressions compared to existing methods.
  • Combines iterative refinement with continuous gradient descent for enhanced robustness.
Read more
SilIF: Silhouette-Augmented Isolation Forest for Unsupervised Transaction Fraud Detection
Venkatakrishnan Gopalakrishnan
Theory Interpretability Efficient ML
  • SilIF enhances Isolation Forest by adding a silhouette-based scoring layer.
  • Demonstrated a statistically significant improvement in fraud detection performance on the IEEE-CIS benchmark.
  • Identified conditions under which the silhouette augmentation is effective or ineffective.
  • Provides open-source code for reproducibility of experiments.
Read more
Machine Learning methods for event classification and vertex reconstruction of the 12C + 12C reaction with the MATE-TPC
Minghui Zhang, Xiaobin Li, Jie Chen, Ningtao Zhang, Fenhua Lu, Junrui Ma, Jiazhen Yan, Wanqin Tu, Xiaodong Tang, Bingshui Gao, Chengui Lu, Zhichao Zhang, Jinlong Zhang, Weiping Liu
Computer Vision
  • Machine learning techniques can effectively classify nuclear reaction events.
  • High classification accuracy achieved with deep learning models (up to 97% for simulated data).
  • Successful identification of misclassified events compared to traditional methods.
  • Development of a CNN model for accurate reaction vertex reconstruction.
Read more
Gradient Transformer: Learning to Generate Updates for LLMs
Binh-Nguyen Nguyen, Khang Tran, NhatHai Phan, Issa Khalil
NLP Large Language Models Efficient ML
  • Introduces the first data-free weak-to-strong knowledge distillation method for LLMs.
  • Develops GRAD-TRANSFORMER, a Transformer-based model that generates LLM update vectors from TinyLM updates.
  • Demonstrates superior performance compared to state-of-the-art knowledge distillation methods.
  • Enables privacy-preserving fine-tuning of LLMs without accessing sensitive data.
Read more
Dense2MoE: Pushing the Pareto Frontier of On-Device LLMs via Unified Pruning and Upcycling
Fengfa Li, Hongjin Ji, Yifeng Ding, Lei Ren, Chen Wei
NLP Large Language Models Efficient ML
  • Introduction of Dense2MoE, a unified framework combining pruning and upcycling for on-device LLMs.
  • Utilization of Layer-Fusion UpCycling (LF-UC) to maintain model accuracy while reducing redundancy.
  • Demonstrated significant improvements in inference latency versus model accuracy, pushing the Pareto frontier.
  • Validated across multiple benchmarks and model scales, confirming broad applicability.
Read more
Learn from Weaknesses: Automated Domain Specialization for Small Computer-Use Agents
Suji Kim, Kangsan Kim, Sung Ju Hwang
Efficient ML
  • LEARNWEAK automates domain specialization for small CUAs, addressing performance gaps with large models.
  • The framework synthesizes targeted training data based on identified weaknesses, eliminating the need for manual annotation.
  • An error-aware specialization objective allows for precise updates by distinguishing between planning and execution errors.
  • LEARNWEAK outperforms existing models, achieving significant performance improvements across multiple domains.
Read more
Function-Valued Causal Influence in Nonlinear Time Series
Valentina V. Kuskova, Dmitry Zaytsev, Michael Coppedge
Time Series
  • Scalar edge scores in nonlinear causal models can misrepresent the complexity of causal relationships.
  • Function-valued causal influence provides a more accurate representation of state-dependent effects.
  • The proposed framework allows for direct estimation of causal response functions from trained models.
  • Synthetic experiments demonstrate that qualitatively different causal mechanisms can yield similar scalar scores.
Read more
Balancing Plasticity and Stability with Fast and Slow Successor Features
Raymond Chua, Doina Precup, Blake Richards
Reinforcement Learning Robotics Theory
  • Introduces a continual RL setup with smooth, continuous non-stationarity.
  • Demonstrates that performance degradation is primarily due to instability rather than insufficient plasticity.
  • Proposes a framework integrating Successor Features with multi-timescale synaptic consolidation.
  • Uses cross-attention over SFs to interpret the contributions of stability and plasticity across timescales.
Read more
Augmenting Attention with Exponentially Decaying Memory Improves Query-Aware KV Sparsity
Xiuying Wei, Caglar Gulcehre
NLP Large Language Models Efficient ML
  • Integration of exponentially decaying memory into the RAT+ attention framework improves query-aware sparse inference.
  • RAT+ consistently enhances accuracy over standard attention mechanisms across various sparse budgets.
  • Significant performance improvements were observed in tasks using Quest, MoBA, and SnapKV with the memory-augmented architecture.
  • Two hypotheses are proposed to explain the benefits of the memory module: improved critical-token selection and enhanced information retention.
Read more
Semi-Supervised Hypothesis Testing by Betting on Predictions
Yaniv Tenzer, Elad Tolochinsky, Yaniv Romano
Theory
  • Introduces a testing-by-betting framework for semi-supervised hypothesis testing.
  • Develops an e-statistic to construct valid sequential tests under label and concept shifts.
  • Demonstrates that the proposed tests retain validity and power despite inaccurate predictions.
  • Shows significant power gains in simulations and applications to large language models.
Read more
Transfer Learning using 66 Diseases for Disease Forecasting Applications
Lauren J Beesley, Alexander C Murph, Dave Osthus, Lauren A Castro
Time Series
  • Incorporating multiple data streams significantly enhances disease forecasting accuracy.
  • The study compiles a publicly-available database of infectious disease data spanning 66 diseases.
  • Data quality is crucial; irrelevant data can degrade forecasting performance.
  • The research categorizes data inputs into four classes to assess their impact on model performance.
Read more
Model Merging on Loss Landscape: A Geometry Perspective
Juanwu Lu, Anand Bhaskar, Brian Axelrod, Ekaterina Tolstaya, Tristan Emrich
Theory Optimization Efficient ML
  • EpiMer introduces a geometry-based approach to model merging, addressing the limitations of existing flat-geometry methods.
  • The framework utilizes the FrΓ©chet mean on a Riemannian manifold, focusing on a low-rank subspace to simplify computations.
  • Theoretical analysis provides insights into the conditions under which curvature-aware merging is beneficial.
  • Empirical results show that EpiMer outperforms traditional merging methods across various tasks.
Read more
Benchmarking Inductive Biases for Multivariate Time-Series Anomaly Detection with a Robust Multi-View Channel-Graph Detector
Junhao Wei, Yanxiao Li, Bidong Chen, Yifu Zhao, Haochen Li, Dexing Yao, Baili Lu, Xudong Ye, Jietian Feng, Sio-Kei Im, Yapeng Wang, Xu Yang
Time Series
  • Introduces a unified benchmarking study for MTS anomaly detection methods.
  • Evaluates ten detectors across five datasets under consistent protocols.
  • Finds that no single inductive bias dominates across all datasets.
  • CCG-MSD achieves the best performance and robustness metrics.
Read more
Stage-wise Distortion-Perception Traversal in Zero-shot Inverse Problems with Diffusion Models
Jiawei Zhang, Ziyuan Liu, Leon Yan, Zhenyu Xiao, Yuantao Gu
Generative Models Computer Vision Efficient ML
  • Introduction of a stage-wise framework for D-P traversal in zero-shot inverse problems using diffusion models.
  • The MAP-RPS method combines MAP estimation with re-noised posterior sampling to balance distortion and perception.
  • Extension to latent space with LMAP-RPS enhances applicability and efficiency.
  • Theoretical analyses support the effectiveness of the proposed methods.
Read more
Thinned Mean Field Langevin Dynamics
Zonghao Chen, Heishiro Kanagawa, FranΓ§ois-Xavier Briol, Chris J. Oates, Lester Mackey
Theory Optimization Efficient ML
  • KT-MFLD reduces computational complexity from O(N^2) to O(N^(3/2)) by using a thinned particle coreset.
  • The method retains convergence guarantees similar to traditional MFLD under mild conditions.
  • Empirical validation shows KT-MFLD outperforms Random-MFLD and other coreset methods in multiple tasks.
  • The approach is particularly beneficial for high-dimensional and multimodal distributions.
Read more
$E^3$-Agent: An Executable and Evolving Agent for Resource Management of Edge Generative Inference
Rui Bao, Yaping Sun, Zhiyong Chen, Feng Yang, Meixia Tao, Nan Li, Wenjun Zhang
NLP Large Language Models Generative Models
  • E3-Agent features a dual-layer architecture for efficient resource management in edge environments.
  • It adapts to non-stationary conditions through online learning from execution feedback.
  • The system significantly reduces latency and improves performance stability compared to static resource management approaches.
  • E3-Agent is designed to handle the complexities of heterogeneous devices and dynamic workloads in edge AI applications.
Read more
Meta-Attention: Bayesian Per-Token Routing for Efficient Transformer Inference
Alan Ferrari
NLP Large Language Models Efficient ML
  • Meta-Attention dynamically selects attention mechanisms for each token based on contextual demands.
  • The Bayesian Meta-Controller uses a Dirichlet prior to inform routing decisions, improving compute-performance trade-offs.
  • Empirical results show a 2.4Γ— reduction in projected normalized FLOP cost compared to prior-free routing methods.
  • The framework effectively prevents routing collapse while maintaining task performance.
Read more
SPHERE-JEPA: Spherical Prediction with Homogeneous Embeddings
LΓ©o Nicollier, Max Dunitz, Marc Pic, Pablo MusΓ©, Enric Meinhardt-Llopis, Gabriele Facciolo
Computer Vision Theory Efficient ML
  • Extends theoretical analysis of optimal SSL representations from Euclidean to Riemannian manifolds.
  • Proves that uniform distributions on the hypersphere are optimal for k-NN and kernel ridge regression.
  • Introduces SUSReg, a projection-based regularizer that enforces hyperspherical uniformity.
  • Demonstrates significant performance improvements over Gaussian-based regularization on standard benchmarks.
Read more
Law of Neural Interaction: Depth-Width Shape, Interaction Efficiency, and Generalization
Wenjie Sun, Jinning Yang, Shuai Zhang, Mengnan Du
Large Language Models Theory Efficient ML
  • Generalization in neural networks is influenced by interaction efficiency, which can be optimized by adjusting the depth-width ratio.
  • The concept of 'neural interaction' is introduced, extending superposition to gradient space and providing new metrics for quantifying interaction.
  • An efficient interaction interval exists under fixed budgets, which remains stable as resource budgets scale up.
  • Models closer to the efficient interaction interval demonstrate superior performance on benchmarks like MMLU-Pro.
Read more
Convergence of Spectral Descent for Non-smooth Optimization
Yixuan Yang, Yuqing He, Song Li
Optimization Theory Efficient ML
  • Establishment of global linear convergence for Spectral Descent and Truncated Spectral Descent in non-smooth optimization.
  • Introduction of a neighborhood-based subgradient selection mechanism to stabilize optimization trajectories.
  • Theoretical guarantees for robust low-rank matrix recovery under mixed noise conditions.
  • Empirical validation showing superior performance of Muon-type optimizers compared to traditional methods.
Read more
Time Series Causal Discovery via Context-Conditioned and Causality-Augmented Pretraining
Biao Ouyang, Tengxue Zhang, Zhihao Zhuang, Yang Shu, Chenjuan Guo, Bin Yang
Time Series
  • PTCD is a pretraining framework that enhances causal discovery in time series data.
  • It utilizes a dual-scale iterative attention mechanism to model complex temporal dependencies.
  • The framework incorporates intervention-based learning to break spurious correlations.
  • PTCD shows superior performance on multiple real-world out-of-distribution datasets.
Read more
Personalized Observation Normalization for Federated Reinforcement Learning in Simulation Environments with Heterogeneity
Yiran Pang, Zhen Ni, Xiangnan Zhong
Reinforcement Learning Federated Learning Robotics
  • Introduction of Personalized Observation Normalization (PON) for FedRL to address input distribution heterogeneity.
  • Demonstration that shared normalization parameters are ineffective due to diverse local distributions.
  • PON utilizes continuously updated statistics for local normalization, improving consistency across agents.
  • Experimental results show PON accelerates training and outperforms baseline methods in heterogeneous environments.
Read more
Teacher-Student Representational Alignment for Reinforcement Learning-Driven Imitation Learning
Meraj Mammadov, Pedro Zuidberg Dos Martires, Johannes Andreas Stork
Reinforcement Learning Robotics
  • Introduces a novel method to bridge the imitation gap between RL-based teacher and IL-based student policies.
  • Utilizes a shared embedding space to hide private information from the teacher, enabling direct imitation by the student.
  • Demonstrates improved student performance with reduced imitation gap across multiple challenging environments.
  • The method is adaptable to existing frameworks with minimal hyperparameter tuning.
Read more
Bridging Classification and Reconstruction: Cooperative Time Series Anomaly Detection
Qideng Tang, Dai Chaofan, Wubin Ma, Yahui Wu, Haohao Zhou, Tao Zhang, Huan Li, Dalin Zhang
Time Series
  • CoAD unifies classification and reconstruction methods to enhance time series anomaly detection.
  • The framework addresses limitations of existing OE and MAE methods, improving generalization and masking accuracy.
  • CoAD employs probability-informed soft masking to better identify subtle anomalies.
  • Extensive experiments show CoAD outperforms state-of-the-art methods in accuracy and efficiency.
Read more
Aligning Few-Step Generative Models by Amortizing Sample-based Variational Inference
Jaewoo Lee, Hyeongyu Kang, Dohyun Kim, Kyuil Sim, Woocheol Shin, Minsu Kim, Taeyoung Yun, Jeongjae Lee, Sanghyeok Choi, Tabitha Edith Lee, Jongchul Ye, Jinkyoo Park
Generative Models Reinforcement Learning Robotics
  • FAV is a general alignment framework that does not rely on restrictive assumptions about generative models.
  • The method utilizes Stein Variational Gradient Descent for sample-based variational inference.
  • FAV outperforms existing policy extraction methods in robotics manipulation tasks.
  • The framework successfully fine-tunes various few-step generative models for image generation.
Read more
The Role of Causal Features in Strategic Classification for Robustness and Alignment
Antonio Gois, Sophia Gunluk, Nir Rosenfeld, Nidhi Hegde, Simon Lacoste-Julien, Dhanya Sridhar
Theory
  • Causal classifiers can achieve optimal classification error after sufficient user adaptation.
  • Out-of-distribution risk can be decomposed into bias and feature utilization, highlighting the advantages of causal classifiers.
  • Causal features can align long-term incentives between institutions and users, potentially improving user outcomes.
  • Empirical results validate theoretical predictions regarding user behavior in strategic classification contexts.
Read more
Adversarial Dual On-Policy Distillation from Expressive Flow-based Teacher
Zhenglin Wan, Jingxuan Wu, Xingrui Yu, Chubin Zhang, Mingcong Lei, Bo An, Ivor W. Tsang, Yang You
Reinforcement Learning Robotics Generative Models
  • FA-OPD combines adversarial dual on-policy distillation with a Flow Matching teacher for improved learning from demonstrations.
  • The method provides two complementary supervision channels: a reward channel for exploration and an action channel for stabilization.
  • FA-OPD shows significant improvements over traditional behavioral cloning and existing on-policy distillation methods.
  • The approach is validated across multiple robotic tasks, showcasing robustness against noisy and limited demonstrations.
Read more
Resource-Constrained Affect Modelling via Variance Regularisation Pruning
Kosmas Pinitas, Konstantinos Katsifis
Efficient ML
  • Introduces Variance-Regularised Pruning (VR) to enhance model robustness in affective computing.
  • Evaluates model parameters based on their contribution to both accuracy and variability across users.
  • Achieves up to 80% sparsity while maintaining near-baseline performance on the AGAIN dataset.
  • Addresses the need for efficient and reliable affective models in resource-constrained environments.
Read more
When Interpretability Is Unequally Distributed: Fairness in Hybrid Interpretable Models
Ziba Jabbar Zare, Ulrich AΓ―vodji, Julien Ferry, Thibaut Vidal
Interpretability
  • Introduction of Interpretability Coverage Disparity (ICD) as a measure of fairness in hybrid interpretable models.
  • Demonstration of significant disparities in interpretability allocation across demographic groups.
  • Development of methods to mitigate ICD with minimal impact on model performance.
  • Highlighting the importance of auditing models for both predictive and interpretability fairness.
Read more
Single-Rollout Hidden-State Dynamics for Training-Free RLVR Data Selection
Jianghao Wu, Jianfei Cai, Weiqiang Wang, Jin Ye, Daniel F. Schmidt, Yasmeen George
Reinforcement Learning Large Language Models Efficient ML
  • SHIFT enables one-shot, training-free, and label-free data selection for RLVR.
  • Utilizes Reasoning-Induced Representation Shift (RIRS) as a proxy for instance utility.
  • Achieves better performance than existing training-free diversity and uncertainty-based selection methods.
  • Demonstrates effectiveness in ultra-low budget scenarios across various benchmarks.
Read more
The Energy Blind Spot: NVIDIA's Flagship Edge AI Hardware Cannot Support Process-Level Energy Attribution
Deepak Panigrahy, Aakash Tyagi
Efficient ML Theory
  • NVIDIA's GB10 hardware lacks the necessary interfaces for process-level energy attribution.
  • Agentic AI workloads consume significantly more energy than linear workflows, with orchestration structure being a major factor.
  • The absence of CPU energy counters on ARM-based systems limits reproducibility in energy measurement.
  • The paper proposes a hardware specification for energy attribution and interim calibration methods.
Read more
Trust Region Q Adjoint Matching
Yonghoon Dong, Kyungmin Lee, Changyeon Kim, Jaehyuk Kim, Jinwoo Shin
Reinforcement Learning Optimization Robotics
  • Introduces TRQAM, which stabilizes off-policy RL by controlling deviations from pretrained policies.
  • Proves that the path-space KL divergence can be expressed as a function of the trust-region parameter Ξ».
  • Demonstrates that TRQAM outperforms existing methods in offline RL and offline-to-online RL settings.
  • Identifies the amplification of critic errors as a critical issue in existing adjoint matching methods.
Read more
HRVConformer: Neonatal Hypoxic-Ischemic Encephalopathy Classification from the Heart Rate signals
Shuwen Yu, William P Marnane, Geraldine B. Boylan, Gordon Lightbody
Time Series
  • HRVConformer processes raw heart rate signals directly, eliminating the need for handcrafted features.
  • The architecture combines convolutional layers and Transformer-based attention mechanisms for improved classification performance.
  • The model achieved an AUC of 83.23% and accuracy of 74.56%, surpassing several baseline models.
  • The improved Pan-Tompkins algorithm enhanced the quality of heart rate signal extraction from ECG recordings.
Read more
Revisiting Metafeatures to Explain Model Differences on Tabular Data
Markus Herre, Andrej Tschalzev, Sascha Marton, Christian Bartelt
Interpretability
  • Dataset meta-features do not robustly explain performance gaps between neural networks and tree-based models.
  • One weak association is found for non-foundation vs. foundation model gaps, which does not generalize well.
  • A robust association between TabICLv2 and TabPFN-2.6 improves held-out predictions.
  • Meta-feature predictors do not significantly outperform simple baseline models.
Read more
Learning Energy-Based Models from Stochastic Interpolants using Spatiotemporal Differences
Hanlin Yu, RuiKang OuYang, Partha Kaushik, Arto Klami, Michael U. Gutmann, Omar Chehab
Generative Models Theory Efficient ML
  • Introduces a unifying taxonomy for energy-based model training methods based on spatial and temporal variations.
  • Identifies limitations of existing temporal and spatial methods, particularly in multi-modal settings and support mismatch.
  • Proposes Spatiotemporal Noise-Contrastive Estimation (stNCE) as a solution to jointly learn spatial and temporal differences.
  • Demonstrates that stNCE leads to new training objectives that outperform existing methods on various benchmarks.
Read more
Kan Extension Transformers: A Categorical Unification of Attention, Diffusion, and Predict-Detach Self-Conditioning
Sridhar Mahadevan
NLP Large Language Models Theory
  • KETs provide a categorical framework that unifies various Transformer implementations.
  • The use of detached predictive carriers allows for effective self-conditioning without leaking future information.
  • Quadratic KET outperforms other causal architectures on larger datasets like WikiText-2 and WikiText-103.
  • The predict-detach regime yields the most substantial performance gains across all datasets.
Read more
WINDQuant: Weight-Informed Neural Decision-Making for Global Mixed-Precision LLM Quantization
Phong Nam Huu Nguyen, Khoi M. Le, Cong-Duy T Nguyen, Anh Tuan Luu, Thong Thanh Nguyen, Tho Quan
Large Language Models Reinforcement Learning Efficient ML
  • WINDQuant reformulates mixed-precision quantization as a sequential decision-making problem, allowing for adaptive bit-width allocation.
  • The framework operates at a fine-grained level, enabling more precise and flexible quantization strategies compared to existing methods.
  • WINDQuant demonstrates competitive performance on LLaMA models with up to 70 billion parameters without requiring full model retraining.
  • The approach integrates activation-aware mechanisms and supports a range of quantization operators from 1-bit to 8-bit.
Read more
On the Learnability of Test-Time Adaptation: A Recovery Complexity Perspective
Zhi Zhou, Ming Yang, Shi-Yu Tian, Kun-Yang Yu, Lan-Zhe Guo, Yu-Feng Li
Theory
  • Introduces a theoretical framework for studying TTA learnability.
  • Defines (Ο΅, Ξ΄)-Recovery Complexity and (Ο΅, ρ)-TTA Learnability metrics.
  • Develops a unified model for analyzing non-stationary test streams.
  • Derives bounds on recovery complexity, highlighting fundamental limits of TTA.
Read more
Detectability in Diversity: Improved Canary Crafting for Privacy Auditing in One Run
Mathieu DagrΓ©ou, AurΓ©lien Bellet
Optimization Efficient ML Theory
  • Introduces an efficient method for crafting canaries for one-run privacy auditing.
  • Combines influence functions with bilevel optimization to enhance canary detectability and diversity.
  • Empirical validation shows improved privacy leakage estimates with reduced computational costs.
  • Addresses the issue of interference among canaries, which affects membership inference accuracy.
Read more
Reparametrizing Shampoo and SOAP for Subspace Basis Updates and BFloat16 Storage
Alan Milligan, Zikun Xu, Simon Lacoste-Julien, Felix Dangel, Wu Lin
Optimization Efficient ML
  • Introduces a reparametrization of Shampoo-based methods to support BFP16 storage.
  • Reduces computational overhead by updating only part of the basis through QR decomposition in a subspace.
  • Improves performance of KL-SOAP to match or exceed KL-Shampoo under BFP16 storage.
  • Compatible with various subspace selection strategies, enhancing flexibility in optimization.
Read more
PLS in the Mirror of Self-Attention
Jiangsheng (Jason) You
Theory Optimization
  • PLS can be viewed as a linearized version of self-attention, bridging traditional statistical methods and modern neural network paradigms.
  • The reformulation of PLS as a regression problem allows for greater flexibility in modeling relationships between predictors and responses.
  • Introducing a modified cost function enhances the ability of PLS to handle non-orthogonal transformations and nonlinear activations.
  • The study provides insights into dimensionality normalization in self-attention mechanisms, suggesting potential improvements in learning efficiency.
Read more
MobileMoE: Scaling On-Device Mixture of Experts
Yanbei Chen, Hanxian Huang, Ernie Chang, Jacob Szwejbka, Digant Desai, Zechun Liu, Vikas Chandra, Raghuraman Krishnamoorthi
NLP Large Language Models Efficient ML
  • MobileMoE establishes a new Pareto frontier for on-device LLMs with sub-billion active parameters.
  • The proposed MoE scaling law optimizes architecture for mobile memory and compute constraints.
  • MobileMoE models achieve 2-4Γ— fewer inference FLOPs compared to leading dense LLMs.
  • The first efficient MoE inference is demonstrated on commodity smartphones with significant speed improvements.
Read more
Bayesian Deployment Approval for Learned Landing Controllers under Finite Rollout Validation
Fei Jiang, Lei Yang
Reinforcement Learning Robotics Theory
  • Introduces a Bayesian framework for validating learned landing controllers under uncertainty.
  • Defines deployment capability as the probability of meeting safety constraints during landing.
  • Utilizes Bayesian inference to quantify uncertainty in deployment readiness.
  • Develops a sequential validation mechanism for real-time decision-making during testing.
Read more