AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

48 Papers today
8h Update frequency
7 Days of history
EdgeRazor: A Lightweight Framework for Large Language Models via Mixed-Precision Quantization-Aware Distillation
Shu-Hao Zhang, Le-Tong Huang, Xiang-Sheng Deng, Xin-Yi Zou, Chen Wu, Nan Li, Shao-Qun Zhang
Large Language Models Efficient ML NLP
  • EdgeRazor integrates quantization and distillation for efficient LLM deployment.
  • The framework introduces mixed-precision quantization for better resource allocation.
  • Empirical results show significant performance improvements over existing quantization methods.
  • EdgeRazor reduces storage requirements and accelerates decoding times.
Read more
When Safety Geometry Collapses: Fine-Tuning Vulnerabilities in Agentic Guard Models
Ismail Hossain, Sai Puppala, Jannatul Ferdaus, Md Jahangir Alam, Yoonpyo Lee, Syed Bahauddin Alam, Sajedul Talukder
NLP Large Language Models Theory
  • Benign fine-tuning can lead to a complete collapse of safety alignment in guard models.
  • The phenomenon of safety geometry collapse is more severe in purpose-built guard models than in general-purpose LLMs.
  • Fisher-Weighted Safety Subspace Regularization (FW-SSR) effectively restores safety alignment during fine-tuning.
  • Structural representational geometry is a more reliable predictor of safety behavior than absolute displacement metrics.
Read more
FL-Sailer: Efficient and Privacy-Preserving Federated Learning for Scalable Single-Cell Epigenetic Data Analysis via Adaptive Sampling
Guangyi Zhang, Yi Dai, Yiyun He, Junhao Liu
Federated Learning Efficient ML Theory
  • FL-Sailer is the first federated learning framework tailored for scATAC-seq data analysis.
  • Adaptive leverage score sampling reduces dimensionality by 80% while preserving biological interpretability.
  • The invariant VAE architecture effectively disentangles biological signals from technical confounders.
  • FL-Sailer demonstrates superior performance compared to centralized methods in multi-institutional settings.
Read more
ITBoost: Information-Theoretic Trust for Robust Boosting
Ye Su, Longlong Zhao, Diego Garcia-Gil, Jipeng Guo, Gangchun Zhang, Jinxin Chen, Jinsong Chen
Theory Optimization
  • ITBoost leverages an information-theoretic trust framework to address label noise in boosting.
  • The method employs an MDL-based sample weighting mechanism that focuses on residual time-series characteristics.
  • Theoretical analysis shows ITBoost has a tighter generalization error bound than standard GBDT under label noise.
  • Empirical results indicate ITBoost outperforms existing boosting algorithms and deep tabular models in noisy settings.
Read more
An End-to-End Framework for Building Large Language Models for Software Operations
Jingkai He, Pengfei Chen, Chenghui Wu, Shuang Liang, Ye Li, Gou Tan, Xidao Wen, Chuanfu Zhang
Large Language Models Reinforcement Learning NLP
  • Introduction of OpsLLM, a domain-specific LLM for software operations.
  • Implementation of a Human-in-the-Loop mechanism for high-quality data curation.
  • Development of a domain process reward model (DPRM) to enhance RCA accuracy.
  • Demonstrated significant performance improvements over existing LLMs in QA and RCA tasks.
Read more
Disentangling Shared and Task-Specific Representations from Multi-Modal Clinical Data
He Lyu, Huolin Zeng, Junren Wang, Huazhen Yang, Linchao He, Yong Chen, Zhirui Li, Andreas Maier, Siming Bayer, Huan Song
Multimodal
  • Introduces Orthogonal Task Decomposition (OrthTD) for disentangling shared and task-specific representations.
  • Utilizes a unified Transformer architecture for multimodal data fusion.
  • Achieves superior performance in clinical outcome prediction compared to existing methods.
  • Demonstrates significant improvements in identifying rare events within imbalanced datasets.
Read more
Replay-Based Continual Learning for Physics-Informed Neural Operators
Yizheng Wang, Mohammad Sadegh Eshaghi, Xiaoying Zhuang, Timon Rabczuk, Yinghua Liu
Efficient ML Theory
  • Introduces a replay-based continual learning strategy for physics-informed neural operators.
  • Utilizes a distillation-based constraint to preserve knowledge and mitigate catastrophic forgetting.
  • Employs a PDE-based scoring strategy to focus on poorly performing samples for efficient training.
  • Demonstrates improved adaptability to OOD data without requiring labeled datasets.
Read more
Discovering Sparse Counterfactual Factors via Latent Adjustment for Survey-based Community Intervention
Fatima Ashraf, Muhammad Ayub Sabir, Junbiao Pang, Yufang Zhou, Yan Shang
Optimization Theory Interpretability
  • Introduces a framework for sparse, policy-feasible community interventions based on survey data.
  • Utilizes a fixed-basis nonnegative latent representation for stable comparisons pre- and post-intervention.
  • Employs Shapley attribution for identifying important latent factors relevant to intervention strategies.
  • Combines optimal transport with weighted ℓ2,1 penalties to ensure sparsity in intervention adjustments.
Read more
PRISM-CTG: A Foundation Model for Cardiotocography Analysis with Multi-View SSL
Sheng Wong, Ravi Shankar, Beth Albert, Hao Fei, Lin Li, Imane Ben M'Barek, Manu Vatish, Gabriel Davis Jones
Time Series
  • PRISM-CTG is the first foundation model for CTG analysis utilizing self-supervised learning.
  • The model integrates multiple supervisory signals to enhance representation learning.
  • It demonstrates significant performance improvements across various CTG tasks.
  • PRISM-CTG shows strong generalization capabilities on external datasets.
Read more
Proteo-R1: Reasoning Foundation Models for De Novo Protein Design
Fang Wu, Weihao Xuan, Heli Qi, Hanqun Cao, Heng-Jui Chang, Zeqi Zhou, Haokai Zhao, Ma Jian, Carl Ma, Yu-Chi Cheng, Kuan Pang, Xiangru Tang, Zehong Wang, Guanlue Li, Hanchen Wang, Kejun Ying, Pan Lu, Chiho Im, Seungju Han, Peng Xia, Tinson Xu, Yinxi Li, Deyao Zhu, Pheng-Ann Heng, Naoto Yokoya, Masashi Sugiyama, Li Erran Li, Jure Leskovec, Yejin Choi
Generative Models Large Language Models Multimodal
  • Proteo-R1 decouples molecular understanding from geometric generation, enhancing interpretability.
  • The framework employs a dual-expert architecture combining a reasoning expert and a generation expert.
  • Explicit residue-level decisions improve the incorporation of biochemical knowledge into the design process.
  • Proteo-R1 allows for modular integration with various generative models, increasing flexibility.
Read more
Improving FMQA via Initial Training Data Design Considering Marginal Bit Coverage in One-Hot Encoding
Taiga Hayashi, Yuya Seki, Kotaro Terada, Yosuke Mukasa, Shuta Kikuchi, Shu Tanaka
Optimization
  • Introduces a method for designing initial training data to ensure complete marginal bit coverage in FMQA.
  • Proposes two sampling techniques: Latin Hypercube Sampling (LHS) and Sobol’ sequence for improved optimization.
  • Demonstrates significant performance improvements in optimization tasks, especially with larger variable sets.
  • Highlights the importance of initial training data design in the context of black-box optimization problems.
Read more
When Does Gene Regulatory Network Inference Break? A Controlled Diagnostic Study of Causal and Correlational Methods on Single-Cell Data
Miguel Fernandez-de-Retana, Ruben Sanchez-Corcuera, Unai Zulaika, Aritz Bilbao-Jayo, Aitor Almeida
Graph Learning
  • Causal methods for GRN inference often underperform compared to correlation-based methods in realistic benchmarks.
  • A controlled diagnostic framework was developed to isolate and evaluate the impact of seven specific biological and technical pathologies.
  • Causal methods show superiority in ideal conditions but are significantly hindered by dropout and latent confounders.
  • An error-type decomposition reveals qualitatively different errors among methods with similar aggregate accuracy.
Read more
A Closed-Form Adaptive-Landmark Kernel for Certified Point-Cloud and Graph Classification
Sushovan Majhi, Atish Mitra, Žiga Virk, Pramita Bagchi
Graph Learning Theory Efficient ML
  • Introduction of PALACE, an adaptive landmark kernel for point-cloud and graph classification.
  • The method provides closed-form guarantees for distortion bounds and classification rates.
  • Empirical results show PALACE outperforms existing methods on multiple benchmarks.
  • Adaptive landmark placement significantly reduces computational budget compared to uniform grids.
Read more
OCRR: A Benchmark for Online Correction Recovery under Distribution Shift
Adrian Grassi
NLP Efficient ML Theory
  • Introduction of OCRR, a benchmark for assessing online correction recovery in classification systems.
  • Evaluation of nine baseline algorithms across multiple datasets and correction policies.
  • Demonstration of the substrate's superior performance in recovering from errors compared to traditional methods.
  • Highlighting the inadequacy of static benchmarks in capturing the dynamics of online learning and correction.
Read more
OGPO: Sample Efficient Full-Finetuning of Generative Control Policies
Sarvesh Patil, Mitsuhiko Nakamoto, Manan Agarwal, Shashwat Saxena, Jesse Zhang, Giri Anantharaman, Cleah Winston, Chaoyi Pan, Douglas Chen, Nai-Chieh Huang, Zeynep Temel, Oliver Kroemer, Sergey Levine, Abhishek Gupta, Hongkai Dai, Paarth Shah, Max Simchowitz
Reinforcement Learning Generative Models Robotics
  • OGPO enables sample-efficient full-finetuning of GCPs using off-policy critic networks.
  • The algorithm achieves state-of-the-art performance on complex manipulation tasks.
  • OGPO can finetune poorly-initialized behavior cloning policies without expert data.
  • An optimized variant, OGPO+, incorporates additional enhancements for improved performance.
Read more
Using Common Random Numbers for Simulation-based Planning with Rollouts
Sandarbh Yadav, Frederic J Maliakkal, Harshad Khadilkar, Shivaram Kalyanakrishnan
Reinforcement Learning Theory Optimization
  • Introduction of a new estimator for value difference in simulation-based planning.
  • Demonstration of variance reduction using common random numbers in rollouts.
  • Validation of the proposed method through experiments on synthetic tasks.
  • Application of the method in real-world scenarios, including pension disbursement and game planning.
Read more
Hierarchical Support Vector State Partitioning for Distilling Black Box Reinforcement Learning Policies
Senne Deproost, Mehrdad Asadi, Ann Nowé
Reinforcement Learning Interpretability
  • Introduction of State Vector Space Partitioning (SVSP) for distilling RL policies.
  • SVSP achieves a 7.4% improvement in mean return over Voronoi State Partitioning (VSP).
  • Reduction of required sub-policies by 82.1% compared to VSP.
  • Validation on LunarLanderContinuous shows SVSP outperforms both TD3 and VSP.
Read more
Quadrature-TreeSHAP: Depth-Independent TreeSHAP and Shapley Interactions
Ron Wettenstein, Rory Mitchell, Peng Yu
Interpretability Efficient ML Theory
  • Introduces a quadrature-based reformulation of Path-Dependent TreeSHAP.
  • Achieves depth-independent computation of Shapley values and higher-order interactions.
  • Demonstrates significant speed improvements over existing TreeSHAP methods.
  • Provides a stable and efficient implementation for both CPU and GPU.
Read more
Toward Structural Multimodal Representations: Specialization, Selection, and Sparsification via Mixture-of-Experts
Hahyeon Choi, Nojun Kwak
Multimodal
  • S3 framework decomposes multimodal inputs into semantic experts for improved task-specific representation.
  • The methodology includes three stages: Specialization, Selection, and Sparsification.
  • S3 demonstrates superior performance on MultiBench benchmarks compared to existing multimodal learning methods.
  • A reverse U-shaped trend in performance indicates optimal sparsity levels enhance accuracy.
Read more
RFPrompt: Prompt-Based Expert Adaptation of the Large Wireless Model for Modulation Classification
Md Raihan Uddin, Tolunay Seyfi, Fatemeh Afghah
Efficient ML Multimodal
  • RFPrompt offers a parameter-efficient adaptation mechanism for wireless foundation models to handle OOD tasks.
  • The framework utilizes learnable prompt tokens to adapt a frozen pretrained backbone, minimizing parameter overhead.
  • Empirical results show significant improvements in robustness and performance for real-world IQ classification tasks.
  • RFPrompt effectively closes over 79% of the performance gap compared to fully fine-tuned models using only 0.34% of the parameters.
Read more
Unified Framework of Distributional Regret in Multi-Armed Bandits and Reinforcement Learning
Harin Lee, Min-hwan Oh
Reinforcement Learning Theory
  • Introduces a unified framework for distributional regret in MAB and RL.
  • Presents a novel algorithm (EQO+) with a flexible exploration bonus.
  • Establishes both gap-independent and gap-dependent distributional regret bounds.
  • Achieves optimal trade-offs between expected and distributional regret.
Read more
Breaking the Quality-Privacy Tradeoff in Tabular Data Generation via In-Context Learning
Xinyan Han, Yan Lu, Xiaoyu Lin, Yuanyuan Jiang, Yuanrui Wang, Xuanyue Li, Wenchao Zou, Xingxuan Zhang
Generative Models
  • DiffICL is the first approach to frame tabular data generation as an in-context learning problem.
  • The method effectively mitigates the memorization issue prevalent in small-data regimes, enhancing both data quality and privacy.
  • DiffICL outperforms existing generative models across 14 datasets in terms of quality and privacy protection.
  • The synthetic data generated can be used for data augmentation, improving downstream task performance.
Read more
Constraint-Enhanced Reinforcement Learning Based on Dynamic Decoupled Spherical Radial Squashing
Qijun Liao, Zhaoxin Yu, Jue Yang
Reinforcement Learning Robotics Optimization
  • DD-SRad achieves hard per-step constraint satisfaction with probability 1.
  • The method provides exact ℓ∞ coverage of the feasible action space, addressing the limitations of existing spherical parameterization methods.
  • Empirical results show significant improvements in task performance and constraint adherence compared to traditional methods.
  • The approach is compatible with existing off-policy RL frameworks, enabling seamless integration.
Read more
Text-Conditional JEPA for Learning Semantically Rich Visual Representations
Chen Huang, Xianhang Li, Vimal Thilak, Etai Littwin, Josh Susskind
Computer Vision NLP Multimodal
  • Introduction of TC-JEPA, which enhances I-JEPA with text conditioning for better semantic representation.
  • Utilization of image captions to reduce prediction uncertainty in masked feature prediction.
  • Demonstrated improvements in downstream performance, training stability, and scalability.
  • Establishment of a new vision-language pretraining paradigm based on feature prediction, outperforming contrastive methods.
Read more
From Video-to-PDE: Data-Driven Discovery of Nonlinear Dye Plume Dynamics
Cesar Acosta-Minoli, Sayantan Sarkar
Computer Vision Theory Interpretability
  • Development of a comprehensive video-to-PDE pipeline for modeling dye plume dynamics.
  • Utilization of weak-form regression to mitigate issues with noisy video data.
  • Implementation of rollout calibration and bootstrap diagnostics for coefficient assessment.
  • The derived PDE model outperforms traditional advection-diffusion models.
Read more
Rethinking the Rank Threshold for LoRA Fine-Tuning
Juneyoung Park
NLP Large Language Models Theory
  • The rank requirement for LoRA fine-tuning can be reduced from 12 to 1 for binary classification tasks.
  • The use of non-symmetric manifold dimension analysis leads to a weaker capacity requirement.
  • The Polyak–Łojasiewicz inequality allows for the removal of the rank threshold in cross-entropy settings.
  • Empirical results demonstrate that rank 1 performs competitively across various binary classification tasks.
Read more
Uni-OPD: Unifying On-Policy Distillation with a Dual-Perspective Recipe
Wenjin Hou, Shangpin Peng, Weinong Wang, Zheng Ruan, Yue Zhang, Zhenglin Zhou, Mingqi Gao, Yifei Chen, Kaiqi Wang, Hongming Yang, Chengquan Zhang, Zhuotao Tian, Han Hu, Yi Yang, Fei Wu, Hehe Fan
Large Language Models Reinforcement Learning Multimodal
  • Identification of two core bottlenecks in OPD: insufficient exploration of informative states and unreliable teacher supervision.
  • Introduction of a dual-perspective optimization strategy that enhances both student exploration and teacher signal reliability.
  • Comprehensive validation of Uni-OPD across diverse settings, showcasing its effectiveness and versatility.
  • Demonstration of faster convergence and improved performance compared to existing OPD and reinforcement learning methods.
Read more
On Adaptivity in Zeroth-Order Optimization
Hassan Dbouk, Nidham Gazagnadou, Matthias Reisser, Christos Louizos
Optimization Large Language Models Efficient ML
  • Adaptive ZO methods like ZO-Adam do not outperform well-tuned ZO-SGD in high-dimensional settings.
  • ZO gradients are isotropic and lack the coordinate-wise heterogeneity that adaptive methods exploit.
  • MEAZO is proposed as a memory-efficient alternative that achieves global step size adaptation with minimal memory usage.
  • MEAZO matches the performance of ZO-Adam while retaining the memory footprint of ZO-SGD.
Read more
A geometric relation of the error introduced by sampling a language model's output distribution to its internal state
Albert F. Modenbach
NLP Large Language Models Interpretability
  • Introduces a geometric framework to analyze sampling errors in language models.
  • Demonstrates that the curvature of token embeddings relates to the model's internal world representation.
  • Uses chess as a controlled environment to evaluate model behavior and decision-making.
  • Shows that the geometry of token space can reflect the model's internal representation of problems.
Read more
ELVIS: Ensemble-Calibrated Latent Imagination for Long-Horizon Visual MPC
Yurui Du, Pinhao Song, Yutong Hu, Renaud Detry
Reinforcement Learning Robotics Optimization
  • ELVIS combines recurrent state-space models with Gaussian-mixture MPPI for improved long-horizon planning.
  • The framework adapts the effective return horizon in real-time, enhancing robustness against model errors.
  • It employs uncertainty-aware exploration and exploitation strategies to improve planning reliability.
  • ELVIS achieves state-of-the-art performance on benchmark visual tasks and demonstrates effective zero-shot transfer to real-world applications.
Read more
OSAQ: Outlier Self-Absorption for Accurate Low-bit LLM Quantization
Zhikai Li, Zhen Dong, Xuewen Liu, Jing Zhang, Qingyi Gu
Large Language Models Efficient ML Optimization
  • OSAQ introduces an additive weight transformation to suppress outliers in low-bit quantization.
  • The method exploits the low-rank properties of the Hessian matrix to identify a stable null space.
  • OSAQ does not require inter-layer transformations, maintaining efficiency during inference.
  • The approach is validated through extensive experiments, showing significant performance improvements over existing methods.
Read more
Continual Distillation of Teachers from Different Domains
Nicolas Michel, Maorong Wang, Jiangpeng He, Toshihiko Yamasaki
Efficient ML
  • Introduction of the Continual Distillation paradigm for training models on a sequence of teacher models.
  • Identification of Unseen Knowledge Transfer (UKT) and Unseen Knowledge Forgetting (UKF) as critical challenges.
  • Development of Self External Data Distillation (SE2D) to mitigate UKF while maximizing UKT.
  • Empirical validation of SE2D's effectiveness in improving cross-domain generalization.
Read more
Road Risk Monitor: A Deployable U.S. Road Incident Forecasting System with Live Weather and Road-Level Tiles
Anton Ivchenko
Time Series
  • Development of a nationwide road incident forecasting system integrating multiple data sources.
  • Implementation of a dual-scale modeling approach for improved prediction accuracy.
  • Provision of a public codebase for reproducibility and local deployment.
  • Achieved high performance metrics for both baseline and road-segment models.
Read more
Why Geometric Continuity Emerges in Deep Neural Networks: Residual Connections and Rotational Symmetry Breaking
Kyungwon Jeong, Won-Gi Paeng, Honggyo Suh
Theory Large Language Models
  • Geometric continuity in weight matrices is influenced by residual connections and symmetry-breaking nonlinearities.
  • Activation functions and normalization layers have distinct roles in shaping geometric continuity.
  • Continuity is projection-specific in transformers, with different layers exhibiting varying degrees of continuity.
  • A nonlinear but rotation-preserving activation fails to maintain continuity, highlighting the importance of symmetry breaking.
Read more
Distribution-Free Pretraining of Classification Losses via Evolutionary Dynamics
Meng Xiang, Yan Pei
Optimization Theory
  • EDL learns a transferable classification loss using synthetic data without real sample access.
  • The framework employs a ranking-consistency objective to enforce meaningful loss penalties.
  • An evolutionary strategy with chaotic mutation enhances the robustness and exploration of loss shape optimization.
  • EDL can replace traditional loss functions like cross-entropy, yielding competitive accuracy.
Read more
DynaTab: Dynamic Feature Ordering as Neural Rewiring for High-Dimensional Tabular Data
Al Zadid Sultan Bin Habib, Gianfranco Doretto, Donald A. Adjeroh
Theory Optimization Efficient ML
  • DynaTab introduces dynamic feature ordering to improve model performance on high-dimensional tabular data.
  • The model predicts when feature permutation will be beneficial based on dataset complexity.
  • DynaTab integrates order-aware mechanisms such as positional embeddings and masked attention.
  • The architecture shows significant performance gains compared to 45 state-of-the-art models.
Read more
Adaptive Data Compression and Reconstruction for Memory-Bounded EEG Continual Learning
Chengcheng Xie
Time Series Efficient ML Theory
  • Introduction of ADaCoRe, a novel pipeline for memory-efficient EEG continual learning.
  • Utilization of morphology-aware techniques to enhance data compression and reconstruction.
  • Demonstrated significant performance gains over existing UICL methods under strict memory constraints.
  • Ablation studies highlight the importance of each component in the proposed pipeline.
Read more
QUIVER: Cost-Aware Adaptive Preference Querying in Surrogate-Assisted Evolutionary Multi-Objective Optimization
Florian A. D. Burnat
Optimization
  • QUIVER optimizes the allocation of resources between objective evaluations and preference elicitation.
  • The method adapts the selection of query modalities based on the difficulty of the optimization problem.
  • QUIVER outperforms traditional single-modality baselines in terms of utility regret.
  • The approach integrates a value-of-information perspective to unify objective exploration and preference learning.
Read more
Disease Is a Spectral Perturbation
John D. Mayfield, Matthew S. Rosen
Theory Interpretability Multimodal
  • Introduces the concept of a biomarker Hamiltonian to model disease transformation.
  • Characterizes disease as a spectral perturbation of the healthy biomarker covariance structure.
  • Derives optimal prognostic statistics based on eigenmode projections.
  • Establishes a unified framework that connects various existing multiomics methods.
Read more
Calibration of the underlying surface parameters for urban flood using latent variables and adjoint equation
Yongfu Tian, Shan Ding, Guofeng Su, Jianguo Chen
Optimization
  • Introduces a Bayesian framework for urban flood parameter calibration using latent variables.
  • Utilizes the adjoint equation of the Urban Flood Dynamical System model for efficient optimization.
  • Demonstrates rapid convergence and robustness to observation time intervals in calibration.
  • Achieves significant accuracy in calibrating Manning's coefficient for urban roads.
Read more
Knowledge-Free Correlated Agreement for Incentivizing Federated Learning
Leon Witt, Togrul Abbasli, Kentaroh Toyoda, Wojciech Samek, Lucy Klinger
Federated Learning Theory Efficient ML
  • KFCA is a knowledge-free mechanism that incentivizes honest reporting in federated learning.
  • It eliminates the label-flipping vulnerability present in existing methods like Correlated Agreement.
  • KFCA supports real-time reward computation without the need for report aggregation.
  • Empirical evaluations show significant reductions in reward computation costs compared to traditional methods.
Read more
Spatiotemporal Convolutions on EEG signal -- A Representation Learning Perspective on Efficient and Explainable EEG Classification with Convolutional Neural Nets
Laurits Dixen, Stefan Heinrich, Paolo Burelli
Time Series Efficient ML Interpretability
  • 2D spatiotemporal convolutions significantly reduce training time for high-dimensional EEG classification tasks.
  • The representational geometry differs between 1D and 2D CNN models, impacting the interpretability of learned features.
  • Maintaining performance while improving efficiency is crucial for real-time EEG applications.
  • Architectural design in CNNs should consider the unique characteristics of EEG data for better feature extraction.
Read more
Gated Subspace Inference for Transformer Acceleration
Stephen J. Thomas
NLP Large Language Models Efficient ML
  • GSI exploits the low effective rank of token activation manifolds for inference acceleration.
  • The method achieves significant speedups (3.0× to 10.5×) without requiring retraining or architectural changes.
  • A per-token gating mechanism ensures output distribution preservation.
  • GSI extends previous work by covering all linear maps in transformers, not just MLP layers.
Read more
Beyond Activation Alignment: The Geometry of Neural Sensitivity
Amirhossein Yavari, Farnaz Zamani Esfahlani
Theory
  • Introduces a framework focusing on local decodable information for comparing neural representations.
  • Defines second-moment local perturbation-discrimination tasks and summarizes them using expected projected pullback/Fisher metrics.
  • Develops the Spectral Riemannian Alignment Score (S-RAS) for comparing neural representations.
  • Empirically validates the framework across artificial and biological systems, including neural networks and mouse visual cortex data.
Read more
Pretrained Model Representations as Acquisition Signals for Active Learning of MLIPs
Eszter Varga-Umbrich, Shikha Surana, Paul Duckworth, Jules Tilly, Olivier Peltre, Zachary Weller-Davies
Efficient ML
  • Pretrained model representations can serve as effective acquisition signals for active learning in MLIPs.
  • The proposed finite-width NTK and activation kernel outperform traditional acquisition methods.
  • Using pretrained models reduces the data required for training MLIPs by significant margins.
  • The latent space of pretrained models preserves chemically meaningful structures.
Read more
Most ReLU Networks Admit Identifiable Parameters
Moritz Grillo, Guido Montúfar
Theory
  • Introduces a unified framework using weighted polyhedral complexes to study parameter identifiability.
  • Establishes that most ReLU architectures with sufficient width have identifiable parameters.
  • Settles the functional dimension for nearly all ReLU architectures as the number of parameters minus the number of hidden neurons.
  • Demonstrates that minimal architectures can still have non-trivial parameter redundancies.
Read more
Superposition Is Not Necessary: A Mechanistic Interpretability Analysis of Transformer Representations for Time Series Forecasting
Alper Yıldırım
Time Series
  • A single-layer transformer can match the performance of deeper models in time series forecasting.
  • Expanding the dictionary size in sparse autoencoders yields minimal changes in forecasting performance.
  • Targeted interventions on latent features produce negligible forecast perturbations.
  • Superposition is not required for competitive performance in time series forecasting.
Read more
FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation
Guangsheng Bao, Hongbo Zhang, Han Cui, Yanbin Zhao, Yue Zhang
Efficient ML NLP Large Language Models
  • FAAST enables forward-only associative adaptation, avoiding backpropagation and iterative updates.
  • The method constructs fast weights in closed form, allowing for efficient inference without retaining memory.
  • FAAST achieves comparable or superior performance to backpropagation-based methods while drastically reducing adaptation time and memory usage.
  • The approach is modular and can be integrated into existing neural networks, including large language models.
Read more