AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

48 Papers today
8h Update frequency
7 Days of history
Geodesics of Dynamic Graphs for Regime Change Detection
William Cappelletti, Γ‰tienne Voutaz, Pascal Frossard
Graph Learning Time Series Theory
  • Introduces a geodesic-based framework for detecting regime changes in dynamic graphs.
  • Defines regimes as coherent dynamics characterized by geodesics in graph space.
  • Outperforms existing change point detection methods on synthetic and real-world data.
  • Aligns detected change points with significant external events during the Covid-19 pandemic.
Read more
WhiFlash: Accelerating Speculative Decoding with Token-Level Cross-Paradigm Routing
Young D. Kwon, Miles Williams, Rui Li, Alexandros Kouris, Stylianos I. Venieris
NLP Large Language Models Efficient ML
  • WhiFlash is the first cross-paradigm speculative decoding method that integrates autoregressive and diffusion-based drafting.
  • The method employs a token-level routing mechanism to dynamically select the most effective drafting model during inference.
  • Novel cache-management optimizations reduce switching overhead to below 7% of per-round latency.
  • WhiFlash achieves significant throughput gains compared to state-of-the-art models, enhancing performance in diverse tasks.
Read more
$Ξ±$-PFN: Fast Entropy Search via In-Context Learning
Herilalaina Rakotoarison, Steven Adriaensen, Tom Viering, Carl Hvarfner, Samuel MΓΌller, Frank Hutter, Eytan Bakshy
Optimization Efficient ML
  • Introduces $Ξ±$-PFN to improve the efficiency of Entropy Search in Bayesian optimization.
  • Utilizes a two-stage amortization strategy with Prior-data Fitted Networks for rapid acquisition function evaluation.
  • Achieves significant speed improvements (over 50x) compared to traditional Monte Carlo methods.
  • Demonstrates competitive performance against state-of-the-art ES methods on various benchmarks.
Read more
Network Recovery from Cascade Data: A Debiased Jacobian-Based Machine Learning Approach
Lei Huang
Graph Learning Theory Time Series
  • CascadeNet does not require a predefined diffusion model, reducing the risk of misspecification.
  • The method employs a flexible estimator for the transition function, allowing for a wide range of applications.
  • Neyman-orthogonal debiasing ensures unbiased estimates of the network Jacobian, facilitating formal statistical inference.
  • CascadeNet outperforms existing methods in both simulated and real-world scenarios, particularly in recovering true network structures.
Read more
QDSP: An Interpretable Structured Learning Framework for Predicting Death or Cerebral Palsy in Very Low Birth Weight Infants
Ling Wang, Xiaolong Li, Hui Zhou, Jing Shi, Fuhao Zhang, Dapeng Chen, Nan Mu
Interpretability
  • QDSP integrates QSS and DSP for robust and interpretable predictions in clinical settings.
  • The framework achieved high accuracy and AUC in predicting outcomes for VLBWI.
  • QDSP outperformed traditional machine learning and deep learning methods in the evaluation.
  • The model identified clinically relevant predictors, enhancing interpretability.
Read more
Covariance Shrinkage via Stochastic Interpolation
Mathieu Chalvidal, Florentin Coeurdoux, Eric Vanden-Eijnden
Theory Optimization
  • Covariance shrinkage is reformulated as empirical risk minimization over a stochastic interpolant.
  • Three mechanisms for risk reduction are identified: scheduling, coupling, and early stopping.
  • The method allows for non-linear flow maps that escape the limitations of classical shrinkage methods.
  • The neural estimator outperforms traditional shrinkage methods in terms of out-of-sample performance.
Read more
ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research
Wanghan Xu, Shuo Li, Tianlin Ye, Qinglong Cao, Yixin Chen, Hengjian Gao, Yiheng Wang, Qi Li, Kun Li, Sheng Xu, Shengdu Chai, Fangchen Yu, Xiangyu Zhao, Zhangrui Zhao, Weijie Ma, Zijie Guo, Haoyu Zhou, Haoxiang Yin, Lixue Cheng, Chaofan Hu, Haoxuan Li, Lu Mi, Xuxuan Xie, Yifan Zhou, Ruizhe Chen, Zhiwang Zhou, Xingjian Guo, Yuhao Zhou, Xuming He, Shengyuan Xu, Xinyu Gu, Jiamin Wu, Mianxin Liu, Chunfeng Song, Fenghua Ling, Dongzhan Zhou, Shixiang Tang, Yuqiang Li, Mao Su, Peng Ye, Siqi Sun, Bin Wang, Xue Yang, Zhenfei Yin, Tianfan Fu, Guangtao Zhai, Wanli Ouyang, Bo Zhang, Lei Bai, Wenlong Zhang
Large Language Models Optimization Theory
  • ResearchClawBench provides a structured evaluation framework for autonomous scientific research across diverse domains.
  • The benchmark includes 40 tasks based on real scientific papers, enabling a realistic assessment of AI capabilities.
  • Current autonomous research agents and LLMs show limited effectiveness in achieving target-paper-level re-discovery.
  • Expert-curated rubrics allow for detailed evaluation of scientific outputs, addressing the complexity of scientific research.
Read more
Theoretical Foundations of Continual Learning via Drift-Plus-Penalty
Nazreen Shah, Govinda Arya, Bharath B.N., Ranjitha Prasad
Theory Optimization
  • Introduces a control-theoretic approach to continual learning, framing it as a dynamic process.
  • Proposes the COLD framework, which regulates forgetting through a virtual queue and the DPP principle.
  • Establishes stability and convergence guarantees for the proposed methods.
  • Demonstrates superior performance compared to existing CL methods on benchmark datasets.
Read more
CoMetaPNS: Continually Meta-learning Personalized Neural Surrogates for Cardiac Electrophysiology Simulations
Ryan Missel, Xiajun Jiang, Linwei Wang
Generative Models Optimization Efficient ML
  • Introduces a continual meta-learning framework for personalized cardiac simulations.
  • Addresses the challenge of catastrophic forgetting in traditional meta-learning models.
  • Utilizes a Bayesian Gaussian Mixture Model for effective data integration and identification.
  • Demonstrates superior performance in simulation accuracy and computational scalability.
Read more
When Should an AI Scientist Stop? Verifiable Experiment Steering and Refusal for Autonomous Discovery
Neel Tushar Shah, Manglam Kartik
Theory Optimization Interpretability
  • CARTOGRAPH introduces a verification layer for AI scientists that integrates experiment selection, ambiguity resolution, and model refusal.
  • The framework outperforms traditional methods in various experimental settings, demonstrating significant advantages in decision-making.
  • CARTOGRAPH can retract incorrect model identifications based on new evidence, enhancing the reliability of autonomous discovery.
  • The study emphasizes the need for AI systems to have mechanisms for stopping claims when the underlying model library is inadequate.
Read more
RETROSPECT: RETROsynthesis via Sequential Prediction, and Chemically Transformed-ranking
Raja Sekhar Pappala, Shreyas Vinaya Sathyanarayana, Ronit Kumar Choudhary, Arjun Verma, Deepak Warrier
Generative Models Graph Learning Optimization
  • Introduction of RETROSPECT, a modular framework for retrosynthesis that separates proposal generation from candidate selection.
  • Development of the ChemAlign Transformer, which employs advanced training techniques for improved prediction accuracy.
  • Implementation of a LambdaMART reranker that enhances candidate selection based on various chemical descriptors.
  • Demonstration of high accuracy rates on the USPTO-50K dataset, supporting the effectiveness of the proposed methods.
Read more
From Shortcuts to Reasoning: Robust Post-Training of Theory of Mind with Reinforcement Learning
Jike Zhong, Yuxiang Lai, Ming Li, Yuheng Li, Wuao Liu, Behzad Dariush, Konstantinos Psounis, Shao-Yuan Lo
Reinforcement Learning NLP Large Language Models
  • Identification of pervasive shortcut issues in popular ToM datasets that mislead model evaluations.
  • Development of a framework for auditing ToM datasets to quantify shortcut prevalence.
  • Introduction of Thinking-RFT, which significantly improves ToM reasoning capabilities over traditional methods.
  • Demonstration of robust generalization and performance gains in complex reasoning tasks.
Read more
Heterogeneous Effects of Green Finance on Urban Decarbonization: Evidence from 285 Cities in China
Xueyang Li, Jinlei Ma
Theory Interpretability
  • Green finance significantly lowers carbon intensity in urban areas.
  • The effects of green finance are most pronounced in less developed cities.
  • Energy structure optimization is the primary mechanism through which green finance operates.
  • Different financial instruments have varying impacts on decarbonization.
Read more
Product units in gated recurrent units improve nuclear-mass prediction
Ziyuan Li, Paulo S.A. Freitas, John W. Clark, Babette Dellen
Time Series
  • Introduction of MI-PU-GRU and AM-PU-GRU architectures for nuclear mass prediction.
  • Utilization of complex-valued computations to capture amplitude and phase dynamics.
  • Significant reduction in prediction errors compared to traditional GRU and other models.
  • Establishment of a new benchmark for sequence-based nuclear mass prediction.
Read more
SRT: Super-Resolution for Time Series via Disentangled Rectified Flow
Jufang Duan, Shenglong Xiao, Yuren Zhang
Time Series Generative Models
  • Introduces SRT, a framework for time series super-resolution that reconstructs high-resolution data from low-resolution inputs.
  • Utilizes a disentangled rectified flow approach to decompose time series into trend and seasonal components.
  • Implements a novel cross-resolution attention mechanism to enhance detail generation.
  • SRT-large variant shows strong zero-shot super-resolution capabilities through extensive pre-training.
Read more
Beyond Homophily: Towards Generalized Graph Reconstruction Attack and Defense
Zhanke Zhou, Bo Han, Xuan Li, Jiangchao Yao, Sanmi Koyejo, Michael K. Ng
Graph Learning
  • Graph reconstruction attacks can expose sensitive information from GNNs, necessitating effective defense mechanisms.
  • The study provides a systematic characterization of adjacency recoverability influenced by graph homophily and heterophily.
  • The proposed MC-GRA (+) attack method enhances reconstruction fidelity over prior techniques.
  • The MC-GPB (+) defense method successfully mitigates reconstruction success with only slight accuracy trade-offs.
Read more
A Held-Out Transition-Pair Falsifier for Long-Horizon Non-Abelian State Tracking
Jeonghoon Lee
Theory
  • Introduces a held-out transition-pair falsifier to evaluate non-Abelian state tracking.
  • Demonstrates that a projected recurrent state model can achieve perfect predictions over long horizons.
  • Mechanism diagnostics reveal the relationship between projection temperature and model performance.
  • Confirms that blocking local transition memorization pathways is crucial for accurate state tracking.
Read more
scCBGM: Interpretable Single-Cell Counterfactual Editing
Alma Andersson, Aya Abdelsalam Ismail, Edward De Brouwer, Doron Haviv, Tommaso Biancalani, Kyunghyun Cho, Gabriele Scalia, AΓ―cha BenTaieb, Hector Corrada Bravo
Generative Models Interpretability
  • Introduction of scCBGM for interpretable single-cell counterfactual editing.
  • Architectural innovations enhance model performance without dimensional constraints.
  • Development of a synthetic benchmark for rigorous evaluation of counterfactuals.
  • Demonstrated superior performance in editing accuracy and generalization across datasets.
Read more
ConSteer-RL: Steering Reasoning Capabilities in Large Language Models via Confidence-Aware Reinforcement Learning
Qing Miao, Yiming Zhao, Jing Yang, Chenxi Liu, Yuehai Chen, Yuewen Liu, Shaoyi Du, Badong Chen
NLP Large Language Models Reinforcement Learning
  • ConSteer-RL integrates model-internal confidence signals into RLVR training.
  • The framework employs a confidence-aware reward shaping mechanism to improve reasoning accuracy.
  • Experimental results show significant performance improvements over existing GRPO methods.
  • The approach does not require additional human annotations or complex verification systems.
Read more
Trajectory Geometry of Transformer Representations Across Layers
Vishal Pandey, Gopal Singh
NLP Large Language Models Interpretability
  • Introduces a trajectory-geometric framework for transformer interpretability.
  • Identifies significant trajectory convergence for semantically related prompts in deeper layers.
  • Demonstrates that reasoning tasks exhibit higher trajectory curvature than lexical tasks.
  • Shows measurable trajectory bifurcation for ambiguous tokens, indicating effective disambiguation.
Read more
Finite Certificates for In-Context Determinacy and a Threshold Theory of Emergence in Language Models
Faruk Alpay, Hamdi Alakkad
NLP Large Language Models Theory
  • Introduces finite semantic certificates for verifying language model behavior.
  • Establishes an exact row-space criterion for finite determinacy in context-conditioned queries.
  • Proves an anti-mirage theorem to differentiate between genuine semantic transitions and scoring discontinuities.
  • Demonstrates NP-completeness in extracting the smallest forcing subcontext.
Read more
Fourier fractal dimension to predict the generalization of deep neural networks
Joao B. Florindo, Davi Wanderley Misturini
Theory Optimization
  • Introduces a novel generalization measure based on Fourier fractal dimension.
  • Demonstrates strong correlation between the proposed measure and actual generalization gap.
  • Outperforms existing methods in predicting generalization without validation data.
  • Presents a customized Fourier-based optimizer to regularize fractal dimension during training.
Read more
GlucoFM-Bench: Benchmarking Time-Series Foundation Models for Blood Glucose Forecasting
Baiying Lu, Zhaohui Liang, Ryan Pontius, Shengpu Tang, Temiloluwa Prioleau
Time Series
  • Introduction of GlucoFM-Bench, the first benchmark for evaluating TSFMs in blood glucose forecasting.
  • Assessment of eight state-of-the-art models across 15 diabetes-relevant datasets.
  • Demonstration of strong transferability of pre-trained TSFMs, particularly in zero-shot and few-shot scenarios.
  • Highlighting the superior performance of lightweight LSTM models when abundant task-specific data is available.
Read more
Uniform Stability and Generalization Error of GD and SGD on Fixed-Point Parameters
Jonghyun Shin, Sejun Park
Theory Optimization
  • Deterministic rounding degrades GD's generalization error from O(T/n) to O(T/√n).
  • Uniform stability of GD becomes Ω(T), leading to vacuous generalization bounds.
  • SGD with deterministic rounding achieves tighter uniform stability bounds depending on dimensionality.
  • Stochastic rounding can increase generalization error with higher dimensions.
Read more
Closed-Form Spectral Regularization for Multi-Task Model Merging
Yongxian Wei, Runxi Cheng, Xingxuan Zhang, Li Shen, Chun Yuan, Peng Cui, Dacheng Tao
Efficient ML Multimodal Optimization
  • Introduces a closed-form spectral regularization approach for multi-task model merging.
  • Demonstrates that iterative optimization acts as an implicit spectral regularizer.
  • Proposes SWUDI and SWUDI-A, which significantly reduce computational costs while maintaining or improving accuracy.
  • Achieves substantial reductions in wall-clock time and GPU memory usage compared to state-of-the-art methods.
Read more
Learn to Match: Two-Sided Matching with Temporally Extended Feedback
Haijing Zong, Yancheng Liang, Boyang Zhou, Natasha Jaques
Reinforcement Learning Theory Optimization
  • Introduces a framework for two-sided matching with temporally extended feedback.
  • Models matching as a partially observable Markov game with evolving agent profiles.
  • LEARN2MATCH benchmark supports decentralized decision-making in dynamic matching markets.
  • Independent PPO outperforms CA-ETC in social welfare and regret but has higher information-friction loss.
Read more
RASFT: Rollout-Adaptive Supervised Fine-Tuning for Reasoning
Yongliang Miao, Fengyuan Liu, Wei Shi, Yanguang Liu, Fei Sun, Na Zou, Mengnan Du
NLP Large Language Models Reinforcement Learning
  • RASFT addresses the limitations of traditional SFT by introducing a policy-aware framework for reasoning tasks.
  • The framework dynamically adjusts expert supervision based on the model's problem-solving ability, enhancing adaptability.
  • Empirical results show significant performance improvements over standard SFT and other reinforcement learning methods.
  • RASFT preserves the model's inherent reasoning capabilities while effectively integrating expert guidance.
Read more
Improved Convergence Analysis of Topology Dependence in Decentralized SGD
Yuki Takezawa, Anastasia Koloskova, Sebastian U. Stich
Theory Optimization Federated Learning
  • Developed a novel proof technique for improved convergence rates in Decentralized SGD.
  • Showed that the full eigenvalue spectrum of the mixing matrix governs convergence rates, not just the spectral gap.
  • Provided experimental evidence that aligns theoretical predictions with observed training behaviors.
  • Demonstrated that sparse topologies with small spectral gaps can perform better than previously thought.
Read more
Loss-Guided Adaptive Scale Refinement for Molecular Force Prediction
Limin Yu
Graph Learning
  • Introduces a loss-guided adaptive scale refinement framework for molecular force prediction.
  • Demonstrates substantial complementarity between short- and long-range force prediction branches.
  • Shows that continuous scale interpolation outperforms hard routing, indicating the existence of effective intermediate scales.
  • Establishes that a compact set of discrete scale anchors can approximate the continuous oracle scale space.
Read more
Towards Serverless Semi-Decentralized Federated Learning with Heterogeneous Optimizers
Su Wang, Mung Chiang, H. Vincent Poor
Federated Learning Optimization Theory
  • Introduces SSD-FL, a serverless approach to semi-decentralized federated learning.
  • Addresses the challenges of cluster formation in decentralized environments with heterogeneous optimizers.
  • Implements a unique scoring metric for assessing data and optimizer heterogeneity.
  • Demonstrates improved convergence speeds and communication efficiency in experiments.
Read more
Asymptotic Optimality of Thompson Sampling for Risk-Averse Bandits with Sub-Gaussian Rewards
Joel Q. L. Chang
Theory
  • Introduces ρ-NPTSSG, a nonparametric Thompson Sampling algorithm for risk-averse bandits.
  • Achieves instance-optimal regret for any continuous risk functional on distributions with bounded density and sub-Gaussian tails.
  • Resolves an open problem regarding the optimality of the ρ-NPTS algorithm from previous work.
  • Develops key technical contributions, including discretization lemmas that facilitate the algorithm's performance.
Read more
Assessing Sample Quality in Conditional Generation under Compositional Shift
Berker Demirel, Valentino Maiorca, Marco Fumero, Theofanis Karaletsos, Francesco Locatello
Generative Models Computer Vision
  • Introduces a reference-free trust score for assessing sample quality in conditional generation.
  • Combines global realism and attribute-wise faithfulness to evaluate generated samples.
  • Demonstrates empirical effectiveness on biological imaging and vision benchmarks.
  • Enables early sample rejection during the generation process, improving efficiency.
Read more
Unsupervised Continual Clustering via Forward-Backward Knowledge Distillation
Mohammadreza Sadeghi, Sareh Soleimani, Zihan Wang, Narges Armanfard
Theory Efficient ML
  • FBCC is the first framework for unsupervised continual clustering that integrates representation learning and clustering in a sequential manner without replay.
  • The dual-phase forward-backward knowledge distillation strategy mitigates catastrophic forgetting by using lightweight student models to guide the teacher.
  • FBCC provides a memory-efficient solution by storing task-specific knowledge in compact student models instead of large-scale models or past samples.
  • Extensive experiments show that FBCC outperforms state-of-the-art unsupervised and supervised continual learning methods in terms of clustering accuracy.
Read more
Where Rectified Flows Leak: Characterising Membership Signals Along the Interpolation Path
Thomas Sesmat, Gabriel Meseguer-Brocal, Geoffroy Peeters
Generative Models Theory Audio & Speech
  • Rectified Flows can encode subtle traces of training data that may not be directly observable.
  • A bell-shaped curve characterizes the reconstruction gap between training and test data along the interpolation path.
  • The peak location of the membership signal can be derived mathematically under certain assumptions.
  • The findings are validated across different modalities, including audio and images.
Read more
Inferring hidden forcing in a biological oscillator using Kolmogorov-Arnold networks
Julian Szereszewski, Facundo Fainstein, Leandro E. Fernandez, Gabriel B. Mindlin
Interpretability Time Series Theory
  • Kolmogorov-Arnold networks (KAN) effectively reconstruct hidden forces in dynamical systems from partial observations.
  • The study reveals a two-phase activation pattern in avian respiratory dynamics that is not apparent from pressure measurements alone.
  • Electromyographic recordings validate the predictions made by the reconstructed dynamics.
  • The approach highlights the potential of interpretable machine learning methods in uncovering hidden structures in complex biological systems.
Read more
SafeECGMatch: Calibration-Aware Joint Frequency and Time Space Semi-Supervised Learning for Open-Set ECG Classification
Hongkyu Koh, Ikbeom Jang
Time Series
  • Introduction of SafeECGMatch, a novel SSL framework for ECG classification addressing label distribution mismatch.
  • Implementation of a dual-view calibration mechanism that integrates time and frequency domain learning.
  • Demonstration of state-of-the-art accuracy and calibration performance on benchmark ECG datasets.
  • Focus on reducing overconfidence in predictions, enhancing model reliability in clinical applications.
Read more
Spatiotemporal Imputation with Graph-Informed Flow Matching
Zepeng Zhang, Aref Einizade, Jhony H. Giraldo, Olga Fink
Generative Models Graph Learning Time Series
  • Introduction of GiFlow, a novel framework for spatiotemporal imputation.
  • Utilization of a graph-informed prior based on adaptive spatiotemporal filtering.
  • Demonstration of GiFlow's superior performance over existing methods across various datasets.
  • Integration of spatial and temporal attention mechanisms for improved modeling.
Read more
Test-Time Adaptive Composition for Machine Learning as a Service (MLaaS) in IoT Environments
Deepak Kanneganti, Sajib Mistry, Sheik Mohammad Mostakim Fattah, Aneesh Krishna
Efficient ML
  • Introduction of a TTA-aware composability model to assess service compatibility in MLaaS compositions.
  • Development of a service-level adaptation model to regulate personalized adaptations during inference.
  • Demonstration of improved computational efficiency over traditional adaptive approaches.
  • Focus on enabling personalized adaptation at the composition level, addressing the dynamic nature of IoT environments.
Read more
RECAP: Regression Evaluation for Continual Adaptation of Prompts
Harsh Deshpande, Kushal Chawla, Sangwoo Cho, William Campbell
NLP Large Language Models
  • RECAP benchmark measures continual-learning phenomena at the constraint level under a proactive protocol.
  • Existing prompt adaptation methods are inadequate for proactive adaptation, showing no significant performance improvement.
  • The study highlights the need for new methodologies that can adapt to evolving constraints without prior feedback.
  • The benchmark transforms static datasets into temporal evaluation streams, allowing for rigorous evaluation of prompt adaptation methods.
Read more
Attention at the Theoretical Minimum: A Mathematics of Arrays Framework for Memory-Optimal Transformer Kernels
Lenore Mullin, Gaetan Hains
Theory Efficient ML Optimization
  • Introduces a Mathematics of Arrays (MoA) framework for optimizing transformer attention mechanisms.
  • Achieves theoretical minimum memory traffic by eliminating intermediate arrays through algebraic construction.
  • Demonstrates a formal lower bound for data movement, significantly reducing memory costs compared to standard implementations.
  • Projects substantial speedup and energy reduction in real-world applications, especially at large sequence lengths.
Read more
OffQ: Taming Structured Outliers in LLM Quantization by Offsetting
Haoqi Wang, Lorenz K. Mueller, Jiawei Zhuang, Mathieu Salzmann, Lukas Cavigelli
Large Language Models Efficient ML NLP
  • Introduces OffQ, a method to mitigate activation outliers in low-bit quantization.
  • Utilizes a top-1 PCA to identify low-dimensional outlier subspaces.
  • Concentrates outliers into fewer channels and converts them into group-wise offsets.
  • Achieves effective W4A4KV4 quantization with uniform precision.
Read more
STELLAR: Spatio-Temporal Environmental Learning with Latent Alignment and Refinement for Long-Tailed Species Distribution Modeling
Shufeng Kong, Tao Yu, Yuanyuan Wei, Caihua Liu, Junwen Bai, Yingheng Wang, Marc Grimson, Daniel Fink, Carla P. Gomes
Graph Learning Time Series Generative Models
  • STELLAR addresses the limitations of existing JSDM approaches by integrating spatio-temporal dynamics and community structure.
  • The framework employs a Graph-Temporal Encoder to capture historical environmental changes and species interactions.
  • A novel Context-Anchored Latent Alignment mechanism enhances species clustering based on environmental preferences.
  • The Imbalance-Aware Decoupled Decoding module effectively tackles the long-tail distribution of species.
Read more
GRASP: Geometry-aware Residual Alignment for Scalable Pretraining Data Attribution
Yue Min, Ruining Chen, Yujun Li
Theory Efficient ML Large Language Models
  • GRASP formalizes pretraining data attribution as subset-level counterfactual utility prediction.
  • The method incorporates a quadratic geometric penalty to model interactions between data subsets.
  • GRASP significantly reduces artifact construction costs and improves efficiency in evaluating data subsets.
  • The approach demonstrates superior performance compared to existing scalable data attribution methods.
Read more
Principles and Practice of Deep Representation Learning: or a Mathematical Theory of Memory
San Buchanan, Druv Pai, Peng Wang, Yi Ma
Theory Optimization Efficient ML
  • Establishes a mathematical framework for understanding deep representation learning.
  • Unifies classical and modern approaches to data representation and compression.
  • Introduces auto-encoding architectures for self-correction and improvement.
  • Connects theoretical principles to practical applications in AI tasks.
Read more
GeoGNN: Time Series Geo-Localization using Two-Tower Graph Neural Networks
Toan Tran, Waqwoya Abebe, Abhishek Potnis, Supriya Chinthavali, Cyrus Shahabi, Li Xiong, Dalton Lunga
Graph Learning Time Series
  • GeoGNN is a two-tower architecture combining spatial and temporal learning for time series geolocalization.
  • The model leverages graph neural networks to embed geographic candidates and extract features from time series data.
  • GeoGNN outperforms traditional baselines, enhancing geolocalization accuracy by about 27% on average.
  • The approach addresses unique challenges in geolocalizing time series, which lack explicit geographic cues.
Read more
The Geometry of Last-Layer Model Stealing
Snigdha Chandan Khilar
Theory Large Language Models NLP
  • Introduces a geometric perspective on last-layer model stealing using exterior differential systems.
  • Identifies the polar space of the quadratic generator as key to recovering the projection matrix.
  • Demonstrates that the intrinsic dimension of the hidden-state manifold reveals information about nonlinear sublayers.
  • Establishes a clear identifiability boundary for parameters beneath the last layer.
Read more
Knowledge Graphs and Reasoning LLMs for Finding Simple Yet Effective Transcriptomic Perturbation Predictors
Jake Fawkes, Liam Hodgson, Jason Hartford
Reinforcement Learning Large Language Models Graph Learning
  • The K-nearest neighbour approach using knowledge graphs achieves competitive performance in predicting transcriptomic perturbations.
  • Reinforcement learning can optimize LLMs to enhance their predictive capabilities for biological responses.
  • The proposed methods generalize well to unseen perturbations, indicating their robustness.
  • The RL-optimized LLM improves performance on downstream tasks, such as differential expression prediction.
Read more
Beyond Linear and Overcomplete Regimes: A Mean-Field Analysis of Bottleneck Autoencoders
Santanu Das, Ramyak Bilas, Pascal Esser, Satyaki Mukherjee
Theory Optimization
  • The paper provides a theoretical framework for analyzing nonlinear bottleneck autoencoders using mean-field methods.
  • It establishes that the learning dynamics of finite-width networks closely track the mean-field risk trajectory.
  • The study highlights the importance of both nonlinearity and bottleneck constraints in representation learning.
  • The authors derive a system of coupled PDEs to characterize the learning dynamics of BAEs.
Read more