AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

59 Papers today
8h Update frequency
7 Days of history
Attention Sinks Are Provably Necessary in Softmax Transformers: Evidence from Trigger-Conditional Tasks
Yuval Ran-Milo
  • Attention sinks are a necessary feature of softmax Transformers when computing trigger-conditional tasks.
  • The study introduces a trigger-conditional task that reflects the behavior of attention heads in practical scenarios.
  • Theoretical proofs demonstrate that single-layer and multi-layer softmax models must exhibit sink behavior to achieve task accuracy.
  • ReLU attention can solve the same task without any sink formation, indicating that normalization is the key factor driving sink behavior.
Read more
Huntington Disease Automatic Speech Recognition with Biomarker Supervision
Charles L. Wang, Cady Chen, Ziwei Gong, Julia Hirschberg
  • Introduces a high-fidelity clinical corpus for HD speech ASR, the first of its kind for end-to-end evaluation.
  • Demonstrates that different ASR architectures exhibit unique error patterns when processing HD speech.
  • Achieves a significant reduction in WER through HD-specific adaptations of the Parakeet-TDT model.
  • Proposes the use of clinically grounded biomarkers as auxiliary supervision for ASR adaptation.
Read more
Duration Aware Scheduling for ASR Serving Under Workload Drift
Darshan Makwana, Yash Jogi, Harsh Kotta, Aayush Kubba
  • Duration-aware scheduling can significantly reduce end-to-end latency in ASR systems.
  • SJF reduces median latency by up to 73% but may cause increased tail latency.
  • HRRN provides a balanced approach, improving median latency while controlling tail latency degradation.
  • The proposed methods maintain performance under workload drift with minimal scheduling overhead.
Read more
Systematic Scaling Analysis of Jailbreak Attacks in Large Language Models
Xiangwen Wang, Ananth Balashankar, Varun Chandrasekaran
  • Introduces a scaling-law framework for analyzing jailbreak attacks in LLMs.
  • Demonstrates that prompting-based attacks are more compute-efficient than optimization-based methods.
  • Identifies distinct success-stealthiness operating points for different attack paradigms.
  • Finds that the ease of eliciting harm is highly goal-dependent, with misinformation being the most accessible target.
Read more
Multilingual Financial Fraud Detection Using Machine Learning and Transformer Models: A Bangla-English Study
Mohammad Shihab Uddin, Md Hasibul Amin, Nusrat Jahan Ema, Bushra Uddin, Tanvir Ahmed, Arif Hassan Zidan
  • Investigation of financial fraud detection in a multilingual Bangla-English context.
  • Comparison of classical machine learning models with transformer-based architectures.
  • Identification of unique linguistic patterns in fraudulent messages.
  • Demonstration of the effectiveness of classical models in low-resource language settings.
Read more
Slack More, Predict Better: Proximal Relaxation for Probabilistic Latent Variable Model-based Soft Sensors
Zehua Zou, Yiran Ma, Yulong Zhang, Zhengnan Li, Zeyu Yang, Jinhao Xie, Xiaoyu Jiang, Zhichao Chen
  • Introduction of KProxNPLVM to improve soft sensor modeling accuracy.
  • Utilization of Wasserstein distance as a proximal operator to relax the learning objective.
  • Rigorous derivation and proof of convergence for the new variational inference strategy.
  • Demonstration of improved performance through extensive experiments on various datasets.
Read more
Survival Meets Classification: A Novel Framework for Early Risk Prediction Models of Chronic Diseases
Shaheer Ahmad Khan, Muhammad Usamah Shahid, Muddassar Farooq
Interpretability
  • Integration of survival analysis with classification techniques for chronic disease risk prediction.
  • Development of models using only EMR data, excluding lab results, for early disease alerts.
  • Survival models outperform traditional classifiers in predictive performance metrics.
  • Novel explanation methodology for model outputs validated by medical experts.
Read more
Procedural Fairness via Group Counterfactual Explanation
Gideon Popoola, John Sheppard
  • Formalizes procedural fairness as group counterfactual explanation invariance.
  • Introduces Group Counterfactual Integrated Gradients (GCIG) as a training-time regularization method.
  • GCIG minimizes cross-group variation in feature attributions to ensure consistent reasoning.
  • Empirical results show reduced explanation disparity and competitive predictive performance.
Read more
Chemical Reaction Networks Learn Better than Spiking Neural Networks
Sophie Jaffard, Ivo F. Sbalzarini
Theory
  • CRNs can learn classification tasks without requiring hidden layers, unlike SNNs.
  • The paper provides mathematical guarantees for the learning behavior of CRNs.
  • Numerical experiments show CRNs outperform SNNs in classifying handwritten digits.
  • The study highlights the potential of CRNs in machine learning applications.
Read more
Causal Matrix Completion under Multiple Treatments via Mixed Synthetic Nearest Neighbors
Minrui Luo, Zhiheng Zhang
Theory Efficient ML Optimization
  • Introduction of Mixed Synthetic Nearest Neighbors (MSNN) for causal matrix completion under multiple treatments.
  • MSNN retains the statistical properties of SNN while improving sample efficiency in data-scarce environments.
  • The method allows for the sharing of imputation coefficients across treatments based on shared latent factors.
  • Empirical results show MSNN's effectiveness in estimating causal effects where traditional methods fail.
Read more
Entropy-Preserving Reinforcement Learning
Aleksei Petrenko, Ben Lipkin, Kevin Chen, Erik Wijmans, Marco Cusumano-Towner, Raja Giryes, Philipp Krähenbühl
  • Entropy collapse in policy gradient algorithms can hinder exploration and lead to suboptimal policies.
  • Active monitoring and control of entropy during training is essential for maintaining diversity in learned trajectories.
  • The paper introduces REPO and ADAPO as mechanisms for effective entropy regulation.
  • Maintaining a steady entropy trajectory correlates with improved performance in language model reasoning tasks.
Read more
Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models
Samy Jelassi, Mujin Kwun, Rosie Zhao, Yuanzhi Li, Nicolo Fusi, Yilun Du, Sham M. Kakade, Carles Domingo-Enrich
NLP Large Language Models Optimization
  • Introduces a feature-matching loss for fine-tuning language models that targets sequence-level statistics.
  • Proposes Energy-Based Fine-Tuning (EBFT) as a practical method to optimize the feature-matching loss.
  • EBFT outperforms traditional supervised fine-tuning (SFT) and matches reinforcement learning with verifiable rewards (RLVR) in downstream tasks.
  • Demonstrates lower validation cross-entropy compared to both SFT and RLVR.
Read more
Client-Conditional Federated Learning via Local Training Data Statistics
Rickard Brännvall
Federated Learning
  • Proposes a method that conditions a global model on PCA statistics of local training data.
  • Achieves performance comparable to an Oracle baseline across various heterogeneity types.
  • Demonstrates unique robustness to data sparsity, maintaining accuracy with reduced client data.
  • Avoids the need for additional communication and complex client clustering.
Read more
Frequentist Consistency of Prior-Data Fitted Networks for Causal Inference
Valentyn Melnychuk, Vahid Balazadeh, Stefan Feuerriegel, Rahul G. Krishnan
  • PFNs can exhibit prior-induced confounding bias, affecting their frequentist consistency.
  • A one-step posterior correction (OSPC) is proposed to recalibrate PFNs and restore consistency.
  • The OSPC leads to a semi-parametric Bernstein-von Mises theorem for calibrated PFNs.
  • Martingale posteriors are utilized to implement the OSPC effectively.
Read more
Separable neural architectures as a primitive for unified predictive and generative intelligence
Reza T. Batley, Apurba Sarker, Rajib Mostakim, Andrew Klichine, Sourav Saha
  • Introduction of Separable Neural Architectures (SNA) as a new neural primitive.
  • SNAs exploit latent factorisable structures in various domains, enhancing predictive and generative capabilities.
  • Demonstrated effectiveness across multiple applications, including reinforcement learning and turbulent flow modeling.
  • Establishes a structural analogy between chaotic dynamics and linguistic autoregression.
Read more
H2LooP Spark Preview: Continual Pretraining of Large Language Models for Low-Level Embedded Systems Code
Amit Singh, Vedant Nipane, Pulkit Agrawal, Jatin Kishnani
Large Language Models Generative Models NLP
  • Development of a specialized training corpus for embedded systems code using repository-datasheet pairs.
  • Successful application of continual pretraining to adapt a large language model for a niche domain.
  • Significant performance improvements in perplexity and code generation accuracy compared to existing models.
  • Open-source release of the model checkpoint to facilitate further research in embedded systems LLMs.
Read more
On the Role of Reversible Instance Normalization
Gaspard Berthelier, Tahar Nabil, Etienne Le Naour, Richard Niamke, Samir Perlaza, Giovanni Neglia
Time Series
  • Identifies three key challenges in normalization for time series forecasting: temporal, spatial, and conditional distribution shifts.
  • Conducts ablation studies on RevIN, revealing redundancies and detrimental components.
  • Challenges the effectiveness of RevIN in mitigating distribution shifts.
  • Proposes improvements for RevIN to enhance robustness and generalization in forecasting models.
Read more
Harnessing Data Asymmetry: Manifold Learning in the Finsler World
Thomas Dagès, Simon Weber, Daniel Cremers, Ron Kimmel
  • Introduction of Finsler geometry to capture asymmetric dissimilarities in manifold learning.
  • Development of a Finsler manifold learning pipeline that enhances existing asymmetric embedding techniques.
  • Experimental validation showing superior performance of Finsler embeddings over traditional Euclidean methods.
  • Revelation of hidden structures and information in data that symmetric methods fail to capture.
Read more
The Latent Color Subspace: Emergent Order in High-Dimensional Chaos
Mateusz Pach, Jessica Bader, Quentin Bouniot, Serge Belongie, Zeynep Akata
  • Introduction of the Latent Color Subspace (LCS) in the VAE latent space of the FLUX model.
  • Demonstration that color can be represented in a three-dimensional subspace closely resembling the HSL color model.
  • Development of a training-free method for color intervention based on the LCS interpretation.
  • Validation of the LCS through mid-generation color observation and targeted interventions.
Read more
Deep Learning Network-Temporal Models For Traffic Prediction
Yufeng Xin, Ethan Fan
Time Series Graph Learning Large Language Models
  • Introduction of two deep learning models for multivariate time series prediction in network traffic.
  • The GAT model captures both temporal and topological correlations, while the LLM model excels in generalization.
  • Extensive performance evaluations demonstrate the superiority of the LLM model over traditional methods.
  • Insights into correlation variability and prediction distribution discrepancies are provided.
Read more
Hindsight-Anchored Policy Optimization: Turning Failure into Feedback in Sparse Reward Settings
Yuning Wu, Ke Wang, Devin Chen, Kai Wei
Reinforcement Learning Large Language Models Optimization
  • HAPO addresses the dilemma of sparse rewards in reinforcement learning by integrating hindsight mechanisms.
  • The Synthetic Success Injection (SSI) operator allows for dynamic anchoring to teacher demonstrations during failures.
  • A Thompson sampling-inspired gating mechanism governs the intervention, enabling a self-paced learning curriculum.
  • HAPO demonstrates asymptotic consistency, recovering unbiased gradients as the policy improves.
Read more
Teleodynamic Learning a new Paradigm For Interpretable AI
Enrique ter Horst, Juan Diego Zambrano
  • Teleodynamic Learning shifts the focus from static optimization to dynamic co-evolution of structure, parameters, and resources.
  • The approach models learning as navigation in a constrained dynamical system with coupled inner and outer dynamics.
  • Three central phenomena emerge: stabilization without external criteria, phase-structured behavior, and geometry-based convergence guarantees.
  • The Distinction Engine (DE11) demonstrates the effectiveness of this paradigm, achieving high accuracy on benchmark datasets.
Read more
Effective Resistance Rewiring: A Simple Topological Correction for Over-Squashing
Bertran Miquel-Oliver, Manel Gil-Sorribes, Victor Guallar, Alexis Molina
Graph Learning
  • Introduces Effective Resistance Rewiring (ERR) to address over-squashing in GNNs.
  • Utilizes effective resistance as a global measure to identify structural bottlenecks.
  • Demonstrates a trade-off between over-squashing and oversmoothing in GNNs.
  • Combines ERR with normalization techniques to improve model performance.
Read more
Temporal Straightening for Latent Planning
Ying Wang, Oumayma Bounou, Gaoyue Zhou, Randall Balestriero, Tim G. J. Rudner, Yann LeCun, Mengye Ren
Robotics Optimization Theory
  • Introduces temporal straightening to improve representation learning for latent planning.
  • Utilizes a curvature regularizer to create straighter latent trajectories.
  • Demonstrates improved alignment between Euclidean and geodesic distances in latent space.
  • Achieves significant performance gains in goal-reaching tasks with gradient-based planning.
Read more
On the Robustness of Langevin Dynamics to Score Function Error
Daniel Yiming Cao, August Y. Chen, Karthik Sridharan, Yuchen Wu
Generative Models Theory
  • Langevin dynamics is not robust to L2 (or Lp) errors in score function estimates, unlike diffusion models.
  • Even small L2 errors can lead to significant deviations from the target distribution in Langevin dynamics.
  • The results caution against the use of Langevin dynamics with estimated scores in high-dimensional generative modeling.
  • The paper provides a formal proof of the limitations of Langevin dynamics regarding score estimation errors.
Read more
Taming the Adversary: Stable Minimax Deep Deterministic Policy Gradient via Fractional Objectives
Taeho Lee, Donghwan Lee
Reinforcement Learning Robotics Optimization
  • Introduction of MMDDPG framework for robust policy learning in RL.
  • Formulation of training as a minimax optimization problem between user and adversary.
  • Use of a fractional objective to balance performance and disturbance magnitude.
  • Demonstrated improved robustness in MuJoCo environments against disturbances.
Read more
Slow-Fast Inference: Training-Free Inference Acceleration via Within-Sentence Support Stability
Xingyu Xie, Zhaochen Yu, Yue Liao, Tao Wang, Kim-Chuan Toh, Shuicheng Yan
NLP Large Language Models Efficient ML
  • Identification of within-sentence support stability, where attention support remains stable over short coherent spans.
  • Introduction of Slow-Fast Inference (SFI), a training-free framework that alternates between fast and slow decoding steps.
  • Development of a training-free Selector that converts dense-attention evidence into reusable memory.
  • Significant improvements in decoding throughput without sacrificing quality, achieving 1.6Ă— to 14.4Ă— speedup.
Read more
Automatic Generation of High-Performance RL Environments
Seth Karten, Rahul Dev Appapogu, Chi Jin
Reinforcement Learning Efficient ML Robotics
  • High-performance RL environments can be generated automatically and cheaply, reducing the engineering burden.
  • The proposed methodology includes hierarchical verification to ensure semantic equivalence across environments.
  • Significant speedups were achieved in various environments, with performance improvements ranging from 1.5Ă— to 42Ă—.
  • The approach allows for the creation of entirely new environments, such as TCGJax, which did not exist prior to this work.
Read more
STAMP: Selective Task-Aware Mechanism for Text Privacy
Fengwei Tian, Payel Bhattacharjee, Heidi Hanson, Geoffrey D. Rubin, Joseph Y. Lo, Ravi Tandon
  • STAMP selectively allocates privacy budgets based on token importance and sensitivity.
  • The polar mechanism preserves the magnitude of embeddings while perturbing their direction.
  • Experimental results show STAMP outperforms existing methods in maintaining utility while ensuring privacy.
  • The framework is applicable in scenarios requiring client-side text privacy protection.
Read more
Personalized Federated Learning via Gaussian Generative Modeling
Peng Hu, Jianwei Ma
Federated Learning Generative Models
  • Introduces pFedGM, a method for personalized federated learning using Gaussian generative modeling.
  • Balances global collaboration and personalization through a dual objective approach.
  • Decouples the Gaussian classifier into a navigator and a statistic extractor for improved representation learning.
  • Demonstrates effectiveness across diverse scenarios and datasets, outperforming existing methods.
Read more
Flowcean - Model Learning for Cyber-Physical Systems
Maximilian Schmidt, Swantje Plambeck, Markus Knitt, Hendrik Rose, Goerschwin Fey, Jan Christian Wieck, Stephan Balduin
  • Flowcean automates model generation for Cyber-Physical Systems using data-driven learning.
  • The framework emphasizes modularity and usability, allowing integration of various learning libraries.
  • Flowcean addresses the limitations of existing machine learning frameworks in handling diverse CPS applications.
  • The framework supports customization of data-driven learning pipelines for specific CPS characteristics.
Read more
Grammar of the Wave: Towards Explainable Multivariate Time Series Event Detection via Neuro-Symbolic VLM Agents
Sky Chenwei Wan, Tianjun Hou, Yifei Wang, Xiqing Chang, Aymeric Jan
  • Introduction of Knowledge-Guided Time Series Event Detection (K-TSED) to leverage natural language descriptions for event detection.
  • Development of the Event Logic Tree (ELT) framework to model the temporal-logic structures of events.
  • Creation of a neuro-symbolic VLM agent system (SELA) that combines logic analysis and signal inspection.
  • Establishment of a benchmark dataset from real-world time series data to validate the proposed method.
Read more
Resource-Efficient Iterative LLM-Based NAS with Feedback Memory
Xiaojie Gu, Dmitry Ignatov, Radu Timofte
  • Introduces a closed-loop iterative NAS pipeline using LLMs for architecture generation and refinement.
  • Utilizes a historical feedback memory to learn from past attempts, improving the efficiency of the search process.
  • Demonstrates significant performance improvements on image classification tasks with minimal computational resources.
  • Establishes a framework that is accessible for resource-constrained environments, favoring compact models for edge deployment.
Read more
Statistical and structural identifiability in representation learning
Walter Nelson, Marco Fumero, Theofanis Karaletsos, Francesco Locatello
Theory
  • Introduces distinct concepts of statistical and structural identifiability in representation learning.
  • Proposes model-agnostic definitions of near-identifiability allowing for error tolerance.
  • Demonstrates that ICA can resolve linear ambiguities in representations.
  • Achieves state-of-the-art disentanglement using a simple combination of autoencoders and ICA.
Read more
Graph Tokenization for Bridging Graphs and Transformers
Zeyuan Guo, Enmao Diao, Cheng Yang, Chuan Shi
Graph Learning
  • Introduces a graph tokenization framework that combines reversible serialization with BPE.
  • Guides serialization using global statistics to enhance structural representation.
  • Enables standard Transformers to process graph data without architectural changes.
  • Achieves state-of-the-art results on 14 benchmark datasets, surpassing existing models.
Read more
Cross-Domain Policy Optimization via Bellman Consistency and Hybrid Critics
Ming-Hong Chen, Kuan-Chen Pan, You-De Huang, Xi Liu, Ping-Chun Hsieh
  • Introduction of cross-domain Bellman consistency to measure transferability of source-domain models.
  • Development of the QAvatar framework that combines Q functions from source and target domains.
  • Establishment of convergence properties for QAvatar, ensuring reliable knowledge transfer.
  • Demonstration of QAvatar's superior performance over existing CDRL methods in benchmark tasks.
Read more
Exhaustive Circuit Mapping of a Single-Cell Foundation Model Reveals Massive Redundancy, Heavy-Tailed Hub Architecture, and Layer-Dependent Differentiation Control
Ihor Kendiukhov
  • Exhaustive circuit tracing reveals a heavy-tailed hub architecture with significant annotation bias.
  • 1.8% of features account for a disproportionate amount of connectivity in the model.
  • Redundancy in feature interactions increases with interaction order, indicating a subadditive architecture.
  • Late-layer features are causally linked to promoting cellular maturity, while early-layer features push away from it.
Read more
Fingerprinting Concepts in Data Streams with Supervised and Unsupervised Meta-Information
Ben Halstead, Yun Sing Koh, Patricia Riddle, Mykola Pechenizkiy, Albert Bifet, Russel Pears
Time Series
  • FiCSUM framework combines supervised and unsupervised meta-information for concept representation.
  • Dynamic weighting strategy allows for flexible adaptation to different datasets.
  • FiCSUM significantly outperforms existing methods in detecting concept drift and classification accuracy.
  • The framework captures a wide range of concept behaviors, enhancing the identification of recurring concepts.
Read more
Geometry-Aware Probabilistic Circuits via Voronoi Tessellations
Sahil Sidheekh, Sriraam Natarajan
  • Introduces Voronoi tessellations as a method to enhance the geometric adaptability of probabilistic circuits.
  • Formalizes the incompatibility between Voronoi-based routing and tractable inference in PCs.
  • Presents two solutions: a certified approximate inference framework and a structural condition for exact inference.
  • Develops a differentiable relaxation for Voronoi tessellations to enable gradient-based learning.
Read more
Task-Conditioned Routing Signatures in Sparse Mixture-of-Experts Transformers
Mynampati Sri Ranganadha Avinash
NLP Large Language Models Interpretability
  • Introduction of routing signatures as a compact representation of expert activation patterns.
  • Demonstration of strong task-conditioned clustering of routing signatures in the OLMoE model.
  • Validation of routing patterns against permutation and load-balancing baselines.
  • High accuracy in task classification using routing signatures.
Read more
Single molecule localization microscopy challenge: a biologically inspired benchmark for long-sequence modeling
Fatemeh Valeh, Monika Farsang, Radu Grosu, Gerhard SchĂĽtz
  • Introduction of SMLM-C as a benchmark for evaluating long-sequence models in biological imaging.
  • Demonstration of significant performance degradation in state space models under conditions of high temporal discontinuity.
  • Highlighting the unique challenges posed by sparse, irregular, and noise-corrupted temporal signals in SMLM data.
  • Emphasis on the necessity for improved sequence modeling methodologies to address the complexities of biological data.
Read more
High-resolution weather-guided surrogate modeling for data-efficient cross-location building energy prediction
Piragash Manmatharasan, Girma Bitsuamlak, Katarina Grolinger
Optimization Time Series Efficient ML
  • Introduces a high-resolution weather-informed surrogate modeling approach for building energy prediction.
  • Achieves cross-location generalization with minimal simulation effort, enabling zero-shot predictions.
  • Utilizes weekly weather data to capture fine-grained weather-energy relationships.
  • Evaluates multiple time-series learning strategies for optimal weather input encoding.
Read more
Sharpness-Aware Minimization for Generalized Embedding Learning in Federated Recommendation
Fengyuan Yu, Xiaohua Feng, Yuyuan Li, Changwang Zhang, Jun Wang, Chaochao Chen
  • Introduces FedRecGEL, a framework focusing on generalized item embedding learning in federated recommendation systems.
  • Reformulates the federated recommendation problem as a multi-task learning challenge, emphasizing item-centered perspectives.
  • Utilizes sharpness-aware minimization to enhance the stability and generalization of item embeddings.
  • Demonstrates significant performance improvements over existing federated recommendation methods through extensive experiments.
Read more
Beyond the Class Subspace: Teacher-Guided Training for Reliable Out-of-Distribution Detection in Single-Domain Models
Hong Yang, Devroop Kar, Qi Yu, Travis Desell, Alex Ororbia
Computer Vision Theory Efficient ML
  • Identification of Domain-Sensitivity Collapse (DSC) as a critical failure mode in single-domain OOD detection.
  • Introduction of Teacher-Guided Training (TGT) to enhance domain sensitivity in single-domain models.
  • Demonstrated significant improvements in OOD detection performance across multiple benchmarks.
  • TGT maintains in-domain classification accuracy while reducing false positive rates for OOD detection.
Read more
Monitoring and Prediction of Mood in Elderly People during Daily Life Activities
Daniel Bautista-Salinas, Joaquín Roca González, Inmaculada Méndez, Oscar Martinez Mozos
Time Series
  • Development of a wearable system for mood monitoring in elderly individuals.
  • Utilization of ecological momentary assessment (EMA) for real-time mood evaluation.
  • Machine learning classifier trained on physiological data from a wristband.
  • Promising results in mood prediction accuracy, especially for happiness and activeness.
Read more
IsoCompute Playbook: Optimally Scaling Sampling Compute for LLM RL
Zhoujun Cheng, Yutao Xie, Yuxiao Qu, Amrith Setlur, Shibo Hao, Varad Pimpalkhute, Tongtong Liang, Feng Yao, Zhengzhong Liu, Eric Xing, Virginia Smith, Ruslan Salakhutdinov, Zhiting Hu, Taylor Killian, Aviral Kumar
Reinforcement Learning Large Language Models Optimization
  • Optimal allocation of sampling compute in LLM RL is crucial for maximizing performance.
  • The number of parallel rollouts per problem increases with compute budget but saturates at higher levels.
  • Easy and hard problem sets show similar scaling trends driven by different mechanisms.
  • Performance is less sensitive to the number of unique problems per batch compared to rollouts per problem.
Read more
Security Considerations for Artificial Intelligence Agents
Ninghui Li, Kaiyuan Zhang, Kyle Polley, Jerry Ma
NLP Large Language Models Theory
  • AI agents introduce unique security vulnerabilities distinct from traditional software systems.
  • The blurring of code and data in LLM-powered agents creates new attack surfaces.
  • Existing security mechanisms are often inadequate for the dynamic and autonomous nature of AI agents.
  • A layered defense strategy is proposed to address the security challenges of AI agents.
Read more
ARROW: Augmented Replay for RObust World models
Abdulaziz Alyahya, Abdallah Al Siyabi, Markus R. Ernst, Luke Yang, Levin Kuhlmann, Gideon Kowadlo
  • ARROW introduces a dual-buffer system for memory-efficient continual reinforcement learning.
  • The algorithm is inspired by neuroscience, specifically the Complementary Learning Systems theory.
  • ARROW demonstrates reduced forgetting in tasks without shared structure compared to traditional methods.
  • The approach maintains comparable forward transfer, indicating effective knowledge retention.
Read more
Disentangled Representation Learning through Unsupervised Symmetry Group Discovery
Dang-Nhu Barthélémy, Annabi Louis, Argentieri Sylvain
Reinforcement Learning Robotics Theory
  • Introduces a method for autonomous discovery of symmetry group structures in representation learning.
  • Proves the identifiability of the true symmetry group decomposition under minimal assumptions.
  • Develops two algorithms: one for symmetry group discovery and another for LSBD representation learning.
  • Demonstrates superior performance of the proposed method over existing LSBD approaches in various environments.
Read more
Structure-Aware Epistemic Uncertainty Quantification for Neural Operator PDE Surrogates
Haoze Song, Zhihao Li, Mengyi Deng, Xin Li, Duyi Pan, Zhilu Lai, Wei Wang
Theory Optimization Efficient ML
  • Introduces a structure-aware UQ scheme for neural operators in PDE modeling.
  • Focuses on epistemic uncertainty arising from limited data and imperfect training.
  • Implements targeted perturbations in the lifting module to improve uncertainty estimates.
  • Demonstrates superior performance in uncertainty coverage and alignment on PDE benchmarks.
Read more
Higher-Order Modular Attention: Fusing Pairwise and Triadic Interactions for Protein Sequences
Shirin Amiraslani, Xin Gao
Theory Efficient ML
  • HOMA introduces triadic interactions to enhance the representation of protein sequences.
  • The method is designed to be computationally efficient, suitable for long biological sequences.
  • HOMA shows consistent performance improvements across multiple protein-related tasks.
  • The framework allows for controlled comparisons among different attention mechanisms.
Read more
Learning Tree-Based Models with Gradient Descent
Sascha Marton
Optimization Interpretability Reinforcement Learning
  • Introduces a gradient descent approach for learning decision trees, overcoming limitations of traditional methods.
  • Utilizes backpropagation on a dense decision tree representation for joint optimization of tree parameters.
  • Extends the method to tree ensembles with instance-wise weighting for improved performance and interpretability.
  • Achieves state-of-the-art results in multiple domains, including multimodal and reinforcement learning.
Read more
Deep Learning-Based Metamodeling of Nonlinear Stochastic Dynamic Systems under Parametric and Predictive Uncertainty
Haimiti Atila, Seymour M.J. Spence
Time Series
  • Introduces three metamodeling frameworks for nonlinear dynamic systems that account for both loading and parameter uncertainties.
  • Employs deep learning techniques to enhance metamodeling capabilities, overcoming limitations of traditional methods.
  • Demonstrates effective prediction uncertainty quantification through Monte Carlo dropout integrated with LSTM.
  • Validates the proposed frameworks on two distinct case studies, showcasing their adaptability and performance.
Read more
Language Generation with Replay: A Learning-Theoretic View of Model Collapse
Giorgio Racca, Michal Valko, Amartya Sanyal
NLP Large Language Models Theory
  • Introduces a learning-theoretic framework for analyzing model collapse in LLMs.
  • Demonstrates that replay can limit generatability in non-uniform and limit generation contexts.
  • Findings support existing practical methods like data cleaning and watermarking but also identify their limitations.
  • Establishes a clear distinction between the effects of replay on different generative tasks.
Read more
Scaling Reasoning Efficiently via Relaxed On-Policy Distillation
Jongwoo Ko, Sara Abdali, Young Jin Kim, Tianyi Chen, Pashmina Cameron
Reinforcement Learning Large Language Models Efficient ML
  • REOPOLD stabilizes on-policy distillation by relaxing strict imitation constraints.
  • The framework integrates modern RL insights to enhance training efficiency.
  • Empirical results show significant improvements in sample efficiency and test-time scaling.
  • REOPOLD allows smaller models to perform comparably to larger models in reasoning tasks.
Read more
Jailbreak Scaling Laws for Large Language Models: Polynomial-Exponential Crossover
Indranil Halder, Annesya Banerjee, Cengiz Pehlevan
NLP Large Language Models Theory
  • Adversarial prompt-injection can amplify attack success rates from polynomial to exponential growth.
  • A theoretical model based on spin-glass theory provides insights into the behavior of LLMs under adversarial conditions.
  • Short prompts lead to polynomial scaling of attack success rates, while long prompts result in exponential scaling.
  • Empirical validation shows varying attack susceptibility across different LLMs, correlating with their reasoning abilities.
Read more
AutoScout: Structured Optimization for Automating ML System Configuration
Jimmy Shong, Yuhan Ding, Yihan Jiang, Liheng Jing, Haonan Chen, Gaokai Zhang, Aditya Akella, Fan Lai
Optimization Efficient ML
  • AutoScout formulates ML system configuration as a mixed-discrete/continuous optimization problem.
  • It employs a hybrid optimization framework that integrates sparse and dense parameter optimization.
  • AutoScout achieves 2.7–3.0Ă— training speedup compared to expert-tuned settings.
  • The system is 13.7–16.5Ă— faster than existing system configurators.
Read more
Topological DeepONets and a generalization of the Chen-Chen operator approximation theorem
Vugar Ismailov
Theory
  • Introduction of Topological DeepONets for approximating nonlinear operators in locally convex spaces.
  • Generalization of the Chen-Chen operator approximation theorem to encompass broader function spaces.
  • Construction of a neural network architecture that utilizes continuous linear functionals for input processing.
  • Demonstration of uniform approximation capabilities for continuous operators on compact sets.
Read more
Exploiting Expertise of Non-Expert and Diverse Agents in Social Bandit Learning: A Free Energy Approach
Erfan Mirzaei, Seyed Pooya Shariatpanahi, Alireza Tavakoli, Reshad Hosseini, Majid Nili Ahmadabadi
Reinforcement Learning Theory Optimization
  • Introduces a free energy-based social bandit learning algorithm that integrates individual and social learning.
  • Proves theoretical convergence to the optimal policy without requiring shared rewards or social norms.
  • Demonstrates improved learning performance in the presence of non-expert agents.
  • Maintains logarithmic regret, indicating efficient exploration and exploitation.
Read more