AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

64 Papers today
8h Update frequency
7 Days of history
Dummy-Aware Weighted Attack (DAWA): Breaking the Safe Sink in Dummy Class Defenses
Yunrui Yu, Xuxiang Feng, Pengda Qin, Pengyang Wang, Kafeng Wang, Cheng-zhong Xu, Hang Su, Jun Zhu
Theory Optimization
  • Conventional evaluation methods overestimate the robustness of Dummy Classes-based defenses.
  • DAWA introduces a dual-targeting approach that simultaneously attacks both true and dummy labels.
  • Extensive experiments show a significant reduction in measured robustness of Dummy Classes defenses.
  • The study highlights the need for continuous evolution in adversarial robustness evaluation methodologies.
Read more
Curvature-Guided LoRA: Steering in the pretrained NTK subspace
Frédéric Zheng, Alexandre Proutière
NLP Efficient ML Optimization
  • Introduction of the prediction alignment problem focusing on model outputs.
  • Development of Curvature-Guided LoRA (CG-LoRA) leveraging curvature information for low-rank updates.
  • Demonstration of improved performance and faster convergence compared to existing LoRA variants.
  • Emphasis on the significance of function-space alignment in parameter-efficient fine-tuning.
Read more
Lie Generator Networks for Nonlinear Partial Differential Equations
Shafayeth Jamil, Rehan Kapadia
Theory Interpretability Time Series
  • Introduction of LGN-KM, a neural operator for nonlinear PDEs that lifts dynamics into a linear space.
  • Structured decomposition of the Koopman generator enhances stability and interpretability.
  • Successful application on Navier–Stokes turbulence, recovering known physical properties from data alone.
  • Demonstration of gauge invariance across different flow regimes.
Read more
An Explicit Surrogate for Gaussian Mixture Flow Matching with Wasserstein Gap Bounds
Elham Rostami, Taous-Meriem Laleg-Kirati, Hamidou Tembine
Optimization Generative Models Theory
  • Development of a closed-form surrogate for Gaussian mixture transport using affine flow dynamics.
  • Establishment of second-order agreement between the surrogate and the exact Gaussian Wasserstein cost under local commuting assumptions.
  • Derivation of an explicit cubic bound on the surrogate-Wasserstein gap in local commuting regimes.
  • Introduction of a path-splitting strategy for improved error control in nonlocal transport scenarios.
Read more
Beta-Scheduling: Momentum from Critical Damping as a Diagnostic and Correction Tool for Neural Network Training
Ivan Pasichnyk
Optimization Interpretability Efficient ML
  • Introduces a diagnostic pipeline that connects damping regimes, error-specific gradient attribution, and surgical layer correction.
  • Successfully identifies and corrects errors in specific layers of neural networks without full retraining.
  • Demonstrates cross-optimizer invariance in identifying problematic layers, suggesting architectural issues rather than optimizer artifacts.
  • Achieves significant computational savings (82%) and performance improvements (+22) compared to full retraining.
Read more
Variational Graph Neural Networks for Uncertainty Quantification in Inverse Problems
David Gonzalez, Alba Muixi, Beatriz Moya, Elias Cueto
Graph Learning
  • Introduces a hybrid architecture combining graph neural networks with variational inference for uncertainty quantification.
  • Addresses limitations of traditional deterministic methods in inverse problems by providing measures of confidence in predictions.
  • Demonstrates high precision in recovering physical parameters and estimating loads with associated confidence intervals.
  • Validates methodology through practical applications in solid mechanics, showcasing its effectiveness in real-world scenarios.
Read more
Mitigating Forgetting in Continual Learning with Selective Gradient Projection
Anika Singh, Aayush Dhaulakhandi, Varun Chopade, Likhith Malipati, David Martinez, Kevin Zhu
Optimization Theory Efficient ML
  • Introduction of Selective Forgetting-Aware Optimization (SFAO) to mitigate catastrophic forgetting.
  • Utilization of cosine similarity and per-layer gating for controlled gradient updates.
  • Achieves a 90% reduction in memory cost while maintaining competitive accuracy.
  • Demonstrates improved performance on continual learning benchmarks, especially with MNIST datasets.
Read more
Foundations of Polar Linear Algebra
Giovanni Guasti
Theory Efficient ML Interpretability
  • Introduction of Polar Linear Algebra as a structured framework for operator learning.
  • Demonstrated effectiveness on the MNIST benchmark, showing reliable training of polar operators.
  • Imposing self-adjoint-inspired spectral constraints improves training stability and convergence.
  • Reduction in parameter count and computational complexity while enhancing interpretability.
Read more
Target-Aligned Reinforcement Learning
Leonard S. Pleiss, James Harrison, Maximilian Schiffer
Reinforcement Learning Theory Optimization
  • TARL mitigates the stability-recency tradeoff by prioritizing updates based on alignment between target and online network estimates.
  • A novel offline-online target alignment metric is introduced to quantify the agreement between value estimates.
  • Theoretical analysis indicates that learning on aligned transitions acts as a variance reduction mechanism, improving learning efficiency.
  • Empirical results show consistent improvements over standard RL algorithms in various environments.
Read more
From Density Matrices to Phase Transitions in Deep Learning: Spectral Early Warnings and Interpretability
Max Hennick, Guillaume Corlouer
Theory Interpretability Optimization
  • Introduction of the 2-datapoint reduced density matrix (2RDM) for studying phase transitions in neural networks.
  • Derivation of spectral diagnostics: spectral heat capacity for early warning of second-order transitions and participation ratio for dimensionality assessment.
  • Top eigenvectors of the 2RDM offer mechanistic insights into the nature of transitions.
  • Validation of the framework across multiple distinct settings.
Read more
Reward-Based Online LLM Routing via NeuralUCB
Ming-Hua Tsai, Phat Tran
Large Language Models NLP Reinforcement Learning
  • NeuralUCB is proposed as a novel approach for cost-aware LLM routing.
  • The method outperforms random and min-cost baselines in utility reward.
  • Achieves lower inference costs while maintaining competitive rewards compared to max-quality models.
  • The study highlights challenges in action discrimination and exploration.
Read more
HCLSM: Hierarchical Causal Latent State Machines for Object-Centric World Modeling
Jaber Jaber, Osama Jaber
Robotics Computer Vision Graph Learning
  • HCLSM integrates object-centric decomposition, hierarchical temporal dynamics, and causal reasoning in a single differentiable architecture.
  • The two-stage training protocol enhances model performance by first specializing object slots before predicting dynamics.
  • The model achieves state-of-the-art performance on the PushT benchmark with significant improvements in prediction accuracy and speed.
  • A custom Triton kernel optimizes the selective state space model scan, drastically reducing computational time.
Read more
On the Asymptotics of Self-Supervised Pre-training: Two-Stage M-Estimation and Representation Symmetry
Mohammad Tinati, Stephen Tu
Theory
  • Develops an asymptotic theory for self-supervised pre-training using two-stage M-estimation.
  • Addresses the challenge of group symmetry in pre-training estimators through Riemannian geometry.
  • Establishes a link between pre-training representations and downstream predictors via orbit-invariance.
  • Applies theoretical results to case studies, showing substantial improvements over prior work.
Read more
Preconditioned Attention: Enhancing Efficiency in Transformers
Hemanth Saratchandran
Efficient ML Optimization Theory
  • Standard attention mechanisms in Transformers can produce ill-conditioned matrices, negatively impacting training efficiency.
  • Preconditioned attention introduces a conditioning matrix to improve the condition number of attention matrices.
  • The method is theoretically grounded and empirically validated across multiple transformer applications.
  • Preconditioned attention is compatible with various existing attention mechanisms, enhancing their performance.
Read more
Kernel Dynamics under Path Entropy Maximization
Jnaneshwar Das
Theory
  • The kernel function is treated as a dynamical variable, influencing the optimization landscape of inference.
  • Fixed points of the dynamics correspond to self-consistent kernels that reinforce their own distinction structures.
  • Kernel change incurs a thermodynamic cost, establishing a link between information theory and thermodynamics.
  • The framework connects various domains, including biology, learning, and craft mastery, through structured correspondences.
Read more
Capturing Multivariate Dependencies of EV Charging Events: From Parametric Copulas to Neural Density Estimation
Martin Výboh, Gabriela Grmanová
Time Series
  • Introduces Vine copulas and CODINE for modeling EV charging events.
  • Demonstrates superior performance in capturing multivariate dependencies compared to traditional methods.
  • Evaluates models on diverse real-world datasets, enhancing generalizability.
  • Preserves tail behaviors and correlation structures effectively.
Read more
Monodense Deep Neural Model for Determining Item Price Elasticity
Lakshya Garg, Sai Yaswanth, Deep Narayan Mishra, Karthik Kumaran, Anupriya Sharma, Mayank Uniyal
Optimization Theory Time Series
  • Proposes a novel framework for estimating item price elasticity using large-scale transactional data.
  • Introduces the Monodense deep neural network, which combines various neural network layers for improved performance.
  • Demonstrates the ability to model price elasticity without requiring control/treatment groups, making it scalable for millions of items.
  • Shows superior performance of the proposed model compared to traditional econometric and machine learning methods.
Read more
Concept frustration: Aligning human concepts and machine representations
Enrico Parisini, Christopher J. Soelistyo, Ahab Isaac, Alessandro Barp, Christopher R.S. Banerji
Interpretability
  • Introduces the concept of 'concept frustration' to describe contradictions in known concepts due to unobserved concepts.
  • Develops task-aligned similarity measures for detecting concept frustration in machine learning models.
  • Demonstrates that frustration can degrade both performance and interpretability in concept-based models.
  • Provides a closed-form expression for Bayes-optimal classifier accuracy, highlighting the impact of frustration.
Read more
PRISM: PRIor from corpus Statistics for topic Modeling
Tal Ishon, Yoav Goldberg, Uri Shaham
NLP
  • PRISM enhances LDA by using corpus-intrinsic statistics for initialization.
  • The method improves topic coherence and interpretability without relying on external knowledge.
  • Empirical results show PRISM's effectiveness across diverse datasets, including text and biological data.
  • The approach is particularly beneficial in domains with limited external resources.
Read more
Stochastic Dimension Implicit Functional Projections for Exact Integral Conservation in High-Dimensional PINNs
Zhangyong Liang
Theory Optimization Efficient ML
  • Introduces the Stochastic Dimension Implicit Functional Projection (SDIFP) framework for enforcing exact conservation laws in PINNs.
  • Bypasses the need for deterministic quadrature and spatial grid dependencies, enhancing scalability in high dimensions.
  • Implements a doubly-stochastic unbiased gradient estimator (DS-UGE) to reduce memory complexity during optimization.
  • Maintains O(1) point-wise inference efficiency while ensuring mathematical regularity of solutions.
Read more
Label-efficient Training Updates for Malware Detection over Time
Luca Minnei, Cristian Manca, Giorgio Piras, Angelo Sotgiu, Maura Pintor, Daniele Ghiani, Davide Maiorca, Giorgio Giacinto, Battista Biggio
Efficient ML
  • Proposes a model-agnostic framework combining active learning and semi-supervised learning for malware detection.
  • Demonstrates a reduction in manual labeling costs by up to 90% while maintaining detection performance.
  • Introduces a feature-level drift analysis methodology to understand feature stability and its impact on performance.
  • Evaluates a comprehensive set of AL and SSL techniques across multiple platforms (Android and Windows).
Read more
Tucker Attention: A generalization of approximate attention mechanisms
Timon Klein, Jonas Kusch, Sebastian Sager, Stefan Schnake, Steffen Schotthöfer
NLP Large Language Models Efficient ML
  • Tucker Attention generalizes existing approximate attention mechanisms, providing a more interpretable framework.
  • It significantly reduces the number of parameters required for self-attention while maintaining performance.
  • The method encompasses existing techniques like GQA and MLA, offering insights into their low-rank structures.
  • Tucker Attention is compatible with advanced attention techniques such as flash-attention and RoPE.
Read more
A Tight Expressivity Hierarchy for GNN-Based Entity Resolution in Master Data Management
Ashwin Ganesan
Graph Learning Theory Efficient ML
  • Introduces a separation theory for entity resolution using MPNNs on typed entity-attribute graphs.
  • Establishes necessary and sufficient conditions for various entity resolution tasks.
  • Demonstrates a complexity gap between detecting single and multiple shared attributes.
  • Proposes a minimal-architecture principle for selecting MPNN adaptations based on task requirements.
Read more
ERPO: Token-Level Entropy-Regulated Policy Optimization for Large Reasoning Models
Song Yu, Li Li
NLP Large Language Models Reinforcement Learning
  • ERPO improves reasoning in LLMs by focusing on token-level dynamics instead of sequence-level advantages.
  • Critical Decision Pivots (CDPs) are identified as crucial points where the model's reasoning is most sensitive to perturbations.
  • The methodology includes Entropy-aware Gating, Bucket-based Implicit Normalization, and Result-anchored Advantage Synthesis.
  • Extensive experiments show ERPO significantly enhances reasoning accuracy and produces more concise derivation paths compared to GRPO.
Read more
Derived Fields Preserve Fine-Scale Detail in Budgeted Neural Simulators
Wenshuo Wang, Fan Zhang
Optimization Theory Efficient ML
  • Introduces Derived-Field Optimization (DerivOpt) for state design in neural simulators.
  • Demonstrates that primitive and derived fields retain detail differently under fixed storage budgets.
  • Shows that fine-scale fidelity can be significantly improved by optimizing the choice of carried fields.
  • Empirical results indicate that carried-state design is a critical factor in neural simulation performance.
Read more
Efficient and Scalable Granular-ball Graph Coarsening Method for Large-scale Graph Node Classification
Guan Wang, Shuyin Xia, Lei Qian, Guoyin Wang, Yi Liu, Yi Wang, Wei Wang
Graph Learning Efficient ML
  • Introduces a multi-granularity granular-ball graph coarsening algorithm.
  • Achieves linear time complexity in the graph coarsening process.
  • Enhances training efficiency and scalability of GCNs for large-scale datasets.
  • Demonstrates superior performance in node classification tasks compared to existing methods.
Read more
HISA: Efficient Hierarchical Indexing for Fine-Grained Sparse Attention
Yufei Xu, Fanxu Meng, Fan Jiang, Yuxuan Wang, Ruijie Zhou, Jiexi Wu, Zhixin Pan, Zhaohui Wang, Xiaojuan Tang, Wenjie Pei, Tongxuan Liu, Di yin, Xing Sun, Muhan Zhang
Large Language Models Efficient ML
  • HISA provides a two-stage hierarchical indexing approach that significantly reduces indexing complexity.
  • The method achieves 2-4× speedup in kernel-level benchmarks without compromising selection quality.
  • HISA is a drop-in replacement for existing indexers, requiring no retraining or architectural modifications.
  • Empirical results show that HISA closely matches the performance of traditional sparse attention mechanisms.
Read more
Reducing Oracle Feedback with Vision-Language Embeddings for Preference-Based RL
Udita Ghosh, Dripta S. Raychaudhuri, Jiachen Li, Konstantinos Karydis, Amit Roy-Chowdhury
Reinforcement Learning Robotics Multimodal
  • ROVED combines vision-language embeddings with targeted oracle feedback for efficient preference-based reinforcement learning.
  • The framework reduces oracle queries by up to 80% while maintaining performance comparable to oracle-only methods.
  • A parameter-efficient fine-tuning method enhances the quality of VLE-generated preferences.
  • The adapted VLE demonstrates strong cross-task generalization, yielding up to 90% cumulative annotation savings.
Read more
Deep Learning-Based Anomaly Detection in Spacecraft Telemetry on Edge Devices
Christopher Goetze, Tim Schlippe, Daniel Lakey
Time Series Optimization Efficient ML
  • Three deep learning approaches for anomaly detection in spacecraft telemetry were evaluated.
  • The forecasting & threshold method outperformed other approaches with a CEF0.5 of 92.7%.
  • Neural architecture optimization significantly reduced model size and computational requirements.
  • Optimized models can operate within the stringent constraints of space-grade hardware.
Read more
Improving Efficiency of GPU Kernel Optimization Agents using a Domain-Specific Language and Speed-of-Light Guidance
Siva Kumar Sastry Hari, Vignesh Balaji, Sana Damani, Qijing Huang, Christos Kozyrakis
Optimization Efficient ML
  • Introduction of µCUTLASS, a compact DSL for GPU kernel optimization.
  • Implementation of Speed-of-Light (SOL) guidance to improve optimization efficiency.
  • Demonstrated significant speedups in kernel performance over traditional methods.
  • Reduction in token costs while maintaining high performance.
Read more
From Astronomy to Astrology: Testing the Illusion of Zodiac-Based Personality Prediction with Machine Learning
Abhinna Sundar Samantaray, Finnja Annika Fluhrer, Dhruv Saini, Omkar Charaple, Anish Kumar Singh, Dhruv Vansraj Rathore
Theory
  • Astrology lacks a credible causal mechanism and has not demonstrated predictive validity.
  • A synthetic dataset was created to test zodiac-based personality predictions using machine learning.
  • Machine learning classifiers showed performance indistinguishable from random chance.
  • The success of astrology is attributed to cognitive biases and the overlap of common personality traits.
Read more
ATLAS-RTC: Closing the Loop on LLM Agent Output with Token-Level Runtime Control
Christopher Cruz
Large Language Models NLP Generative Models
  • ATLAS-RTC intercepts token generation at the logit level, allowing for real-time corrections.
  • The system employs a graduated intervention policy to ensure structured outputs without modifying model weights.
  • Significant improvements in JSON schema satisfaction (from 56.7% to 76.7%) and tool call reliability (from 28.3% to 58.3%) were achieved.
  • The paper provides an honest characterization of failure modes and conditions under which ATLAS-RTC may degrade.
Read more
Key-Embedded Privacy for Decentralized AI in Biomedical Omics
Rongyu Zhang, Hongyu Dong, Gaole Dai, Ziqi Qiao, Shenli Zheng, Yuan Zhang, Aosong Cheng, Xiaowei Chi, Jincai Luo, Pin Li, Li Du, Dan Wang, Yuan Du, Xudong Xing, Jianxu Chen, Shanghang Zhang
Federated Learning
  • Introduction of INFL, a lightweight federated learning method for biomedical applications.
  • Integration of a secret key into the model architecture to enhance privacy.
  • Demonstrated effectiveness across diverse biomedical omics tasks.
  • Maintains model utility while ensuring strong privacy controls.
Read more
The Spectral Edge Thesis: A Mathematical Framework for Intra-Signal Phase Transitions in Neural Network Training
Yongzhong Xu
Theory
  • The Spectral Edge Thesis provides a new mathematical framework for understanding phase transitions in neural network training.
  • Empirical evidence shows that gap dynamics in the Gram matrix are crucial for predicting grokking events.
  • The framework is architecture-agnostic and relies on NTK eigenvalues and Hessian curvatures.
  • The study confirms 19 out of 20 predictions, highlighting the robustness of the proposed framework.
Read more
Causality-inspired Federated Learning for Dynamic Spatio-Temporal Graphs
Yuxuan Liu, Wenchao Xu, Haozhao Wang, Zhiming He, Zhaofeng Shi, Chongyang Xu, Peichao Wang, Boyuan Zhang
Graph Learning Federated Learning Time Series
  • SC-FSGL addresses the challenges of representation entanglement and negative transfer in Federated Learning for dynamic graphs.
  • The framework introduces a Conditional Separation Module to decouple transferable causal knowledge from client-specific noise.
  • A Causal Codebook is utilized to promote knowledge sharing and consistency across clients through contrastive learning.
  • Experiments show significant performance improvements over existing state-of-the-art methods on various datasets.
Read more
Mind the Gap: A Framework for Assessing Pitfalls in Multimodal Active Learning
Dustin Eisenhardt, Yunhee Jeong, Florian Buettner
Multimodal
  • Identification of three critical pitfalls in multimodal active learning: missing modalities, modality imbalance, and varying interaction structures.
  • Development of a benchmarking framework using synthetic datasets to isolate and analyze the effects of these pitfalls.
  • Empirical comparison of unimodal and multimodal query strategies, demonstrating that existing methods do not adequately address the identified challenges.
  • Findings indicate that models often rely on a single modality, leading to imbalanced representations.
Read more
Learning to Select Visual In-Context Demonstrations
Eugene Lee, Yu-Chi Lin, Jiajie Diao
Multimodal Reinforcement Learning Computer Vision
  • Introduction of LSD, a framework that reformulates demonstration selection as a sequential decision-making problem.
  • Utilization of a Dueling DQN agent to learn optimal demonstration sets that maximize MLLM performance.
  • Identification of a task-dependent dichotomy in visual ICL, highlighting the effectiveness of kNN for subjective tasks and LSD for objective tasks.
  • Comprehensive evaluation across five visual regression benchmarks demonstrating the superiority of the proposed method.
Read more
On the Mirage of Long-Range Dependency, with an Application to Integer Multiplication
Zichao Wei
Theory
  • Long-range dependency in integer multiplication is a mirage, not an intrinsic property.
  • Representing integers in a 2D grid allows multiplication to be performed with local operations.
  • A neural cellular automaton with minimal parameters can achieve high generalization in multiplication tasks.
  • Existing architectures like Transformers struggle with multiplication due to their reliance on 1D representations.
Read more
ARCS: Autoregressive Circuit Synthesis with Topology-Aware Graph Attention and Spec Conditioning
Tushar Dhananjay Pathak
Reinforcement Learning Generative Models Graph Learning
  • ARCS generates complete analog circuit designs in milliseconds, significantly faster than traditional methods.
  • Achieves 99.9% simulation validity with only 8 SPICE evaluations, a substantial reduction compared to genetic algorithms.
  • Introduces Group Relative Policy Optimization (GRPO) to improve reinforcement learning for multi-topology circuit design.
  • Utilizes grammar-constrained decoding to ensure 100% structural validity of generated circuits.
Read more
MR-ImagenTime: Multi-Resolution Time Series Generation through Dual Image Representations
Xianyong Xu, Yuanjun Zuo, Zhihong Huang, Yihan Qin, Haoxian Xu, Leilei Du, Haotian Wang
Time Series Generative Models
  • MR-CDM effectively handles variable-length time series without fixed input windows.
  • The framework incorporates multi-scale trend decomposition to model temporal patterns at different resolutions.
  • Experiments show MR-CDM significantly outperforms existing state-of-the-art models in forecasting accuracy.
  • The proposed method enhances the robustness of forecasts across heterogeneous sequence lengths and temporal scales.
Read more
Quality-Controlled Active Learning via Gaussian Processes for Robust Structure-Property Learning in Autonomous Microscopy
Jawad Chowdhury, Ganesh Narasimha, Jan-Chi Yang, Yongtao Liu, Rama Vasudevan
Optimization Efficient ML Robotics
  • Introduces ActiveQC, a gated active learning framework that prioritizes high-quality data acquisition.
  • Combines curiosity-driven sampling with physics-informed quality control to mitigate the effects of noisy data.
  • Demonstrates superior performance over traditional active learning methods in structure-property learning tasks.
  • Successfully applied in real-time autonomous microscopy experiments, validating its practical utility.
Read more
Optimistic Actor-Critic with Parametric Policies for Linear Markov Decision Processes
Max Qiushi Lin, Reza Asad, Kevin Tan, Haque Ishfaq, Csaba Szepesvari, Sharan Vaswani
Reinforcement Learning Theory Optimization
  • Introduces an optimistic actor-critic framework for linear MDPs with parametric policies.
  • Utilizes logit-matching regression for the actor and Langevin Monte Carlo for the critic.
  • Achieves state-of-the-art sample complexity in both on-policy and off-policy settings.
  • Demonstrates practical applicability through experiments in linear MDPs and Atari environments.
Read more
Hybrid Quantum-Classical Spatiotemporal Forecasting for 3D Cloud Fields
Fu Wang, Qifeng Lu, Xinyu Long, Meng Zhang, Xiaofei Yang, Weijia Cao, Xiaowen Chu
Time Series
  • Introduction of QENO, a hybrid quantum-classical framework for 3D cloud forecasting.
  • Utilization of a topology-aware quantum enhancement block to model nonlocal interactions.
  • Development of a dynamic fusion temporal unit that integrates quantum features with classical memory.
  • Demonstrated superior performance over existing forecasting models in terms of accuracy and structural fidelity.
Read more
Symbolic Density Estimation: A Decompositional Approach
Angelo Rajendram, Xieting Chu, Vijay Ganesh, Max Fieg, Aishik Ghosh
Theory Interpretability
  • Introduction of AI-Kolmogorov for Symbolic Density Estimation (SymDE).
  • Multi-stage pipeline includes decomposition, nonparametric estimation, support estimation, and symbolic regression.
  • Demonstrated efficacy on synthetic and exotic distributions, including applications in high-energy physics.
  • Addresses challenges in ensuring valid probability distributions and discovering complex symbolic expressions.
Read more
Physics-Guided Transformer (PGT): Physics-Aware Attention Mechanism for PINNs
Ehsan Zeraatkar, Rodion Podorozhny, Jelena Tešić
Theory Efficient ML Optimization
  • Introduction of Physics-Guided Transformer (PGT) that integrates physical structure into self-attention.
  • PGT achieves significant improvements in reconstruction accuracy and stability compared to traditional PINNs and other methods.
  • Utilizes a heat-kernel-derived additive bias to enforce physical consistency in attention mechanisms.
  • Demonstrates effective performance on both diffusion-dominated and convection-dominated systems.
Read more
From Physics to Surrogate Intelligence: A Unified Electro-Thermo-Optimization Framework for TSV Networks
Mohamed Gharib, Leonid Popryho, Inna Partin-Vaisband
Optimization Graph Learning
  • Introduces a unified framework for electro-thermal modeling and optimization of TSV networks.
  • Combines physics-informed analytical modeling with GNN surrogates for efficient design-space exploration.
  • Achieves significant reduction in computational time for TSV configuration evaluations.
  • Demonstrates strong validation results against traditional FEM methods.
Read more
Physics-Informed Framework for Impact Identification in Aerospace Composites
Natália Ribeiro Marinho, Richard Loendersloot, Jan Willem Wiegman, Frank Grooteman, Tiedo Tinga
Theory Interpretability
  • Introduction of a physics-informed framework for impact identification in aerospace composites.
  • Integration of physical knowledge with data-driven inference to enhance reliability.
  • Demonstrated capability to infer impact parameters with high accuracy under challenging conditions.
  • Stable performance even with reduced data and increased noise, indicating robustness.
Read more
Match or Replay: Self Imitating Proximal Policy Optimization
Gaurav Chaudhary, Laxmidhar Behera, Washim Uddin Mondal
Reinforcement Learning Robotics Efficient ML
  • Introduction of Self-Imitating Proximal Policy Optimization (SIPP) for improved exploration and sample efficiency.
  • MATCH strategy utilizes optimal transport to prioritize rewarding state-action transitions in dense reward settings.
  • REPLAY strategy enhances learning in sparse reward environments by replaying successful trajectories.
  • Empirical validation shows SIPP outperforms state-of-the-art self-imitating RL methods.
Read more
Meteorology-Driven GPT4AP: A Multi-Task Forecasting LLM for Atmospheric Air Pollution in Data-Scarce Settings
Prasanjit Dey, Soumyabrata Dev, Bianca Schoen-Phelan
Large Language Models Time Series Efficient ML
  • GPT4AP is a parameter-efficient multi-task forecasting model for air pollution.
  • The model utilizes a pre-trained GPT-2 backbone with adaptations to reduce trainable parameters.
  • It demonstrates superior performance in few-shot and zero-shot learning scenarios compared to existing models.
  • GPT4AP maintains competitive accuracy in long-term forecasting with full training data.
Read more
Distributed Online Submodular Maximization under Communication Delays: A Simultaneous Decision-Making Approach
Zirui Xu, Vasileios Tzoumas
Optimization Robotics Theory
  • Introduces the DOG algorithm for distributed online submodular maximization.
  • Addresses communication delays that hinder existing sequential and one-hop coordination methods.
  • Establishes a trade-off between coordination performance and convergence time based on network structure.
  • Provides theoretical performance guarantees and approximation ratios for DOG.
Read more
InkDrop: Invisible Backdoor Attacks Against Dataset Condensation
He Yang, Dongyi Lv, Song Ma, Wei Xi, Zhi Wang, Hanlin Gu, Yajie Wang
Computer Vision Efficient ML Theory
  • InkDrop enhances stealthiness in backdoor attacks against Dataset Condensation.
  • The method utilizes model uncertainty near decision boundaries to create effective perturbations.
  • InkDrop maintains model utility while embedding malicious behavior into condensed datasets.
  • Extensive experiments validate the effectiveness and imperceptibility of the proposed attack.
Read more
Realistic Market Impact Modeling for Reinforcement Learning Trading Environments
Lucas Riera Abbade, Anna Helena Reali Costa
Reinforcement Learning Optimization Time Series
  • Introduction of a suite of Gymnasium-compatible trading environments with realistic market impact models.
  • Significant differences in trading behavior and performance metrics when using the AC model compared to a fixed cost model.
  • Hyperparameter optimization is crucial for improving out-of-sample performance and preventing pathological trading behaviors.
  • The choice of algorithm interacts with the cost model in environment-specific ways, affecting overall trading outcomes.
Read more
Subspace Optimization for Backpropagation-Free Continual Test-Time Adaptation
Damian Sójka, Sebastian Cygert, Marc Masana
Optimization Efficient ML
  • Introduction of PACE, a backpropagation-free TTA method optimizing normalization layers.
  • Utilization of CMA-ES and Fastfood projections to enhance adaptation capabilities.
  • Dynamic stopping criterion to minimize computational overhead during stable domains.
  • Integration of a domain-specialized vector bank for rapid adaptation to recurring domains.
Read more
Refined Detection for Gumbel Watermarking
Tor Lattimore
NLP Large Language Models Theory
  • Introduces a refined detection mechanism for Gumbel watermarking.
  • Proven to be nearly optimal among model-agnostic watermarking schemes.
  • Detection can be performed without access to the model.
  • Establishes upper and lower bounds on token requirements for detection.
Read more
The Geometry of Polynomial Group Convolutional Neural Networks
Yacoub Hendi, Daniel Persson, Magdalena Larfors
Theory
  • Introduction of a mathematical framework for PGCNNs using graded group algebras.
  • Two parametrization methods (Hadamard and Kronecker products) for polynomial activation functions.
  • Dimension of the neuromanifold is determined by the number of layers and group size, not by polynomial degree.
  • Description of the general fiber of the Kronecker parametrization and conjecture for Hadamard parametrization.
Read more
FI-KAN: Fractal Interpolation Kolmogorov-Arnold Networks
Gnankan Landry Regis N'guessan
Theory Efficient ML Interpretability
  • Introduction of two FI-KAN architectures: Pure FI-KAN and Hybrid FI-KAN.
  • Learnable fractal dimensions allow for adaptive basis functions that match target regularity.
  • Hybrid FI-KAN shows substantial performance improvements over traditional KAN across various benchmarks.
  • The study provides empirical evidence supporting the regularity-matching hypothesis in function approximation.
Read more
AutoStan: Autonomous Bayesian Model Improvement via Predictive Feedback
Oliver Dürr
Optimization Theory Interpretability
  • AutoStan autonomously builds and improves Bayesian models using Stan without manual intervention.
  • The framework utilizes NLPD and sampler diagnostics as feedback for iterative model enhancement.
  • AutoStan demonstrates superior performance on diverse datasets compared to existing black-box methods.
  • The approach is agent-agnostic, applicable to any CLI coding agent capable of executing shell commands.
Read more
DiSGMM: A Method for Time-varying Microscopic Weight Completion on Road Networks
Yan Lin, Jilin Hu, Shengnan Guo, Christian S. Jensen, Youfang Lin, Huaiyu Wan
Graph Learning Time Series Optimization
  • DiSGMM addresses two layers of data sparsity in microscopic weight completion.
  • The method uses Gaussian mixture models for flexible and closed-form distribution representation.
  • DiSGMM combines static and dynamic embeddings to balance known weights and inherent segment information.
  • Experiments show significant performance improvements over existing methods.
Read more
Loss Gap Parity for Fairness in Heterogeneous Federated Learning
Brahim Erraji, Michaël Perrot, Aurélien Bellet
Federated Learning Optimization Theory
  • EAGLE algorithm minimizes disparities in loss gaps among clients in federated learning.
  • Focuses on fairness in relative improvements rather than loss parity, avoiding performance degradation for certain clients.
  • Theoretical convergence guarantees are provided for non-convex loss functions.
  • Empirical results show EAGLE reduces loss gap variance while maintaining or improving overall model performance.
Read more
Squish and Release: Exposing Hidden Hallucinations by Making Them Surface as Safety Signals
Nathaniel Oh, Paul Attie
NLP Large Language Models Interpretability
  • Introduction of the Squish and Release (S&R) framework for detecting hidden hallucinations in AI models.
  • Identification of a fixed detector body in the model's architecture that is crucial for safety evaluations.
  • Demonstration that synthetically engineered cores significantly outperform empirically discovered cores in releasing hidden signals.
  • Establishment of the Order-Gap Benchmark to evaluate model performance across various domains.
Read more
OneComp: One-Line Revolution for Generative AI Model Compression
Yuma Ichikawa, Keiji Kimura, Akihiro Yoshida, Yudai Fujimoto, Hiroki Tokura, Yamato Arai, Yoshiyuki Ishii, Yusei Kawakami, Genki Shikada, Achille Jacquemond, Yoshihiko Fujisawa, Katsuki Fujisawa, Takumi Honda, Akira Sakai
Efficient ML Generative Models Optimization
  • OneComp automates the model compression process, making it accessible to practitioners.
  • The framework adapts to available hardware, optimizing quantization stages accordingly.
  • It integrates various compression techniques, ensuring improved model quality with each stage.
  • OneComp serves as a bridge between theoretical research and practical application in model deployment.
Read more
Why not to use Cosine Similarity between Label Representations
Beatrix M. G. Nielsen
Theory
  • Cosine similarity does not correlate with model probabilities in softmax classifiers.
  • It is possible to create models with identical probabilities but different cosine similarities.
  • Translation of unembeddings can lead to misleading cosine similarity values.
  • Centering or fixing the length of representations does not resolve the disconnect between cosine similarity and probabilities.
Read more
A Latent Risk-Aware Machine Learning Approach for Predicting Operational Success in Clinical Trials based on TrialsBank
Iness Halimi, Emmanuel Piffo, Oumnia Boudersa, Yvan Marcel Carre Vilmorin, Melissa Ait-ikhlef, Karima Kone, Andy Tan, Augustin Medina, Juliette Hernando, Sheila Ernest, Vatche Bartekian, Karine Lalonde, Mireille E Schnitzer, Gianolli Dorcelus
Optimization
  • Introduces a hierarchical latent risk-aware machine learning framework for predicting clinical trial success.
  • Utilizes a curated dataset from TrialsBank comprising 13,700 trials.
  • Achieves high F1-scores (0.93, 0.92, 0.91) across Phase I-III trials.
  • Demonstrates improved discrimination of operational failures by incorporating latent risk factors.
Read more
A Neural Tension Operator for Curve Subdivision across Constant Curvature Geometries
Hassan Ugail, Newton Howard
Computer Vision Theory Generative Models
  • Introduction of a shared learned tension predictor for curve subdivision across different geometries.
  • The method achieves lower bending energy and angular roughness compared to traditional fixed-tension methods.
  • Theoretical guarantees ensure structural safety and convergence of the proposed approach.
  • Empirical results demonstrate effective generalization beyond the training distribution.
Read more