AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

62 Papers today
8h Update frequency
7 Days of history
Forgetting to Witness: Efficient Federated Unlearning and Its Visible Evaluation
Houzhe Wang, Xiaojie Zhu, Chi Chen
Federated Learning Generative Models Efficient ML
  • Introduction of a complete pipeline for federated unlearning, including an efficient unlearning approach and a novel evaluation framework.
  • Utilization of knowledge distillation to facilitate the unlearning process without requiring historical data.
  • Development of Skyeye, a visualization framework that assesses the forgetting capacity of federated unlearning models.
  • Demonstration of the effectiveness of the proposed methods through comprehensive experimental results.
Read more
LMI-Net: Linear Matrix Inequality--Constrained Neural Networks via Differentiable Projection Layers
Sunbochen Tang, Andrea Goertzen, Navid Azizan
Optimization Theory Efficient ML
  • LMI-Net introduces a differentiable projection layer specifically designed for enforcing LMI constraints in neural networks.
  • The framework utilizes Douglas–Rachford splitting for efficient projection and implicit differentiation.
  • Theoretical convergence guarantees ensure that the model reliably satisfies LMI constraints.
  • Experimental results show improved feasibility and robustness under various conditions compared to traditional soft-constrained models.
Read more
Training Without Orthogonalization, Inference With SVD: A Gradient Analysis of Rotation Representations
Chris Choy
Computer Vision Robotics Theory
  • Removing orthogonalization during training improves rotation estimation in deep learning.
  • SVD orthogonalization introduces gradient pathologies that hinder training performance.
  • Direct 9D regression is preferable to 6D regression in the unorthogonalized training regime.
  • The paper provides a theoretical analysis of the SVD Jacobian's spectrum and its implications for gradient flow.
Read more
Optimal-Transport-Guided Functional Flow Matching for Turbulent Field Generation in Hilbert Space
Li Kunpeng, Wan Chenguang, Qu Zhisong, Lim Kyungtak, Virginie Grandgirard, Xavier Garbet, Yu Hua, Ong Yew Soon
Generative Models
  • FOT-CFM generalizes Conditional Flow Matching to infinite-dimensional Hilbert spaces.
  • Incorporation of Optimal Transport theory allows for efficient and accurate turbulence generation.
  • The method achieves high fidelity in reproducing turbulent statistics with fewer function evaluations.
  • Neural Operators are used for parameterizing the vector field, enabling resolution-invariant dynamics.
Read more
On the Geometry of Positional Encodings in Transformers
Giansalvo Cirrincione
NLP Large Language Models Theory
  • Positional information is essential for Transformers to perform order-sensitive tasks.
  • Distinct vector representations for sequence positions are learned during training.
  • An optimal positional encoding approximates statistical distances between word distributions.
  • The sinusoidal encoding is theoretically justified as nearly optimal for certain corpora.
Read more
Automated Attention Pattern Discovery at Scale in Large Language Models
Jonathan Katzy, Razvan-Mihai Popescu, Erik Mekkes, Arie van Deursen, Maliheh Izadi
Large Language Models Interpretability
  • Introduces AP-MAE as a scalable method for analyzing attention patterns in LLMs.
  • Demonstrates that attention patterns can predict the correctness of model outputs.
  • Shows that AP-MAE generalizes across different models with minimal loss.
  • Establishes attention patterns as a tractable and informative object of study.
Read more
Generative models for decision-making under distributional shift
Xiuyuan Cheng, Yunqin Zhu, Yao Xie
Generative Models Optimization Theory
  • Generative models can effectively represent and transform distributions under distributional shifts.
  • The framework introduced leverages mathematical tools for constructing decision-relevant distributions.
  • Generative models enhance the operations research toolkit by supporting representation, robustness, and inference.
  • The paper provides theoretical guarantees for the use of generative models in decision-making contexts.
Read more
One Model for All: Multi-Objective Controllable Language Models
Qiang He, Yucheng Yang, Tianyi Zhou, Meng Fang, Mykola Pechenizkiy, Setareh Maghsudi
NLP Large Language Models Optimization
  • Introduction of Multi-Objective Control (MOC) for training LLMs to accommodate diverse user preferences.
  • MOC integrates multi-objective optimization principles into RLHF, allowing for a single model to handle multiple objectives.
  • Demonstrated superior performance in controllability, output quality, and generalization compared to baseline methods.
  • MOC's training cost is comparable to single-objective RLHF, making it efficient for practical applications.
Read more
On Dominant Manifolds in Reservoir Computing Networks
Noa Kaplan, Alberto Padoan, Anastasia Bizyaeva
Time Series Theory
  • Establishes a link between training data and the geometry of reservoir computing dynamics.
  • Demonstrates that dominant modes correspond to approximations of Koopman eigenfunctions.
  • Introduces a spectral analysis framework for characterizing trained linear RC systems.
  • Discusses generalization to nonlinear reservoirs through dominance theory.
Read more
Learning-Based Multi-Criteria Decision Making Model for Sawmill Location Problems
Mahid Ahmed, Ali Dogru, Chaoyang Zhang, Chao Meng
Optimization
  • Introduction of a novel LB-MCDM framework that minimizes reliance on subjective expert judgment.
  • Integration of machine learning with GIS-based spatial analysis for dynamic site suitability assessment.
  • Identification of the Supply-Demand Ratio as the most influential factor in sawmill location selection.
  • Demonstration of the framework's effectiveness through a case study in Mississippi.
Read more
General Multimodal Protein Design Enables DNA-Encoding of Chemistry
Jarrid Rector-Brooks, Théophile Lambert, Marta Skreta, Daniel Roth, Yueming Long, Zi-Qi Li, Xi Zhang, Miruna Cretu, Francesca-Zhoufan Li, Tanvi Ganapathy, Emily Jin, Avishek Joey Bose, Jason Yang, Kirill Neklyudov, Yoshua Bengio, Alexander Tong, Frances H. Arnold, Cheng-Hao Liu
Generative Models Multimodal
  • DISCO enables simultaneous design of protein sequences and 3D structures without pre-defined motifs.
  • The model generates enzymes that catalyze new-to-nature reactions with high activity.
  • Random mutagenesis confirms that enzyme activity can be enhanced through directed evolution.
  • DISCO expands the searchable space for DNA-encoded chemical reactivity.
Read more
Topological Characterization of Churn Flow and Unsupervised Correction to the Wu Flow-Regime Map in Small-Diameter Vertical Pipes
Brady Koenig, Sushovan Majhi, Atish Mitra, Abigail Stein, Burt Todd
Theory
  • Introduces the first topology-based characterization of churn flow using Euler Characteristic Surfaces.
  • Develops an unsupervised learning framework that blends ECS-derived kernels with gas velocity.
  • Demonstrates significant discrepancies between ECS-inferred transitions and existing models, indicating under-prediction of slug flow persistence.
  • Achieves high classification accuracy and recall rates without labeled training data, surpassing traditional supervised methods.
Read more
Autoencoder-Based Parameter Estimation for Superposed Multi-Component Damped Sinusoidal Signals
Momoka Iida, Hayato Motohashi, Hirotaka Takahashi
Time Series
  • Development of an autoencoder-based method for parameter estimation of damped sinusoidal signals.
  • Evaluation of the method under different training data distributions (Gaussian vs. uniform).
  • High accuracy in estimating parameters even in challenging scenarios with noise and overlapping components.
  • Demonstration of the method's robustness in practical applications.
Read more
Representational Collapse in Multi-Agent LLM Committees: Measurement and Diversity-Aware Consensus
Dipkumar Patel
Large Language Models NLP Efficient ML
  • Representational collapse occurs when agents produce similar outputs, reducing the effectiveness of majority voting.
  • The Diversity-Aware Latent Consensus (DALC) protocol improves accuracy by incorporating diversity weights based on embedding geometry.
  • Ablation studies show that hint sharing enhances performance more than diversity weighting alone.
  • The choice of embedding proxy significantly affects both the severity of representational collapse and the accuracy of downstream tasks.
Read more
Scaling DPPs for RAG: Density Meets Diversity
Xun Sun, Baiheng Xie, Li Huang, Qiang Gao
NLP Large Language Models Optimization
  • Introduction of ScalDPP, a diversity-aware retrieval mechanism for RAG using DPPs.
  • Development of a scalable dynamic kernel construction method to enhance complementarity in context selection.
  • Introduction of Diverse Margin Loss (DML) to optimize the embedding space for better retrieval outcomes.
  • Demonstration of ScalDPP's superiority over standard RAG models in experimental evaluations.
Read more
Algebraic Diversity: Group-Theoretic Spectral Estimation from Single Observations
Mitchell A. Thornton
Theory
  • Introduces a framework for group-theoretic spectral estimation from single observations.
  • Proves that the symmetric group is optimal for algebraic diversity in signal processing.
  • Demonstrates significant performance improvements in various applications, including MUSIC and massive MIMO.
  • Reveals new algebraic structures in transformer models, suggesting enhancements for large language models.
Read more
Controllable Image Generation with Composed Parallel Token Prediction
Jamie Stirling, Noura Al-Moubayed, Chris G. Willcocks, Hubert P. H. Shum
Generative Models Computer Vision Multimodal
  • Introduces a new framework for composing conditional distributions in discrete generative models.
  • Achieves a 63.4% relative reduction in error rates compared to previous methods across three datasets.
  • Offers a 2.3× to 12× speed-up in real-time generation over comparable continuous methods.
  • Demonstrates effective control over image generation through concept weighting and negation.
Read more
Improving Feasibility via Fast Autoencoder-Based Projections
Maria Chzhen, Priya L. Donti
Reinforcement Learning Optimization Efficient ML
  • Proposes a data-driven approach for enforcing complex operational constraints using an autoencoder.
  • Introduces a structured, convex latent representation for efficient feasibility corrections.
  • Demonstrates significant speed improvements in feasibility enforcement compared to traditional methods.
  • Empirical results show near-perfect feasibility in constrained optimization and safer actions in reinforcement learning.
Read more
Modeling Patient Care Trajectories with Transformer Hawkes Processes
Saumya Pandey, Varun Chandola
Time Series
  • Introduces a Transformer Hawkes Process framework for modeling healthcare utilization trajectories.
  • Imbalance-aware training objective improves sensitivity to rare but critical healthcare events.
  • Demonstrates improved prediction of event types and timing on real-world healthcare data.
  • Provides qualitative interpretability of model predictions related to patient risk and vulnerability.
Read more
Top-K Retrieval with Fixed-Size Linear-Attention Completion: Backbone- and KV-Format-Preserving Attention for KV-Cache Read Reduction
Yasuto Hoshi, Daisuke Miyashita, Jun Deguchi
NLP Large Language Models Efficient ML
  • Introduces a hybrid attention module that combines exact anchors with Top-K retrieval and a fixed-size completion term.
  • Maintains the original backbone language model and KV-cache format to ensure compatibility.
  • Utilizes a subtractive completion cache computed at prefill time to estimate contributions from unretrieved tokens.
  • Demonstrates improved performance in long-context benchmarks, especially in high-entropy attention scenarios.
Read more
FNO$^{\angle θ}$: Extended Fourier neural operator for learning state and optimal control of distributed parameter systems
Zhexian Li, Ketan Savla
Optimization Theory
  • Introduction of FNO∠θ, an extended architecture of the Fourier neural operator.
  • Utilization of the Ehrenpreis-Palamodov principle to represent states and optimal controls of linear PDEs.
  • Modification of the FNO layer to incorporate complex frequency variables for improved learning.
  • Demonstrated significant performance enhancements in learning state and optimal control for the nonlinear Burgers’ equation.
Read more
Towards Intelligent Energy Security: A Unified Spatio-Temporal and Graph Learning Framework for Scalable Electricity Theft Detection in Smart Grids
AbdulQoyum A. Olowookere, Usman A. Oguntola, Ebenezer Leke Odekanle, Maridiyah A. Madehin, Aisha A. Adesope
Graph Learning Time Series
  • Development of a comprehensive AI-driven framework for electricity theft detection.
  • Integration of multi-source data including electrical, environmental, and renewable energy inputs.
  • Use of hybrid anomaly detection models combining LSTM, TCN, and Autoencoders.
  • Application of graph-based learning techniques to capture spatial dependencies.
Read more
Diagonal-Tiled Mixed-Precision Attention for Efficient Low-Bit MXFP Inference
Yifu Ding, Xinhao Zhang, Jinyang Guo
Large Language Models Efficient ML
  • Introduction of Diagonal-Tiled Mixed-Precision Attention (DMA) for efficient LLM inference.
  • Development of a fully fused GPU kernel that integrates quantization and attention computation.
  • Empirical evaluations show DMA maintains generation quality with significant speedup.
  • DMA addresses challenges of low-bit quantization and unfused operations in attention mechanisms.
Read more
Learning Stable Predictors from Weak Supervision under Distribution Shift
Mehrdad Shoeibi, Elias Hossain, Ivan Garibay, Niloofar Yousefi
Theory
  • Formalization of supervision drift as a failure mode in weakly supervised learning.
  • Development of a controlled evaluation protocol to isolate supervision drift effects.
  • Demonstration of partial cross-domain robustness but severe temporal non-transferability.
  • Introduction of feature stability analysis as a diagnostic for detecting non-transferability.
Read more
Cross-Machine Anomaly Detection Leveraging Pre-trained Time-series Model
Yangmeng Li, Kei Sano, Toshihiro Kitao, Ryoji Anzaki, Yukiya Saitoh, Hironori Moki, Dragan Djurdjanovic
Time Series
  • Proposes a novel framework for cross-machine anomaly detection using pre-trained time-series models.
  • Integrates a domain-invariant feature extractor to enhance generalization across different machines.
  • Utilizes Random Forest Classifiers for feature disentanglement into machine-related and condition-related aspects.
  • Demonstrates superior performance over existing methods in detecting anomalies in industrial settings.
Read more
Not All Turns Are Equally Hard: Adaptive Thinking Budgets For Efficient Multi-Turn Reasoning
Neharika Jali, Anupam Nayak, Gauri Joshi
Large Language Models Reinforcement Learning Efficient ML
  • Introduces Turn-Adaptive Budgets (TAB) for efficient multi-turn reasoning in LLMs.
  • Models multi-turn reasoning as a multi-objective Markov Decision Process.
  • Achieves up to 35% token savings while maintaining accuracy on benchmarks.
  • Proposes TAB All-SubQ for further optimization using prior knowledge of sub-questions.
Read more
The Role of Generator Access in Autoregressive Post-Training
Amit Kiran Rege
NLP Large Language Models Theory
  • Generator access significantly influences the effectiveness of autoregressive post-training.
  • Prefix control allows learners to revisit and extend previously constructed prefixes, breaking the no-reset barrier.
  • Observation richness becomes meaningful only after prefix control is established.
  • The study establishes a clear distinction between prefix control and observation richness in the context of learning.
Read more
Grokking as Dimensional Phase Transition in Neural Networks
Ping Wang
Theory
  • Grokking is identified as a dimensional phase transition in neural networks.
  • Effective dimensionality (D) transitions from sub-diffusive to super-diffusive at generalization onset.
  • Gradient field geometry, rather than network architecture, determines the effective dimensionality.
  • The study employs finite-size scaling to analyze gradient avalanche dynamics across multiple model sizes.
Read more
Hidden in the Multiplicative Interaction: Uncovering Fragility in Multimodal Contrastive Learning
Tillmann Rheude, Stefan Hegselmann, Roland Eils, Benjamin Wild
Multimodal
  • Identifies fragility in Symile's symmetric treatment of modalities, which can degrade performance under misalignment.
  • Introduces Gated Symile, a gating mechanism that adapts modality contributions based on reliability.
  • Demonstrates improved retrieval accuracy on both synthetic and real-world datasets compared to existing methods.
  • Highlights the importance of modeling modality-specific reliability in multimodal contrastive learning.
Read more
ACES: Who Tests the Tests? Leave-One-Out AUC Consistency for Code Generation
Hui Sun, Yun-Ji Zhang, Zheng Xie, Ren-Biao Liu, Yali Du, Xin-Ye Li, Ming Li
Large Language Models Theory Optimization
  • Introduces LOO-AUC as a method to evaluate test reliability without knowing code correctness.
  • Proposes ACES, a scoring system with two variants for weighting tests based on their discriminative power.
  • Demonstrates that ACES achieves state-of-the-art results in Pass@k across multiple benchmarks.
  • Establishes a theoretical foundation linking test consistency to discriminative power.
Read more
Gym-Anything: Turn any Software into an Agent Environment
Pranjal Aggarwal, Graham Neubig, Sean Welleck
Reinforcement Learning Multimodal Generative Models
  • Gym-Anything enables the automatic creation of interactive environments for a wide variety of software applications.
  • The framework uses a multi-agent approach for environment setup and auditing, enhancing reliability and scalability.
  • CUA-World includes over 10,000 long-horizon tasks, significantly expanding the scope of agent training and evaluation.
  • The proposed auditing mechanism improves task completion rates in long-horizon benchmarks.
Read more
A Mixture of Experts Foundation Model for Scanning Electron Microscopy Image Analysis
Sk Miraj Ahmed, Yuewei Lin, Chuntian Cao, Shinjae Yoo, Xinpei Wu, Won-Il Lee, Nikhil Tiwale, Dan N. Le, Thi Thu Huong Chu, Jiyoung Kim, Kevin G. Yager, Chang-Yong Nam
Computer Vision
  • Introduction of the first foundation model tailored for SEM image analysis.
  • Utilization of a self-supervised transformer architecture with a Mixture of Experts mechanism.
  • Demonstrated capability of defocus-to-focus image translation without paired supervision.
  • Outperformed existing state-of-the-art techniques in multiple evaluation metrics.
Read more
Toward Consistent World Models with Multi-Token Prediction and Latent Semantic Enhancement
Qimin Zhong, Hao Liao, Haiming Qin, Mingyang Zhou, Rui Mao, Wei Chen, Naipeng Chao
NLP Large Language Models Theory
  • Theoretical analysis of MTP reveals its potential for inducing belief states but also highlights risks of structural hallucinations.
  • LSE-MTP framework aligns multi-token predictions with ground-truth trajectories to enforce latent consistency.
  • Experiments show that LSE-MTP improves path legality and robustness in multi-step planning tasks.
Read more
Jeffreys Flow: Robust Boltzmann Generators for Rare Event Sampling via Parallel Tempering Distillation
Guang Lin, Christian Moya, Di Qi, Xuda Ye
Generative Models Theory Efficient ML
  • Introduction of Jeffreys Flow as a robust generative framework for rare event sampling.
  • Utilizes symmetric Jeffreys divergence to mitigate mode collapse in Boltzmann generators.
  • Demonstrates scalability and accuracy on complex multi-dimensional benchmarks.
  • Provides theoretical guarantees for improved sampling accuracy and reduced mode collapse.
Read more
Reproducing AlphaZero on Tablut: Self-Play RL for an Asymmetric Board Game
Tõnis Lees, Tambet Matiisen
Reinforcement Learning
  • Adaptation of AlphaZero for asymmetric games like Tablut.
  • Implementation of separate policy and value heads for each player.
  • Identification of catastrophic forgetting as a challenge in self-play.
  • Significant performance improvements observed over 100 iterations.
Read more
Data Distribution Valuation Using Generalized Bayesian Inference
Cuong N. Nguyen, Cuong V. Nguyen
Theory
  • Introduction of the Generalized Bayes Valuation (GBV) framework for data distribution valuation.
  • Utilization of generalized Bayesian inference with a transferability loss function.
  • Extension of GBV to Continual Generalized Bayes Valuation (CGBV) for continuous data streams.
  • Demonstration of GBV's applicability to annotator evaluation and data augmentation.
Read more
PRIME: Prototype-Driven Multimodal Pretraining for Cancer Prognosis with Missing Modalities
Kai Yu, Shuang Zhou, Yiran Song, Zaifu Zhan, Jie Peng, Kaixiong Zhou, Tianlong Chen, Feng Xie, Meng Wang, Huazhu Fu, Mingquan Lin, Rui Zhang
Multimodal
  • PRIME is designed to handle missing modalities in clinical data for cancer prognosis.
  • It employs a prototype memory bank for semantic imputation and representation learning.
  • The framework achieves superior performance on multiple cancer prognosis tasks compared to existing methods.
  • PRIME supports robust predictions even when modalities are missing during inference.
Read more
Anticipatory Reinforcement Learning: From Generative Path-Laws to Distributional Value Functions
Daniel Bloch
Reinforcement Learning Theory Time Series
  • Introduction of Anticipatory Reinforcement Learning (ARL) framework for non-Markovian environments.
  • Development of a 'Single-Pass' policy evaluation mechanism that avoids high-variance Monte Carlo methods.
  • Utilization of Marcus-compliant Neural Controlled Differential Equations (CDEs) for accurate path dynamics.
  • Formulation of a Self-Consistent Field (SCF) equilibrium to ensure consistency between deterministic and stochastic representations.
Read more
A machine learning framework for uncovering stochastic nonlinear dynamics from noisy data
Matteo Bosso, Giovanni Franzese, Kushal Swamy, Maarten Theulings, Alejandro M. Aragón, Farbod Alijani
Time Series Theory Interpretability
  • Introduces a hybrid framework that combines symbolic regression with Gaussian processes for modeling stochastic dynamics.
  • Successfully identifies both symbolic and stochastic components of dynamical systems from noisy data.
  • Demonstrates data efficiency, requiring only 102-103 data points for effective modeling.
  • Validates the approach on both numerical benchmarks and experimental biological systems.
Read more
SODA: Semi On-Policy Black-Box Distillation for Large Language Models
Xiwen Chen, Jingjing Wang, Wenhui Zhu, Peijie Qiu, Xuanzhao Dong, Hejian Sang, Zhipeng Wang, Alborz Geramifard, Feng Luo
NLP Large Language Models Efficient ML
  • Introduction of semi on-policy distillation, utilizing a static snapshot of student responses for effective black-box alignment.
  • SODA achieves a 10× speedup in training time and 27% reduction in peak GPU memory usage compared to GAD.
  • Extensive evaluations show SODA outperforms or matches GAD on 15 out of 16 benchmark results.
  • The method eliminates the need for adversarial training, simplifying the distillation process.
Read more
LLMs Should Express Uncertainty Explicitly
Junyu Guo, Shangding Gu, Ming Jin, Costas Spanos, Javad Lavaei
Large Language Models NLP
  • Uncertainty in LLMs should be treated as an explicit interface rather than a latent quantity.
  • Two complementary interfaces are proposed: verbalized confidence for final answers and an <uncertain> marker during reasoning.
  • The verbalized-confidence interface improves calibration and reduces overconfident errors.
  • The reasoning-time interface enhances visibility of failures and aids in retrieval control.
Read more
Learning $ ext{AC}^0$ Under Graphical Models
Gautam Chandrasekaran, Jason Gaitonde, Ankur Moitra, Arsen Vasilyan
Theory
  • Introduces quasipolynomial-time algorithms for learning AC0 under graphical models with strong spatial mixing.
  • Circumvents traditional Fourier analysis limitations by leveraging new sampling algorithms.
  • Extends the applicability of low-degree polynomial approximation to other function classes.
  • Addresses the critique of reliance on product structure in learning theory.
Read more
Collapse-Free Prototype Readout Layer for Transformer Encoders
Giansalvo Cirrincione, Rahul Ranjeev Kumar
Theory Efficient ML NLP
  • Introduction of DDCL-Attention, a prototype-based competitive readout layer for transformers.
  • Mathematical guarantees against prototype collapse and formal training stability.
  • Versatile application in multiple paradigms, including readout layers and hierarchical compression.
  • Empirical validation demonstrating the effectiveness and efficiency of the method across various datasets.
Read more
NativeTernary: A Self-Delimiting Binary Encoding with Unary Run-Length Hierarchy Markers for Ternary Neural Network Weights, Structured Data, and General Computing Infrastructure
Maharshi Savdhariya
NLP Large Language Models Efficient ML
  • Introduces NativeTernary, a binary encoding scheme for ternary values.
  • Utilizes unary run-length encoding to represent semantic hierarchy levels.
  • Addresses the lack of native binary formats for ternary neural networks.
  • Offers three encoding variants with different delimiter configurations.
Read more
Reasoning Through Chess: How Reasoning Evolves from Data Through Fine-Tuning and Reinforcement Learning
Lucas Dionisopoulos, Nicklas Majamaki, Prithviraj Ammanabrolu
NLP Large Language Models Reinforcement Learning
  • Fine-tuning on single best moves leads to effective RL but unfaithful reasoning.
  • Training on multi-move trajectories results in more stable RL and faithful reasoning.
  • Reinforcement learning improves move quality and reduces hallucination rates.
  • SFT-checkpoint metrics can predict final RL performance.
Read more
Vehicle-as-Prompt: A Unified Deep Reinforcement Learning Framework for Heterogeneous Fleet Vehicle Routing Problem
Shihong Huang, Shengjie Wang, Lei Gao, Hong Ma, Zhanluo Zhang, Feng Zhang, Weihua Zhou
Reinforcement Learning Optimization
  • Introduces a unified DRL framework for solving HFVRP and its variants.
  • Develops the Vehicle-as-Prompt mechanism to streamline decision-making.
  • Achieves superior performance compared to existing DRL methods and traditional heuristics.
  • Demonstrates strong zero-shot generalization across diverse problem scales.
Read more
Territory Paint Wars: Diagnosing and Mitigating Failure Modes in Competitive Multi-Agent PPO
Diyansha Singh
Reinforcement Learning
  • Introduces Territory Paint Wars as a benchmark for studying competitive MARL failure modes.
  • Identifies five critical implementation failure modes affecting PPO performance.
  • Characterizes competitive overfitting, where self-play performance does not indicate generalization ability.
  • Proposes opponent mixing as a simple yet effective solution to improve generalization.
Read more
Curvature-Aware Optimization for High-Accuracy Physics-Informed Neural Networks
Anas Jnini, Elham Kiyani, Khemraj Shukla, Jorge F. Urban, Nazanin Ahmadi Daryakenari, Johannes Muller, Marius Zeinhofer, George Em Karniadakis
Optimization Efficient ML Theory
  • Introduction of curvature-aware optimization techniques for PINNs.
  • Demonstration of improved convergence and accuracy on complex differential equations.
  • Comparison of new PINN methods against high-order numerical methods.
  • Addressing scalability for batched training in large data-driven problems.
Read more
ReLU Networks for Exact Generation of Similar Graphs
Mamoona Ghafoor, Tatsuya Akutsu
Generative Models Graph Learning Theory
  • Introduces ReLU networks for exact graph generation within specified edit distances.
  • Eliminates reliance on training data, ensuring validity of generated graphs.
  • Demonstrates scalability and exactness for large graphs (up to 1400 vertices).
  • Outperforms existing generative models in meeting edit distance constraints.
Read more
The Geometric Alignment Tax: Tokenization vs. Continuous Geometry in Scientific Foundation Models
Prashant C. Raju
Theory
  • The Geometric Alignment Tax is a fundamental issue in foundation models that leads to geometric distortion.
  • Replacing cross-entropy with continuous objectives significantly improves geometric stability.
  • Learned codebooks can worsen geometric stability despite better reconstruction.
  • Three distinct failure regimes in biological foundation models were identified: Local-Global Decoupling, Representational Compression, and Geometric Vacuity.
Read more
Expectation Maximization (EM) Converges for General Agnostic Mixtures
Avishek Ghosh
Theory Optimization
  • The paper extends the EM algorithm to a general agnostic setting, allowing for arbitrary parametric functions.
  • Gradient EM is proposed as a modification of the traditional EM algorithm, focusing on exponential convergence to loss minimizers.
  • The framework encompasses various problems, including mixed linear classifiers and generalized linear regression.
  • Convergence results are established under proper initialization and separation conditions, highlighting the robustness of the approach.
Read more
Convolutional Neural Network and Adversarial Autoencoder in EEG images classification
Albert Nasybullin, Semen Kurkin
Computer Vision
  • Combination of computer vision and neural networks for EEG classification.
  • Development of a dataset of 2D EEG topograms from raw EEG signals.
  • Implementation of a CNN architecture tailored for EEG image classification.
  • Successful classification of motor cortex activities during hand movements.
Read more
Is Prompt Selection Necessary for Task-Free Online Continual Learning?
Seoyoung Park, Haemin Lee, Hankook Lee
Efficient ML Theory
  • Prompt selection strategies in task-free OCL often yield suboptimal results.
  • The proposed SinglePrompt framework eliminates the need for prompt selection.
  • SinglePrompt focuses on classifier optimization with a single prompt per self-attention block.
  • The framework employs cosine similarity-based logit design to reduce forgetting.
Read more
The UNDO Flip-Flop: A Controlled Probe for Reversible Semantic State Management in State Space Models
Hongxu Zhou
Theory
  • Introduces the UNDO Flip-Flop task to evaluate reversible semantic state retrieval in SSMs.
  • Demonstrates that existing models fail to learn the required stack-based rollback mechanism.
  • Establishes a systematic failure in retrieving historical states under adversarial conditions.
  • Highlights the gap between theoretical expressibility of models and their practical learning outcomes.
Read more
Same Graph, Different Likelihoods: Calibration of Autoregressive Graph Generators via Permutation-Equivalent Encodings
Laurits Fredsgaard, Aaron Thomas, Michael Riis Andersen, Mikkel N. Schmidt, Mahito Sugiyama
Generative Models Graph Learning Theory
  • Introduces Linearization Uncertainty (LU) as a metric for evaluating the consistency of likelihoods across different linearizations of graphs.
  • Demonstrates that biased linearization strategies can lead to lower NLL but higher calibration errors under random permutations.
  • Shows that LU correlates better with molecular stability than NLL, suggesting it is a more reliable quality measure for generated graphs.
Read more
Beauty in the Eye of AI: Aligning LLMs and Vision Models with Human Aesthetics in Network Visualization
Peng Zhang, Xuefeng Li, Xiaoqi Wang, Han-Wei Shen, Yifan Hu
Computer Vision Large Language Models Generative Models
  • A large-scale user study collected human-preferred visualizations for 11,531 graphs, creating a benchmark dataset.
  • Prompt engineering significantly improved the alignment of LLM outputs with human aesthetic preferences.
  • Vision Models can achieve alignment with human preferences comparable to human-human agreement.
  • The findings indicate the feasibility of using AI as a scalable proxy for human labelers in network visualization.
Read more
Weight-Informed Self-Explaining Clustering for Mixed-Type Tabular Data
Lehao Li, Qiang Huang, Yihao Ang, Bryan Kian Hsiang Low, Anthony K. H. Tung, Xiaokui Xiao
Interpretability
  • Introduction of WISE framework for clustering mixed-type tabular data.
  • Development of Binary Encoding with Padding (BEP) for feature alignment.
  • Implementation of Leave-One-Feature-Out (LOFO) for diverse feature weighting.
  • Creation of Discriminative FreqItems (DFI) for coherent explanations.
Read more
PCA-Driven Adaptive Sensor Triage for Edge AI Inference
Ankit Hemant Lade, Sai Krishna Jasti, Nikhil Sinha, Indar Kumar, Akanksha Tiwari
Time Series Efficient ML Optimization
  • PCA-Triage optimizes bandwidth allocation by adapting sampling rates based on sensor data correlations.
  • The algorithm runs efficiently with zero trainable parameters and operates within a strict time budget.
  • Empirical results show PCA-Triage achieves superior fault detection performance compared to existing methods.
  • The method is robust against data loss and noise, making it suitable for industrial IoT applications.
Read more
k-Maximum Inner Product Attention for Graph Transformers and the Expressive Power of GraphGPS
Jonas De Schouwer, Haitz Sáez de Ocáriz Borde, Xiaowen Dong
Graph Learning Efficient ML Theory
  • Introduction of k-Maximum Inner Product (k-MIP) attention for graph transformers.
  • Achieves linear memory complexity and up to ten-fold speedup over full attention mechanisms.
  • Proves that k-MIP transformers can approximate full-attention transformers to arbitrary precision.
  • Demonstrates competitive performance on large-scale graph benchmarks.
Read more
Blind-Spot Mass: A Good-Turing Framework for Quantifying Deployment Coverage Risk in Machine Learning Systems
Biplab Pal, Santanu Bhattacharya, Madanjit Singh
Theory
  • Introduction of blind-spot mass as a metric for quantifying deployment coverage risk in ML systems.
  • Demonstration of how heavy-tailed distributions can lead to coverage blindness in model performance.
  • Validation of the framework in two distinct domains: wearable human activity recognition and clinical data analysis.
  • Identification of dominant contributors to coverage risk through blind-spot decomposition.
Read more
Context is All You Need
Jean Erik Delanois, Shruti Joshi, Ryan Golden, Teresa Nick, Maxim Bazhenov
Generative Models NLP Computer Vision
  • Introduction of CONTXT, a simple method for contextual adaptation in ANNs.
  • Demonstrates consistent performance improvements across various tasks without retraining.
  • Addresses the limitations of existing domain adaptation methods that are complex and resource-intensive.
  • Incorporates insights from neuroscience regarding context processing in biological systems.
Read more
Dynamic Linear Coregionalization for Realistic Synthetic Multivariate Time Series
Annita Vapsi, Penghang Liu, Saheed Obitayo, Aakriti, Manoj Cherukumalli, Prathamesh Patil, Amit Varshney, Nicolas Marchesotti, Elizabeth Fons, Vamsi K. Potluru, Manuela Veloso
Time Series
  • DynLMC generates synthetic multivariate time series with realistic, nonstationary correlation structures.
  • The model incorporates time-varying correlations, regime-switching, and lagged dependencies.
  • Fine-tuning on DynLMC-generated data improves robustness and generalization of FMTS.
  • The approach addresses the limitations of existing synthetic data generators that rely on static correlations.
Read more