AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

69 Papers today
8h Update frequency
7 Days of history
Forecasting with Guidance: Representation-Level Supervision for Time Series Forecasting
Jiacheng Wang, Liang Fan, Baihua Li, Luyan Zhang
Time Series
  • Identifies limitations of error-only supervision in deep learning-based time series forecasting.
  • Introduces ReGuider, a plug-in method for representation-level supervision using pretrained time series foundation models.
  • Demonstrates that ReGuider enhances the expressiveness of temporal representations in forecasting models.
  • Shows consistent improvements in forecasting accuracy across various datasets and architectures.
Read more
IPatch: A Multi-Resolution Transformer Architecture for Robust Time-Series Forecasting
Aymane Harkati, Moncef Garouani, Olivier Teste, Julien Aligon, Mohamed Hamlich
Time Series
  • IPatch integrates point-wise and patch-wise representations for improved time-series forecasting.
  • The architecture includes mechanisms for capturing both fine-grained and coarse temporal dependencies.
  • Experimental results show significant improvements in forecasting accuracy and robustness compared to traditional methods.
  • IPatch is effective across various prediction horizons and benchmark datasets.
Read more
i-IF-Learn: Iterative Feature Selection and Unsupervised Learning for High-Dimensional Complex Data
Chen Ma, Wanjie Wang, Shuhao Fan
Interpretability
  • Introduction of i-IF-Learn, a framework for joint feature selection and clustering.
  • Adaptive feature selection statistic that balances supervised and unsupervised signals.
  • Utilization of low-dimensional embeddings to enhance clustering performance.
  • Empirical results show significant improvements over classical and deep clustering methods.
Read more
Unveiling Hidden Convexity in Deep Learning: a Sparse Signal Processing Perspective
Emi Zeger, Mert Pilanci
Theory Optimization Interpretability
  • Convex equivalences of ReLU neural networks can simplify optimization and enhance theoretical understanding.
  • Reframing neural network training as a convex optimization task allows for efficient global optimization.
  • The paper presents an equivalence theorem connecting two-layer ReLU networks to convex group Lasso problems.
  • Experimental results indicate performance benefits when applying convex optimization frameworks to neural network training.
Read more
Transformers Trained via Gradient Descent Can Provably Learn a Class of Teacher Models
Chenyang Zhang, Qingyue Zhao, Quanquan Gu, Yuan Cao
Theory Optimization
  • One-layer transformers can effectively learn from a general class of teacher models.
  • The paper establishes a tight convergence guarantee for population loss with a rate of Θ(1/T).
  • Transformers demonstrate robust out-of-distribution generalization capabilities.
  • The study identifies a bilinear structure that underpins various learning tasks, enabling unified theoretical guarantees.
Read more
PoiCGAN: A Targeted Poisoning Based on Feature-Label Joint Perturbation in Federated Learning
Tao Liu, Jiguang Lv, Dapeng Man, Weiye Xi, Yaole Li, Feiyu Zhao, Kuiming Wang, Yingchao Bian, Chen Xu, Wu Yang
Federated Learning Computer Vision Generative Models
  • PoiCGAN introduces a targeted poisoning attack framework that enhances stealthiness while maintaining model performance.
  • The method leverages dual-feature collaborative perturbations to minimize the impact on the main task's accuracy.
  • Experiments show a significant increase in attack success rates compared to existing methods.
  • The approach highlights new vulnerabilities in Federated Learning systems, necessitating stronger defenses.
Read more
End-to-End Efficient RL for Linear Bellman Complete MDPs with Deterministic Transitions
Zakaria Mhammedi, Alexander Rakhlin, Nneka Okolo
Reinforcement Learning Theory Efficient ML
  • Introduces a computationally efficient algorithm for linear Bellman complete MDPs with deterministic transitions.
  • Algorithm is end-to-end efficient for finite action spaces and requires only an argmax oracle for larger action spaces.
  • Achieves ε-optimal policy with polynomial sample and computational complexity.
  • Addresses a significant gap in existing literature regarding exploration in linear Bellman complete MDPs.
Read more
Lagrangian Relaxation Score-based Generation for Mixed Integer linear Programming
Ruobing Wang, Xin Li, Yujie Fang, Mingzhong Wang
Optimization
  • Introduction of SRG, a generative framework for MILP solving.
  • Utilization of Lagrangian relaxation to enhance solution quality and feasibility.
  • Joint modeling of decision variables to overcome independence assumptions.
  • Demonstrated superior performance compared to existing machine learning baselines.
Read more
CoordLight: Learning Decentralized Coordination for Network-Wide Traffic Signal Control
Yifeng Zhang, Harsh Goel, Peizhuo Li, Mehul Damani, Sandeep Chinchali, Guillaume Sartoretti
Reinforcement Learning Optimization
  • Introduces Queue Dynamic State Encoding (QDSE) for enhanced traffic state representation.
  • Develops Neighbor-aware Policy Optimization (NAPO) to improve agent coordination.
  • Demonstrates superior performance over existing traffic signal control methods.
  • Addresses challenges of partial observability and decentralized decision-making.
Read more
Causal Discovery in Action: Learning Chain-Reaction Mechanisms from Interventions
Panayiotis Panayiotou, Özgür Şimşek
Theory Graph Learning
  • Causal discovery in chain-reaction systems can be achieved through blocking interventions.
  • The proposed method provides a unique identification of causal structures with finite-sample guarantees.
  • Experiments show that the method outperforms observational heuristics in complex causal scenarios.
  • The approach is applicable to various real-world systems exhibiting cascade-like structures.
Read more
Latent Algorithmic Structure Precedes Grokking: A Mechanistic Study of ReLU MLPs on Modular Arithmetic
Anand Swaroop
Theory Interpretability
  • ReLU MLPs exhibit near-binary square wave input weights rather than sinusoidal weights during grokking.
  • The output weights maintain a phase-sum relation, indicating a structured internal representation.
  • An idealized MLP constructed from extracted parameters achieves high accuracy despite the original model's poor performance.
  • Grokking sharpens latent algorithmic structures rather than discovering new algorithms.
Read more
Reservoir-Based Graph Convolutional Networks
Mayssa Soussia, Gita Ayu Salsabila, Mohamed Ali Mahjoub, Islem Rekik
Graph Learning
  • Introduction of RGC-Net, combining reservoir computing with graph convolution for improved GNN performance.
  • Utilization of fixed-random reservoir weights and a leaky integrator to enhance feature retention and mitigate over-smoothing.
  • Demonstrated state-of-the-art performance in graph classification and generative tasks, particularly in dynamic brain connectivity.
  • Introduction of TRGC-Net for comparing fixed and trainable reservoir weights, offering insights into reservoir dynamics.
Read more
On the Use of Bagging for Local Intrinsic Dimensionality Estimation
Kristóf Péter, Ricardo J. G. B. Campello, James Bailey, Michael E. Houle
Theory
  • Introduces bagging as a variance-reduction technique for LID estimation.
  • Analyzes the complex interplay between sampling rate, neighborhood size, and ensemble size.
  • Demonstrates significant improvements in estimation accuracy through empirical results.
  • Proposes methods for combining bagging with neighborhood smoothing for enhanced performance.
Read more
Towards Effective Experiential Learning: Dual Guidance for Utilization and Internalization
Fei Bai, Zhipeng Chen, Chuan Hao, Ming Yang, Ran Tao, Bryan Dai, Wayne Xin Zhao, Jian Yang, Hongteng Xu
NLP Large Language Models Reinforcement Learning
  • DGO introduces a unified framework that combines external and internal experience for improved training effectiveness.
  • The framework operates through a closed-loop system of experience utilization and internalization.
  • DGO consistently outperforms baseline methods, demonstrating enhanced reasoning capabilities in LLMs.
  • The method achieves an average score of 32.41% on six benchmarks, improving to 39.38% with test-time scaling.
Read more
Steering Code LLMs with Activation Directions for Language and Library Control
Md Mahbubur Rahman, Arjun Guha, Harshitha Menon
Large Language Models NLP
  • Code LLMs exhibit strong implicit preferences for specific programming languages and libraries.
  • Layer-wise activation directions can be estimated to steer model outputs effectively.
  • Interventions can influence code generation even under neutral or conflicting prompts.
  • Steering strength varies by model and target, with risks of quality degradation from strong interventions.
Read more
The Coordinate System Problem in Persistent Structural Memory for Neural Architectures
Abhinaba Basu
Theory
  • Introduction of the Dual-View Pheromone Pathway Network (DPPN) for persistent structural memory.
  • Identification of coordinate stability and graceful transfer mechanisms as independent requirements for effective memory.
  • Demonstration that learned coordinate systems are unstable and hinder memory persistence.
  • Fixed random Fourier features provide stable coordinates but do not ensure effective transfer.
Read more
A Learning Method with Gap-Aware Generation for Heterogeneous DAG Scheduling
Ruisong Zhou, Haijun Zou, Li Zhou, Chumin Sun, Zaiwen Wen
Reinforcement Learning Optimization Theory
  • WeCAN framework effectively addresses scheduling of heterogeneous DAGs using reinforcement learning.
  • Introduces a two-stage single-pass design for efficient schedule generation.
  • Develops an order-space analysis to identify and eliminate generation-induced optimality gaps.
  • Demonstrates superior performance in makespan compared to existing scheduling methods.
Read more
Scaling Recurrence-aware Foundation Models for Clinical Records via Next-Visit Prediction
Haresh Rengaraj Rajamohan, Xiang Gao, Weicheng Zhu, Shih-Lun Huang, Long Chen, Gabe Schulman, Huizhen Jin, Shengduo Li, Yixuan Wang, Huidi Yang, Kyunghyun Cho, Cem M. Deniz, Narges Razavian
Generative Models Time Series NLP
  • Introduction of RAVEN, a generative model for next-visit prediction in EHRs.
  • Implementation of a history-dependent regularization mechanism to improve prediction of new disease onsets.
  • Demonstration of RAVEN's competitive performance in zero-shot disease incidence forecasting.
  • Investigation of scaling behaviors in data-constrained and compute-saturated regimes.
Read more
Deep Convolutional Neural Networks for predicting highest priority functional group in organic molecules
Kunal Khatri, Vineet Mehta, Manish Narwaria, Bhaskar Chaudhary
Computer Vision
  • Introduction of a CNN model for predicting the highest priority functional group in organic molecules.
  • Utilization of FTIR spectroscopy data for training the model.
  • Comparison of CNN performance with traditional SVM methods, showing improved accuracy.
  • Emphasis on the significance of functional group priority in organic chemistry.
Read more
Transcending Classical Neural Network Boundaries: A Quantum-Classical Synergistic Paradigm for Seismic Data Processing
Zhengyi Yuan, Xintong Dong, Xinyang Wang, Zheng Cong, Shiqi Dong
Generative Models Theory Time Series
  • Introduction of QC-GAN, the first application of quantum neural networks in seismic exploration.
  • Integration of quantum and convolutional pathways enhances feature representation and processing capacity.
  • Novel QC feature complementarity loss ensures non-overlapping information encoding.
  • Experimental validation shows superior performance in denoising and interpolation tasks.
Read more
Attack Assessment and Augmented Identity Recognition for Human Skeleton Data
Joseph G. Zalameda, Megan A. Witherow, Alexander M. Glandon, Jose Aguilera, Khan M. Iftekharuddin
Generative Models Computer Vision Optimization
  • Introduction of Attack-AAIRS framework to enhance model robustness against adversarial attacks.
  • Utilization of GANs to generate synthetic adversarial samples for training.
  • Demonstrated improved resilience of HCN-ID models to various adversarial attack methods.
  • Maintained accuracy on real data while improving robustness against unseen attacks.
Read more
Symbolic--KAN: Kolmogorov-Arnold Networks with Discrete Symbolic Structure for Interpretable Learning
Salah A Faroughi, Farinaz Mostajeran, Amirhossein Arzani, Shirko Faroughi
Interpretability
  • Symbolic-KANs bridge the gap between symbolic regression and neural networks by embedding discrete symbolic structures.
  • The architecture allows for the direct selection of univariate primitives, leading to compact closed-form expressions.
  • Symbolic-KANs effectively recover governing structures in various applications, including regression and dynamical systems.
  • The framework extends to physics-informed learning, producing accurate solutions from governing constraints.
Read more
CN-Buzz2Portfolio: A Chinese-Market Dataset and Benchmark for LLM-Based Macro and Sector Asset Allocation from Daily Trending Financial News
Liyuan Chen, Shilong Li, Jiangpeng Yan, Shuoling Liu, Qiang Yang, Xiu Li
NLP Large Language Models
  • Introduction of CN-Buzz2Portfolio as a benchmark for evaluating LLMs in financial asset allocation.
  • Focus on macro and sector-level asset allocation rather than individual stock picking.
  • Implementation of a Tri-Stage CPA Agent Workflow to assess LLM performance.
  • Significant disparities observed among LLMs in translating financial narratives into portfolio strategies.
Read more
MsFormer: Enabling Robust Predictive Maintenance Services for Industrial Devices
Jiahui Zhou, Dan Li, Ruibing Jin, Jian Lou, Yanran Zhao, Zhenghua Chen, Zigui Jiang, See-Kiong Ng
Time Series
  • Introduction of MsFormer, a lightweight Multi-scale Transformer for predictive maintenance.
  • Incorporation of a Multi-scale Sampling module to capture multi-scale temporal correlations.
  • Use of a lightweight attention mechanism tailored for data-scarce environments.
  • Extensive validation on real-world datasets showing significant performance improvements.
Read more
Bridging the Gap Between Climate Science and Machine Learning in Climate Model Emulation
Luca Schmidt, Nina Effenberger
Efficient ML
  • ML emulators can significantly reduce the computational costs associated with traditional climate models.
  • There is a disconnect between the climate science and machine learning communities regarding the use of emulators.
  • A framework integrating both fields can enhance the design and reliability of climate model emulators.
  • Closer collaboration can create feedback loops that improve both emulators and physical simulations.
Read more
Generalizing Dynamics Modeling More Easily from Representation Perspective
Yiming Wang, Zhengnan Zhang, Genghe Zhang, Jiawen Dan, Changchun Li, Chenlong Hu, Chris Nugent, Jun Liu, Ximing Li, Bo Yang
Time Series
  • Introduction of a generalized Pre-trained Dynamics EncoDER (PDEDER) for improved dynamics modeling.
  • Utilization of the Lyapunov exponent to minimize chaotic behavior in the latent space.
  • Incorporation of reconstruction and forecasting objectives to enhance model performance.
  • Evaluation on 12 dynamic systems shows significant improvements in forecasting accuracy.
Read more
Linear-Nonlinear Fusion Neural Operator for Partial Differential Equations
Heng Wu, Junjie Wang, Benzhuo Lu
Efficient ML Theory Interpretability
  • Introduction of a linear-nonlinear multiplicative fusion mechanism for improved training efficiency.
  • LNF-NO architecture effectively decouples linear and nonlinear effects for better representation.
  • Demonstrated significant training speed improvements (up to 2.7x faster) compared to existing models.
  • Achieves comparable or better accuracy across various PDE benchmarks.
Read more
TuneShift-KD: Knowledge Distillation and Transfer for Fine-tuned Models
Yushi Guan, Jeanine Ohene-Agyei, Daniel Kwan, Jean Sebastien Dandurand, Yifei Zhang, Nandita Vijaykumar
NLP Large Language Models Efficient ML
  • TuneShift-KD automates the distillation of specialized knowledge from fine-tuned models to target models.
  • The method relies on identifying perplexity differences to create a synthetic training dataset.
  • It does not require access to original training data or additional training of discriminators.
  • Models fine-tuned with TuneShift-KD show improved accuracy over previous knowledge transfer methods.
Read more
Upper Entropy for 2-Monotone Lower Probabilities
Tuan-Anh Vu, Sébastien Destercke, Frédéric Pichon
Theory Optimization Efficient ML
  • Introduces a polynomial-time algorithm for computing upper entropy for 2-monotone lower probabilities.
  • Demonstrates significant improvements over existing algorithms, making computations more efficient.
  • Provides specialized algorithms for belief functions, possibility distributions, and probability intervals.
  • Presents an approximation algorithm that effectively bounds the results and controls errors.
Read more
Likelihood hacking in probabilistic program synthesis
Jacek Karwowski, Younesse Kaddar, Zihuiwen Ye, Nikolay Malkin, Sam Staton
Reinforcement Learning Theory Generative Models
  • Formal definition of likelihood hacking in probabilistic programming languages.
  • Establishment of sufficient conditions for preventing likelihood hacking.
  • Implementation of SafeStan and SafePyMC to enforce safety constraints.
  • Empirical evidence of likelihood hacking in RL-driven automated scientists.
Read more
MetaKube: An Experience-Aware LLM Framework for Kubernetes Failure Diagnosis
Wei Sun, Ting Wang, Xinran Tian, Wanshun Lan, Xuhan Feng, Haoyue Li, Fangxin Wang
Large Language Models
  • MetaKube integrates episodic memory networks, specialized language models, and causal knowledge graphs for enhanced Kubernetes diagnostics.
  • The framework allows for dynamic reasoning pathways, optimizing diagnostic speed and depth based on problem familiarity.
  • MetaKube's locally-deployable model ensures data privacy while achieving high diagnostic performance.
  • Experiential learning through EPMN significantly improves diagnostic accuracy over time.
Read more
UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience
Zichuan Lin, Feiyu Liu, Yijun Yang, Jiafei Lyu, Yiming Gao, Yicheng Liu, Zhicong Lu, Yangbin Yu, Mingyu Yang, Junyou Li, Deheng Ye, Jie Jiang
Reinforcement Learning Optimization Multimodal
  • Introduction of UI-Voyager, a self-evolving mobile GUI agent.
  • Utilization of Rejection Fine-Tuning (RFT) for autonomous data and model co-evolution.
  • Implementation of Group Relative Self-Distillation (GRSD) to improve credit assignment in learning.
  • Achieved an 81.0% Pass@1 success rate on the AndroidWorld benchmark, outperforming existing models.
Read more
Deep Neural Regression Collapse
Akshay Rangamani, Altay Unal
Theory
  • Deep Neural Regression Collapse (NRC) occurs across all layers of deep regression models, not just the last layer.
  • Models exhibiting Deep NRC learn the intrinsic dimension of low-rank targets, indicating generalization rather than memorization.
  • Weight decay is necessary for inducing Deep NRC, highlighting its importance in model training.
  • The study provides a complete description of NRC conditions applicable to deep regressors.
Read more
Central Dogma Transformer III: Interpretable AI Across DNA, RNA, and Protein
Nobuyuki Ota
Interpretability Multimodal
  • CDT-III aligns its architecture with the central dogma, enhancing interpretability and biological relevance.
  • The two-stage architecture effectively separates transcription and translation processes, improving prediction accuracy.
  • Joint prediction of RNA and protein changes leads to better performance and interpretability.
  • The model can predict clinical side effects and generate hypotheses without clinical data, showcasing its practical applications.
Read more
Robustness Quantification and Uncertainty Quantification: Comparing Two Methods for Assessing the Reliability of Classifier Predictions
Adrián Detavernier, Jasper De Bock
Theory
  • RQ outperforms UQ in assessing classifier prediction reliability, particularly under distribution shifts.
  • Both RQ and UQ can be combined for enhanced reliability assessments.
  • The study emphasizes the significance of reliability in high-stakes AI applications.
  • A comprehensive comparison is conducted using real datasets, expanding beyond previous studies focused on artificial data.
Read more
Diet Your LLM: Dimension-wise Global Pruning of LLMs via Merging Task-specific Importance Score
Jimyung Hong, Jaehyung Kim
Large Language Models Efficient ML
  • DIET is a dimension-wise global pruning framework that generates a single global mask for LLMs.
  • The method requires no additional training, relying solely on activation profiling from a small number of task-specific samples.
  • DIET consistently outperforms state-of-the-art structured pruning methods across various sparsity levels and model sizes.
  • The framework demonstrates significant accuracy gains, particularly in zero-shot commonsense reasoning tasks.
Read more
AscendOptimizer: Episodic Agent for Ascend NPU Operator Optimization
Jiehao Wu, Zixiao Huang, Wenhao Li, Chuyun Shen, Junjie Sheng, Xiangfeng Wang
Optimization Efficient ML
  • AscendOptimizer addresses the knowledge bottleneck in optimizing AscendC operators due to a lack of public reference implementations.
  • The framework employs a two-stage optimization process that alternates between host-side tiling and kernel-side optimization.
  • It achieves a 1.19× speedup over existing baselines, with 49.61% of operators outperforming their references.
  • The method leverages hardware feedback for evolutionary search and creates a library of optimization motifs through kernel rewinding.
Read more
Dual-Criterion Curriculum Learning: Application to Temporal Data
Gaspard Abel, Eloi Campagne, Mohamed Benloughmari, Argyris Kalogeratos
Time Series
  • Introduction of the Dual-Criterion Curriculum Learning (DCCL) framework combining loss-based and density-based difficulty assessments.
  • DCCL addresses the challenge of defining meaningful difficulty measures in Curriculum Learning.
  • Empirical evaluations show that DCCL outperforms traditional loss-only baselines in time-series forecasting tasks.
  • The framework is modular and applicable to a wide range of data types, enhancing its versatility.
Read more
Precision-Varying Prediction (PVP): Robustifying ASR systems against adversarial attacks
Matías Pizarro, Raghavan Narasimhan, Asja Fischer
Audio & Speech
  • PVP enhances ASR robustness by varying numerical precision during inference.
  • The method does not require retraining or access to model internals.
  • A lightweight detection strategy is proposed based on transcription consistency across precision modes.
  • Experiments show significant improvements in robustness and detection performance across multiple ASR models.
Read more
Multimodal Training to Unimodal Deployment: Leveraging Unstructured Data During Training to Optimize Structured Data Only Deployment
Zigui Wang, Minghui Sun, Jiang Shu, Matthew M. Engelhard, Lauren Franz, Benjamin A. Goldstein
Multimodal
  • Introduces a multimodal learning framework that leverages unstructured EHR data for training while deploying a structured-only model.
  • Utilizes contrastive learning and knowledge distillation to transfer knowledge from a teacher model to a student model.
  • Achieves an AUROC of 0.705, outperforming the structured-only baseline of 0.656.
  • Highlights the importance of unstructured data in enhancing model performance in clinical settings.
Read more
Instruction-Tuned, but Not More Verifiable Instruction-Following: A Cross-Task Diagnosis for LoRA Adapters
Junyi Zou
NLP Large Language Models
  • Nominal training objectives do not consistently predict actual performance improvements across tasks.
  • The concept of 'capability drift' describes the mismatch between nominal labels and realized capabilities.
  • Routine cross-task evaluations are essential before deploying models to avoid unintended performance shifts.
  • Different benchmarks operationalize instruction following differently, leading to mixed evidence across evaluations.
Read more
DeepDTF: Dual-Branch Transformer Fusion for Multi-Omics Anticancer Drug Response Prediction
Yuhan Zhao, Jacob Tennant, James Yang, Zhishan Guo, Young Whang, Ning Sui
Multimodal Graph Learning Interpretability
  • DeepDTF integrates multi-omics data and drug structures using a dual-branch Transformer architecture.
  • The model achieves superior performance on drug response prediction tasks compared to existing baselines.
  • It includes an interpretability module that connects predictions to biological pathways and gene attributions.
  • DeepDTF addresses challenges of cross-modal misalignment and high-dimensional data in cancer drug response modeling.
Read more
Lightweight Fairness for LLM-Based Recommendations via Kernelized Projection and Gated Adapters
Nan Cui, Wendy Hui Wang, Yue Ning
NLP Large Language Models Efficient ML
  • Proposes a lightweight bias mitigation method for LLM-based recommendations.
  • Combines kernelized INLP for bias removal with a gated MoE adapter for utility restoration.
  • Achieves fairness improvements without sacrificing recommendation accuracy.
  • No additional trainable parameters are required, making it computationally efficient.
Read more
GEM: Guided Expectation-Maximization for Behavior-Normalized Candidate Action Selection in Offline RL
Haoyu Wang, Jingcheng Wang, Shunyu Wu, Xinwei Xiao
Reinforcement Learning
  • GEM provides a multimodal and controllable action selection framework for offline RL.
  • The method preserves distinct action hypotheses while focusing on high-value regions through GMMs.
  • Candidate-based selection allows for a flexible compute-quality trade-off at inference time.
  • GEM mitigates the risk of out-of-distribution errors associated with naive candidate maximization.
Read more
Kronecker-Structured Nonparametric Spatiotemporal Point Processes
Zhitong Xu, Qiwei Yuan, Yinghao Chen, Yan Sun, Bin Shen, Shandian Zhe
Time Series Theory Interpretability
  • KSTPP enables explicit discovery of event relationships while maintaining modeling flexibility.
  • The model captures complex interaction patterns, including excitation, inhibition, and time-varying effects.
  • Kronecker algebra is leveraged to reduce computational complexity and enhance scalability.
  • The framework outperforms existing neural point process models in predictive tasks.
Read more
Self Paced Gaussian Contextual Reinforcement Learning
Mohsen Sahraei Ardakani, Rui Song
Reinforcement Learning Optimization Theory
  • SPGL avoids costly numerical optimizations by using a closed-form update for Gaussian contexts.
  • The method maintains sample efficiency and adaptability while reducing computational overhead.
  • SPGL shows superior performance in contextual RL benchmarks compared to existing methods.
  • The approach is scalable and applicable to high-dimensional context spaces.
Read more
Beyond Accuracy: Introducing a Symbolic-Mechanistic Approach to Interpretable Evaluation
Reza Habibi, Darian Lee, Magy Seif El-Nasr
NLP Interpretability
  • Traditional accuracy metrics fail to reliably distinguish between generalization and memorization in machine learning models.
  • The proposed symbolic-mechanistic evaluation framework combines symbolic rules with mechanistic interpretability to provide deeper insights into model behavior.
  • A case study on NL-to-SQL tasks illustrates the limitations of standard evaluation metrics, revealing hidden failures in models that appear competent based on accuracy alone.
  • The authors emphasize the need for mechanism-aware evaluation, particularly for tasks with clear algorithmic requirements.
Read more
Rethinking Multimodal Fusion for Time Series: Auxiliary Modalities Need Constrained Fusion
Seunghan Lee, Jun Seo, Jaehoon Lee, Sungdong Yoo, Minjae Kim, Tae Yoon Lim, Dongwan Kang, Hwanil Choi, SoonYoung Lee, Wonbin Ahn
Time Series Multimodal
  • Naive multimodal fusion strategies often underperform compared to unimodal TS models.
  • Constrained fusion methods, including the proposed Controlled Fusion Adapter (CFA), significantly improve performance.
  • CFA allows for controlled integration of auxiliary textual information without modifying the TS backbone.
  • The study involved over 20,000 experiments across diverse datasets and models, validating the effectiveness of constrained fusion.
Read more
Synthetic Mixed Training: Scaling Parametric Knowledge Acquisition Beyond RAG
Seungju Han, Konwoo Kim, Chanwoo Park, Benjamin Newman, Suhas Kotha, Jaehun Jung, James Zou, Yejin Choi
NLP Large Language Models
  • Synthetic Mixed Training combines synthetic QAs and documents to improve knowledge acquisition.
  • The approach yields log-linear improvements in performance as data volume and generator strength increase.
  • Focal Rewriting enhances document diversity by conditioning generation on specific questions.
  • The proposed methods outperform RAG in five out of six benchmark settings.
Read more
Permutation-Symmetrized Diffusion for Unconditional Molecular Generation
Gyeonghoon Ko, Juho Lee
Generative Models
  • Introduces a direct modeling approach for diffusion on the quotient manifold to achieve permutation invariance.
  • Derives an explicit expression for the heat kernel on the quotient manifold, enhancing understanding of diffusion dynamics.
  • Utilizes MCMC to approximate the permutation-symmetrized score for training.
  • Demonstrates competitive performance in unconditional molecular generation tasks on the QM9 dataset.
Read more
Conditionally Identifiable Latent Representation for Multivariate Time Series with Structural Dynamics
Minkey Chang, Jae-Young Kim
Time Series
  • Introduction of the Identifiable Variational Dynamic Factor Model (iVDFM) for multivariate time series.
  • Achieves identifiability by conditioning on the innovation process rather than latent states.
  • Utilizes linear diagonal dynamics to preserve identifiability and ensure computational efficiency.
  • Demonstrates improved factor recovery and intervention accuracy on synthetic and real-world data.
Read more
Efficient Controller Learning from Human Preferences and Numerical Data Via Multi-Modal Surrogate Models
Lukas Theiner, Maik Pfefferkorn, Yongpeng Zhao, Sebastian Hirt, Rolf Findeisen
Optimization Robotics Multimodal
  • Introduces a multi-fidelity, multi-modal Bayesian optimization framework.
  • Integrates low-fidelity numerical data with high-fidelity human preferences.
  • Utilizes Gaussian process surrogate models for efficient learning.
  • Demonstrates application in tuning an autonomous vehicle's trajectory planner.
Read more
Research on Individual Trait Clustering and Development Pathway Adaptation Based on the K-means Algorithm
Qianru Wei, Jihaoyu Yang, Cheng Zhang, Jinming Yang
Theory
  • Utilizes K-means clustering to categorize students based on individual traits.
  • Focuses on the fitness of students for specific career paths rather than just predicting career outcomes.
  • Provides targeted career guidance based on clustering results, enhancing personalized support.
  • Demonstrates the effectiveness of data-driven approaches in improving employment success rates for students.
Read more
From Arithmetic to Logic: The Resilience of Logic and Lookup-Based Neural Networks Under Parameter Bit-Flips
Alan T. L. Bacellar, Sathvik Chemudupati, Shashank Nag, Allison Seigler, Priscila M. V. Lima, Felipe M. G. França, Lizy K. John
Theory Efficient ML
  • Resilience against bit-flip errors is a structural property of neural architectures.
  • Lower precision, higher sparsity, bounded activations, and shallow depth improve resilience.
  • Logic and Lookup-Based Neural Networks (LUT-NNs) demonstrate superior stability under corruption.
  • A novel Even-Layer Recovery effect is observed in logic-based architectures.
Read more
Asymptotic Learning Curves for Diffusion Models with Random Features Score and Manifold Data
Anand Jerry George, Nicolas Macris
Generative Models Theory
  • Asymptotic expressions for errors in diffusion models are derived, highlighting the impact of manifold structure on sample complexity.
  • For linear manifolds, sample complexity scales linearly with intrinsic dimension, while this advantage diminishes for non-linear manifolds.
  • The study uses random feature neural networks to parameterize the score function, providing insights into the learning process of diffusion models.
  • The findings suggest that the geometric structure of data significantly influences the performance of generative models.
Read more
LineMVGNN: Anti-Money Laundering with Line-Graph-Assisted Multi-View Graph Neural Networks
Chung-Hoo Poon, James Kwok, Calvin Chow, Jang-Hyeon Choi
Graph Learning
  • Introduction of LineMVGNN, a new GNN model for AML detection.
  • Utilizes line graphs to enhance transaction information propagation.
  • Demonstrates superior performance compared to existing state-of-the-art methods.
  • Addresses scalability and interpretability issues in traditional AML systems.
Read more
Language-Assisted Image Clustering Guided by Discriminative Relational Signals and Adaptive Semantic Centers
Jun Ma, Xu Zhang, Zhengxing Jiao, Yaxin Hou, Hui Liu, Junhui Hou, Yuheng Jia
Computer Vision NLP Multimodal
  • Proposes a new framework for Language-Assisted Image Clustering (LAIC) addressing key limitations of existing methods.
  • Enhances inter-class discriminability by utilizing cross-modal relations for self-supervision signals.
  • Implements prompt learning to create adaptive semantic centers for improved clustering assignments.
  • Achieves an average performance improvement of 2.6% over state-of-the-art methods across multiple datasets.
Read more
Cost-Sensitive Neighborhood Aggregation for Heterophilous Graphs: When Does Per-Edge Routing Help?
Eyal Weiss
Graph Learning
  • Introduces Cost-Sensitive Neighborhood Aggregation (CSNA) for GNNs to handle heterophilous graphs.
  • Distinguishes between adversarial and informative heterophily regimes and their implications for message routing.
  • Demonstrates that CSNA can preserve class-discriminative signals where mean aggregation fails.
  • Finds that per-edge routing is beneficial in adversarial contexts but not in informative ones.
Read more
Probabilistic Geometric Alignment via Bayesian Latent Transport for Domain-Adaptive Foundation Models
Kuepon Aueawatthanaphisut, Kuepon Aueawatthanaphisut
Theory
  • Introduction of a novel uncertainty-aware probabilistic latent transport framework for foundation model adaptation.
  • Development of a Bayesian transport operator for geometry-preserving feature transfer under distributional shifts.
  • Integration of optimal transport dynamics with PAC-Bayesian generalization control, providing theoretical guarantees.
  • Empirical results demonstrate superior performance in latent manifold alignment and uncertainty calibration.
Read more
Causality-Driven Disentangled Representation Learning in Multiplex Graphs
Saba Nasiri, Selin Aviyente, Dorina Thanou
Graph Learning
  • Introduces CaDeM, a causal inference-based framework for disentangled representation learning in multiplex graphs.
  • Employs a GCN backbone and integrates three objectives for effective disentanglement of common and private embeddings.
  • Demonstrates significant improvements in representation learning performance across various datasets and tasks.
  • Addresses limitations of existing methods that do not explicitly control for confounding effects in multiplex graphs.
Read more
Wireless communication empowers online scheduling of partially-observable transportation multi-robot systems in a smart factory
Yaxin Liao, Qimei Cui, Kwang-Cheng Chen, Xiong Li, Jinlian Chen, Xiyu Zhao, Xiaofeng Tao, Ping Zhang
Robotics Optimization
  • Introduces a communication-enabled online scheduling framework for T-MRS in smart factories.
  • Integrates wireless M2M networking with route scheduling to enhance AGV coordination.
  • Demonstrates significant improvements in scheduling efficiency compared to traditional methods.
  • Highlights the differences between M2M and human-to-human communication in the context of scheduling.
Read more
Safe Reinforcement Learning with Preference-based Constraint Inference
Chenglin Li, Guangchun Ruan, Hua Geng
Reinforcement Learning Robotics Optimization
  • Introduces PbCRL, a novel method for inferring safety constraints from human preferences.
  • Addresses limitations of traditional Bradley-Terry models in capturing heavy-tailed cost distributions.
  • Incorporates a dead zone mechanism and SNR loss to improve exploration and constraint alignment.
  • Demonstrates superior performance in safety and reward compared to existing methods.
Read more
MolEvolve: LLM-Guided Evolutionary Search for Interpretable Molecular Optimization
Xiangsen Chen, Ruilong Wu, Yanyan Lan, Ting Ma, Yang Liu
Large Language Models Optimization Interpretability
  • Introduces MolEvolve, an evolutionary framework for molecular optimization using LLMs.
  • Addresses the challenge of interpretability in molecular property prediction.
  • Utilizes a closed-loop verification mechanism to ensure high-precision chemical insights.
  • Outperforms existing GNN and LLM-based methods in property prediction and optimization tasks.
Read more
Robustness Quantification for Discriminative Models: a New Robustness Metric and its Application to Dynamic Classifier Selection
Rodrigo F. L. Lassance, Jasper De Bock
Theory
  • Introduction of a new robustness metric applicable to any probabilistic discriminative classifier.
  • The metric is based on Constant Odds Ratio (COR) perturbation, allowing for broader applicability.
  • Demonstrated correlation with accuracy through experiments using Accuracy Rejection Curves.
  • Application of the metric in dynamic classifier selection to improve prediction reliability.
Read more
Manifold Generalization Provably Proceeds Memorization in Diffusion Models
Zebang Shen, Ya-Ping Hsieh, Niao He
Generative Models Theory
  • Diffusion models can generate novel samples with coarse scores by capturing the geometry of the data.
  • The manifold hypothesis provides a framework for understanding generalization in diffusion models.
  • Generalization occurs at a faster statistical rate than full density estimation, especially for smooth manifolds.
  • Coarse score accuracy can still yield fine on-manifold coverage, enabling high-quality sample generation.
Read more
Towards Safe Learning-Based Non-Linear Model Predictive Control through Recurrent Neural Network Modeling
Mihaela-Larisa Clement, Mónika Farsang, Agnes Poks, Johannes Edelmann, Manfred Plöchl, Radu Grosu, Ezio Bartocci
Optimization Robotics Reinforcement Learning
  • Introduction of Sequential-AMPC, a sequential neural policy for NMPC.
  • Integration of safety-augmented online evaluation and fallback mechanisms.
  • Significant reduction in expert MPC rollouts needed for training.
  • Improved feasibility rates and closed-loop safety compared to baseline methods.
Read more
Full waveform inversion method based on diffusion model
Caiyun Liu, Siyang Pei, Qingfeng Yu, Jie Xiong
Generative Models Optimization Theory
  • Introduction of a conditional diffusion model for full waveform inversion.
  • Utilization of two-dimensional density information to improve inversion accuracy.
  • Demonstrated enhanced resolution and structural fidelity in inversion results.
  • Increased stability and robustness in complex geological scenarios.
Read more
Learning Response-Statistic Shifts and Parametric Roll Episodes from Wave--Vessel Time Series via LSTM Functional Models
Jose del Aguila Ferrandis, Kevin T. Crofton
Time Series
  • Development of a data-driven surrogate model using LSTM networks for predicting parametric roll in vessels.
  • The model is trained on wave-motion time series generated from both experiments and simulations, making it versatile.
  • Focus on capturing not just the dynamics of parametric roll but also the statistical shifts in response distributions.
  • Evaluation of various loss functions to improve the model's accuracy in tail risk prediction.
Read more
A Direct Classification Approach for Reliable Wind Ramp Event Forecasting under Severe Class Imbalance
Alejandro Morales-Hernández, Fabrizio De Caro, Gian Marco Paldino, Pascal Tribel, Alfredo Vaccaro, Gianluca Bontempi
Time Series
  • Introduces a direct classification approach for forecasting WPREs, addressing severe class imbalance.
  • Develops a data preprocessing strategy that enhances feature extraction from power observations.
  • Combines majority-class undersampling with ensemble learning to improve model performance.
  • Achieves over 85% accuracy and 88% weighted F1 score in numerical simulations.
Read more