AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

48 Papers today
8h Update frequency
7 Days of history
Multi-Scale Reversible Chaos Game Representation: A Unified Framework for Sequence Classification
Sarwan Ali, Taslim Murad
Interpretability Theory Computer Vision
  • Introduction of MS-RCGR, a novel encoding framework for biological sequences.
  • MS-RCGR guarantees reversibility and captures multi-resolution compositional patterns.
  • Empirical evidence shows that combining CGR features with protein language model embeddings improves classification accuracy.
  • The framework bridges traditional machine learning, computer vision, and hybrid approaches for sequence analysis.
Read more
Disentangling Damage from Operational Variability: A Label-Free Self-Supervised Representation Learning Framework for Output-Only Structural Damage Identification
Xudong Jian, Charikleia Stoura, Simon Scandella, Eleni Chatzi
Time Series
  • Introduces a self-supervised, label-free framework for structural damage identification.
  • Employs an autoencoder with disentangled latent representations to separate damage from operational variability.
  • Utilizes VICReg for invariance to nuisance factors and a frequency-domain constraint for consistency.
  • Demonstrates effectiveness on real-world datasets, including a bridge and a gearbox.
Read more
Barrier-enforced multi-objective optimization for direct point and sharp interval forecasting
Worachit Amnuaypongsa, Yotsapat Suparanonrat, Pana Wanitchollakit, Jitkomut Songsiri
Time Series Optimization
  • Introduces a multi-objective optimization framework for simultaneous point and interval forecasting.
  • Ensures non-crossing prediction intervals while maximizing sharpness through a novel loss function.
  • Eliminates the need for manual hyperparameter tuning by using adaptive weight selection.
  • Demonstrates superior performance in solar irradiance forecasting compared to existing methods.
Read more
Sampling for Quality: Training-Free Reward-Guided LLM Decoding via Sequential Monte Carlo
Jelena Markovic-Voronov, Wenhui Zhu, Bo Long, Zhipeng Wang, Suyash Gupta, Kayhan Behdin, Bee-Chung Chen, Deepak Agarwal
NLP Large Language Models Generative Models
  • Introduces a training-free probabilistic framework for reward-guided decoding in LLMs.
  • Defines a reward-augmented target distribution that enhances sequence-level quality.
  • Develops Sequential Monte Carlo algorithms for efficient sampling from modified distributions.
  • Achieves significant performance improvements on benchmarks like HumanEval and MATH500.
Read more
SetFlow: Generating Structured Sets of Representations for Multiple Instance Learning
Nikola Jovišić, Milica Škipina, Vanja Švenda
Generative Models Computer Vision Multimodal
  • SetFlow introduces a generative approach to model entire MIL bags, overcoming limitations of instance-level methods.
  • The architecture effectively captures intra-bag dependencies and is conditioned on class labels and input scale.
  • Evaluation on mammography data shows improved performance in classification tasks when using generated samples for augmentation.
  • SetFlow demonstrates competitive results even when trained exclusively on synthetic data.
Read more
A Survey of Reinforcement Learning for Large Language Models under Data Scarcity: Challenges and Solutions
Zhiyin Yu, Yuchen Mou, Juncheng Yan, Junyu Luo, Chunchun Chen, Xing Wei, Yunhui Liu, Hongru Sun, Yuxing Zhang, Jun Xu, Yatao Bian, Ming Zhang, Wei Ye, Tieke He, Jie Yang, Guanjie Zheng, Zhonghai Wu, Bo Zhang, Lei Bai, Xiao Luo
Large Language Models Reinforcement Learning NLP
  • Introduces a hierarchical framework for understanding RL in LLMs under data scarcity.
  • Categorizes existing methods into data-centric, training-centric, and framework-centric perspectives.
  • Highlights the challenges of data scarcity in RL applications for LLMs.
  • Provides a comprehensive roadmap for future research in data-efficient RL.
Read more
FB-NLL: A Feature-Based Approach to Tackle Noisy Labels in Personalized Federated Learning
Abdulmoneam Ali, Ahmed Arafa
Federated Learning
  • FB-NLL decouples user clustering from iterative training, enhancing robustness against noisy labels.
  • The framework employs a one-shot clustering method based on feature covariances, reducing communication and computation costs.
  • A feature-consistency strategy is introduced for label detection and correction, improving learning performance.
  • FB-NLL is model-independent and compatible with existing noise-robust training techniques.
Read more
Balance-Guided Sparse Identification of Multiscale Nonlinear PDEs with Small-coefficient Terms
Zhenhua Dang, Lei Zhang, Long Wang, Guowei He
Theory Interpretability Optimization
  • Introduces BG-SINDy, a method for identifying small-coefficient terms in nonlinear PDEs.
  • Utilizes balance-guided sparsification to prioritize terms based on physical importance.
  • Employs a progressive pruning strategy to eliminate insignificant terms effectively.
  • Demonstrates the method's effectiveness through numerical experiments on various PDEs.
Read more
HELM: Harness-Enhanced Long-horizon Memory for Vision-Language-Action Manipulation
Zijian Zeng, Fei Ding, Huiming Yang, Xianwei Li
Multimodal Robotics
  • HELM addresses long-horizon manipulation failures in VLA models through a novel framework.
  • The State Verifier (SV) is a key innovation, providing pre-execution failure predictions that enhance task success rates.
  • HELM demonstrates that merely extending context windows does not resolve long-horizon execution issues.
  • The framework includes comprehensive evaluations against multiple baselines, showcasing its effectiveness.
Read more
Structure-guided molecular design with contrastive 3D protein-ligand learning
Carles Navarro, Philipp Tholke, Gianni de Fabritiis
Generative Models Multimodal
  • Introduces an SE(3)-equivariant transformer for encoding 3D protein-ligand interactions.
  • Combines contrastive learning with autoregressive molecular generation for efficient virtual screening.
  • Achieves competitive results in zero-shot virtual screening on the LIT-PCBA benchmark.
  • Generates target-specific molecules that are synthetically accessible and aligned with commercial chemical spaces.
Read more
LLM-Extracted Covariates for Clinical Causal Inference: Rethinking Integration Strategies
Lei Liu, Jialin Chen, Kathy Macropol
NLP Large Language Models Interpretability
  • Integration strategy significantly affects treatment effect estimates in causal inference.
  • Directly augmenting propensity score models with LLM-extracted covariates reduces estimation bias effectively.
  • Interpretable structured covariates outperform black-box embeddings in terms of bias and auditability.
  • Fine-tuning an open-source model improves extraction accuracy and addresses privacy concerns.
Read more
Rethinking Dataset Distillation: Hard Truths about Soft Labels
Priyam Dey, Aditya Sahdev, Sunny Bhati, Konda Reddy Mopuri, R. Venkatesh Babu
Computer Vision Efficient ML Theory
  • Soft labels significantly influence the performance of dataset distillation methods, often overshadowing the benefits of high-quality coresets.
  • In the SL+KD regime, performance is primarily dictated by compute rather than data quality or size.
  • The introduction of CAD-Prune and CA2D demonstrates a new approach to dataset distillation that improves performance across various settings.
  • The study raises questions about the effectiveness of current DD practices and suggests a reevaluation of methodologies in light of their findings.
Read more
FedProxy: Federated Fine-Tuning of LLMs via Proxy SLMs and Heterogeneity-Aware Fusion
Tao Fan, Guoqiang Ma, Yuanfeng Song, Lixin Fan, Kai Chen, Qiang Yang
Large Language Models Federated Learning Optimization
  • FedProxy addresses the trilemma of IP protection, client privacy, and performance loss in federated learning.
  • The framework utilizes a Proxy SLM for effective federated fine-tuning, enhancing representation capacity.
  • Heterogeneity-aware aggregation strategies are implemented to mitigate parameter interference during model updates.
  • FedProxy achieves performance comparable to centralized fine-tuning while maintaining privacy and IP security.
Read more
LBLLM: Lightweight Binarization of Large Language Models via Three-Stage Distillation
Siqing Song, Chuang Wang, Yong Lang, Yi Yang, Xu-Yao Zhang
Large Language Models Efficient ML
  • Introduces a three-stage quantization strategy for effective low-bit quantization of LLMs.
  • Combines PTQ initialization with lightweight QAT to enhance model performance.
  • Eliminates the need for high-precision auxiliary channels and rotation matrices.
  • Achieves significant improvements in perplexity and accuracy with minimal computational resources.
Read more
Learning Posterior Predictive Distributions for Node Classification from Synthetic Graph Priors
Jeongwhan Choi, Jongwoo Kim, Woosung Kang, Noseong Park
Graph Learning
  • NodePFN is the first to extend the posterior predictive network paradigm to graphs, enabling universal node classification.
  • The method utilizes synthetic graph priors that systematically control homophily, community structures, and feature-label relationships.
  • A dual-branch architecture is developed to integrate context-query attention with local message passing for enhanced learning.
  • NodePFN achieves competitive performance across 23 diverse benchmarks, particularly excelling in heterophily graph scenarios.
Read more
Separating Geometry from Probability in the Analysis of Generalization
Maxim Raginsky, Benjamin Recht
Theory Optimization
  • Introduces a deterministic framework for analyzing generalization in machine learning.
  • Decouples geometric properties from probabilistic assumptions in traditional generalization analysis.
  • Establishes variational principles that relate in-sample and out-of-sample performance.
  • Provides new insights into the stability of machine learning algorithms under data perturbations.
Read more
HardNet++: Nonlinear Constraint Enforcement in Neural Networks
Andrea Goertzen, Kaveh Alim, Navid Azizan
Optimization Robotics Theory
  • Introduces HardNet++, a method for enforcing nonlinear constraints in neural networks.
  • Utilizes a differentiable projection framework for simultaneous enforcement of linear and nonlinear constraints.
  • Demonstrates convergence guarantees for achieving small constraint violations.
  • Validates the method through experiments on a nonlinear model predictive control task.
Read more
The Logical Expressiveness of Topological Neural Networks
Amirreza Akbari, Amauri H. Souza, Vikas Garg
Graph Learning Theory
  • TNNs incorporate higher-order relational structures, enhancing their expressiveness compared to traditional GNNs.
  • The k-CCWL test and topological counting logic (TCk) are introduced as new frameworks for analyzing TNNs.
  • The paper establishes the equivalence between k-CCWL, TCk+2, and a topological pebble game, providing a unified understanding of TNN expressiveness.
  • The findings highlight the limitations of GNNs in capturing complex graph properties and suggest TNNs as a more robust alternative.
Read more
Concept Inconsistency in Dermoscopic Concept Bottleneck Models: A Rough-Set Analysis of the Derm7pt Dataset
Gonzalo Nápoles, Isel Grau, Yamisleydi Salgueiro
Computer Vision Interpretability
  • Identified concept-level inconsistencies in the Derm7pt dataset affecting model accuracy.
  • Established a theoretical accuracy ceiling of 92.1% for CBMs using hard concepts.
  • Developed Derm7pt+, a consistent benchmark subset that improves classification quality.
  • Demonstrated the effectiveness of EfficientNet architectures in achieving high performance metrics.
Read more
Latent Phase-Shift Rollback: Inference-Time Error Correction via Residual Stream Monitoring and KV-Cache Steering
Manan Gupta, Dhruv Kumar
NLP Large Language Models Generative Models
  • LPSR introduces a novel inference-time error correction method for large language models.
  • The method detects reasoning errors in real-time by monitoring the residual stream and identifying phase shifts.
  • LPSR achieves superior performance on the MATH-500 benchmark compared to existing methods and larger models.
  • The study reveals that optimal layers for error detection and task accuracy differ, informing better model monitoring strategies.
Read more
Decompose, Structure, and Repair: A Neuro-Symbolic Framework for Autoformalization via Operator Trees
Xiaoyang Liu, Zineng Dong, Yifan Bai, Yantao Li, Yuntian Liu, Tao Luo
NLP Large Language Models Theory
  • Introduction of the DSR framework for modular autoformalization of mathematical statements.
  • Decomposition of NL statements into logical components and mapping them to structured operator trees.
  • Development of the PRIME benchmark for evaluating autoformalization across various mathematical domains.
  • DSR achieves state-of-the-art performance in autoformalization tasks.
Read more
The High Explosives and Affected Targets (HEAT) Dataset
Bryan Kaiser, Kyle Hickmann, Sharmistha Chakrabarti, Soumi De, Sourabh Pandit, David Schodt, Jesus Pulido, Divya Banesh, Christine Sweeney
Theory Efficient ML
  • HEAT provides a comprehensive dataset for training AI surrogate models in high-explosive dynamics.
  • The dataset includes over 661,000 snapshots from various simulations, capturing essential physical phenomena.
  • It enables the development of computationally efficient models that can replace traditional high-cost experiments.
  • The dataset is structured to facilitate the training of models that predict the evolution of multi-material interactions under shock loading.
Read more
A PPA-Driven 3D-IC Partitioning Selection Framework with Surrogate Models
Shang Wang, Shuai Liu, Owen Randall, Matthew E. Taylor
Optimization
  • DOPP effectively bridges the gap between proxy metrics and true PPA outcomes.
  • The framework achieves significant improvements in PPA metrics over Open3DBench.
  • DOPP reduces the number of expensive PPA evaluations while maintaining performance.
  • The methodology allows for tailored solutions based on user-specific PPA preferences.
Read more
AC-SINDy: Compositional Sparse Identification of Nonlinear Dynamics
Peter Racioppo
Theory Interpretability Time Series
  • AC-SINDy replaces sparse basis selection with a learned computational graph structure.
  • The method separates state estimation from dynamics identification, improving noise robustness.
  • Feature Normalization ensures learned coefficients reflect functional importance.
  • Pruning-based structure learning enables recovery of sparse, interpretable dynamics.
Read more
Task Switching Without Forgetting via Proximal Decoupling
Pourya Shamsolmoali, Masoumeh Zareapoor, Eric Granger, William A. P. Smith, Yue Lu
Theory Optimization Efficient ML
  • Introduces a novel operator splitting approach to continual learning, separating task learning from stability enforcement.
  • Utilizes Douglas-Rachford Splitting to enable selective parameter updates, enhancing model adaptability.
  • Achieves state-of-the-art performance on standard benchmarks without the need for replay buffers or complex architectures.
  • Theoretical justification supports the effectiveness of the proposed method in addressing the stability-plasticity dilemma.
Read more
D-QRELO: Training- and Data-Free Delta Compression for Large Language Models via Quantization and Residual Low-Rank Approximation
Junlin Li, Shuangyong Song, Guodong Du, Ngai Wong, Xuebo Liu, Yongxiang Li, Min Zhang, Jing Li, Xuelong Li
Large Language Models Efficient ML NLP
  • D-QRELO effectively combines quantization and low-rank approximation for delta compression.
  • The method is training- and data-free, enhancing its generality and efficiency.
  • Extensive experiments show D-QRELO outperforms existing delta compression methods.
  • The paper provides insights into how SFT data scale affects delta compression efficiency.
Read more
REALM: Reliable Expertise-Aware Language Model Fine-Tuning from Noisy Annotations
Sajjad Ghiasvand, Mark Beliaev, Mahnoosh Alizadeh, Ramtin Pedarsani
NLP Large Language Models
  • REALM learns annotator expertise values unsupervised, enhancing model robustness against noisy labels.
  • The method significantly outperforms traditional noisy SFT approaches across multiple datasets and tasks.
  • Accuracy improvements of up to 50% are observed in adversarial settings, with gains increasing with model capacity.
  • REALM adapts to multi-task scenarios by capturing per-annotator reliability through a learned matrix.
Read more
Parkinson's Disease Detection via Self-Supervised Dual-Channel Cross-Attention on Bilateral Wrist-Worn IMU Signals
Meheru Zannat
Time Series
  • Introduces a dual-channel cross-attention architecture to capture motor asymmetry in PD.
  • Achieves high classification accuracy with minimal labeled data through self-supervised learning.
  • Demonstrates real-time inference capabilities suitable for edge deployment.
  • Addresses the clinical challenge of differentiating PD from other neurodegenerative diseases.
Read more
On the Generalization Bounds of Symbolic Regression with Genetic Programming
Masahiro Nomura, Ryoki Hamano, Isao Ono
Theory Interpretability
  • Derives a generalization bound for GP-based symbolic regression models.
  • Decomposes the generalization gap into structure-selection and constant-fitting components.
  • Links practical design choices in GP to explicit complexity terms in the generalization bound.
  • Provides a theoretical perspective on common practices like parsimony pressure and depth limits.
Read more
AutoPPA: Automated Circuit PPA Optimization via Contrastive Code-based Rule Library Learning
Chongxiao Li, Pengwei Jin, Di Huang, Guangrun Sun, Husheng Han, Jianan Mu, Xinyao Zheng, Jiaguo Zhu, Shuyi Xing, Hanjun Wei, Tianyun Ma, Shuyao Cheng, Rui Zhang, Ying Wang, Zidong Du, Qi Guo, Xing Hu
Optimization
  • AutoPPA automates PPA optimization without human intervention by generating rules from raw RTL code.
  • The E2I workflow contrasts and abstracts optimization rules from diverse code pairs, improving scalability and efficiency.
  • An adaptive multi-step search framework enhances the retrieval and application of optimization rules.
  • Experimental results show AutoPPA achieves up to 15.31% area improvement and 11.28% delay reduction compared to manual and state-of-the-art methods.
Read more
Multi-Label Phase Diagram Prediction in Complex Alloys via Physics-Informed Graph Attention Networks
Eunjeong Park, Amrita Basak
Graph Learning
  • Introduction of a physics-informed graph attention network for phase diagram prediction.
  • Utilization of a large dataset generated from CALPHAD calculations for training.
  • Incorporation of thermodynamic constraints to ensure physical consistency in predictions.
  • High performance metrics achieved, including a macro-F1 score of 0.951.
Read more
Harmful Intent as a Geometrically Recoverable Feature of LLM Residual Streams
Isaac Llorente-Saguer
NLP Large Language Models Interpretability
  • Harmful intent is geometrically recoverable from LLM residual streams as a linear direction.
  • Detection performance is stable across various model architectures and alignment variants.
  • Harmful intent and refusal behaviors are functionally dissociated in LLMs.
  • Operational metrics like TPR@1%FPR should accompany AUROC in safety evaluations.
Read more
Revisiting Auxiliary Losses for Conditional Depth Routing: An Empirical Study
Qingwei Lin
NLP Large Language Models Efficient ML
  • G3 (JEPA-guided gate) improves optimization dynamics compared to G1 (MLP gate).
  • Removing utility/rank losses enhances performance for both gate architectures.
  • Structural mismatch between oracle labels and actual execution can lead to negative impacts from auxiliary losses.
  • The study provides insights into the interactions of auxiliary signals in conditional depth routing.
Read more
L1 Regularization Paths in Linear Models by Parametric Gaussian Message Passing
Yun-Peng Li, Hans-Andrea Loeliger
Optimization Theory Efficient ML
  • Introduction of two dual algorithms for computing L1 regularization paths.
  • Utilization of parametric Gaussian message passing for efficient computation.
  • Broad applicability to various linear models including LASSO and SVM.
  • Focus on exact path computation rather than approximate methods.
Read more
Shifting the Gradient: Understanding How Defensive Training Methods Protect Language Model Integrity
Satchel Grant, Victor Gillioz, Jake Ward, Thomas McGrath
NLP Large Language Models Theory
  • PPS and IP achieve defensive benefits through distinct mechanisms.
  • PPS can reduce pre-existing trait expression, while IP is ineffective on pre-finetuned models.
  • PPS shifts the activation gradient to attenuate trait acquisition, whereas IP's mechanism remains unclear.
  • IP reduces prediction loss on trait-expressing data, suggesting it 'explains away' the trait signal.
Read more
DARLING: Detection Augmented Reinforcement Learning with Non-Stationary Guarantees
Argyrios Gerogiannis, Yu-Han Huang, Venugopal V. Veeravalli
Reinforcement Learning Theory
  • DARLING is a modular framework for non-stationary reinforcement learning that operates without prior knowledge of changes.
  • It combines mean-shift detection techniques with tailored tests for identifying changes in rewards and transition dynamics.
  • The framework achieves improved dynamic regret bounds compared to existing methods, establishing it as nearly optimal.
  • Empirical evaluations demonstrate DARLING's superior performance across diverse non-stationary scenarios.
Read more
Lyapunov-Certified Direct Switching Theory for Q-Learning
Donghwan Lee
Reinforcement Learning Theory Optimization
  • Introduces a direct stochastic switching system representation for Q-learning errors.
  • Derives a finite-time final-iterate bound using a JSR-induced Lyapunov function.
  • Demonstrates that the JSR can provide a more accurate convergence rate than the traditional row-sum rate.
  • Presents a computable quadratic-certificate version of the direct switching bound.
Read more
Multi-Level Temporal Graph Networks with Local-Global Fusion for Industrial Fault Diagnosis
Bibek Aryal, Gift Modekwe, Qiugang Lu
Graph Learning Time Series
  • Introduction of a multi-level temporal graph network for industrial fault diagnosis.
  • Dynamic construction of correlation graphs to capture sensor relationships.
  • Integration of local and global features to enhance fault diagnosis accuracy.
  • Experimental validation shows improved performance on the Tennessee Eastman Process.
Read more
Chronax: A Jax Library for Univariate Statistical Forecasting and Conformal Inference
Xan Carey, Yash Deshmukh, Aileen Huang, Sunit Jadhav, Omkar Tekawade, Lorraine Yang, Anvesha Tiwary, Gerardo Riano, Amy Greenwald, Denizalp Goktas
Time Series
  • Chronax is built on JAX, enabling functional purity and composable transformations for forecasting.
  • The library addresses scalability issues in forecasting large collections of time series data.
  • Chronax supports model-agnostic conformal inference for uncertainty quantification.
  • The design allows for seamless integration with modern machine learning and scientific computing pipelines.
Read more
LLM as a Tool, Not an Agent: Code-Mined Tree Transformations for Neural Architecture Search
Masakazu Yoshimura, Zitang Sun, Yuiko Sakuma, Junji Otsuka, Atsushi Irie, Takeshi Ohashi
Large Language Models Optimization Computer Vision
  • Introduces LLMasTool, a hierarchical tree-based NAS framework.
  • Utilizes LLMs as tools for fine-tuning architecture rather than as autonomous agents.
  • Implements a diversity-guided evolutionary algorithm for efficient exploration.
  • Demonstrates significant performance improvements over existing NAS methods.
Read more
When Graph Structure Becomes a Liability: A Critical Re-Evaluation of Graph Neural Networks for Bitcoin Fraud Detection under Temporal Distribution Shift
Saket Maganti
Graph Learning
  • Random Forest on raw features outperforms all tested GNNs under strict inductive conditions.
  • A significant performance gap exists between transductive and inductive training methods for GNNs.
  • Randomly shuffled edges yield better performance than the actual transaction graph, indicating potential issues with the dataset's topology.
  • The study emphasizes the importance of evaluation protocols in assessing the effectiveness of machine learning models.
Read more
Ultrametric OGP - parametric RDT symmetric binary perceptron connection
Mihailo Stojnic
Theory
  • Introduces a rigorous upper bound for constraint densities in ultrametric OGPs.
  • Establishes a connection between parametric RDT and overlap gap properties.
  • Presents numerical evaluations that align closely with previous parametric RDT estimates.
  • Proposes conjectures regarding the relationships between ult-OGP and parametric RDT parameters.
Read more
Budgeted Online Influence Maximization
Pierre Perrault, Jennifer Healey, Zheng Wen, Michal Valko
Optimization Theory Graph Learning
  • Introduces a budgeted framework for online influence maximization, moving beyond fixed cardinality constraints.
  • Proposes a CUCB-style algorithm with logarithmic regret bounds for the budgeted OIM problem.
  • Demonstrates improvements over existing state-of-the-art methods in both budgeted and non-budgeted settings.
  • Validates the approach through theoretical proofs and experimental results.
Read more
Optimal Routing for Federated Learning over Dynamic Satellite Networks: Tractable or Not?
Yi Zhao, Di Yuan, Tao Deng, Suzhi Cao, Ying Dong
Federated Learning Optimization Theory
  • The paper presents a tractability analysis of routing optimization for in-orbit Federated Learning.
  • It distinguishes between tractable and NP-hard routing problems, providing efficient algorithms for the former.
  • The analysis covers various settings including model distribution, client selection, and flow splittability.
  • Insights into the inherent complexity of intractable cases are provided, guiding future research.
Read more
SaFeR-Steer: Evolving Multi-Turn MLLMs via Synthetic Bootstrapping and Feedback Dynamics
Haolong Hu, Hanyu Li, Tiancheng He, Huahui Yi, An Zhang, Qiankun Li, Kun Wang, Yang Liu, Zhigang Zeng
NLP Large Language Models Multimodal
  • Introduces SaFeR-Steer, a framework for improving multi-turn MLLM safety.
  • Combines synthetic bootstrapping with a tutor-in-the-loop reinforcement learning approach.
  • Releases the STEER dataset, specifically designed for multi-turn dialogue safety evaluation.
  • Demonstrates substantial improvements in safety and helpfulness metrics over existing methods.
Read more
Do LLM-derived graph priors improve multi-agent coordination?
Nikunj Gupta, Rajgopal Kannan, Viktor Prasanna
Reinforcement Learning Large Language Models Graph Learning
  • LLM-derived graph priors provide a semantic and data-efficient alternative to traditional coordination graph methods in MARL.
  • The integration of LLMs allows for zero-shot inference of coordination patterns from natural language descriptions.
  • The proposed method enhances agent coordination and adaptability in dynamic environments.
  • Evaluation on MPE scenarios shows significant performance improvements over baseline methods.
Read more
Reasoning on the Manifold: Bidirectional Consistency for Self-Verification in Diffusion Language Models
Jiaoyang Ruan, Xin Gao, Yinda Chen, Hengyu Zeng, Liang Du, Guanghao Li, Jie Fu, Jian Pu
NLP Large Language Models Generative Models
  • Introduction of a geometric perspective for reasoning in dLLMs.
  • Development of Bidirectional Manifold Consistency (BMC) as an unsupervised metric for solution validity.
  • Demonstration of BMC's versatility across diagnosis, inference, and alignment tasks.
  • Establishment of intrinsic geometric stability as a robust indicator of correctness.
Read more
LEPO: Latent Reasoning Policy Optimization for Large Language Models
Yuyan Zhou, Jiarui Yu, Hande Dong, Zhezheng Hao, Hong Wang, Jianqing Zhang, Qiang Lin
NLP Large Language Models Reinforcement Learning
  • LEPO introduces stochasticity into latent reasoning, enhancing exploration capabilities.
  • The framework applies reinforcement learning directly to continuous latent representations.
  • Extensive experiments show LEPO outperforms existing RL methods on various benchmarks.
  • Stochastic latent reasoning leads to higher entropy and better problem-solving distribution.
Read more