AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

48 Papers today
8h Update frequency
7 Days of history
Haiku to Opus in Just 10 bits: LLMs Unlock Massive Compression Gains
Roy Rinberg, Annabelle Michael Carrell, Simon Henniger, Nicholas Carlini, Keri Warr
NLP Large Language Models Efficient ML
  • Domain-adapted LoRA adapters improve lossless compression by 2Γ— over baseline models.
  • Lossy compression through succinct rewrites achieves a 2Γ— improvement over original responses.
  • Question-Asking compression (QA) allows small models to recover significant performance gaps using interactive questioning.
  • Compression ratios achieved are significantly smaller than prior state-of-the-art methods.
Read more
A Numerical Method for Coupling Parameterized Physics-Informed Neural Networks and FDM for Advanced Thermal-Hydraulic System Simulation
Jeesuk Shin, Donggyun Seo, Sihyeong Yu, Joongoo Jeon
Theory Efficient ML
  • Development of a hybrid framework (P2F) combining Parameterized PINNs and FDM for thermal-hydraulic simulations.
  • NA-PINN allows for data-free training and avoids retraining for different problem parameters.
  • The method ensures exact mass conservation and simplifies momentum solving in simulations.
  • Demonstrated high accuracy in a six-tank draining scenario with minimal error across various initial conditions.
Read more
Feature Weighting Improves Pool-Based Sequential Active Learning for Regression
Dongrui Wu
Optimization Theory Efficient ML
  • Introduces feature weighting in distance computation for active learning in regression.
  • Proposes five new active learning approaches that incorporate feature weighting.
  • Demonstrates improved performance of feature-weighted methods over traditional unweighted methods.
  • Extends the applicability of feature weighting to both single-task and multi-task regression problems.
Read more
Reflective Context Learning: Studying the Optimization Primitives of Context Space
Nikita Vassilyev, William Berrios, Ruowang Zhang, Bo Han, Douwe Kiela, Shikib Mehri
Optimization Reinforcement Learning Theory
  • Introduction of Reflective Context Learning (RCL) as a unified framework for context optimization.
  • Emphasis on reflection and iterative updates to context instead of traditional gradient-based methods.
  • Integration of classical optimization techniques to enhance context learning.
  • Demonstrated improvements in performance across multiple benchmarks.
Read more
Coupled Query-Key Dynamics for Attention
Barak Gahtan, Alex M. Bronstein
NLP Large Language Models Efficient ML
  • Introduces Coupled QK Dynamics, enhancing attention mechanisms by evolving queries and keys jointly.
  • Achieves significant improvements in language modeling perplexity with minimal additional parameters.
  • Structural ablation studies confirm that coupling is the key factor for performance gains.
  • Effectiveness varies by corpus, with benefits observed in domain-coherent texts but not in heterogeneous datasets.
Read more
DISCO-TAB: A Hierarchical Reinforcement Learning Framework for Privacy-Preserving Synthesis of Complex Clinical Data
Arshia Ilaty, Hossein Shirazi, Amir Rahmani, Hajar Homayouni
Reinforcement Learning Generative Models Large Language Models
  • DISCO-TAB synthesizes clinical data while preserving privacy and ensuring clinical validity.
  • The framework uses a hierarchical reinforcement learning approach to evaluate data quality at multiple granularities.
  • It incorporates techniques to preserve medical logic and address class imbalances in synthetic data.
  • DISCO-TAB shows significant improvements in clinical classifier utility and statistical fidelity compared to existing methods.
Read more
FedSQ: Optimized Weight Averaging via Fixed Gating
Cristian PΓ©rez-Corral, Jose I. Mestre, Alberto FernΓ‘ndez-HernΓ‘ndez, Manuel F. Dolz, JosΓ© Duato, Enrique S. Quintana-OrtΓ­
Federated Learning
  • FedSQ decouples structural and quantitative knowledge in federated learning.
  • The method stabilizes aggregation under heterogeneous client data by fixing gating masks.
  • Empirical results show improved convergence efficiency compared to standard federated averaging.
  • FedSQ is particularly effective in cross-silo federated learning settings.
Read more
Conditional Sampling via Wasserstein Autoencoders and Triangular Transport
Mohammad Al-Jarrah, Michele Martino, Marcus Yim, Bamdad Hosseini, Amirhossein Taghvaei
Generative Models Theory Efficient ML
  • Introduction of Conditional Wasserstein Autoencoders (CWAEs) for conditional sampling.
  • Utilization of block-triangular decoders to exploit low-dimensional structures in data.
  • Demonstration of substantial error reductions in approximation compared to traditional methods.
  • Theoretical exploration of connections between CWAEs and conditional optimal transport.
Read more
Random Is Hard to Beat: Active Selection in online DPO with Modern LLMs
Giyeong Oh, Junghyun Lee, Jaehyun Park, Youngjae Yu, Wonho Bae, Junhyug Noh
NLP Large Language Models Optimization
  • Active Preference Learning (APL) shows minimal advantage over RANDOM sampling in online DPO.
  • Improvements in proxy win rates can occur alongside declines in general model capabilities.
  • The study highlights the inefficiency of active selection strategies in the presence of strong pre-trained priors.
  • The findings raise questions about the practical benefits of computationally intensive active selection methods.
Read more
Bridging Deep Learning and Integer Linear Programming: A Predictive-to-Prescriptive Framework for Supply Chain Analytics
Khai Banh Nghiep, Duc Nguyen Minh, Lan Hoang Thi
Time Series Optimization
  • The study systematically compares deep learning models with traditional statistical methods for demand forecasting.
  • N-BEATS outperforms MSTL in forecasting accuracy, making it the most optimized model for this dataset.
  • The proposed framework integrates forecasting with operational decision-making through integer linear programming.
  • The research demonstrates the practical application of improved forecasting in logistics planning.
Read more
Universal Hypernetworks for Arbitrary Models
Xuanfeng Zhou
Computer Vision Graph Learning NLP
  • UHN is a fixed-architecture generator that can produce weights for various models without redesigning the hypernetwork.
  • It supports multi-model generalization and multi-task learning across different architectures.
  • UHN allows for stable recursive generation of hypernetworks, enhancing flexibility in model creation.
  • Empirical results show UHN's competitive performance across diverse benchmarks.
Read more
HyperFitS -- Hypernetwork Fitting Spectra for metabolic quantification of ${}^1$H MR spectroscopic imaging
Paul J. Weiser, Gulnur Ungan, Amirmohammad Shamaei, Georg Langs, Wolfgang Bogner, Malte Hoffmann, Antoine Klauser, Ovidiu C. Andronesi
Optimization Efficient ML
  • HyperFitS significantly reduces spectral fitting times from hours to seconds.
  • The method allows for flexible baseline corrections and water suppression adjustments.
  • Metabolite maps generated by HyperFitS show strong agreement with conventional fitting methods.
  • Baseline parametrization can substantially impact metabolic quantification results.
Read more
Causal-Audit: A Framework for Risk Assessment of Assumption Violations in Time-Series Causal Discovery
Marco Ruiz, Miguel Arana-Catania, David R. Ardila, Rodrigo Ventura
Time Series
  • Causal-Audit formalizes assumption validation as calibrated risk assessment.
  • The framework computes risk scores based on five assumption families and provides uncertainty intervals.
  • An abstention-aware decision policy recommends methods only when reliable inference is possible.
  • Evaluation shows high calibration accuracy (AUROC > 0.95) and significant false positive reduction.
Read more
Robust Graph Representation Learning via Adaptive Spectral Contrast
Zhuolong Li, Boxue Yang, Haopeng Chen
Graph Learning Theory
  • Identifies a spectral dilemma in graph contrastive learning regarding the trade-off between high-frequency signal utility and noise sensitivity.
  • Introduces ASPECT, a framework that utilizes a reliability-aware spectral gating mechanism to improve robustness in graph representation learning.
  • Demonstrates that existing global spectral fusion strategies are suboptimal for mixed graphs with varying node-wise frequency preferences.
  • Achieves state-of-the-art performance on 8 out of 9 benchmarks, particularly on heterophilic graphs.
Read more
Batched Contextual Reinforcement: A Task-Scaling Law for Efficient Reasoning
Bangji Yang, Hongbo Ma, Jiajun Fan, Ge Liu
NLP Large Language Models Reinforcement Learning Efficient ML
  • Introduction of Batched Contextual Reinforcement (BCR) for efficient reasoning in LLMs.
  • Discovery of a task-scaling law where increasing concurrent problems reduces token usage while maintaining accuracy.
  • Demonstration of a 'free lunch' phenomenon where accuracy improves despite reduced verbosity.
  • Emergence of self-regulated efficiency in models, eliminating redundant reasoning loops.
Read more
LiME: Lightweight Mixture of Experts for Efficient Multimodal Multi-task Learning
Md Kowsher, Haris Mansoor, Nusrat Jahan Prottasha, Ozlem Garibay, Victor Zhu, Zhengping Ji, Chen Chen
Multimodal Efficient ML Theory
  • LiME reduces the number of trainable parameters significantly compared to traditional MoE-PEFT methods.
  • The approach allows for expert specialization without the need for separate adapters for each expert.
  • Zero-parameter routing is achieved by utilizing existing representations, eliminating the overhead of learned routers.
  • LiME is compatible with various PEFT methods, enhancing its versatility.
Read more
PRISM: LLM-Guided Semantic Clustering for High-Precision Topics
Connor Douglas, Utkucan Balci, Joseph Aylett-Bullock
NLP Large Language Models Interpretability
  • PRISM combines LLM capabilities with efficient topic modeling techniques.
  • The framework utilizes a student-teacher model to distill LLM supervision into a lightweight encoder.
  • Thresholded clustering allows for precise topic separation without over-partitioning.
  • PRISM shows improved performance over existing topic modeling methods across multiple corpora.
Read more
UQ-SHRED: uncertainty quantification of shallow recurrent decoder networks for sparse sensing via engression
Mars Liyao Gao, Yuxuan Bao, Amy S. Rude, Xinwei Shen, J. Nathan Kutz
Time Series Theory Efficient ML
  • UQ-SHRED provides a distributional learning framework for valid uncertainty quantification in sparse sensing.
  • The method combines noise injection with energy score minimization, maintaining computational efficiency.
  • Theoretical guarantees are established for the learned conditional distribution, supporting its use in uncertainty-aware applications.
  • UQ-SHRED is validated across multiple scientific datasets, showcasing its effectiveness in various domains.
Read more
Complex-Valued GNNs for Distributed Basis-Invariant Control of Planar Systems
Samuel Honor, Mohamed Abdelnaby, Kevin Leahy
Graph Learning Robotics Theory
  • Introduces a complex-valued GNN architecture that is invariant to local basis choices.
  • Enhances data efficiency and tracking performance in distributed control tasks.
  • Demonstrates improved generalization over traditional real-valued GNNs.
  • Addresses limitations of existing GNNs in GPS-denied and compass-denied environments.
Read more
Toward an Operational GNN-Based Multimesh Surrogate for Fast Flood Forecasting
Valentin Mercier, Serge Gratton, Corentin Lapeyre, GwenaΓ«l Chevallet
Graph Learning Time Series Efficient ML
  • Development of a GNN-based surrogate model for flood forecasting.
  • Utilization of a projected-mesh strategy to enhance training efficiency.
  • Incorporation of multimesh connectivity to improve spatial reception.
  • Significant reduction in prediction time from 180 minutes to 0.4 seconds.
Read more
Beyond Semantic Manipulation: Token-Space Attacks on Reward Models
Yuheng Zhang, Mingyue Huo, Minghao Zhu, Mengxue Zhang, Nan Jiang
Reinforcement Learning Large Language Models Optimization
  • Introduction of TOMPA, a framework for adversarial optimization in token space.
  • Demonstration of TOMPA's ability to exploit vulnerabilities in state-of-the-art reward models.
  • Significant performance improvement over GPT-5 reference answers, achieving high rewards with nonsensical outputs.
  • Identification of a length-dependent effect in adversarial token patterns.
Read more
On the Geometric Structure of Layer Updates in Deep Language Models
Jun-Sik Yoo
NLP Large Language Models Interpretability
  • Introduces a functional decomposition of layer updates into a dominant tokenwise component and a residual.
  • Demonstrates a strong geometric separation between the full update and the tokenwise component.
  • Finds a significant correlation between approximation error and output perturbation, indicating the importance of the residual.
  • Validates findings across multiple architectures, offering a broad perspective on layerwise dynamics.
Read more
Towards Intrinsically Calibrated Uncertainty Quantification in Industrial Data-Driven Models via Diffusion Sampler
Yiran Ma, Jerome Le Ny, Zhichao Chen, Zhihuan Song
Theory Optimization
  • Introduces a diffusion-based framework for uncertainty quantification in industrial models.
  • Eliminates the need for post-hoc calibration by providing intrinsically calibrated predictive uncertainty.
  • Demonstrates significant improvements in uncertainty calibration and predictive accuracy over existing methods.
  • Evaluated on synthetic datasets and real-world industrial case studies.
Read more
MATA-Former & SIICU: Semantic Aware Temporal Alignment for High-Fidelity ICU Risk Prediction
Zhichong Zheng, Xiaohang Nie, Xueqi Wang, Yuanjin Zhao, Haitao Zhang, Yichao Tang
Time Series Multimodal
  • Introduction of MATA-Former, a transformer architecture that aligns clinical semantics with temporal dynamics.
  • Development of Plateau-Gaussian Soft Labeling (PSL) for continuous risk modeling instead of binary classification.
  • Creation of the SIICU dataset with over 506,000 expert-annotated clinical events to enhance evaluation of ICU risk prediction models.
  • Demonstration of superior performance in risk prediction from text-intensive, irregular clinical time series.
Read more
Koopman-Based Nonlinear Identification and Adaptive Control of a Turbofan Engine
David Grasev
Optimization Theory Robotics
  • Development of a physics-based component-level model for turbofan engine control.
  • Introduction of a meta-heuristic extended dynamic mode decomposition for accurate dynamic modeling.
  • Creation of two controllers: AKMPC and K-FBLC, with AKMPC showing superior robustness.
  • Demonstration of the Koopman model's flexibility across different control objectives.
Read more
Re-analysis of the Human Transcription Factor Atlas Recovers TF-Specific Signatures from Pooled Single-Cell Screens with Missing Controls
Arka Jain, Umesh Sharma
Theory
  • Developed a reproducible pipeline for analyzing pooled single-cell TF screens.
  • Successfully assigned TF identities to 79.2% of cells in the dataset.
  • Recovered TF-specific signatures for 59 out of 61 testable TFs, significantly improving upon previous analyses.
  • Identified key transcriptional remodelers and linked them to specific biological pathways.
Read more
Understanding Latent Diffusability via Fisher Geometry
Jing Gu, Morteza Mardani, Wonjun Lee, Dongmian Zou, Gilad Lerman
Generative Models Theory Efficient ML
  • Introduces a theoretical framework linking latent diffusability to Fisher Information Geometry.
  • Identifies and decouples three penalties of latent geometric distortion affecting diffusion performance.
  • Derives conditions for preserving Fisher Information Rate (FIR) to ensure stable diffusability.
  • Empirical validation shows the effectiveness of FI and FIR metrics in predicting latent diffusion performance.
Read more
Generalization Limits of Reinforcement Learning Alignment
Haruhi Shida, Koo Imai, Keigo Kansa
NLP Large Language Models Reinforcement Learning
  • RLHF primarily redistributes existing capabilities rather than acquiring new ones.
  • The introduction of 'compound jailbreaks' demonstrates significant vulnerabilities in LLM safety mechanisms.
  • Attack success rates increased from 14.3% with individual methods to 71.4% with combined approaches.
  • Safety mechanisms may fail against unknown attack patterns due to limited training data.
Read more
Extracting Money Laundering Transactions from Quasi-Temporal Graph Representation
Haseeb Tariq, Marwan Hassani
Graph Learning
  • Introduction of ExSTraQt, a supervised learning framework for detecting money laundering transactions.
  • Utilization of graph-based features tailored for AML detection.
  • Demonstrated significant improvements in detection accuracy over existing models.
  • Framework designed for scalability and simplicity in implementation.
Read more
Residuals-based Offline Reinforcement Learning
Qing Zhu, Xian Yu
Reinforcement Learning Optimization Theory
  • Introduces a residuals-based framework for offline reinforcement learning that addresses data coverage limitations.
  • Defines a residuals-based Bellman optimality operator that incorporates estimation errors into policy optimization.
  • Develops a residuals-based offline deep Q-learning algorithm and demonstrates its effectiveness in a stochastic environment.
  • Provides finite-sample guarantees and conditions for asymptotic optimality of the proposed methods.
Read more
Crystalite: A Lightweight Transformer for Efficient Crystal Modeling
Tin HadΕΎi VeljkoviΔ‡, Joshua Rosenthal, Ivor LončariΔ‡, Jan-Willem van de Meent
Generative Models Graph Learning Efficient ML
  • Introduction of the Geometric Enhancement Module (GEM) for direct geometric biasing in Transformers.
  • Replacement of one-hot atom representations with a compact chemically informed tokenization.
  • Crystalite achieves state-of-the-art results in crystal structure prediction and generation.
  • Significantly faster sampling compared to traditional geometry-heavy models.
Read more
Malliavin Calculus for Counterfactual Gradient Estimation in Adaptive Inverse Reinforcement Learning
Vikram Krishnamurthy, Luke Snow
Reinforcement Learning Theory Optimization
  • Introduces a novel passive Langevin-based algorithm for adaptive inverse reinforcement learning.
  • Utilizes Malliavin calculus to efficiently estimate counterfactual gradients conditioned on measure-zero events.
  • Achieves optimal convergence rates independent of trajectory resampling or kernel smoothing.
  • Provides a comprehensive algorithmic framework for counterfactual gradient estimation.
Read more
Auction-Based Online Policy Adaptation for Evolving Objectives
Guruprerana Shabadi, Kaushik Mallik
Reinforcement Learning Robotics Optimization
  • Introduces a modular framework for multi-objective reinforcement learning using auction-based policy adaptation.
  • Local policies compete through bids reflecting urgency, allowing for dynamic prioritization of objectives.
  • Demonstrates superior performance compared to monolithic policies in dynamic environments.
  • Enhances interpretability by allowing clear identification of active policies and objectives.
Read more
Self-Distilled RLVR
Chenxu Yang, Chuanyu Qin, Qingyi Si, Minghui Chen, Naibin Gu, Dingyu Yao, Zheng Lin, Weiping Wang, Jiaqi Wang, Nan Duan
Reinforcement Learning Large Language Models Theory
  • RLSD combines the advantages of OPSD and RLVR, addressing the limitations of each.
  • The paper identifies severe information leakage in OPSD, leading to unstable training.
  • RLSD decouples update direction from update magnitude, enhancing training stability.
  • Empirical results show RLSD achieves faster convergence and better performance than GRPO.
Read more
FourierMoE: Fourier Mixture-of-Experts Adaptation of Large Language Models
Juyong Jiang, Fan Wang, Hong Qi, Sunghun Kim, Jing Tang
NLP Large Language Models Efficient ML
  • FourierMoE integrates MoE architecture with inverse discrete Fourier transform (IDFT) for frequency-aware adaptation.
  • The method addresses task interference and representation deficiency in multi-task fine-tuning settings.
  • FourierMoE employs a frequency-adaptive router and learns complex coefficients to capture both phase and amplitude information.
  • Extensive evaluations show superior performance across various benchmarks with fewer trainable parameters compared to existing methods.
Read more
Neural network methods for two-dimensional finite-source reflector design
Roel Hacking, Lisa Kusch, Koondanibha Mitra, Martijn Anthonissen, Wilbert IJzerman
Optimization
  • Introduces a neural network parameterization for reflector design that addresses finite-source light distribution.
  • Develops two differentiable objective functions for optimizing reflector height.
  • Demonstrates superior performance of the neural network approach over traditional deconvolution methods.
  • Provides a comprehensive evaluation across multiple benchmarks, including height constraints.
Read more
Sven: Singular Value Descent as a Computationally Efficient Natural Gradient Method
Samuel Bright-Thonney, Thomas R. Harvey, Andre Lukas, Jesse Thaler
Optimization Efficient ML Theory
  • Sven optimizes neural networks by treating each data point's residual as a separate condition.
  • The algorithm approximates the Moore-Penrose pseudoinverse using truncated SVD, leading to lower computational costs.
  • Sven significantly outperforms Adam and other first-order methods in regression tasks.
  • The method is particularly suited for over-parameterized models and can be applied in scientific computing.
Read more
UI-Oceanus: Scaling GUI Agents with Synthetic Environmental Dynamics
Mengzhou Wu, Yuzhe Guo, Yuan Cao, Haochuan Lu, Songhe Zhu, Pingzhe Qu, Xin Chen, Kang Qin, Zhongpu Wang, Xiaode Zhang, Xinyi Wang, Wei Dai, Gang Cao, Yuetang Deng, Zhi Gong, Dezhi Ran, Linyi Li, Wei Yang, Tao Xie
Robotics Generative Models Reinforcement Learning
  • UI-Oceanus shifts the learning paradigm from high-level trajectory imitation to mastering interaction physics.
  • Forward dynamics is identified as the primary driver for scalability, outperforming traditional methods.
  • The framework enables low-cost autonomous exploration to yield high-density supervision for training.
  • Experimental results show significant performance improvements in both offline and real-world scenarios.
Read more
Modeling and Controlling Deployment Reliability under Temporal Distribution Shift
Naimur Rahman, Naazreen Tabassum
Optimization Time Series Theory
  • Introduces a dynamic state model for deployment reliability that separates average performance from temporal stability.
  • Formulates deployment adaptation as a multi-objective control problem with constraints on intervention costs.
  • Defines a class of drift-triggered intervention policies for managing reliability state and drift signals.
  • Demonstrates that selective interventions can reduce operational costs by approximately 73% while maintaining model performance.
Read more
Test-Time Scaling Makes Overtraining Compute-Optimal
Nicholas Roberts, Sungjun Cho, Zhiqi Gao, Tzu-Heng Huang, Albert Wu, Gabriel Orlanski, Avi Trost, Kelly Buchanan, Aws Albarghouthi, Frederic Sala
Large Language Models Optimization Theory
  • Introduces Train-to-Test (T2) scaling laws that optimize pretraining and test-time decisions jointly.
  • Demonstrates that optimal pretraining strategies shift towards overtraining when factoring in inference costs.
  • Validates the T2 scaling approach by showing improved performance of overtrained models across various tasks.
  • Findings remain relevant even after post-training, suggesting practical implications for model deployment.
Read more
Graph Neural Operator Towards Edge Deployability and Portability for Sparse-to-Dense, Real-Time Virtual Sensing on Irregular Grids
William Howes, Jason Yoo, Kazuma Kobayashi, Subhankar Sarkar, Farid Ahmed, Souvik Chakraborty, Syed Bahauddin Alam
Graph Learning Efficient ML
  • VIRSO provides accurate sparse-to-dense reconstruction for irregular geometries.
  • The method integrates spectral and spatial analysis for improved performance.
  • Achieves mean relative L2 errors below 1% while reducing energy-delay product significantly.
  • Demonstrates edge-deployability with low power consumption and latency.
Read more
Enhancing the Reliability of Medical AI through Expert-guided Uncertainty Modeling
Aleksei Khalin, Ekaterina Zaychenkova, Aleksandr Yugay, Andrey Goncharov, Sergey Korchagin, Alexey Zaytsev, Egor Ershov
Computer Vision Interpretability Theory
  • Expert evaluations significantly enhance the quality of uncertainty estimates in medical AI.
  • The proposed two-ensemble method effectively separates epistemic and aleatoric uncertainty.
  • The framework shows substantial improvements in various medical tasks, outperforming state-of-the-art methods.
  • A simplified one-ensemble method offers comparable performance with greater efficiency.
Read more
Physics Informed Reinforcement Learning with Gibbs Priors for Topology Control in Power Grids
Pantelis Dogoulis, Maxime Cordy
Reinforcement Learning Graph Learning Optimization
  • Introduces a physics-informed RL methodology for topology control in power grids.
  • Utilizes a Gibbs prior to select a small, state-dependent set of feasible actions.
  • Employs a graph neural network to predict overload risks for action evaluation.
  • Achieves significant improvements in reward and decision time compared to existing methods.
Read more
SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization
Zhengxi Lu, Zhiyuan Yao, Jinyang Wu, Chengcheng Han, Qi Gu, Xunliang Cai, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen
Reinforcement Learning Large Language Models Robotics
  • SKILL0 is the first RL framework explicitly designed for skill internalization, enabling zero-shot autonomous behavior.
  • In-context reinforcement learning (ICRL) is introduced to transition from context-dependent execution to intrinsic competence.
  • Dynamic Curriculum adaptsively withdraws skills based on their on-policy helpfulness, optimizing the learning process.
  • SKILL0 achieves substantial performance improvements over traditional RL baselines while maintaining a low token context size.
Read more
Low-Rank Compression of Pretrained Models via Randomized Subspace Iteration
Farhad Pourkamali-Anaraki
Efficient ML Theory Computer Vision
  • Establishes a theoretical link between low-rank approximation error and predictive performance.
  • Proposes randomized subspace iteration (RSI) as a superior alternative to RSVD for model compression.
  • Demonstrates that RSI improves approximation quality in scenarios with slow-decaying singular value spectra.
  • Evaluates the effectiveness of RSI on both convolutional and transformer-based architectures.
Read more
Application of parametric Shallow Recurrent Decoder Network to magnetohydrodynamic flows in liquid metal blankets of fusion reactors
M. Lo Verso, C. Introini, E. Cervi, L. Savoldi, J. N. Kutz, A. Cammi
Time Series
  • SHRED effectively reconstructs MHD states from sparse measurements.
  • The integration of SVD with SHRED enhances computational efficiency.
  • The framework generalizes well to unseen magnetic field configurations.
  • SHRED can infer magnetic field dynamics from temperature data alone.
Read more
Mitigating Reward Hacking in RLHF via Advantage Sign Robustness
Shinnosuke Ono, Johannes Ackermann, Soichiro Nishimori, Takashi Ishida, Masashi Sugiyama
Reinforcement Learning Large Language Models Optimization
  • Introduces Sign-Certified Policy Optimization (SignCert-PO) to mitigate reward hacking in RLHF.
  • Focuses on the concept of advantage sign robustness to improve policy updates.
  • Operates without the need for multiple reward models or extensive training data.
  • Achieves superior performance on benchmark tasks compared to existing methods.
Read more
Hierarchical Planning with Latent World Models
Wancong Zhang, Basile Terver, Artem Zholus, Soham Chitnis, Harsh Sutaria, Mido Assran, Randall Balestriero, Amir Bar, Adrien Bardes, Yann LeCun, Nicolas Ballas
Reinforcement Learning Robotics Optimization
  • Introduces a hierarchical planning framework that operates on multiple temporal scales.
  • Achieves a 70% success rate in real-world robotic tasks with zero-shot control.
  • Reduces planning time complexity by up to three times compared to flat models.
  • Eliminates the need for inverse models or skill learning by using latent state matching.
Read more