AI-generated summaries
Today's ML research,
without the noise.
Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.
48
Papers today
8h
Update frequency
7
Days of history
CART: Context-Anchored Recurrent Transformer -- A Parameter-Efficient Architecture with Learned Stability
NLP
Large Language Models
Efficient ML
- CART achieves parameter efficiency by reusing a shared core block across multiple iterations.
- The architecture separates context encoding from iterative refinement, reducing computational overhead.
- A learned LTI gate stabilizes the recurrent computation, maintaining a consistent spectral radius.
- Empirical results show that the best configuration can vary significantly between training stages.
Read more
CART: Context-Anchored Recurrent Transformer -- A Parameter-Efficient Architecture with Learned Stability
Summary
This paper introduces CART (Context-Anchored Recurrent Transformer), a novel language model architecture designed for parameter efficiency by reusing a single shared core block across multiple iterations. Unlike traditional looped transformers that recompute key-value (K, V) representations at each iteration, CART computes K and V once from a multi-layer prelude and reuses them in a recurrent core through multi-head latent attention (MLA) cross-attention. This approach separates context encoding from iterative refinement, thereby reducing computational costs and enhancing stability across iterations. A learned Linear Time-Invariant (LTI) gate ensures stability by maintaining a narrow range of spectral radius values during training. The evaluation of CART is conducted in two stages, involving a hyperparameter screening and full training across various configurations. The results reveal that the optimal configuration can change based on the training stage, highlighting the importance of prelude depth over loop count. The findings also indicate that excessive iterations during inference can degrade performance, suggesting a need for careful tuning of model parameters and training strategies.
Methodology
CART employs a unique architecture that includes a multi-layer prelude for computing K and V once, a recurrent core that uses MLA cross-attention, and a learned LTI gate for stability. The evaluation is conducted in two stages: a hyperparameter screening followed by full training across various configurations, utilizing consumer GPUs for computational efficiency.
Results
The evaluation identified P=6 as the best configuration across scales in Stage 1, but Stage 2 revealed that R=6 was optimal for larger dimensions (d ≥ 512), while R=8 performed better at d=256. The spectral radius of the LTI gate settled within a narrow band during training, and excessive iterations during inference consistently degraded performance on benchmark tasks.
Implications
CART's design could lead to more efficient language models suitable for deployment on memory-constrained hardware, with potential applications in real-time language processing and other NLP tasks. The findings on parameter tuning and inference optimization may inform future model architectures and training methodologies.
Expressivity of congruence-based architectures for DNNs on positive-definite matrices
Theory
- Congruence-like layers in DNNs for SPD matrices can lead to limited expressivity when weight matrices are constrained to be semi-orthogonal.
- The expressivity collapse to a one-hidden-layer equivalent is linked to the loss of spectral diversity in the network.
- The study compares various Riemannian classifiers to assess their effectiveness with features extracted from congruence-like layers.
- The findings emphasize the importance of architectural design in maximizing the performance of DNNs on SPD data.
Read more
Expressivity of congruence-based architectures for DNNs on positive-definite matrices
Summary
This paper investigates the expressivity of deep neural network (DNN) architectures designed for classifying symmetric positive-definite (SPD) matrices, particularly focusing on congruence-like layers. These layers, which involve multiplying the input matrix by a weight matrix and its transpose, are central to the SPDNet architecture. The authors demonstrate that imposing a semi-orthogonality constraint on the weight matrix significantly restricts the expressivity of the network, leading to a collapse into a one-hidden-layer equivalent for certain activation functions. This limitation arises from a loss of spectral diversity in the congruence-like layers, as dictated by Poincaré's separation theorem. The paper further explores various Riemannian classifiers and their compatibility with the feature maps generated by these layers, providing insights into the final classification step in the context of SPD data. Overall, the work highlights the need for careful consideration of architectural constraints to enhance the expressivity of DNNs applied to SPD matrices.
Methodology
The authors analyze the congruence-like transformation applied in DNNs for SPD matrices, examining the effects of semi-orthogonality constraints on weight matrices. They utilize theoretical frameworks, including Poincaré's separation theorem, to establish the relationship between these constraints and network expressivity. Additionally, they evaluate different Riemannian classifiers to determine their compatibility with the features produced by the congruence-like layers.
Results
The study reveals that restricting the weight matrix to be semi-orthogonal results in a significant reduction in the expressivity of the DNN architecture, effectively limiting it to a one-hidden-layer model for certain activation functions. The analysis of various Riemannian classifiers indicates differing levels of compatibility with the feature maps generated by the congruence-like layers, suggesting that the choice of classifier can impact classification performance.
Implications
The findings of this research have implications for the design of neural architectures tailored for SPD matrices, particularly in fields where such data is prevalent, such as medical imaging and signal processing. By understanding the limitations imposed by architectural constraints, practitioners can better design DNNs that leverage the unique properties of SPD data for improved classification outcomes.
HARVE: Hacking-Aware Reward-Head Vector Editing for Robust Reward Models
NLP
Large Language Models
Interpretability
- Introduction of REWARDHACKBENCH, a benchmark for evaluating reward model robustness against hacking.
- Development of HARVE, a training-free method for reward-head vector editing to enhance robustness.
- Demonstration that HARVE significantly improves performance over traditional fine-tuning methods.
- Empirical evidence that reward hacking can be captured as a multidimensional subspace in reward-model representations.
Read more
HARVE: Hacking-Aware Reward-Head Vector Editing for Robust Reward Models
Summary
This paper addresses the vulnerability of reward models used in large language model (LLM) alignment to reward hacking, where models exploit reward-correlated cues without improving the quality of outputs. The authors introduce REWARDHACKBENCH, a benchmark comprising 1,203 matched gold-hacked pairs across 13 reward-hacking patterns, revealing significant failures in existing reward models. To mitigate these issues, they propose HARVE, a training-free method that edits the reward-head vector to reduce sensitivity to hacking-related features. HARVE identifies a multi-directional hacking subspace from residual-stream directions and removes components aligned with that subspace using a small set of contrastive examples. Experimental results demonstrate that HARVE outperforms fine-tuning methods, improving robustness against reward hacking while maintaining overall model performance. The findings suggest that reward hacking is better represented as a multidimensional subspace rather than isolated cues, providing a new perspective for targeted mitigation strategies.
Methodology
The authors developed HARVE, which edits the reward-head vector of scalar reward models by identifying and removing components aligned with a multi-directional hacking subspace derived from residual-stream directions. This approach is training-free and utilizes a small set of contrastive examples to enhance robustness against reward hacking.
Results
HARVE achieved a 21.1 percentage point improvement in gold-preference rates on target subcategories compared to the original reward model and a 13.7 point improvement over fine-tuning baselines. It also preserved performance on non-target subcategories and improved overall performance on RM-Bench by 2.4 points.
Implications
The findings suggest that HARVE can be a practical solution for enhancing the robustness of reward models in various applications, particularly in high-stakes domains where reward hacking can have significant consequences. The introduction of REWARDHACKBENCH also provides a valuable resource for future research in reward model evaluation and improvement.
Contrastive Neural Algorithmic Reasoning for Graph Coloring
Graph Learning
Optimization
Theory
- Introduces the first neural supervised learning approach to graph coloring with a colorability certificate.
- Proposes a contrastive learning framework that enhances interpretability and scalability in graph coloring tasks.
- Demonstrates that the proposed method achieves effective generalization across different graph families.
- Establishes a geometric understanding of node embeddings in relation to graph coloring.
Read more
Contrastive Neural Algorithmic Reasoning for Graph Coloring
Summary
This paper addresses the problem of graph coloring, specifically approximate k-coloring, where the objective is to assign colors to graph nodes such that adjacent nodes have different colors while minimizing monochromatic edges. The authors propose a novel contrastive learning framework that learns transferable coloring geometry, aligning embeddings of same-color nodes while pushing adjacent nodes' representations apart. This approach contrasts with recent unsupervised graph neural network (GNN) methods that optimize each instance independently, limiting generalization across different graph sizes and distributions. The authors derive a population objective over bounded-size graphs and demonstrate that the optimal embeddings exhibit a line-prototype structure, where nodes of the same color collapse into a shared one-dimensional subspace. They also provide theoretical insights into the optimization process and establish conditions under which the proposed method yields effective colorings. Empirical results on synthetic and real-world graphs show that the contrastive GNN encoders generalize well and produce low-conflict colorings, often outperforming traditional greedy algorithms.
Methodology
The authors employ a contrastive learning framework where vertices assigned the same color are treated as positive pairs, while adjacent vertices are treated as negative pairs. They utilize an absolute-value variant of the InfoNCE loss to align unoriented lines representing color prototypes. The method involves training a GNN encoder to map vertices to unit-norm embeddings, followed by clustering to determine vertex colors based on the learned embeddings.
Results
The theoretical analysis reveals a sharp lower bound for the contrastive objective and characterizes optimizers that achieve it. The experiments demonstrate that the proposed contrastive GNN encoders effectively generalize across graph sizes and distributions, producing low-conflict colorings that match or improve upon greedy approaches.
Implications
The findings suggest that contrastive learning can significantly enhance the performance of graph coloring algorithms, making them more interpretable and applicable to various real-world scenarios such as scheduling and resource allocation. The geometric insights provided by the method could also inform future research in graph learning and optimization.
A Theoretical Framework for Self-Play Theorem Proving Algorithms
Theory
Large Language Models
Graph Learning
- Introduces a theoretical framework for self-play theorem proving algorithms.
- Formalizes the theorem set as a graph to analyze prover-conjecturer interactions.
- Demonstrates that a well-connected theorem graph allows for exponential growth of the prover's knowledge set.
- Proposes a diversity measure to enhance the quality of generated theorems.
Read more
A Theoretical Framework for Self-Play Theorem Proving Algorithms
Summary
This paper presents a theoretical framework for understanding self-play algorithms in the context of formal theorem proving using Large Language Models (LLMs). The authors formalize the set of theorems as a graph, where nodes represent theorems and edges indicate semantic similarity. They introduce primitive assumptions that characterize the capabilities of a prover model and the access a conjecturer has to the theorem graph structure. The study shows that a prover-conjecturer system, utilizing a reversible random walk for conjecturing, can exponentially grow the prover's knowledge set if the theorem graph is well-connected. To address the issue of conjecturers generating overly complex theorems, the authors propose a diversity measure for the training distribution of theorems and an improved conjecturing algorithm that maximizes this diversity by computing diffusion similarity between neighboring theorems. The paper also outlines a method for calculating diffusion similarity using contrastive learning to embed theorems into Euclidean space.
Methodology
The authors formalize the theorem proving process as a graph structure and analyze the interactions between a prover and a conjecturer. They introduce a conjecturing algorithm based on reversible random walks and propose a diversity measure for the generated theorems. The computation of diffusion similarity is achieved through contrastive learning techniques.
Results
The paper establishes that under certain connectivity conditions of the theorem graph, the prover's knowledge set can grow exponentially. It also provides a new conjecturing algorithm that effectively maximizes the diversity of theorems generated, thus improving the training process.
Implications
This framework could lead to more efficient training of theorem proving models, enhancing their ability to generate and prove diverse and fundamental theorems. It may also have broader applications in automated reasoning and AI-driven mathematical research.
Staying Alive: Uncensored Survival Analysis with Tabular Foundation Models
Time Series
- Introduces a training-free method for survival regression using Tabular Foundation Models.
- Constructs an Accelerated Failure Time model with minimal parameter fitting.
- Implements a non-parametric in-context estimator to handle right-censored data.
- Demonstrates competitive performance against traditional survival regression models.
Read more
Staying Alive: Uncensored Survival Analysis with Tabular Foundation Models
Summary
This paper addresses the challenges of applying Survival Analysis (SA) to time-to-event data, particularly in the presence of right-censoring, using Tabular Foundation Models (TFMs). The author proposes a novel training-free method for survival regression that leverages TFMs to predict event times and iteratively impute right-censored data. The method constructs an Accelerated Failure Time (AFT) model that requires fitting only a single scalar parameter, thus simplifying the modeling process. Additionally, the paper introduces a non-parametric in-context estimator based on the Buckley-James estimator to handle right-censored data effectively. Through experiments on standard survival analysis benchmarks, the proposed method demonstrates competitive performance compared to traditional parametric and semi-parametric survival regression models, such as Cox regression and parametric AFT models. This work highlights the potential of TFMs in survival analysis, offering a new approach that circumvents the need for extensive training while still achieving robust predictive performance.
Methodology
The methodology involves framing survival regression as a prediction task using TFMs. The author constructs an AFT model with a single scalar parameter and employs a non-parametric in-context estimator to impute censored data iteratively. The approach leverages the strengths of TFMs for zero-shot survival prediction without requiring dataset-specific training.
Results
The proposed method achieves performance comparable to classical survival models on five widely used survival analysis benchmarks, indicating its effectiveness in handling right-censored data while maintaining simplicity in model training.
Implications
The findings suggest that TFMs can be effectively utilized in survival analysis, potentially transforming how time-to-event data is modeled in various domains such as healthcare and churn prediction. This approach may lead to more accessible and efficient survival analysis methodologies.
Hallucination Is Linearly Decodable from Mid-Layer Hidden States in Quantized LLMs
NLP
Large Language Models
- A comprehensive evaluation of hallucination detection methods on quantized LLMs.
- Evidence that truthfulness and hallucinated states are linearly separable in mid-to-late transformer blocks.
- Linear probes outperform sampling-based methods in detecting hallucinations.
- Consistent peak probing layers identified across different model families.
Read more
Hallucination Is Linearly Decodable from Mid-Layer Hidden States in Quantized LLMs
Summary
This paper investigates the presence of a linearly separable truthfulness signal in the hidden states of quantized large language models (LLMs) and identifies the optimal network depth for detecting this signal. The study focuses on three instruction-tuned models (Llama-3.1-8B, Mistral-7B, and Qwen2.5-7B) loaded in 4-bit NF4 quantization and evaluates their performance across four hallucination detection benchmarks: TruthfulQA, HaluEval-QA, FEVER, and a controlled synthetic dataset. The authors compare four detection methods: linear and MLP probes, INSIDE EigenScore, self-consistency, and attention entropy. The findings reveal that a linear probe applied to a mid-network layer achieves high AUROC scores (0.904–1.000), while sampling-based methods perform significantly worse (not exceeding 0.541 AUROC). The results indicate that the truthfulness signal is approximately linear, with peak probing layers consistently located in specific blocks across different model families. Additionally, first-block attention entropy provides a useful signal in knowledge-grounded contexts without incurring extra inference costs. The study highlights the structural mismatch between paired-label evaluation and the information accessed by sampling methods, suggesting that the limitations observed are not inherent to the methods themselves. The authors provide code and data for reproducibility on consumer hardware.
Methodology
The study employs a unified evaluation framework to analyze three quantized LLMs across four hallucination detection datasets. It utilizes linear and MLP probes on hidden states, INSIDE EigenScore, self-consistency, and attention entropy to assess the models' performance in detecting hallucinations.
Results
The linear probe method achieved AUROC scores between 0.904 and 1.000, while sampling-based methods did not exceed 0.541 AUROC. The peak probing layers were consistently found in blocks 13-18 for Llama and Mistral, and blocks 19-25 for Qwen. Attention entropy provided additional insights with AUROC scores ranging from 0.866 to 0.941.
Implications
The findings suggest that linear probing of hidden states can be an effective method for hallucination detection in LLMs, particularly in resource-constrained environments. This could lead to improved deployment strategies for LLMs in real-world applications, enhancing their reliability and trustworthiness.
FAiT: Frequency-Aware Inverted Transformer for Multivariate Time Series Forecasting
Time Series
- FAiT addresses the low-pass filtering bias of traditional Transformer architectures in time series forecasting.
- The model introduces Inverted Attention to recover high-frequency signals that are typically attenuated.
- Dynamic Temporal-Frequency Modulation allows for adaptive spectral energy calibration based on the input instance.
- FAiT outperforms existing state-of-the-art models on benchmark datasets while being computationally efficient.
Read more
FAiT: Frequency-Aware Inverted Transformer for Multivariate Time Series Forecasting
Summary
The paper introduces FAiT, a Frequency-Aware Inverted Transformer designed to enhance multivariate time series forecasting (MTSF). Traditional Transformer architectures often exhibit a low-pass filtering effect due to their self-attention mechanism, which can obscure high-frequency signals critical for capturing local changes in time series data. Existing frequency-domain methods have attempted to address this issue but typically rely on fixed spectral bases and uniform modulation, which do not account for the dynamic nature of real-world time series. FAiT addresses these limitations by implementing Inverted Attention, which interprets the attention map as a learnable low-pass operator while creating a complementary high-pass branch to recover transient signals. Additionally, it employs Dynamic Temporal-Frequency Modulation (DTFM) to adaptively adjust the energy of spectral sub-bands based on instance-specific conditions, allowing for more precise control over evolving multi-scale patterns. The authors conducted extensive experiments on benchmark datasets, demonstrating that FAiT consistently outperforms state-of-the-art Transformer-based and frequency-enhanced models while maintaining computational efficiency.
Methodology
FAiT employs a novel architecture that includes Inverted Attention to create a high-pass branch for recovering high-frequency signals. It also integrates Dynamic Temporal-Frequency Modulation (DTFM) to adaptively adjust the spectral weights, allowing for fine-grained control over the temporal dynamics of the time series data.
Results
The experiments conducted on widely used benchmarks show that FAiT consistently outperforms both traditional Transformer-based models and existing frequency-enhanced approaches, indicating its effectiveness in capturing complex temporal patterns in multivariate time series data.
Implications
FAiT has significant implications for various applications requiring accurate time series forecasting, such as financial risk management, intelligent traffic control, climate resilience planning, and energy grid optimization. Its ability to adaptively capture evolving dynamics can lead to improved decision-making in high-stakes environments.
Planar Symmetric Pattern Generation
Generative Models
Computer Vision
Optimization
- Introduces a symmetrization framework for generating symmetric 2D patterns.
- Maintains continuity in representations while enforcing planar group symmetry.
- Validates the approach through diverse design tasks, demonstrating versatility.
- Separates symmetry constraints from task-specific objectives for broader applicability.
Read more
Planar Symmetric Pattern Generation
Summary
This paper addresses the challenge of generating objects with specific planar symmetries, which is crucial in various fields such as visual arts and manufacturing. The authors propose a novel symmetrization framework that transforms any 2D continuous representation into a symmetric one while maintaining continuity, overcoming limitations of existing methods that fail to enforce symmetry without introducing discontinuities. The framework embeds planar groups into an affine reflection group, allowing for the construction of a continuous G-invariant field. The authors validate their approach through multiple design tasks, including pattern design, paper-cutting design, and topology design, demonstrating effective symmetry control and broader applicability. The proposed method separates symmetry constraints from other design objectives, enabling versatile applications without the need for symmetry-specific data or models. Experimental results confirm the framework's capability to generate symmetric designs under various constraints, showcasing its potential in both visual and material design.
Methodology
The authors develop a symmetric continuous representation framework that embeds planar groups into an affine reflection group. This approach constructs a continuous G-invariant field using high-symmetry coefficients and low-symmetry bases. A unified pipeline for controllable generation is established, optimizing parameters with respect to loss functions that separate symmetry constraints from other design objectives.
Results
The experimental validation shows that the proposed framework effectively generates symmetric designs across various tasks, including pattern design, paper-cutting, and topology design. The results indicate stable symmetry control under physical constraints and demonstrate the framework's ability to produce visually appealing and functionally relevant designs.
Implications
The findings suggest that the proposed symmetrization framework can significantly enhance design processes in visual arts and manufacturing by enabling the generation of symmetric patterns without the need for extensive symmetry-specific training data. This could lead to more efficient design workflows and innovative applications in areas such as architecture, product design, and material engineering.
Mapping the evolution of small reservoirs in Brazil from 1984 to 2025 using deep learning
Computer Vision
- The number of small reservoirs in Brazil increased nearly fourfold from 1984 to 2025.
- The total surface area of these reservoirs expanded significantly, particularly in the Amazon biome.
- The study provides the first country-wide annual dataset on small reservoir evolution over four decades.
- Deep learning techniques were successfully applied to segment small reservoirs from satellite imagery.
Read more
Mapping the evolution of small reservoirs in Brazil from 1984 to 2025 using deep learning
Summary
This paper addresses the significant yet often overlooked impact of small reservoirs on Brazil's water systems, particularly in agricultural contexts. The authors highlight the challenges in mapping these small, stream-fed reservoirs due to their size and the difficulty in distinguishing them from natural water bodies. To overcome this, they developed a deep learning model utilizing Landsat satellite imagery from 1984 to 2025 to accurately segment and map small reservoirs across Brazil. The model's application resulted in the creation of annual reservoir maps, revealing a dramatic increase in the number of small reservoirs from 263,913 in 1984 to 996,245 by 2025, with total surface area expanding from 3,510 km² to 8,550 km². This research provides the first comprehensive dataset on the evolution of small reservoirs in Brazil, offering critical insights into their cumulative impacts on freshwater ecosystems and water resource management.
Methodology
The authors trained a convolutional neural network (CNN) model on Landsat satellite imagery to segment small reservoirs. The model was evaluated for performance across different Landsat sensors and compared with existing datasets to ensure accuracy. This approach allowed for the generation of annual maps detailing the count, size, and distribution of small reservoirs from 1984 to 2025.
Results
The study found that the number of detected small reservoirs increased from 263,913 in 1984 to 996,245 by 2025, with the total surface area rising from 3,510 km² to 8,550 km². The Amazon biome experienced the most significant growth in reservoir area, indicating a substantial impact of agricultural expansion on water resources.
Implications
The findings underscore the importance of monitoring small reservoirs for effective water resource management and environmental conservation. The dataset can inform policymakers and researchers about the cumulative impacts of agricultural practices on freshwater ecosystems, aiding in the development of sustainable water management strategies.
Multi-component Causal Tracing in Large Language Models
NLP
Large Language Models
Interpretability
- Introduces a unified framework for multi-component causal tracing in LLMs.
- Identifies critical subsets of model components affecting performance metrics.
- Employs an efficient algorithm that converts combinatorial problems into continuous optimization.
- Demonstrates superior performance compared to existing baseline methods.
Read more
Multi-component Causal Tracing in Large Language Models
Summary
This paper introduces a unified framework for multi-component causal tracing in large language models (LLMs), addressing the limitations of previous studies that focused on single components. The proposed framework systematically identifies critical subsets of model components, such as attention heads and multi-layer perceptron neurons, that influence specific performance metrics like accuracy and fairness. By employing flexible interventions and a novel algorithm that transforms the combinatorial search problem into a continuous optimization problem, the authors demonstrate the ability to efficiently select components that significantly impact desired metrics. Experimental results show that this method outperforms existing baseline approaches, revealing the importance of considering non-linear interactions among multiple components in LLMs. The findings highlight the need for interpretability in LLMs, particularly in mitigating biases and enhancing model performance through targeted interventions.
Methodology
The authors developed a framework for causal tracing that allows for systematic interventions on multiple components of LLMs. They designed an efficient algorithm that utilizes soft interventions and a metric transformation to address the combinatorial complexity of selecting components, enabling continuous optimization for identifying impactful subsets.
Results
The experimental results indicate that the proposed multi-component causal tracing method effectively identifies subsets of components that have a significant impact on target performance metrics, outperforming traditional single-component approaches. The findings also reveal non-linear interactions among components, challenging previous assumptions about linearity in causal tracing.
Implications
This research has implications for improving the interpretability and performance of LLMs, particularly in addressing safety risks such as bias and misinformation. The framework can be applied to various tasks, including probing linguistic features, testing for bias, and monitoring factual correctness, facilitating more targeted model enhancements.
The Impact of Temporal Granularity on Socio-Demographic Inference from Household Load Profiles
Time Series
- Coarsening temporal granularity reduces predictive accuracy but reveals stable performance plateaus.
- Handcrafted and ts-fresh features are competitive with CNN-based embeddings, with XGBoost as the top performer.
- Static attributes can be inferred from coarse data, while dynamic attributes require fine granularity.
- The study highlights the privacy-utility trade-off in smart metering data usage.
Read more
The Impact of Temporal Granularity on Socio-Demographic Inference from Household Load Profiles
Summary
This paper investigates how the temporal granularity of household load profiles affects the predictability of socio-demographic attributes, addressing a critical gap in understanding the privacy-utility trade-off in smart metering. Using a dataset of 1,589 households over one year, the authors analyze load profiles at granularities ranging from 15 minutes to 7 days. They introduce an evaluation framework that trains classifiers on year-round data but tests them on arbitrary weeks, ensuring generalization across seasonal and weekly variations. The study reveals that while coarsening granularity reduces predictive accuracy, stable performance plateaus exist between 15 minutes to 1 hour and 1 to 7 days, suggesting opportunities for data minimization without significant loss of utility. The research also finds that handcrafted and ts-fresh features are competitive with CNN-based autoencoder embeddings, with XGBoost consistently outperforming other classifiers. Furthermore, feature importance analysis indicates that static attributes like dwelling size can be inferred from coarse data, while dynamic attributes like swimming pool usage require fine-grained signals. Overall, the findings provide insights into balancing privacy and utility in smart metering, emphasizing the need for careful consideration of temporal resolution, feature extraction, and classifier choice.
Methodology
The authors employed a systematic evaluation framework that involved training classifiers on year-round load profile data and testing them on arbitrary weeks. They compared various feature extraction methods (handcrafted features, ts-fresh, CNN-based embeddings) and multiple classifiers to assess their performance across different temporal granularities.
Results
The analysis showed that predictive accuracy decreases with coarser granularity, but stable performance was observed between 15 minutes to 1 hour and again between 1 to 7 days. XGBoost outperformed other classifiers, and the importance of features varied between static and dynamic socio-demographic attributes.
Implications
The findings underscore the need for a balanced approach in smart metering deployments, where the resolution of load profiles must be carefully managed to protect household privacy while still enabling useful socio-demographic inferences. This has implications for policy-making, utility pricing, and targeted marketing strategies.
Local Guidance, Global Impact: Gaussian-Reshaped Trust Region Unlocks Behavior Transitions
Reinforcement Learning
Robotics
Optimization
- PPO's standard optimization struggles in non-stationary environments due to inefficient local updates.
- GTR introduces a Gaussian-shaped trust region that balances local stability and adaptability for policy transitions.
- The Mixture Gaussian Anchor reduces variance from stale policy references, improving robustness.
- GTR outperforms standard PPO across multiple benchmarks, showcasing its effectiveness in diverse applications.
Read more
Local Guidance, Global Impact: Gaussian-Reshaped Trust Region Unlocks Behavior Transitions
Summary
The paper addresses the limitations of Proximal Policy Optimization (PPO) in continual and non-stationary environments, where it struggles to adapt due to inefficient local updates. The authors propose a new method called Gaussian Trust Region Policy Optimization (GTR), which reshapes the trust region using a Gaussian kernel. This approach allows for bounded and non-monotonic constraints that provide local stability while enabling larger policy deviations when necessary. Additionally, a Mixture Gaussian Anchor is introduced to adapt to recent policy trajectories, reducing variance from outdated references. GTR is shown to be architecture-agnostic and demonstrates strong performance across various tasks, including games, robotic control, open-world exploration, and language model post-training. The results indicate that geometry-aware trust-region design can significantly enhance reinforcement learning in complex, dynamic environments.
Methodology
The authors developed the Gaussian Trust Region Policy Optimization (GTR) method, which employs a Gaussian kernel to reshape the trust region for policy updates. This method allows for a flexible constraint that stabilizes local updates while permitting larger deviations when advantageous. The Mixture Gaussian Anchor is also introduced to dynamically adjust the reference policy based on recent trajectories, enhancing the learning process.
Results
GTR achieved superior performance compared to standard PPO in various tasks, including games, simulated robotic control, and language model post-training. The method demonstrated effective transitions between behaviors in non-stationary environments, validating the proposed geometry-aware trust-region design.
Implications
The findings suggest that incorporating geometry-aware mechanisms in reinforcement learning can lead to more robust and adaptable algorithms, particularly in dynamic settings. This could have significant implications for applications in robotics, game AI, and other areas requiring continual learning.
QUIVER: Quantum-Informed Views for Enhanced Representations in Large ML Models
Multimodal
Theory
Graph Learning
- QUIVER integrates quantum Fisher information into classical machine learning models to enhance feature representation.
- The method is architecture-agnostic, allowing for flexible integration into various model types, including transformers and graph neural networks.
- Experimental results show significant performance improvements on QM9 and JETCLASS datasets compared to classical baselines.
- The quantum Fisher view provides a complementary modality that captures higher-order correlations not easily accessible through classical methods.
Read more
QUIVER: Quantum-Informed Views for Enhanced Representations in Large ML Models
Summary
The paper introduces QUIVER (QUantum-Informed Views for Enhanced Representations), a novel approach that enhances classical machine learning models by incorporating quantum Fisher information. This method enriches classical data-driven features with a quantum perspective, providing a basis-independent summary of higher-order correlations through a variational quantum circuit (VQC). Unlike traditional feature augmentation, the quantum Fisher information matrix captures the intrinsic geometry of the learned quantum state manifold, revealing statistical structures that classical methods may overlook. The authors demonstrate the effectiveness of QUIVER on two distinct benchmark datasets: QM9 for molecular property prediction and JETCLASS for jet flavor classification at the Large Hadron Collider (LHC). The core contribution is the architecture-agnostic nature of QUIVER, allowing it to be integrated into various model architectures through targeted modifications. The results indicate that quantum-geometric features can significantly enhance performance metrics in machine learning tasks, even in the absence of fault-tolerant quantum hardware.
Methodology
The methodology involves mapping classical data into a parameterized quantum state using a variational quantum circuit (VQC) and extracting the quantum Fisher information matrix (QFIM). This quantum representation is then fused with classical data through cross-attention mechanisms in transformer architectures and modulation of graph messages in graph neural networks.
Results
The implementation of QUIVER led to consistent improvements in performance metrics on both QM9 and JETCLASS datasets, outperforming classical models such as the Particle Transformer and DimeNet++. The results highlight the effectiveness of incorporating quantum-geometric features into machine learning tasks.
Implications
The findings suggest that quantum-informed approaches can significantly enhance the capabilities of classical machine learning models, particularly in fields requiring the analysis of complex, high-dimensional data. This could lead to advancements in scientific analysis, particularly in high-energy physics and molecular chemistry, paving the way for future research that leverages quantum information theory in machine learning.
Regime-Arrival Uncertainty in Generalization Bounds under Distribution Shift
Theory
- Introduces a framework for analyzing generalization bounds under regime-switching environments.
- Quantifies the risk due to regime composition mismatch using a two-state Markov process.
- Establishes a connection between regime mismatch and future deployment risk through theoretical results.
- Empirical validation shows the framework's effectiveness in tracking deployment gaps, but highlights forecasting challenges.
Read more
Regime-Arrival Uncertainty in Generalization Bounds under Distribution Shift
Summary
This paper addresses the limitations of standard generalization bounds in machine learning, which typically assume static training and deployment distributions. It introduces a framework that accounts for regime-switching environments, specifically focusing on the mismatch between training and deployment distributions characterized by calm and crisis states. The author models the environment as a two-state Markov process and quantifies the additional risk introduced by regime composition mismatch. The paper provides an exact decomposition of future risk, linking it to differences in regime composition, and establishes a finite-sample upper bound on deployment risk. The analysis incorporates geometric beta-mixing dependence and introduces an effective sample size that adjusts for regime persistence. Empirical validation on synthetic and real equity index data demonstrates the framework's utility in diagnosing deployment failures, although it highlights the challenge of forecasting future regime compositions. The findings suggest that while the framework can help understand deployment risks, it does not serve as a forecasting tool, pointing to the need for improved methods to predict regime changes.
Methodology
The methodology involves modeling the training and deployment distributions as a mixture of regime-conditional distributions governed by a two-state Markov process. The paper derives theoretical results connecting regime mismatch to future risk, utilizing concepts from domain adaptation and dependent learning theory. It also employs the H∆H-divergence for quantifying regime discrepancy and introduces effective sample size adjustments based on geometric beta-mixing dependence.
Results
The paper derives an exact decomposition of future risk related to regime mismatch, establishes a finite-sample upper bound on deployment risk, and provides a minimax lower bound demonstrating the fundamental nature of the mismatch penalty. Empirical results show a strong correlation between the proposed penalty and actual deployment gaps in equity index data.
Implications
The framework can serve as a diagnostic tool for understanding deployment failures in machine learning models operating in dynamic environments. It highlights the importance of considering regime shifts in model training and deployment, and underscores the need for advancements in forecasting future regime compositions.
A Geometric Lens on Physics-Aligned Data Compression
Theory
Efficient ML
- Introduces a local geometric theory for understanding trade-offs in physics-informed data compression.
- Establishes that misalignment of preferred directions in latent space leads to fundamental limits on preserving physical observables and standard fidelity.
- Develops a practical alignment diagnostic to assess the effectiveness of compression strategies.
- Validates the theoretical framework through experiments in multiple scientific fields.
Read more
A Geometric Lens on Physics-Aligned Data Compression
Summary
This paper addresses the challenges of data compression in scientific contexts, particularly when using physics-informed losses to train learned compressors. The authors develop a local geometric theory that elucidates the trade-offs between preserving physical observables and standard reconstruction fidelity at fixed bitrates. They demonstrate that the interaction of latent-space sensitivities, induced by the entropy model, the physical observable, and the distortion metric, governs these trade-offs. The theory introduces a local tangent-space rate-distortion law and a practical alignment diagnostic based on dominant eigenspace overlap. Experimental validation across various scientific domains, including computational fluid dynamics and cosmological simulations, confirms the theory's predictions, highlighting the importance of alignment in achieving effective compression without sacrificing fidelity.
Methodology
The authors employ a theoretical framework based on local geometry to analyze the interactions between latent-space metrics related to rate, physical observables, and signal fidelity. They derive a tangent-space rate-distortion law and propose an alignment measure based on eigenspace overlap. Experiments are conducted across various scientific domains to test the theory's predictions.
Results
The study finds that at fixed bitrates, improving the preservation of physical observables often results in degraded standard reconstruction fidelity, particularly when the preferred directions for noise suppression are misaligned. The proposed alignment diagnostic correlates well with observed trade-offs in data and physics spaces, confirming the theoretical predictions.
Implications
This work has significant implications for the design of learned compression algorithms in scientific computing, suggesting that careful consideration of latent-space geometry can lead to more effective compression strategies that prioritize relevant physical observables without compromising overall fidelity.
ConTraIRL: Factorized Contrastive Abstractions for Transferable IRL
Reinforcement Learning
Robotics
- Introduces ConTraIRL, a framework for compositional reward transfer in IRL.
- Utilizes a dual-encoder architecture to factorize dynamics and goals into separate latent representations.
- Employs a dual contrastive objective to enhance the learning of invariant features.
- Demonstrates improved performance in few-shot transfer scenarios on continuous control tasks.
Read more
ConTraIRL: Factorized Contrastive Abstractions for Transferable IRL
Summary
The paper presents ConTraIRL, a novel framework for Inverse Reinforcement Learning (IRL) that addresses the challenge of reward transfer in environments with unseen combinations of dynamics and goals. Traditional IRL methods struggle with compositional generalization, as they often model rewards as a single function of state, leading to unreliable performance when faced with new dynamics-goal pairings. ConTraIRL overcomes this limitation by factorizing the representation of dynamics and goals into separate latent spaces using a dual-encoder architecture. This architecture employs a dual contrastive objective to ensure that the dynamics encoder learns goal-invariant structures while the goal encoder captures dynamics-invariant features. The framework is designed to facilitate reward inference in new contexts by leveraging few-shot supervision from partial expert states. Experiments conducted on continuous control benchmarks demonstrate that ConTraIRL significantly improves sample efficiency and reward recovery compared to existing transfer IRL methods, showcasing its effectiveness in handling unseen dynamics-goal combinations.
Methodology
ConTraIRL employs a dual-encoder architecture that maps observations into two distinct latent spaces: one for dynamics and another for goals. The training process involves a dual contrastive objective that encourages the dynamics encoder to learn structures invariant to goals and vice versa. Temporal alignment is also incorporated to ensure that the representations reflect comparable progress within behaviors. The framework uses few-shot supervision by leveraging partial expert states from target environments to anchor reward recovery.
Results
The experiments on MuJoCo benchmarks reveal that ConTraIRL consistently outperforms baseline methods in terms of reward recovery and transfer robustness, particularly in scenarios involving unseen dynamics-goal pairings. The results indicate significant improvements in sample efficiency and the ability to generalize across different contexts.
Implications
ConTraIRL has the potential to enhance the reliability of IRL in real-world applications where agents must adapt to new environments with varying dynamics and goals. This framework could be particularly beneficial in robotics and autonomous systems, where efficient learning from limited demonstrations is crucial.
A Practical Upper Bound on Selection Bias Effects in Medical Prediction Models
Theory
- The paper introduces a practical upper bound for assessing selection bias in medical prediction models.
- It emphasizes the importance of understanding model generalizability in high-stakes healthcare applications.
- The proposed method requires only partial observability of the selection mechanism and target distribution.
- Experiments demonstrate the method's validity using synthetic and real-world datasets.
Read more
A Practical Upper Bound on Selection Bias Effects in Medical Prediction Models
Summary
This paper addresses the critical issue of selection bias in machine learning models, particularly in healthcare settings where biased data can lead to harmful consequences. The authors propose a novel method to estimate an upper bound on the worst-case performance of prediction models when trained on biased data. Unlike existing approaches that require unrealistic access to the target distribution or complete knowledge of the selection mechanism, this method operates under the more practical assumption of partial observability. The authors validate their approach through experiments on synthetic data, semi-synthetic data from the All of Us Research Program, and real-world data from the MIMIC-IV database. The proposed method allows practitioners to assess model generalizability and make informed deployment decisions, ultimately contributing to safer and more effective healthcare applications.
Methodology
The authors develop a method that estimates the upper bound on expected model loss in the target population by utilizing a moment-matching heuristic over aggregate statistics. This approach allows for the identification of selection variables and the calculation of the upper bound without needing full access to the target distribution or complete knowledge of the selection mechanism.
Results
The experiments conducted show that the proposed method can effectively estimate the upper bound on model performance under selection bias, providing a reliable tool for practitioners to evaluate the generalizability of their models before deployment.
Implications
This work has significant implications for the deployment of machine learning models in healthcare, as it provides a framework for assessing and mitigating the risks associated with selection bias. By enabling better-informed decisions, the method can help improve patient outcomes and reduce the potential for discrimination against underrepresented groups in medical predictions.
Richer Representations for Neural Algorithmic Reasoning via Auxiliary Reconstruction
Graph Learning
Theory
Optimization
- Introduces an auxiliary reconstruction module to enhance encoder representation learning.
- Proposes a more expressive encoder architecture tailored for neural algorithmic reasoning tasks.
- Implements a feature-level masking strategy to capture intra-state feature dependencies.
- Demonstrates improved performance on the CLRS benchmark across diverse algorithmic tasks.
Read more
Richer Representations for Neural Algorithmic Reasoning via Auxiliary Reconstruction
Summary
This paper addresses the limitations of existing neural algorithmic reasoning (NAR) frameworks, particularly focusing on the encoder component within the encoder-processor-decoder architecture. While previous research has concentrated on enhancing the processor, the authors argue that the encoder's role in representation learning has been overlooked. They propose an auxiliary reconstruction module that encourages the encoder to recover input states from their encoded representations, thereby retaining critical information. This approach not only improves the encoder's performance but also enhances the overall reasoning capabilities of the model. The authors introduce a more expressive encoder architecture that incorporates a graph neural network and gated residual connections, allowing it to better capture the structural properties of algorithmic states. Additionally, they implement a feature-level masking strategy to further enrich the learned representations. The effectiveness of their method is evaluated on the CLRS benchmark, which presents diverse algorithmic tasks and challenges related to generalization across varying input sizes. The results demonstrate significant improvements in performance, indicating that richer representations lead to better algorithmic reasoning.
Methodology
The authors augment the standard encoder-processor-decoder framework by adding a reconstruction module and an auxiliary reconstruction objective. They design a new encoder architecture that utilizes graph neural networks and gated residual connections, along with a feature-level masking strategy to enhance representation learning. Experiments are conducted on the CLRS benchmark to evaluate the proposed methods.
Results
The proposed methods lead to improved performance on the CLRS benchmark, showcasing the effectiveness of richer representations in enhancing the reasoning capabilities of neural networks. The results indicate that the new encoder architecture and auxiliary tasks significantly contribute to better generalization and performance across diverse algorithmic tasks.
Implications
The findings suggest that enhancing encoder representations can lead to more effective neural algorithmic reasoning, which has potential applications in algorithm-guided planning, neural program synthesis, and reasoning over structured inputs like graphs and sequences.
Reinforcement Learning with Pairwise Preferences in Long-Term Decision Problems
Reinforcement Learning
Theory
Large Language Models
- Introduces the Markov decision contest framework for RL with pairwise preferences.
- Proves that stationary Markov policies are optimal compared to history-dependent policies.
- Establishes that solving the Markov decision contest is computationally feasible (in P).
- Presents Hedged Policy Iteration (HPI) as an efficient approximate solution method.
Read more
Reinforcement Learning with Pairwise Preferences in Long-Term Decision Problems
Summary
This paper addresses the limitations of traditional reinforcement learning (RL) methods that rely on scalar reward functions by introducing a new framework for RL using pairwise preferences. The authors propose the Markov decision contest as a novel problem model that allows for the specification of goals through pairwise preferences, which are often easier to define than scalar rewards. The paper demonstrates that stationary Markov policies are optimal among all history-dependent policies, and it establishes that solving a Markov decision contest can be done in polynomial time. Additionally, the authors present an approximate solution method called Hedged Policy Iteration (HPI), which converges to an optimal policy at a sublinear rate, specifically at a rate of 1/√K, where K is the number of iterations. The effectiveness of HPI is validated through experiments on thirteen high-dimensional decision problems with long time horizons, showing that it is significantly more learning-efficient than existing methods. The findings suggest that reinforcement learning with pairwise preferences is more tractable than previously believed, especially for applications involving large language models and other long-term decision-making scenarios.
Methodology
The authors develop a new problem model called the Markov decision contest and prove theoretical properties regarding the optimality of stationary Markov policies. They also design an approximate solution method, Hedged Policy Iteration (HPI), and validate its performance through empirical experiments on various decision problems.
Results
The paper shows that stationary Markov policies are optimal among all history-dependent policies and that the exact solution to a Markov decision contest can be computed in polynomial time. The HPI algorithm converges to an optimal policy at a sublinear rate, demonstrating improved learning efficiency in high-dimensional decision problems with long time horizons.
Implications
The findings have significant implications for the development of reinforcement learning algorithms that can effectively utilize pairwise preferences, particularly in applications such as fine-tuning large language models and other complex decision-making tasks that require long-term planning.
MedGym:A Unified Continuous-Time Benchmark for Dynamic Medical Treatment Reinforcement Learning
Reinforcement Learning
- MedGym models dynamic medical treatment recommendations in a continuous-time framework.
- It utilizes Physics-Informed Neural Networks to simulate patient evolution based on clinical data.
- The benchmark supports both offline and online reinforcement learning evaluations.
- MedGym allows for direct comparisons between discrete-time and continuous-time RL methods.
Read more
MedGym:A Unified Continuous-Time Benchmark for Dynamic Medical Treatment Reinforcement Learning
Summary
The paper introduces MedGym, a novel benchmark environment designed for dynamic medical treatment recommendations using reinforcement learning (RL). Traditional RL methods often rely on discrete-time models that fail to capture the complexities of real-world medical scenarios, where patient physiology evolves continuously and interventions occur at irregular intervals. MedGym addresses these challenges by modeling patient evolution in a continuous-time framework and utilizing Physics-Informed Neural Networks (PINNs) to create a configurable RL benchmark from clinical data. This benchmark allows for both offline and online RL evaluations, facilitating direct comparisons between discrete-time and continuous-time methods. MedGym emphasizes critical clinical perspectives such as personalization, safety during treatment trajectories, and the performance gap between offline learning and online deployment. By providing a standardized and realistic evaluation environment, MedGym aims to enhance the assessment of medical RL methods and their applicability in clinical settings.
Methodology
The methodology involves constructing a continuous-time simulation pipeline using clinical data, where patient state transitions are modeled through Physics-Informed Neural Networks (PINNs). This approach allows for irregularly timed and individualized treatment evaluations, enabling a more realistic representation of medical decision-making processes.
Results
MedGym provides a comprehensive evaluation framework that allows researchers to assess the effectiveness of RL methods in dynamic treatment scenarios. It highlights the differences in performance between discrete-time and continuous-time approaches and underscores the importance of individualized treatment strategies in healthcare.
Implications
The development of MedGym has significant implications for the field of medical reinforcement learning, as it offers a more realistic and informative benchmark for evaluating treatment policies. This can lead to improved patient outcomes by ensuring that RL methods are better aligned with the complexities of real-world medical practices.
Spectral-Progressive Thought Flow for Lightweight Multimodal Reasoning
Multimodal
Efficient ML
Computer Vision
- SpecFlow introduces a lightweight framework for multimodal spatial reasoning that reduces computational overhead.
- The framework utilizes a fixed-size discrete cosine space to represent intermediate visual thoughts, enhancing efficiency.
- Classifier-free guidance aligns visual updates with textual intent, allowing for stable memory usage during reasoning.
- Empirical results show a reduction in computation and memory costs by up to 2.1 times compared to traditional methods.
Read more
Spectral-Progressive Thought Flow for Lightweight Multimodal Reasoning
Summary
The paper introduces Spectral-Progressive Thought Flow (SpecFlow), a novel framework designed to enhance lightweight multimodal spatial reasoning. Traditional approaches often face challenges due to the extensive computational and memory demands associated with long chains of intermediate visual and textual thoughts. SpecFlow addresses these issues by representing intermediate visual thoughts in a fixed-size discrete cosine space, which allows for efficient energy compaction. This method preserves essential global layouts and relational structures while introducing high-frequency details only when necessary. The framework employs classifier-free guidance to align visual state evolution with linguistic intent, enabling updates to the visual workspace based solely on the current visual state and accumulated textual trace. This results in a bounded visual workspace that supports long-horizon inference with stable latency and memory usage, independent of reasoning depth. Empirical evaluations demonstrate that SpecFlow achieves competitive or superior reasoning performance while significantly reducing computation and key-value cache costs by up to 2.1 times.
Methodology
The methodology involves representing intermediate visual thoughts in a discrete cosine space to achieve energy compaction. Flow matching is employed to learn a velocity field for generating visual thoughts, with updates conditioned on the current textual thought and previous visual state. The framework maintains a fixed-size visual workspace, allowing for efficient multi-hop reasoning without appending visual tokens to the context.
Results
SpecFlow demonstrates competitive or superior reasoning performance in multimodal spatial tasks, achieving significant reductions in computation and key-value cache costs by up to 2.1 times compared to existing methods.
Implications
The implications of this research suggest that SpecFlow can be applied in various domains requiring efficient multimodal reasoning, such as robotics, autonomous systems, and interactive AI applications, where memory and computational efficiency are critical.
An Exploratory Study into using Machine-Learning for Fast Step-by-step Emulation of Numerical Mechanical Thrombectomy Simulations for Ischemic Stroke
Efficient ML
Robotics
Theory
- Machine learning can significantly speed up the emulation of mechanical thrombectomy simulations.
- Two out of three tested models demonstrated accurate predictions for individual simulation steps.
- Data augmentation techniques enhanced model performance.
- Models struggled with stability in complex geometries over longer simulation durations.
Read more
An Exploratory Study into using Machine-Learning for Fast Step-by-step Emulation of Numerical Mechanical Thrombectomy Simulations for Ischemic Stroke
Summary
This thesis investigates the potential of machine learning to emulate numerical mechanical thrombectomy simulations for treating ischemic stroke, aiming to enhance decision-making under time constraints. The study focuses on developing surrogate models that can predict the outcomes of mechanical thrombectomy simulations more rapidly than traditional numerical methods. Three machine learning models were trained on two different simulation datasets, which involved a simplified aspiration procedure with varying geometric complexities. The results indicate that two of the models successfully predicted individual simulation steps and achieved significant speed improvements, particularly when specific data augmentations were applied. However, the models exhibited instability when tasked with simulating complex geometries over extended periods. This research lays the groundwork for future advancements in creating stable and scalable machine learning methods for realistic numerical simulations in mechanical thrombectomy, potentially improving treatment outcomes for ischemic stroke patients.
Methodology
The study employed three different machine learning models trained on two simulation datasets related to mechanical thrombectomy. The models were evaluated based on their ability to predict simulation outcomes step-by-step, with a focus on performance improvements through data augmentation and generalization to unseen geometries.
Results
The findings revealed that two of the machine learning models could accurately predict individual steps of the simulations and provided substantial speedups compared to traditional methods. However, the models faced challenges in maintaining stability when simulating complex geometries over longer time frames.
Implications
This research suggests that machine learning can be a valuable tool in medical simulations, potentially leading to faster and more informed decision-making in emergency medical scenarios like ischemic stroke treatment. Future work could focus on improving model stability and scalability to enhance clinical applications.
When Hard Negatives Hurt: Bridging the Generative-Discriminative Gap in Hard Negative Synthesis for Retrieval
NLP
Large Language Models
Generative Models
- Hard negative mining has intrinsic limitations that affect retrieval performance.
- Naive incorporation of LLM-generated negatives can degrade retrieval outcomes.
- CausalNeg effectively bridges the generative-discriminative gap through targeted synthesis.
- The proposed methodology includes CoT-guided perturbation and entropy maximization.
Read more
When Hard Negatives Hurt: Bridging the Generative-Discriminative Gap in Hard Negative Synthesis for Retrieval
Summary
This paper addresses the limitations of hard negative mining in training retrieval models, particularly in the context of large language models (LLMs). While hard negative mining has been effective, it suffers from issues such as lack of diversity, uncontrolled intentionality, and false negative risks. The authors propose a novel approach called CausalNeg, which bridges the generative-discriminative gap by synthesizing hard negatives that are both targeted and interpretable. CausalNeg consists of two main components: (1) CoT-guided counterfactual perturbation, which constructs negatives by explicitly violating information requirements of queries, and (2) query-view entropy maximization, which ensures that generated negatives are dispersed across the similarity spectrum to minimize shortcut exploitation. The study reveals that naive integration of generated negatives can degrade performance due to a mismatch between generative and discriminative objectives. Through experiments on four retrieval benchmarks, CausalNeg demonstrates superior performance compared to traditional mining and naive generation methods, validating the effectiveness of causally grounded synthesis and entropy-regularized training.
Methodology
The authors developed CausalNeg, which includes two key modules: CoT-guided counterfactual perturbation for constructing negatives by strategically violating query requirements, and query-view entropy maximization to disperse generated negatives and minimize shortcut exploitation during training.
Results
CausalNeg outperformed both mining-only and naive generation baselines across four retrieval benchmarks, demonstrating the effectiveness of the proposed methods in addressing the generative-discriminative gap.
Implications
The findings suggest that integrating causally grounded synthesis and entropy-regularized training can enhance the training of retrieval models, potentially leading to more effective information retrieval systems in various applications.
E4GEN: Event-level Explainable Extreme-Enhanced Time-series Generation
Generative Models
Time Series
- E4GEN shifts the focus of extreme-aware time-series generation from sample-level to event-level, capturing the temporal dynamics of extreme events.
- The framework consists of three core components: E-Activator, E-Predictor, and E-Control, each addressing different aspects of extreme-event generation.
- E4GEN outperforms state-of-the-art models across multiple dimensions, including overall fidelity and extreme-event fidelity.
- The methodology includes a novel Data-Conditioned Training and Noise-Initiated Sampling mechanism to handle unavailable training labels.
Read more
E4GEN: Event-level Explainable Extreme-Enhanced Time-series Generation
Summary
The paper introduces E4GEN, an innovative framework designed for generating time-series data with a focus on extreme events. Traditional methods often prioritize overall distribution fidelity but neglect the accurate representation of extreme events, which are crucial for understanding high-impact phenomena. E4GEN addresses this gap by providing a systematic approach to control extreme-event generation through three main components: E-Activator, E-Predictor, and E-Control. E-Activator learns when to activate extreme-control signals during the generation process without disrupting general temporal structures. E-Predictor determines what control signals to enforce using a self-driven semantic prediction mechanism, allowing each sample to derive its own control signal based on latent extreme-event information. E-Control specifies how to implement these control signals through a trainable Extreme Control Network that integrates semantic signals into the denoising process. The authors evaluate E4GEN on six datasets using 17 metrics, demonstrating its superiority over existing models in terms of overall fidelity, extreme-event fidelity, and downstream utility. This work shifts the paradigm of extreme-aware time-series generation from a sample-level to an event-level perspective, enhancing the understanding and controllability of extreme events in time-series data.
Methodology
E4GEN employs a diffusion framework that incorporates three main components: E-Activator for determining when to activate extreme-control signals, E-Predictor for defining what control signals to enforce through self-driven semantic prediction, and E-Control for specifying how to integrate these signals into the generation process using a trainable Extreme Control Network.
Results
The evaluation of E4GEN on six datasets against nine baseline models using 17 metrics shows that it achieves superior performance in overall generation fidelity, extreme-event fidelity, and downstream utility, indicating its effectiveness in generating realistic and contextually relevant time-series data.
Implications
E4GEN has significant implications for various applications, including simulations, data augmentation, and hypothesis testing, where accurate modeling of extreme events is essential for understanding complex temporal dynamics in real-world data.
Balancing Learning Rates Across Layers: Exact Two-Step Dynamics and Optimal Scaling in Linear Neural Networks
Theory
Optimization
- Exact closed-form expressions for gradients and test loss after one and two gradient descent steps.
- Unequal learning rates are optimal in the initial training phase, transitioning to equal rates in later steps.
- Identification of critical learning rate thresholds that qualitatively change gradient dynamics.
- Theoretical framework applicable to two-layer and three-layer linear networks under random orthogonal initialization.
Read more
Balancing Learning Rates Across Layers: Exact Two-Step Dynamics and Optimal Scaling in Linear Neural Networks
Summary
This paper investigates the optimal selection of learning rates in two-layer and three-layer linear neural networks tasked with learning linear target functions. The authors derive exact closed-form expressions for gradients and test loss after one and two steps of gradient descent, allowing for a detailed analysis of early training dynamics. They demonstrate that learning rates should be scaled differently in the initial steps of training, with unequal learning rates being more effective initially, transitioning to equal learning rates as training progresses. This finding is supported by numerical experiments that validate the theoretical framework. The study emphasizes the significance of balancing layer-wise learning rates during early training to enhance convergence and generalization. The authors also identify critical thresholds for learning rates that significantly affect gradient dynamics and test loss behavior, providing insights that could inform the design of learning rate schedules in more complex neural architectures.
Methodology
The authors develop a framework for analyzing layer-wise learning rates in linear neural networks, utilizing a gradient decomposition approach that separates dominant components from residual terms. They derive closed-form expressions for test loss and gradients, allowing for a rigorous analysis of training dynamics across different layers.
Results
The study finds that symmetric learning rates are suboptimal after the first update in two-layer networks, while they become locally optimal after two updates in sufficiently wide networks. In three-layer networks, the analysis captures complex cross-layer interactions and identifies distinct scaling regimes for learning rates. The results indicate critical thresholds for learning rates that influence gradient dynamics and test loss behavior.
Implications
The findings provide a theoretical basis for understanding how layer-wise learning rates influence early generalization in neural networks. This can inform the design of more effective learning rate schedules in both linear and more complex neural network architectures, potentially improving convergence rates and generalization performance.
Large-scale Uncertainty Quantification for Latent Variable Models Using Subsampling Markov Chain Monte Carlo
Theory
Optimization
Generative Models
- Developed a statistical scaling limit theory for SGLD–Gibbs in latent variable models.
- Showed that global parameters converge to a diffusion-type limit while latent variables converge to a jump process.
- Provided explicit guidance for hyperparameter tuning to ensure meaningful uncertainty quantification.
- Demonstrated improved performance of SGLD–Gibbs over stochastic variational inference in empirical tests.
Read more
Large-scale Uncertainty Quantification for Latent Variable Models Using Subsampling Markov Chain Monte Carlo
Summary
This paper addresses the challenge of tuning hyperparameters in the Stochastic Gradient Langevin Dynamics combined with Gibbs updates (SGLD–Gibbs) for latent variable models (LVMs). The authors develop a statistical scaling limit theory for SGLD–Gibbs, providing a joint asymptotic limit for global parameters and latent variables under space-time rescaling. They demonstrate that global parameters converge to a diffusion-type limit, while latent variables converge to a jump process, highlighting the contribution of latent-variable randomness to the global parameters' stationary distribution. The findings lead to explicit guidance for hyperparameter tuning, ensuring meaningful uncertainty quantification. Empirical results indicate that SGLD–Gibbs with the proposed tuning guidance outperforms stochastic variational inference in terms of parameter estimation, uncertainty quantification, and predictive performance in applications such as mixture modeling and topic modeling.
Methodology
The authors employed a joint asymptotic analysis of global parameters and latent variables under appropriate space-time rescaling to derive a scaling limit theory for SGLD–Gibbs. They analyzed the convergence properties of the algorithm and used numerical experiments to validate their theoretical findings.
Results
The study found that SGLD–Gibbs with the proposed hyperparameter tuning leads to better parameter estimates and more reliable uncertainty quantification compared to stochastic variational inference. The joint jump-diffusion structure revealed how latent-variable randomness affects the global parameters' stationary distribution.
Implications
The findings can enhance the application of SGLD–Gibbs in large-scale Bayesian inference for latent variable models, improving the reliability of uncertainty quantification in various fields such as machine learning, statistics, and data science.
EEG-FuseFormer: A Transformer-Driven Feature Fusion Framework for Seizure Onset Prediction
Time Series
- EEG-FuseFormer integrates CNN-LSTM and ResNet-18 for enhanced seizure prediction.
- Achieves a mean recall of 98.85%, surpassing many existing methods.
- Demonstrates improved performance in cross-patient scenarios with target adaptation.
- Evaluates computational complexity across diverse hardware platforms.
Read more
EEG-FuseFormer: A Transformer-Driven Feature Fusion Framework for Seizure Onset Prediction
Summary
The paper presents EEG-FuseFormer, a novel transformer-based feature fusion framework aimed at improving seizure onset prediction in epilepsy patients. Given the unpredictability of seizures and the complexity of EEG signals, the authors propose a model that integrates features from both CNN-LSTM and ResNet-18 architectures. The CNN-LSTM captures spatial and temporal features directly from raw EEG signals, while ResNet-18 extracts features from the Short-Time Fourier Transform (STFT) representation of these signals. The fusion of these features is conducted through a transformer encoder, followed by predictions made via fully connected dense layers. The model was validated using the CHB-MIT dataset, achieving a mean recall of 98.85%, outperforming many existing state-of-the-art methods. The study also emphasizes the model's ability to generalize across different patients, demonstrating improved performance metrics such as recall, precision, and F1-score when employing target adaptation techniques in cross-patient validation scenarios. Additionally, the paper assesses the computational complexity of the model across various hardware platforms, highlighting the trade-off between performance and complexity.
Methodology
The methodology involves using a transformer-driven feature fusion model that combines features extracted from CNN-LSTM and ResNet-18 networks. The CNN-LSTM captures spatial and temporal dynamics from raw EEG signals, while ResNet-18 processes STFT representations. A transformer encoder is employed for feature fusion, followed by dense layers for final predictions. The model's performance is validated using the CHB-MIT dataset, focusing on cross-patient testing and adaptation techniques.
Results
The EEG-FuseFormer model achieved a mean recall of 98.85% on the CHB-MIT dataset, demonstrating superior performance compared to existing methods. The model also showed significant improvements in recall, precision, and F1-score metrics when using target adaptation in cross-patient validation, indicating its robustness and generalizability.
Implications
The findings suggest that EEG-FuseFormer could be a valuable tool for real-time seizure prediction, potentially improving the quality of life for epilepsy patients by providing timely alerts. The model's ability to generalize across patients may facilitate broader applications in clinical settings, enhancing personalized treatment strategies.
When Softmax Fails at the Top: Extreme Value Corrections for InfoNCE
Computer Vision
Theory
- The paper reveals the hidden statistical assumptions in the standard InfoNCE softmax formulation.
- It highlights the misalignment of these assumptions with normalized embedding spaces used in modern contrastive learning.
- WEINCE is introduced as a practical modification of InfoNCE that improves performance by addressing the treatment of hard negatives.
- The proposed method shows consistent improvements across multiple vision benchmarks.
Read more
When Softmax Fails at the Top: Extreme Value Corrections for InfoNCE
Summary
The paper addresses the limitations of the InfoNCE loss function commonly used in contrastive learning, particularly its reliance on the softmax function, which imposes a statistical assumption about the selection of top-scoring examples. The authors utilize extreme value theory to demonstrate that this assumption is often misaligned with the normalized embedding settings prevalent in modern contrastive learning frameworks. To rectify this, they propose WEINCE (Weibull-Enhanced InfoNCE), a modification that incorporates anchor-wise online batch statistics to blend standard softmax logits with a shortfall correction, without introducing additional trainable parameters. The authors validate WEINCE across five vision benchmarks, showing that it consistently enhances performance in frozen-feature evaluations. The findings suggest that a more accurate statistical treatment of hard negatives can significantly improve contrastive learning objectives.
Methodology
The authors apply extreme value theory to analyze the softmax assumption in InfoNCE, leading to the development of WEINCE. This method uses online batch statistics to interpolate between standard softmax logits and Weibull shortfall corrections, effectively adapting the loss function to better handle the distribution of hard negatives.
Results
WEINCE demonstrated consistent performance improvements in frozen-feature evaluations across five vision benchmarks, indicating that the proposed method effectively addresses the limitations of the traditional InfoNCE loss function.
Implications
The findings suggest that enhancing the statistical treatment of hard negatives can lead to better representation learning in self-supervised settings, potentially benefiting various applications in computer vision and beyond.
Beyond ℓ2-norm and ℓ∞-norm: A Curvature-Inspired ℓp-Norm Scheme for Deep Neural Networks
Optimization
- Introduction of a dynamic ℓp-norm scheme for DNN optimization.
- LPSGD and LPSGDM optimizers outperform traditional ℓ2 and ℓ∞ based methods.
- The proposed method adapts to curvature changes during training, enhancing convergence rates.
- Theoretical guarantees support the efficacy of the new optimizers in nonconvex settings.
Read more
Beyond ℓ2-norm and ℓ∞-norm: A Curvature-Inspired ℓp-Norm Scheme for Deep Neural Networks
Summary
This paper addresses the limitations of existing optimizers for deep neural networks (DNNs) that rely solely on ℓ2-norm or ℓ∞-norm, which do not adapt well to the varying curvature of the loss landscape during training. The authors propose a novel ℓp-norm scheme with a dynamic value of p, integrated into stochastic gradient descent (SGD) and SGD with momentum (SGDM), resulting in two new optimizers: ℓp-SGD (LPSGD) and ℓp-SGDM (LPSGDM). The approach begins with a larger p (p > 2) to mitigate the influence of high-curvature directions in the early training phase, transitioning to a smaller p (approaching 2) for more stable updates in flatter regions. The paper establishes theoretical guarantees for these algorithms, demonstrating an O(T −1/2) convergence rate in nonconvex settings. Extensive experiments on benchmark datasets like CIFAR-10, CIFAR-100, and ImageNet-1K with various DNN architectures (e.g., VGG-11, ResNet-18, ResNet-50) validate the effectiveness of the proposed optimizers, showcasing superior generalization performance compared to traditional methods.
Methodology
The authors developed a curvature-inspired ℓp-norm scheme that dynamically adjusts the value of p during training. This scheme was incorporated into SGD and SGDM, leading to the creation of LPSGD and LPSGDM. The methodology includes theoretical analysis of convergence rates and extensive empirical testing on various DNN architectures across multiple datasets.
Results
The experimental results indicate that LPSGD and LPSGDM achieve significantly higher accuracy and faster convergence rates compared to traditional optimizers based on ℓ2 and ℓ∞ norms. The proposed optimizers effectively manage the challenges posed by curvature anisotropy during different training phases, leading to better generalization performance.
Implications
The findings suggest that adaptive norm schemes can enhance the training efficiency and performance of deep neural networks, potentially influencing future research in optimization techniques for machine learning. This approach may be applicable to various domains requiring robust DNN training, including computer vision and natural language processing.
Demystifying the Optimal Fair Classifier in Multi-Class Classification
Theory
Optimization
- Introduces an analytically tractable formulation for optimal fair classifiers in multi-class settings.
- Develops two algorithms: an in-processing method using a reduction approach and a post-processing method using plug-in estimation.
- Provides theoretical guarantees showing that both methods are statistically consistent with the optimal accuracy-fairness equilibrium.
- Demonstrates superior performance in balancing accuracy and fairness compared to existing methods through extensive experiments.
Read more
Demystifying the Optimal Fair Classifier in Multi-Class Classification
Summary
This paper addresses the challenges of achieving fairness in multi-class classification tasks, where existing bias mitigation techniques are primarily designed for binary classification. The authors focus on two main issues: characterizing the optimal accuracy-fairness frontier in multi-class settings and developing practical algorithms to achieve this optimum. They introduce a probabilistic formulation of the optimal classifier under fairness constraints and propose two attribute-blind algorithms: an in-processing method that modifies training objectives and a post-processing method that fine-tunes output probabilities. Theoretical analysis shows that both methods converge to the optimal accuracy-fairness Pareto frontier. Experimental results on various datasets demonstrate that the proposed methods outperform existing approaches in balancing accuracy and fairness, providing a flexible framework for fair classification in multi-class scenarios.
Methodology
The authors propose a novel group-fairness calibration framework named OptFair, which includes an in-processing method that reduces the task to cost-sensitive classification problems and a post-processing method that reformulates the problem as a convex optimization task using a plug-in estimator. They also introduce an entropic regularizer to make the optimal classifier analytically tractable.
Results
The experimental results indicate that the OptFair framework achieves a more controllable balance between accuracy and fairness, outperforming existing methods across multiple real-world datasets.
Implications
The findings suggest that the proposed methods can be effectively applied in high-stakes decision-making domains such as healthcare, finance, and criminal justice, where fairness is crucial. The framework allows practitioners to adjust the accuracy-fairness trade-off according to specific requirements.
How Neural Losses Shape VAE Latents
Generative Models
Theory
Optimization
- Neural reconstruction losses reduce the information content in VAE latents compared to pointwise squared error.
- The geometry of the latent space is altered by the choice of reconstruction loss, leading to more isotropic representations.
- Perceptual and adversarial losses encourage a uniform distribution of uncertainty across latent dimensions.
- The rate-distortion tradeoff is insufficient to fully understand VAE behavior; a more nuanced approach is necessary.
Read more
How Neural Losses Shape VAE Latents
Summary
This paper investigates the impact of different reconstruction losses on the latent space dynamics of Variational Autoencoders (VAEs). It challenges the conventional use of pointwise likelihood in the β-VAE objective, highlighting that modern VAEs often incorporate perceptual and adversarial losses. The authors demonstrate that these neural losses alter the rate-distortion problem, leading to reduced information storage in latent representations and changing the geometry of the latent space. Specifically, they prove that augmenting pointwise reconstruction with neural losses results in weaker distortion measures, which consequently lowers the amount of information retained in the latents. Furthermore, they show that these neural losses promote a more isotropic distribution of uncertainty across latent dimensions, contrasting with the anisotropic profiles typically induced by pointwise squared error losses. The findings suggest that the choice of distortion metric significantly influences the optimization landscape of VAEs, advocating for a more mechanistic understanding of how different losses affect latent representation.
Methodology
The authors conducted both theoretical proofs and empirical evaluations to analyze the effects of various reconstruction losses on the latent space of VAEs. They compared the performance of traditional pointwise squared error loss against perceptual and adversarial losses, focusing on how these choices influence the rate-distortion characteristics and the geometry of the learned latent representations.
Results
The study found that neural losses, such as perceptual and adversarial objectives, lead to a reduction in the KL divergence at convergence, indicating that less information is stored in the latent representations. Additionally, the geometry of the latent space becomes more isotropic with these losses, resulting in a more uniform distribution of uncertainty across the latent dimensions.
Implications
These findings have significant implications for the design and training of VAEs, suggesting that practitioners should carefully consider the choice of reconstruction loss as it can fundamentally alter the latent space dynamics and the information captured by the model. This could influence applications in generative modeling, representation learning, and data compression.
TiWeaver: Unified Temporal Dynamics Modeling via Contextual Patching
Time Series
Graph Learning
- TiWeaver addresses the limitations of fixed patching strategies in multivariate time series forecasting.
- The G2AT method allows for adaptive segmentation of time series into coherent patches.
- FADE effectively models fine-grained asynchronous inter-channel dependencies.
- The framework achieves state-of-the-art performance across diverse datasets.
Read more
TiWeaver: Unified Temporal Dynamics Modeling via Contextual Patching
Summary
The paper presents TiWeaver, a novel framework for multivariate time series (MTS) forecasting that addresses the challenges posed by diverse temporal dynamics and irregularities in time series data. Traditional forecasting methods often rely on fixed patching strategies that fail to adapt to the unique characteristics of different channels, leading to inaccurate predictions. TiWeaver introduces a Graph-Guided Adaptive Tokenizer (G2AT) that segments time series into contextually coherent patches based on temporal density and representation consistency. Additionally, it employs a Fine-grained Asynchronous Dependency Extractor (FADE) to capture fine-grained asynchronous inter-channel dependencies while also considering long-term historical dependencies. The framework is evaluated on 12 real-world datasets, demonstrating its ability to outperform existing methods by up to 25%, showcasing its robustness and effectiveness across various domains and data characteristics.
Methodology
TiWeaver employs two main components: the Graph-Guided Adaptive Tokenizer (G2AT) for dynamic patching of time series based on contextual coherence, and the Fine-grained Asynchronous Dependency Extractor (FADE) for modeling inter-channel dependencies and long-term historical influences. This dual approach allows the model to adaptively capture the unique temporal dynamics of multivariate time series data.
Results
The evaluation of TiWeaver on 12 real-world time series datasets revealed that it outperformed existing forecasting methods by up to 25%, highlighting its effectiveness in handling diverse temporal dynamics and irregularities in the data.
Implications
The TiWeaver framework has significant implications for various applications that rely on accurate multivariate time series forecasting, such as weather prediction, stock market analysis, health monitoring, and transportation systems. Its ability to adaptively model complex temporal dependencies can enhance decision-making processes across these domains.
Conformal Language Modeling via Posterior Sampling
NLP
Large Language Models
Generative Models
- Introduces a novel calibration procedure for LLMs that influences the sampling distribution directly.
- Addresses the limitations of post-hoc filtering methods by ensuring outputs are coherent and useful.
- Empirical evaluations show significant improvements in downstream utility for tasks with strong claim interdependencies.
- Maintains statistical guarantees while enhancing the factuality of generated outputs.
Read more
Conformal Language Modeling via Posterior Sampling
Summary
This paper addresses the issue of hallucinations in Large Language Models (LLMs) by proposing a novel approach to conformal language modeling through posterior sampling. Traditional methods for reducing hallucinations often involve post-hoc filtering, which can lead to incoherent or incomplete outputs. The authors suggest a new calibration procedure that directly influences the sampling distribution of the LLM, moving probability mass towards more reliable and useful responses. By conditioning the sampling on high-confidence regions of the output space, the proposed method maintains fluency and coherence while controlling risk. The authors validate their approach through empirical studies on open-ended biography generation and mathematical problem solving, demonstrating that their method achieves higher downstream utility compared to existing post-hoc filtering techniques while maintaining statistical guarantees.
Methodology
The authors reformulate conformal language modeling as posterior sampling from a calibrated family of distributions. They develop an off-policy calibration procedure to select a posterior threshold that conditions the LLM's output on high-confidence regions. The calibration process involves modeling the calibration objective as an empirical distribution posterior and addressing challenges related to estimating conditional normalizing constants and ensuring monotonicity in the objective.
Results
The proposed method was evaluated in two case studies: open-ended biography generation and mathematical reasoning. Results indicate that the method effectively tracks the desired level of factuality while significantly improving the utility of the generated outputs compared to traditional post-hoc filtering methods, particularly in scenarios where claims are interdependent.
Implications
This work has significant implications for deploying LLMs in high-stakes applications, such as medical diagnosis and legal analysis, where the accuracy and reliability of generated information are critical. The approach could enhance the performance of LLMs in various domains by reducing hallucinations and improving the quality of generated content.
G2LoRA: Gradient Orthogonal Low-Rank Adaptation Framework for Graph Continual Learning on Text-Attributed Graphs
Graph Learning
NLP
Large Language Models
- G2LoRA effectively mitigates catastrophic forgetting in graph continual learning.
- The framework addresses the challenges of heterogeneous downstream tasks and differing encoder sensitivities.
- Category-aware gradient projection resolves conflicting updates and enhances knowledge transfer.
- G2LoRA shows superior performance compared to existing methods on benchmark datasets.
Read more
G2LoRA: Gradient Orthogonal Low-Rank Adaptation Framework for Graph Continual Learning on Text-Attributed Graphs
Summary
The paper presents G2LoRA, a novel framework designed to enhance graph continual learning specifically for Text-Attributed Graphs (TAGs). The authors identify significant issues with catastrophic forgetting and task interference when using existing models like LLM-as-Aligner, which align graph and text modalities through contrastive learning. G2LoRA addresses two primary challenges: the shifting optimization objectives due to heterogeneous downstream tasks and the differing sensitivities of graph and text encoders to adaptation. The framework unifies various task levels under a single graph-text alignment objective and employs category-aware gradient projection to mitigate task interference while promoting positive knowledge transfer. Additionally, G2LoRA introduces gradient magnitude modulation to synchronize update rates between graph and text encoders. Experimental results demonstrate that G2LoRA consistently outperforms baseline methods across different architectures, showcasing improved continual performance and transferability.
Methodology
G2LoRA employs a continual learning framework that integrates category-aware gradient projection and gradient magnitude modulation. This approach allows for consistent optimization across different task types while reducing interference and promoting positive transfer of knowledge. The framework is tested on benchmark datasets to evaluate its effectiveness against existing models.
Results
Extensive experiments reveal that G2LoRA outperforms strong baseline models in terms of continual learning performance and transferability. The results indicate a significant reduction in catastrophic forgetting and improved alignment between graph and text representations.
Implications
The findings suggest that G2LoRA can be effectively applied in dynamic environments where graph data evolves over time, such as social networks and e-commerce systems. The framework's ability to maintain knowledge across tasks can enhance applications in various domains requiring continual learning.
MOSAIC: Efficient Mixture-of-Agent Scheduling via Adaptive Aggregation and Inference Concurrency
NLP
Large Language Models
Efficient ML
- MOSAIC optimizes Mixture-of-Agents scheduling to enhance efficiency on limited GPU resources.
- The framework employs an Integer Linear Program (ILP) for expert placement and prompt assignment.
- Confidence-aware adaptive aggregation reduces the need for a final aggregator LLM in consensus scenarios.
- MOSAIC achieves up to 2.5× speedup in expert-stage and 4.23× in aggregator-stage processing.
Read more
MOSAIC: Efficient Mixture-of-Agent Scheduling via Adaptive Aggregation and Inference Concurrency
Summary
The paper presents MOSAIC, a novel scheduling framework designed to optimize the execution of Mixture-of-Agents (MoA) systems that utilize multiple expert large language models (LLMs) for improved reasoning accuracy. Traditional scheduling methods face challenges such as GPU idling and throughput collapse due to imbalanced loads and variability in output lengths from different models. MOSAIC addresses these issues by formulating an Integer Linear Program (ILP) that optimally assigns models to workers and manages prompt counts, while also implementing a confidence-aware adaptive aggregation mechanism that reduces reliance on a final aggregator LLM when experts converge on a majority answer. The framework is evaluated on a 4-GPU system and demonstrates significant speedups in processing times while maintaining accuracy, thus providing a more efficient solution for MoA workloads.
Methodology
MOSAIC utilizes an Integer Linear Program (ILP) to optimize the scheduling of expert models across available GPU workers, focusing on efficient prompt assignment and selective replication of heavy reasoning experts. Additionally, it incorporates a confidence-aware adaptive aggregation mechanism to streamline the consensus process among expert outputs.
Results
In experiments conducted on a 4-GPU setup, MOSAIC achieved speedups of up to 2.54× in the expert stage and 4.23× in the aggregator stage, resulting in a 1.71-2.34× reduction in end-to-end processing time while maintaining accuracy levels comparable to baseline schedulers.
Implications
The findings suggest that MOSAIC can significantly enhance the efficiency of multi-expert systems in various applications, particularly in domains requiring high reasoning capabilities, such as medical and commonsense reasoning. This could lead to broader adoption of Mixture-of-Agents systems in real-world applications where computational resources are limited.
Training a Predictive Coding Network on ImageNet using Equilibrium Propagation
Computer Vision
Efficient ML
Theory
- First demonstration of Predictive Coding Networks and Equilibrium Propagation at ImageNet scale.
- Achieved a top-5 test error rate of 13.23%, close to the backpropagation baseline of 12.2%.
- Nudging-based perturbation method outperforms clamping-based methods on challenging datasets.
- Results challenge assumptions about the effectiveness of random and centered schemes in EP.
Read more
Training a Predictive Coding Network on ImageNet using Equilibrium Propagation
Summary
This paper presents a novel training method for Predictive Coding Networks (PCNs) using Equilibrium Propagation (EP), a physics-based training framework. Traditionally, EP has been limited to small-scale applications, but this work demonstrates its effectiveness at a larger scale by training a 10-layer convolutional PCN (VGG10) on the full ImageNet dataset. The authors combine a centered variant of EP with a new equilibration scheme tailored for PCNs, achieving a test error rate of 13.23% on the top-5 classification task, which is competitive with the 12.2% baseline achieved by backpropagation. This marks the first successful application of both PCNs and EP at the scale of ImageNet, suggesting that the challenges in scaling EP may stem more from the computational properties of the systems rather than limitations of the EP framework itself. The study also explores the effectiveness of different perturbation strategies for training PCNs, revealing that a nudging-based approach significantly outperforms clamping-based methods on complex datasets like ImageNet.
Methodology
The authors developed an EP-based training method for PCNs, integrating a centered variant of EP with a novel equilibration scheme. They conducted extensive experiments on a 5-layer convolutional PCN across multiple vision datasets, performing a sensitivity analysis of EP hyperparameters, including perturbation methods and finite difference schemes.
Results
The study found that the nudging-based perturbation method significantly improved performance on complex datasets, achieving a top-5 test error rate of 13.23% on full-resolution ImageNet. This performance is competitive with traditional backpropagation methods, indicating the potential of EP for large-scale applications.
Implications
The findings suggest that EP can be effectively scaled for larger datasets and models, potentially leading to more efficient training methods for neural networks. This could have implications for the development of energy-efficient machine learning systems and neuromorphic computing platforms.
Analyzing Stream Collapse in Hyper-Connections: From Diagnosis to Mitigation
NLP
Large Language Models
Theory
- Identified a failure mode in HC where multiple streams lead to reliance on a dominant stream.
- Demonstrated that residual mixing often remains close to identity, limiting effective multi-stream usage.
- Introduced Learned Stream Scaling (LSS) as a method to mitigate stream collapse and improve model performance.
- Showed that breaking symmetry at initialization can enhance the specialization of streams.
Read more
Analyzing Stream Collapse in Hyper-Connections: From Diagnosis to Mitigation
Summary
This paper investigates the phenomenon of stream collapse in Hyper-Connections (HC), a framework that replaces the single residual stream in Transformers with multiple parallel streams. The authors analyze how these streams specialize during training and whether they maintain balanced usage or favor a dominant stream. Through fine-grained diagnostics, they discover that after an initial seeding phase, the residual mixing often remains close to identity, leading to underutilization of the multi-stream architecture. The study identifies a failure mode where one stream becomes dominant, concentrating both signal and features, while the others remain less active. To mitigate this issue, the authors propose a method called Learned Stream Scaling (LSS), which introduces a controlled symmetry break at the stream initialization stage. This modification helps reduce the dominant behavior of a single stream and enhances performance across various HC model variants. The findings suggest that breaking symmetry can lead to more effective utilization of multiple streams in HC architectures.
Methodology
The authors conducted experiments using mHC-lite models with four streams, analyzing their performance on datasets such as OpenWebText, WikiText-103, and C4. They employed diagnostics to evaluate the residual mixing behavior and the distribution of signal and features across streams. The impact of the proposed LSS method was also assessed through comparative performance evaluations.
Results
The results indicated that residual mixing in trained models predominantly remained near identity, suggesting limited cross-stream information exchange. The introduction of LSS effectively reduced the dominance of a single stream and improved perplexity across HC variants, demonstrating the potential for enhanced multi-stream utilization.
Implications
The findings have significant implications for the design of Transformer architectures, particularly in improving the efficiency and effectiveness of multi-stream models. By addressing the issue of stream collapse, the proposed methods can lead to better performance in language modeling tasks and potentially other applications involving HC frameworks.
Auditable Climate Risk Intelligence from Fragmented ESG Data: Deterministic Orchestration and Imbalance-Aware Learning for Scope 1-3 Validation
Optimization
Interpretability
Time Series
- Introduction of a deterministic orchestration framework for ESG validation.
- Development of an imbalance-aware learning workflow that incorporates SMOTE and ensemble methods.
- Creation of a synthetic ESG validation benchmark for reproducibility and evaluation.
- Implementation of a governance-oriented explainability architecture for audit reconstruction.
Read more
Auditable Climate Risk Intelligence from Fragmented ESG Data: Deterministic Orchestration and Imbalance-Aware Learning for Scope 1-3 Validation
Summary
This paper addresses the challenges corporations face in validating Environmental, Social, and Governance (ESG) data amidst fragmented reporting environments for Scope 1, 2, and 3 emissions. The authors propose a deterministic climate-risk intelligence framework that integrates various methodologies to enhance ESG validation infrastructure. Key components of the framework include single-source-of-truth orchestration, temporal anomaly detection, imbalance-aware ensemble learning, and explainability-oriented governance. The authors construct a synthetic ESG validation benchmark that reflects real-world reporting characteristics, which is combined with public climate-risk datasets to improve validation traceability. The methodology employs temporal drift analysis, SMOTE-based optimization for rare events, and TreeSHAP for interpretability. The framework is evaluated against traditional statistical classifiers and anomaly detection methods, demonstrating improved performance in terms of recall, F1 score, ROC-AUC, and governance-oriented metrics. The findings suggest a shift from passive ESG reporting to a more structured governance approach, enhancing operational auditability and reproducibility in regulated environments.
Methodology
The proposed methodology combines deterministic orchestration with temporal anomaly detection and imbalance-aware ensemble learning. It utilizes SMOTE for rare-event optimization and TreeSHAP for interpretability, ensuring governance-oriented audit infrastructure. The framework is validated through comparative experimentation against various baseline models using cross-validated evaluation metrics.
Results
The framework outperformed traditional statistical classifiers and anomaly detection methods, achieving higher recall, F1 scores, and ROC-AUC metrics. The audit trace completeness metric demonstrated the ability to reconstruct provenance chains for flagged anomalies, indicating improved governance capabilities.
Implications
The findings have significant implications for corporations aiming for net-zero commitments, as they provide a structured approach to ESG data validation that enhances auditability and governance. This framework can be applied in regulated environments to ensure compliance and improve decision-making related to climate risk.
Right Makes Might: Aligning Verified Hidden States Empowers RL Reasoning
Reinforcement Learning
Large Language Models
Optimization
- Identifies a geometric phenomenon in RL-trained reasoning models where correct rollouts' hidden states converge at the anchor token.
- Proposes Hidden-Align, an auxiliary loss function that aligns hidden states of correct rollouts during RL training.
- Demonstrates significant performance improvements on mathematical reasoning benchmarks without additional training or inference costs.
- Provides systematic ablation studies to validate the design choices and effectiveness of Hidden-Align.
Read more
Right Makes Might: Aligning Verified Hidden States Empowers RL Reasoning
Summary
The paper introduces a novel approach called Hidden-Align, which enhances Reinforcement Learning from Verifiable Rewards (RLVR) for mathematical reasoning in large language models (LLMs). Traditional RLVR methods reduce correct rollouts to a single reward bit, neglecting the geometric relationships among hidden states. The authors identify that at the anchor token (the position before the answer marker), correct rollouts exhibit a high cosine similarity (≈0.84), indicating a convergence of reasoning paths. By aligning the last-layer hidden states of correct rollouts at this anchor token during RL training, Hidden-Align encourages the model to distill a unified representation of correct decisions. This method incurs no additional overhead during training or inference. The authors validate Hidden-Align on eight mathematical reasoning benchmarks, demonstrating significant improvements in performance over the DAPO baseline across various model sizes (Qwen3-1.7B, 4B, and 14B), with consistent gains in pass@1 and pass@k metrics. Systematic ablation studies confirm the effectiveness of their approach and its unique configuration.
Methodology
The authors analyze the pairwise cosine similarity of hidden states from correct rollouts during reasoning. They propose Hidden-Align, which maximizes this similarity at the anchor token through an auxiliary loss function during RL training. The method is designed to be efficient, adding no overhead to training or inference processes.
Results
Hidden-Align improves average pass@1 scores over the DAPO baseline by 3.8, 6.2, and 5.4 percentage points for model sizes of 1.7B, 4B, and 14B, respectively. The method also shows consistent improvements in pass@k metrics across all scales, supported by comprehensive ablation studies.
Implications
The findings suggest that aligning hidden states in RL training can significantly enhance the reasoning capabilities of LLMs, potentially leading to better performance in tasks requiring mathematical reasoning. This approach could be applied to other domains where representation alignment is beneficial.
CARE-RL: Capability-Aware Reinforcement Learning for Mitigating Cross-Domain Conflicts
Reinforcement Learning
Large Language Models
Optimization
- CARE-RL combines protocol-aware reward generation with capability-aware optimization to mitigate cross-domain conflicts.
- The PA-GRM constructs adaptive evaluation protocols for non-verifiable tasks, enhancing reward reliability.
- DACSP modulates updates to preserve previously acquired capabilities while adapting to new domains.
- CARE-RL achieves superior performance compared to existing multi-domain RL methods.
Read more
CARE-RL: Capability-Aware Reinforcement Learning for Mitigating Cross-Domain Conflicts
Summary
The paper introduces CARE-RL, a novel framework designed to tackle the challenges of multi-domain reinforcement learning (RL), particularly focusing on non-verifiable tasks and capability interference across domains. Traditional reinforcement learning with verifiable rewards has shown success in reasoning-oriented large language models (LLMs), but extending this approach to multi-domain settings remains problematic due to unreliable rewards and conflicting capabilities. CARE-RL addresses these issues through two main components: the Protocol-Aware Generative Reward Model (PA-GRM) and Direction-Aware Capability Subspace Projection (DACSP). PA-GRM generates adaptive evaluation protocols for diverse open-ended tasks, ensuring that rewards are more aligned with the specific requirements of each task. DACSP modulates updates in the RL process by preserving beneficial capabilities while suppressing conflicting ones. The experimental results demonstrate that CARE-RL outperforms standard multi-domain RL baselines, achieving significant improvements in performance metrics across various benchmarks, including math, chat, and instruction-following tasks.
Methodology
The methodology involves two key components: PA-GRM, which constructs prompt-level evaluation protocols to generate trace-conditioned rewards for non-verifiable tasks, and DACSP, which extracts historical capability directions from prior RL stages to modulate updates, amplifying aligned components and suppressing conflicting ones.
Results
CARE-RL consistently outperformed standard multi-domain RL baselines, achieving Total Avg scores of 47.9 and 50.7 on Qwen2.5-7B and Qwen3-4B benchmarks, respectively, indicating a significant improvement in handling multi-domain tasks.
Implications
The proposed CARE-RL framework has the potential to enhance the performance of large language models in multi-domain applications, making it applicable in areas such as automated reasoning, natural language understanding, and complex decision-making tasks.
Testing the Test: Score-Direction Instability in Class-Split Anomaly Detection
Computer Vision
Theory
- Identifies a failure mode in class-split evaluation for anomaly detection due to class-dependent score-direction instability.
- Introduces a training-free diagnostic tool (neighborhood class leakage) to predict when class-split benchmarking is unreliable.
- Empirically validates the diagnostic across various datasets and representations, highlighting the impact of class overlap.
- Suggests that current benchmarks may reward methods exploiting dataset-specific quirks rather than reflecting true anomaly detection capabilities.
Read more
Testing the Test: Score-Direction Instability in Class-Split Anomaly Detection
Summary
This paper investigates the limitations of within-dataset class-split evaluation in anomaly detection (AD), particularly when the anomaly class overlaps with the normal class in representation space. The authors demonstrate that this overlap can lead to instability in anomaly scores, causing them to collapse towards chance or even invert, depending on the unknown anomaly class. They introduce a training-free diagnostic tool called neighborhood class leakage, which quantifies the degree of overlap between normal and anomalous classes in a given representation space. The study empirically validates this diagnostic across multiple datasets (Fashion-MNIST, CIFAR-10, and Imagenette) and representations (pixel space and VAE latent space), showing that high values of the ill-posedness index predict unreliable class-split benchmarking outcomes. The findings suggest that class-split AD benchmarks should be viewed as geometry-dependent stress tests rather than definitive measures of anomaly detection capability.
Methodology
The authors propose a diagnostic index based on local neighborhood class leakage, which measures the extent of overlap between normal and anomalous classes in representation space. They analyze datasets using both pixel and VAE latent representations, employing multiple scoring methods (kNN, Isolation Forest, Local Outlier Factor) to evaluate the stability of anomaly scores.
Results
The study finds that high values of the neighborhood class leakage index correlate with instability in anomaly detection scores, including AUROC collapse and inversion. This indicates that class-split evaluations can yield misleading results when there is significant overlap between normal and anomalous classes.
Implications
The findings imply that researchers and practitioners should be cautious when interpreting results from class-split anomaly detection benchmarks. The proposed diagnostic can help identify unreliable evaluations, leading to more robust assessments of anomaly detection methods and potentially guiding the development of better evaluation protocols.
RMPrior: Bridging Propagation Priors and Diffusion Refinement for Efficient Radio Map Construction
Generative Models
Efficient ML
Theory
- Introduction of a mid-start diffusion sampling strategy that leverages propagation priors for radio map construction.
- Demonstrated a significant reduction in inference time (2.01× speedup) while improving reconstruction fidelity metrics.
- Theoretical analysis establishes conditions under which the proposed method enhances reconstruction quality.
- Prior quality significantly impacts reconstruction outcomes, with sensitivity increasing under aggressive truncation.
Read more
RMPrior: Bridging Propagation Priors and Diffusion Refinement for Efficient Radio Map Construction
Summary
This paper presents RMPrior, a novel approach that integrates propagation priors with diffusion refinement to enhance the efficiency of radio map construction. Traditional diffusion models, while capable of high-fidelity radio map generation through iterative denoising, face challenges in practical applications due to their high sampling costs, especially in dynamic wireless environments where frequent updates are necessary. The authors propose a mid-start sampling strategy that utilizes a matched propagation prior, perturbing it to an intermediate diffusion timestep. This allows the pretrained diffusion model to focus on refining the radio map based on existing scene knowledge rather than starting from pure Gaussian noise. Theoretical analyses provide insights into the initialization gap and conditions under which the proposed method improves reconstruction fidelity. Experiments conducted on the IRT4HighRes dataset demonstrate that the RMPrior method achieves a 2.01× speedup in inference time while simultaneously enhancing various fidelity metrics (NMSE, RMSE, SSIM, PSNR) compared to traditional full-step approaches. Additionally, an ablation study highlights the importance of prior quality, revealing that reconstruction quality is closely tied to the fidelity of the propagation model used for initialization.
Methodology
The authors propose a mid-start sampling strategy that begins with a matched propagation prior, which is perturbed to an intermediate diffusion timestep. The pretrained diffusion model then performs reverse denoising only over the remaining trajectory, focusing on multipath-aware refinement rather than full reconstruction from noise. The method is evaluated on the IRT4HighRes dataset, and theoretical analyses are provided to support the findings.
Results
The proposed RMPrior method achieved a 2.01× speedup in average inference latency while improving NMSE, RMSE, SSIM, and PSNR metrics compared to the full-step baseline. The prior-quality ablation study confirmed that reconstruction quality correlates with prior fidelity, with increased sensitivity observed under shorter reverse trajectories.
Implications
The findings suggest that integrating propagation priors with diffusion models can significantly enhance the efficiency and accuracy of radio map construction, making it more feasible for dynamic wireless systems. This approach could be applied to various scenarios requiring frequent updates of spatial representations in wireless networks.
IdEst: Assessing Self-Supervised Learning Representations via Intrinsic Dimension
Computer Vision
Theory
Efficient ML
- IdEst provides an unsupervised criterion for evaluating SSL representations based on intrinsic dimension.
- The method shows strong correlation with downstream performance across multiple datasets and SSL objectives.
- IdEst enables efficient hyperparameter selection without requiring labeled data, reducing computational costs.
- Intrinsic dimensionality is highlighted as a significant geometric proxy for representation quality in SSL.
Read more
IdEst: Assessing Self-Supervised Learning Representations via Intrinsic Dimension
Summary
This paper introduces IdEst (ID Estimation for SSL using Minimum Spanning Tree), a novel method for evaluating self-supervised learning (SSL) representations through intrinsic dimension (ID) estimation. The authors argue that traditional evaluation methods, such as linear probing, are computationally expensive and provide limited insights into the geometric structure of representation spaces. By leveraging the Minimum Spanning Tree dimension estimator (dimMST), IdEst offers a robust and efficient way to estimate the intrinsic dimension of SSL representations. The study demonstrates that IdEst correlates strongly with downstream linear probe performance across various datasets and SSL pretraining objectives. Additionally, it facilitates hyperparameter selection without the need for labeled data, significantly reducing computational costs compared to supervised methods. The findings suggest that intrinsic dimensionality serves as a valuable geometric proxy for assessing the quality of SSL representations, complementing existing evaluation protocols.
Methodology
The authors propose IdEst, which estimates the intrinsic dimension of SSL representations using the Minimum Spanning Tree dimension estimator (dimMST). This method balances local and global information, making it robust to noise and variations in sampling density. The study evaluates IdEst's performance against traditional methods and demonstrates its effectiveness across diverse SSL architectures and datasets.
Results
The results indicate that IdEst correlates strongly with downstream linear probe performances, with correlation coefficients (Spearman's ρ and Kendall's τ) showing significant negative values across various benchmarks. This suggests that lower intrinsic dimensions are associated with better downstream accuracy. Furthermore, IdEst proves to be a practical tool for hyperparameter selection, achieving this with a fraction of the computational cost of supervised alternatives.
Implications
The introduction of IdEst could revolutionize the evaluation of SSL representations by providing a more efficient and insightful method for assessing their quality. This has potential applications in various domains where SSL is employed, allowing for better model selection and optimization without the need for extensive labeled datasets.
Topology-Aware State Abstraction with Tangle Cores for Markov Decision Processes
Reinforcement Learning
Graph Learning
Theory
- Introduction of tangle-core abstraction for overlapping state representation in MDPs.
- Theoretical guarantees for value preservation and error decomposition in abstract MDPs.
- Empirical results demonstrate superior performance of tangle cores compared to traditional state abstraction methods.
- Identification of specific environments where tangle cores provide significant advantages.
Read more
Topology-Aware State Abstraction with Tangle Cores for Markov Decision Processes
Summary
This paper presents a novel approach to state abstraction in reinforcement learning, addressing the limitations of traditional methods that rely on non-overlapping partitions of states based on reward and transition similarities. The authors introduce 'tangle-core abstraction,' which utilizes graph tangles to create overlapping state abstractions that better represent shared interface states in navigation and decision-making problems. By constructing an empirical transition graph from trajectory data and identifying low-order tangles, the method allows for a more flexible representation of state spaces. The authors provide theoretical guarantees for value preservation in the induced abstract MDP and demonstrate the advantages of overlapping abstractions over hard partitions. Empirical evaluations show that tangle-core abstractions outperform several baselines in terms of compression and return tradeoffs across various domains, including tabular settings and procedurally generated mazes. The findings suggest that tangle cores are particularly effective in environments with coherent transition structures, while also identifying scenarios where traditional methods may still be preferable.
Methodology
The authors construct an empirical transition graph from trajectory data and compute low-order tangles to form abstract states known as tangle cores. They utilize a soft membership kernel to allow for overlapping states, contrasting this with traditional hard partitions. Theoretical analysis includes value-preservation guarantees and error decomposition, while empirical evaluations benchmark the performance of tangle-core abstractions against various baselines.
Results
Tangle-core abstractions achieve favorable compression-return tradeoffs compared to bisimulation, DeepMDP, topological maps, and graph partitioning methods across multiple domains. The paper identifies a failure regime where transition topology is uninformative, indicating that tangles are not universally superior but effective in specific scenarios.
Implications
The findings suggest that tangle-core abstraction can enhance decision-making processes in environments with complex state structures, potentially improving the efficiency of reinforcement learning algorithms. This approach may be applicable in robotics, navigation systems, and other areas where shared interface states are prevalent.
Rethinking Neural Width for Alternating Current Optimal Power Flow Proxies
Optimization
Efficient ML
Theory
- Introduces Loss-Guided Neural Densification (LG-ND) for optimizing neural network width.
- Achieves performance parity with existing models using significantly fewer neurons.
- Emphasizes the importance of architectural minimalism for formal verification in power systems.
- Addresses the limitations of over-parameterization in deep learning models for ACOPF.
Read more
Rethinking Neural Width for Alternating Current Optimal Power Flow Proxies
Summary
This paper addresses the challenge of determining the appropriate architectural size for deep learning proxies used in solving the Alternating Current Optimal Power Flow (ACOPF) problem. The authors introduce a novel algorithm called Loss-Guided Neural Densification (LG-ND), which incrementally adjusts the width of the neural network based on its performance in approximating the ACOPF manifold. Instead of starting with a large model and pruning it, LG-ND begins with a minimal architecture and expands only when necessary, ensuring that the model remains compact and interpretable. Empirical evaluations on various IEEE systems demonstrate that LG-ND achieves comparable performance to existing models while utilizing up to ten times fewer neurons per layer. This architectural minimalism is crucial for formal verification in safety-critical grid operations, as smaller models are easier to certify and deploy. The findings suggest that a more systematic approach to model capacity can enhance the reliability and efficiency of machine learning applications in power systems.
Methodology
The authors propose the LG-ND algorithm, which starts with a minimal neural network architecture and incrementally increases its width based on validation loss improvements. The model is trained using a supervised regime with Mean Squared Error (MSE) loss, focusing on capturing the non-convex ACOPF manifold and satisfying physical constraints. The network's width is treated as a dynamic variable, expanding only when the current model fails to meet performance criteria.
Results
Empirical results indicate that the LG-ND algorithm can achieve similar performance levels to existing ACOPF proxy models while using up to ten times fewer neurons per layer. This demonstrates the effectiveness of the proposed approach in maintaining accuracy while reducing model complexity.
Implications
The findings have significant implications for the deployment of machine learning models in safety-critical applications, such as power grid management. By ensuring that models are compact and interpretable, the research supports the development of reliable and efficient solutions for real-time power flow optimization.
Cross-Modal Contrastive Learning of ECG and Angiography Representations for Severe Stenosis Classification
Multimodal
Time Series
- Introduction of StenCE, a contrastive pretraining framework for ECG analysis.
- Demonstrated ability to classify severe stenosis using only ECG data.
- Achieved an AUC of 0.822 for severe stenosis classification.
- Showed consistent performance improvements across various ECG encoders.
Read more
Cross-Modal Contrastive Learning of ECG and Angiography Representations for Severe Stenosis Classification
Summary
This paper addresses the challenge of diagnosing severe coronary artery stenosis, a condition that can lead to heart attacks if left untreated. Traditional methods rely on invasive X-ray angiograms, which are not feasible for all patients, particularly asymptomatic ones. The authors propose a novel framework named StenCE, which utilizes cross-modal contrastive learning to extract relevant features from electrocardiograms (ECGs) and align them with angiography data. By training an ECG encoder to recognize stenosis signals based on features learned from angiography, the model can classify patients based solely on ECG data. The study demonstrates that this approach significantly improves classification performance, achieving an area under the curve (AUC) of 0.822 for severe stenosis cases. This advancement not only enhances the potential for early diagnosis but also broadens the applicability of ECGs in cardiovascular risk stratification.
Methodology
The authors employed a multi-modal contrastive learning approach, where a transformer-based ECG encoder was pretrained to align its feature representations with those from a frozen angiography encoder. This was followed by fine-tuning the ECG encoder specifically for coronary stenosis classification and additional cardiac abnormalities.
Results
The proposed StenCE framework achieved an AUC of 0.822 for classifying severe stenosis cases, indicating a strong capability to detect stenosis signals from ECGs. The evaluations also showed consistent performance improvements across different ECG encoders and various severity thresholds.
Implications
The ability to diagnose severe coronary artery stenosis using non-invasive ECGs could lead to earlier detection and treatment of cardiovascular diseases, particularly in asymptomatic patients. This approach may reduce reliance on invasive procedures and improve patient outcomes in cardiovascular healthcare.
EST-PRM: Stress-Testing Process Reward Models Before They Become Load-Bearing
NLP
Large Language Models
Reinforcement Learning
- Introduces EST-PRM, a framework for stress-testing PRMs under structural perturbations.
- Identifies distinct vulnerability patterns across different PRM models.
- Demonstrates that robustness does not correlate with performance on natural reasoning chains.
- Proposes a formal framework for analyzing vulnerabilities in PRMs.
Read more
EST-PRM: Stress-Testing Process Reward Models Before They Become Load-Bearing
Summary
The paper introduces EST-PRM, a framework designed to stress-test Process Reward Models (PRMs) used in language model training. PRMs assign scores to intermediate reasoning steps, which are critical for dense supervision. The authors argue that existing evaluations do not adequately assess the robustness of PRMs under label-preserving transformations that can alter the relationship between PRM scores and correctness. EST-PRM applies three transformations: step inflation, dependency-aware step reordering, and confidence markers, to evaluate vulnerabilities in five PRM models across 4,687 reasoning chains from various datasets. The study reveals distinct vulnerability patterns among models, indicating that robustness does not correlate with performance on natural inputs. The authors propose a formal vulnerability framework and evaluate mitigation strategies, highlighting trade-offs between robustness and false-positive rates. The findings emphasize the necessity for rigorous robustness evaluations for PRMs to ensure reliable deployment in real-world applications.
Methodology
The authors developed a stress-testing framework that applies three types of label-preserving transformations to PRMs. They conducted empirical evaluations on five different PRM models using a dataset of 4,687 reasoning chains, analyzing the impact of these transformations on model performance and robustness.
Results
The results showed significant differences in vulnerability patterns across the evaluated models. Math-Shepherd exhibited high sensitivity to position perturbations, while Qwen2.5-Math-PRM was most affected by step inflation. The study also revealed that confidence-based perturbations distorted reward calibration, leading to inconsistencies in correctness estimation. The evaluation of mitigation strategies highlighted the trade-offs between robustness coverage and false-positive rates.
Implications
The findings suggest that PRMs require thorough robustness evaluations beyond traditional performance metrics to ensure their reliability in practical applications. This work lays the groundwork for future research in enhancing the robustness of process-level supervision in language models.