gistml

By James Asher

Daily summaries of the latest Machine Learning research papers from Arxiv.

2026-02-04 • Found 24 papers

APEX: Probing Neural Networks via Activation Perturbation

Tao Ren, Xiaoyu Luo, Qiongxiu Li
  • APEX perturbs hidden activations to probe neural networks, avoiding the limitations of input-space and parameter perturbations.
  • Small activation noise reveals sample-level regularity and semantic prediction transitions, aligning with established metrics.
  • Large activation noise exposes model-level biases, such as concentrated outputs in backdoored models, highlighting intrinsic structural biases.
  • APEX unifies existing probing methods by interpreting input perturbation as a constrained special case of activation perturbation.
  • The framework provides a novel perspective for understanding representation-level phenomena in neural networks.
Read More
Abstract
This paper introduces Activation Perturbation for EXploration (APEX), a novel inference-time probing framework for neural networks that perturbs hidden activations while keeping inputs and model parameters fixed. Unlike traditional input-space or parameter perturbation methods, APEX directly accesses intermediate representations, enabling a more comprehensive exploration of the structural information encoded in neural networks. The authors theoretically demonstrate that activation perturbation transitions model behavior from sample-dependent to model-dependent by suppressing input-specific signals and amplifying representation-level structures. Empirical studies reveal two distinct regimes: in the small-noise regime, APEX measures sample regularity and distinguishes structured from randomly labeled models, while in the large-noise regime, it exposes model-level biases, such as concentrated predictions in backdoored models. The findings suggest that APEX provides a semantically aligned perspective on neural network representations, surpassing the capabilities of input- or parameter-level perturbations.
Methodology
APEX operates by perturbing hidden activations during inference while keeping inputs and model parameters fixed. The authors analyze the effects of varying noise levels on model behavior, theoretically establishing the transition from sample-dependent to model-dependent behavior. Empirical case studies are conducted to evaluate APEX's ability to measure sample regularity, distinguish structured models, and reveal biases in backdoored models.
Results
APEX demonstrates two distinct regimes: (1) In the small-noise regime, it efficiently measures sample regularity and reveals semantically coherent prediction transitions. (2) In the large-noise regime, predictions become input-independent, exposing model-level biases such as concentrated outputs in backdoored models. These results highlight APEX's ability to uncover representation-level structures and biases that are inaccessible through traditional perturbation methods.
Implications
APEX provides a powerful tool for understanding neural network representations, with applications in model interpretability, robustness analysis, and security. Its ability to expose biases in backdoored models could aid in detecting adversarial attacks, while its insights into sample regularity may inform training strategies for improved generalization and fairness.
View on arXiv

Anytime Pretraining: Horizon-Free Learning-Rate Schedules with Weight Averaging

Alexandru Meterez, Pranav Ajit Nair, Depen Morwani, Cengiz Pehlevan, Sham Kakade
  • Cosine decay schedules, while effective, are horizon-dependent and unsuitable for open-ended or continual learning setups.
  • The authors propose horizon-free learning rate schedules, such as constant rates with weight averaging and 1/t^γ schedules, which do not require prior knowledge of the training horizon.
  • Weight averaging (e.g., exponential moving averages) plays a critical role in achieving competitive performance with minimax convergence rates.
  • Empirical results on 150M and 300M parameter models show that these anytime schedules closely track the performance of cosine decay across various training durations.
  • The proposed methods provide a practical alternative to cosine decay for large-scale language model pretraining, particularly in scenarios with uncertain or evolving training horizons.
Read More
Abstract
This paper introduces 'anytime' learning rate schedules for pretraining large language models (LLMs) in open-ended or continual learning settings, where the total training horizon is unknown. Traditional cosine decay schedules, while widely used, are horizon-dependent and require prior knowledge of the training duration, making them unsuitable for such scenarios. The authors propose and analyze horizon-free alternatives, including constant learning rates with weight averaging and polynomially decaying schedules (e.g., 1/t^γ). They demonstrate theoretically and empirically that these methods, when combined with weight averaging, can achieve minimax convergence rates and match the performance of well-tuned cosine schedules across various training durations. Experiments on 150M and 300M parameter language models trained at up to 32× Chinchilla scale show that these anytime schedules are competitive with cosine decay in terms of final validation loss, offering a practical and effective alternative for LLM pretraining.
Methodology
The authors conduct a theoretical analysis of horizon-free learning rate schedules in overparameterized linear regression, showing that weight averaging enables minimax convergence rates. Empirical evaluations are performed on 150M and 300M parameter language models trained at up to 32× Chinchilla scale. They compare constant learning rates with weight averaging, 1/t^γ schedules with weight averaging, and the warmup-stable-decay (WSD) schedule against cosine decay. Validation loss is measured across various training durations to assess the competitiveness of the proposed methods.
Results
The proposed anytime schedules, particularly constant learning rates with weight averaging and 1/t^γ schedules, achieve comparable final validation loss to well-tuned cosine decay schedules across all training durations. These methods effectively track the 'cosine envelope,' which represents the optimal performance of cosine schedules tuned for specific horizons. The results demonstrate that horizon-free schedules can serve as practical alternatives to cosine decay in large-scale language model pretraining.
Implications
The findings have significant implications for pretraining large language models in open-ended or continual learning scenarios, where the total training horizon is unknown or evolving. The proposed anytime schedules reduce the need for extensive tuning and horizon-dependent adjustments, simplifying the training process. This approach could enhance the efficiency and flexibility of training pipelines for large-scale models, particularly in dynamic or resource-constrained environments.
View on arXiv

Augmenting Parameter-Efficient Pre-trained Language Models with Large Language Models

Saurabh Anand, Shubham Malaviya, Manish Shukla, Sachin Lodha
  • Introduced 'CompFreeze,' a parameter-efficient fine-tuning method combining compacters with layer-freezing strategies for pre-trained language models.
  • Proposed two strategies to integrate large language models: data labeling for scarce datasets and fallback mechanisms for low-confidence predictions.
  • Demonstrated the effectiveness of combining domain-specific PLMs with the adaptability of LLMs for cybersecurity tasks.
  • Addressed challenges like data drift, scarcity of labeled data, and computational costs in cybersecurity AI applications.
  • Validated the approach through experiments on tasks such as spam detection, domain generation algorithm classification, and entity extraction.
Read More
Abstract
This paper addresses challenges in applying AI models to the dynamic and data-scarce domain of cybersecurity. The authors propose combining parameter-efficient pre-trained language models (PLMs) with large language models (LLMs) to improve model robustness, efficiency, and adaptability. The study introduces 'CompFreeze,' a parameter-efficient fine-tuning approach that integrates compacters (low-rank adapters) with layer-freezing strategies to reduce computational costs while maintaining performance. To further enhance these models, two novel strategies are proposed: (1) using LLMs to label unlabeled cybersecurity datasets, and (2) employing LLMs as fallback mechanisms for low-confidence predictions. Experimental evaluations on cybersecurity-specific downstream tasks, such as spam detection and entity extraction, demonstrate that the hybrid approach improves reliability and performance, making it suitable for real-world cybersecurity applications.
Methodology
The authors utilized parameter-efficient fine-tuning techniques by integrating compacters (low-rank adapters) into pre-trained language models and freezing specific layers to reduce computational overhead. They further enhanced these models by leveraging large language models in two ways: (1) generating labels for unlabeled cybersecurity data and (2) acting as fallback mechanisms for predictions with low confidence. The approach was tested on cybersecurity-specific PLMs (e.g., CyBERT, SecureBERT, CySecBERT) and evaluated on downstream tasks like spam detection and entity extraction.
Results
The proposed CompFreeze-based models, combined with LLMs, demonstrated improved robustness, reliability, and computational efficiency. The hybrid approach outperformed traditional fine-tuning methods in handling cybersecurity-specific tasks, particularly in scenarios with limited labeled data or data drift. The use of LLMs for data labeling and fallback mechanisms further enhanced the models' adaptability and performance.
Implications
This work has significant implications for the cybersecurity domain, where labeled data is scarce, and data drift is common. By combining parameter-efficient PLMs with LLMs, the proposed approach offers a scalable and cost-effective solution for real-world applications such as threat detection, spam filtering, and entity extraction. The methodology can also be extended to other domains facing similar challenges of data scarcity and dynamic environments.
View on arXiv

Automated Dysphagia Screening Using Noninvasive Neck Acoustic Sensing

Jade Chng, Rong Xing, Yunfei Luo, Kristen Linnemeyer-Risser, Tauhidur Rahman, Andrew Yousef, Philip A Weissbrod
  • The study introduces a noninvasive, machine learning-based approach to detect dysphagia using neck acoustic signals.
  • Data was collected from 49 participants undergoing FEES, with acoustic signals annotated using the penetration-aspiration scale (PAS).
  • The model achieved a high AUC-ROC of 0.904 across five train-test splits, demonstrating strong classification performance.
  • The system leverages domain-informed feature extraction and pre-trained audio embedding models, such as OPERA, for robust signal analysis.
  • This approach addresses limitations of current diagnostic methods, offering a scalable and cost-effective alternative for dysphagia screening.
Read More
Abstract
This paper presents a novel, noninvasive framework for automated dysphagia screening using neck acoustic sensing and machine learning. Dysphagia, a condition characterized by difficulty swallowing, affects a significant portion of the population, particularly older adults and individuals with neurological or oncological conditions. Current diagnostic methods, such as videofluoroscopic swallowing studies (VFSS) and fiberoptic endoscopic evaluation of swallowing (FEES), are invasive, costly, and require specialized equipment and trained personnel. To address these limitations, the authors developed a machine learning-based system that analyzes acoustic signals captured from the neck during swallowing tasks. The study collected data from 49 participants undergoing FEES, with acoustic signals annotated using the penetration-aspiration scale (PAS) to assess swallowing dysfunction. The proposed model achieved a high classification performance, with an AUC-ROC of 0.904 across five independent train-test splits. This work demonstrates the feasibility of using noninvasive acoustic sensing as a scalable, cost-effective, and practical tool for dysphagia screening and pharyngeal health monitoring.
Methodology
The study involved collecting neck acoustic signals from 49 participants during FEES, a gold-standard swallowing evaluation. Acoustic data was annotated using the penetration-aspiration scale (PAS) to classify swallowing events as normal or abnormal. The authors employed signal processing techniques for feature extraction and trained machine learning models, including pre-trained audio embedding models like OPERA, to classify swallowing abnormalities. The model's performance was evaluated using five independent train-test splits, with additional experiments to assess the impact of demographic features and model architectures.
Results
The proposed system achieved an AUC-ROC of 0.904 for detecting swallowing abnormalities, demonstrating high diagnostic accuracy. The OPERA pre-trained model outperformed other baseline models in feature extraction. Demographic features such as age and gender had minimal impact on model performance. The study also highlighted the system's ability to generalize across different bolus consistencies, addressing a key limitation of prior research.
Implications
This work has significant implications for healthcare, offering a noninvasive, portable, and cost-effective tool for dysphagia screening and pharyngeal health monitoring. The system could be deployed in clinical settings or as a point-of-care device, reducing reliance on invasive and resource-intensive diagnostic methods. Additionally, it has the potential to improve early detection and intervention for dysphagia, particularly in high-risk populations such as older adults and patients with neurological or oncological conditions.
View on arXiv

Bayesian Conformal Prediction as a Decision Risk Problem

Fanyi Wu, Veronika Lohmanova, Samuel Kaski, Michele Caprio
  • BCP optimises the conformal threshold to minimise prediction set size while ensuring valid coverage guarantees.
  • Bayesian posterior predictive densities are used as non-conformity scores, enhancing robustness under model misspecification.
  • Bayesian quadrature is employed for efficient estimation of prediction set sizes and coverage enforcement.
  • BCP achieves lower variability in prediction set sizes compared to classical split conformal prediction and Bayesian credible intervals.
  • Empirical evaluations demonstrate reliable coverage and efficiency across regression and classification tasks, including distribution-shifted scenarios.
Read More
Abstract
This paper introduces Optimised Bayesian Conformal Prediction (BCP), a novel framework that integrates Bayesian inference with conformal prediction to address efficiency and coverage in uncertainty quantification. BCP formulates conformal prediction as a decision-theoretic risk minimisation problem, where the conformal threshold is optimised to minimise the expected prediction set size while maintaining valid coverage guarantees. The framework leverages Bayesian posterior predictive densities as non-conformity scores and employs Bayesian quadrature for stable estimation of prediction set sizes. BCP operates within a split conformal prediction framework and demonstrates robust empirical coverage under model misspecification. Experimental evaluations on regression and classification tasks, including challenging distribution-shifted datasets like ImageNet-A, show that BCP achieves comparable prediction set sizes to classical split conformal prediction while significantly reducing variability in set sizes. Additionally, BCP outperforms Bayesian credible intervals in maintaining coverage under misspecified models.
Methodology
The authors propose a decision-theoretic optimisation framework for split conformal prediction, where the conformal threshold is treated as a decision variable. Bayesian posterior predictive densities are used as non-conformity scores, and Bayesian quadrature is applied to estimate the expected prediction set size. The framework enforces coverage guarantees using conformal risk control (CRC) and employs add-one-in (AOI) sampling for variance reduction in Bayesian score construction.
Results
BCP achieves valid empirical coverage under model misspecification, with an example of 81% coverage in sparse regression tasks compared to 49% for Bayesian credible intervals. It produces prediction sets of comparable size to classical split conformal prediction but with significantly lower run-to-run variability. The framework also demonstrates reliable performance in distribution-shifted settings, such as ImageNet-A.
Implications
BCP provides a robust approach to uncertainty quantification in machine learning, particularly in scenarios with model misspecification or distribution shifts. Its ability to optimise prediction set size while ensuring coverage guarantees makes it suitable for applications in safety-critical domains, such as healthcare, autonomous systems, and financial risk assessment.
View on arXiv

BlockRR: A Unified Framework of RR-type Algorithms for Label Differential Privacy

Haixia Liu, Yi Ding
  • BlockRR is a unified framework for RR-type mechanisms under label differential privacy, generalizing existing methods like RRWithPrior and RRonBins.
  • The framework partitions the label space into blocks and applies tailored randomized-response transformations to mitigate class imbalance and improve utility.
  • BlockRR satisfies ϵ-label DP and supports systematic benchmarking by reducing algorithm comparison to hyper-parameter analysis.
  • Empirical results on CIFAR-10 variants show that BlockRR outperforms existing methods in high- and moderate-privacy regimes (ϵ ≤ 3.0) while maintaining balanced per-class performance.
  • In low-privacy regimes (ϵ ≥ 4.0), BlockRR reduces to standard RR without additional performance loss.
Read More
Abstract
This paper introduces BlockRR, a novel and unified framework for randomized-response (RR) mechanisms under the label differential privacy (Label DP) paradigm. Label DP focuses on protecting the privacy of labels in datasets while allowing for better utility compared to standard differential privacy. BlockRR generalizes existing RR-type mechanisms, such as RRWithPrior and RRonBins, by partitioning the label space into blocks and applying tailored RR transformations. This unified approach eliminates the need for separate analyses of individual mechanisms and provides a common design space for systematic benchmarking. The authors prove that BlockRR satisfies ϵ-label DP and propose a partitioning method based on a weight matrix derived from label prior information. Empirical evaluations on two CIFAR-10 variants demonstrate that BlockRR achieves superior performance in high- and moderate-privacy regimes (ϵ ≤ 3.0) by balancing overall accuracy and per-class accuracy, while converging to standard RR in low-privacy regimes (ϵ ≥ 4.0).
Methodology
The authors propose a block-based randomized-response mechanism that partitions the label space into majority and minority subsets. Tailored RR transformations are applied to these blocks, using a block-uniform distribution to amplify majority labels and mitigate minority label effects. The framework is designed to unify existing RR-type algorithms under specific hyper-parameter settings, enabling systematic benchmarking. Theoretical proofs establish that BlockRR satisfies ϵ-label DP, and empirical evaluations are conducted on CIFAR-10 datasets with varying class imbalances.
Results
BlockRR achieves superior performance in high- and moderate-privacy regimes (ϵ ≤ 3.0), balancing overall accuracy and per-class accuracy while addressing class collapse and variance issues. In low-privacy regimes (ϵ ≥ 4.0), all methods converge to standard RR, with no additional performance loss. The framework demonstrates its utility in guiding practitioners to select appropriate mechanisms for specific scenarios.
Implications
BlockRR provides a unified and flexible framework for label differential privacy, simplifying the design and comparison of RR-type algorithms. Its ability to balance utility and privacy makes it suitable for applications in sensitive domains such as healthcare, finance, and user behavior analysis, where label privacy is critical. The framework's systematic benchmarking capability could also accelerate the development of new privacy-preserving algorithms.
View on arXiv

Cross-Temporal Attention Fusion (CTAF) for Multimodal Physiological Signals in Self-Supervised Learning

Arian Khorasani, Théophile Demazure
  • CTAF introduces a time-aware cross-attention mechanism to model asynchronous multimodal physiological signals.
  • The method uses alignment-regularized contrastive objectives to improve representation robustness and generalization.
  • CTAF achieves competitive performance on classification tasks while requiring fewer labeled samples.
  • Evaluation includes a novel protocol that measures alignment quality and cross-modal retrieval performance.
  • The approach explicitly addresses the coupling and temporal dynamics between central and autonomic nervous systems.
Read More
Abstract
This paper introduces Cross-Temporal Attention Fusion (CTAF), a self-supervised learning module designed to address the challenges of multimodal physiological signal fusion, particularly when modalities like EEG and peripheral physiology are asynchronous. Unlike traditional methods that rely on costly time-warping or assume strict synchrony, CTAF learns soft, bidirectional alignments between modalities using time-aware cross-attention mechanisms. The approach incorporates alignment-regularized contrastive objectives and redundancy-reduction techniques to ensure robust embeddings that generalize across subjects and tasks. Evaluated on the K-EmoCon dataset, CTAF demonstrates improved cross-modal token retrieval and alignment quality, while maintaining competitive performance on classification tasks with minimal labeled data. The method accounts for the inherent temporal asynchrony between central (EEG) and autonomic (e.g., EDA, ECG) systems, providing a label-efficient and generalizable solution for multimodal affect modeling.
Methodology
CTAF employs a self-supervised learning framework that combines time-aware cross-attention mechanisms, lightweight fusion gates, and alignment-regularized contrastive objectives. It learns soft cross-temporal correspondences between EEG and peripheral physiological signals, producing robust clip-level embeddings. The module is mask-aware for missing tokens and supports weak supervision when labels are available. Evaluation is conducted using subject-wise leave-one-out cross-validation on the K-EmoCon dataset, comparing against strong baselines.
Results
CTAF improves cross-modal token retrieval within a one-second tolerance and achieves higher cosine margins for matched pairs. It performs competitively on three-bin classification accuracy and macro-F1 scores, despite using fewer labeled samples. The method also demonstrates enhanced alignment quality and generalization across subjects.
Implications
CTAF has significant potential for applications in affective computing, emotion recognition, and health monitoring systems that rely on multimodal physiological signals. Its label-efficient design makes it suitable for scenarios with limited annotated data, while its ability to handle asynchronous signals could improve real-world usability in wearable and sensor-based technologies.
View on arXiv

Every Bit Counts: A Theoretical Study of Precision-Expressivity Tradeoffs in Quantized Transformers

Sayak Chakrabarti, Toniann Pitassi, Josh Alman
  • The paper establishes a fine-grained theoretical tradeoff between numerical precision and expressivity in quantized Transformers.
  • A one-layer softmax Transformer can compute certain functions with p bits of precision but fails with p−1 bits, demonstrating a sharp threshold for expressivity loss.
  • Tasks involving equality-like comparisons (e.g., exact match, membership checks) are especially sensitive to quantization.
  • The authors use a combination of finite-precision Transformer constructions and communication-complexity lower bounds to prove their results.
  • The findings provide practical guidance for choosing quantization levels based on task-specific requirements.
Read More
Abstract
This paper investigates the theoretical tradeoff between numerical precision and expressivity in quantized Transformer models, which are widely used in natural language processing and other domains. Quantization, a technique that reduces the numerical precision of computations, is commonly employed to improve the efficiency of Transformer models by reducing memory and computational costs. However, the impact of quantization on the expressivity of these models has not been well understood. The authors provide a theoretical framework to analyze this tradeoff, demonstrating that reducing precision by even a single bit can significantly impair the model's ability to compute certain functions. Specifically, they construct a function, Γ, inspired by the equality function, and prove that a one-layer softmax Transformer can compute Γ with p bits of precision but not with p−1 bits. This result highlights the sensitivity of tasks requiring equality-like comparisons to quantization. The paper combines explicit constructions of finite-precision Transformers with communication-complexity lower bounds to establish a tight 'one-bit' threshold for expressivity loss. The findings offer practical insights for selecting appropriate quantization levels based on task requirements, particularly for tasks involving exact matches or membership checks.
Methodology
The authors construct a specific function, Γ, inspired by the equality function, and analyze its computability under different levels of numerical precision in a one-layer softmax Transformer. They use theoretical tools from communication complexity to establish lower bounds on the precision required to compute Γ. The study considers both fixed-point and floating-point precision formats, analyzing their respective tradeoffs in terms of expressivity and numerical representation.
Results
The paper demonstrates that reducing precision by a single bit can cross a critical threshold where certain functions, such as Γ, become uncomputable by a Transformer. This establishes a tight 'one-bit' threshold for expressivity loss in quantized Transformers. The results concretely explain the empirical observation that quantization can lead to significant expressivity loss, particularly for tasks requiring equality-like comparisons.
Implications
The findings have practical implications for the deployment of quantized Transformers in real-world applications. They suggest that practitioners should carefully select the level of quantization based on the specific requirements of the task, particularly for tasks involving exact matches or membership checks. The results also pave the way for developing heuristics and guidelines to optimize the tradeoff between efficiency and expressivity in quantized models.
View on arXiv

Fubini Study Geometry of Representation Drift in High Dimensional Data

Arturo Tozzi
  • The Fubini–Study metric provides a projective geometric perspective for analyzing representation drift, isolating intrinsic changes from gauge-induced variability.
  • Conventional metrics like Euclidean and cosine distances overestimate drift in the presence of projective ambiguity.
  • A novel computable quantity is introduced to quantify the contribution of gauge-induced variability to representation drift.
  • The framework is tested on empirical datasets, demonstrating its robustness and applicability in distinguishing intrinsic evolution from parametrization artifacts.
  • This approach connects data analysis with established geometric principles, offering a systematic method for assessing representation stability.
Read More
Abstract
This paper introduces a novel geometric framework for analyzing representation drift in high-dimensional data using the Fubini–Study metric. Traditional metrics like Euclidean and cosine distances often conflate intrinsic changes in data representations with variations caused by arbitrary parametrizations, such as rescaling or sign flips. The Fubini–Study metric, grounded in projective geometry, addresses this limitation by measuring distances in terms of equivalence classes, isolating intrinsic changes while remaining invariant to gauge transformations. The authors construct representation trajectories from empirical datasets and compare cumulative drift using Euclidean, cosine, and Fubini–Study metrics. Their results demonstrate that conventional metrics systematically overestimate representation drift when projective ambiguity is present. The paper also introduces a computable quantity that quantifies the contribution of gauge-induced variability, providing a diagnostic tool for distinguishing meaningful structural evolution from parametrization artifacts. This approach bridges a conceptual gap in high-dimensional data analysis and offers a robust framework for assessing representation stability.
Methodology
The authors use the Fubini–Study metric to analyze representation drift by treating data representations as rays in projective space rather than points in a fixed vector space. Representation trajectories are constructed using a sliding-window procedure on the handwritten digits dataset from scikit-learn. Cumulative drift is computed along these trajectories using Euclidean, cosine, and Fubini–Study metrics. The differences between these metrics are analyzed to isolate gauge-induced variability.
Results
The study shows that the Fubini–Study metric effectively isolates intrinsic changes in data representations, while Euclidean and cosine distances systematically overestimate drift in the presence of projective ambiguity. The introduced computable quantity successfully quantifies the contribution of gauge-induced variability, providing a robust diagnostic tool for representation stability.
Implications
This framework has significant implications for high-dimensional data analysis, particularly in fields like representation learning, dimensionality reduction, and latent variable modeling. By distinguishing intrinsic evolution from parametrization artifacts, the approach enhances the interpretability and reliability of representation drift analyses. It also establishes a connection between data analysis and geometric principles, paving the way for more robust and invariant methods in machine learning workflows.
View on arXiv

Learning to Repair Lean Proofs from Compiler Feedback

Evan Wang, Simon Chess, Daniel Lee, Siyuan Ge, Ajit Mallavarapu, Vasily Ilin
  • The authors introduce APRIL, a large-scale dataset of 260,000 examples for Lean proof repair, including erroneous proofs, compiler diagnostics, corrected proofs, and natural-language explanations.
  • A systematic mutation pipeline generates realistic proof failures by modifying tactics, lines of code, and theorems, creating a diverse set of training examples.
  • Training language models on APRIL improves single-shot proof repair accuracy, with a finetuned 4B-parameter model achieving 27.4% accuracy compared to 1.1% for the base model.
  • The study emphasizes the value of feedback-conditioned supervision for improving the ability of neural theorem provers to interpret diagnostics and perform targeted repairs.
  • The dataset and methodology provide a foundation for further research in feedback-driven automated theorem proving and debugging in formal systems.
Read More
Abstract
This paper addresses the challenge of repairing erroneous Lean proofs using compiler feedback, a critical capability for advancing neural theorem provers. The authors introduce APRIL (Automated Proof Repair in Lean), a dataset of 260,000 examples that pairs erroneous proofs, compiler diagnostics, and corresponding corrected proofs. The dataset also includes natural-language diagnoses and fix suggestions grounded in the same feedback. APRIL is created by systematically mutating correct proofs to generate plausible errors and leveraging the Lean compiler to extract error messages and proof states. The authors demonstrate that training language models on APRIL significantly improves their ability to repair proofs and interpret compiler feedback. A finetuned 4B-parameter model achieves a 27.4% single-shot repair accuracy, outperforming both the base model and a strong open-source baseline. This work highlights the importance of feedback-conditioned supervision for iterative proof refinement and provides a valuable resource for advancing automated theorem proving.
Methodology
The authors construct the APRIL dataset by collecting correct proofs from public Lean datasets and systematically generating erroneous proofs through controlled mutations, such as substituting similar theorems, swapping tactics, and introducing plausible errors using language models. The Lean compiler is used to extract error messages, error lines, and goal states for these erroneous proofs. Language models are then trained on this dataset to predict corrected proofs and generate natural-language diagnoses based on compiler feedback. The models are evaluated in a single-shot repair setting without search or iteration.
Results
The finetuned 4B-parameter language model achieved a 27.4% single-shot repair accuracy, significantly outperforming the base model (1.1%) and a strong baseline (26.8%). This demonstrates the effectiveness of feedback-conditioned supervision in improving proof repair capabilities. The APRIL dataset enables controlled evaluation and analysis of models' ability to interpret diagnostics and perform targeted repairs.
Implications
This work has significant implications for advancing automated theorem proving by enabling models to iteratively refine proofs based on compiler feedback, mimicking human proof development processes. The APRIL dataset provides a valuable resource for training and evaluating models on feedback-driven proof repair, potentially improving the efficiency and reliability of formal verification systems in mathematics, software engineering, and other domains requiring formal reasoning.
View on arXiv

Membership Inference Attacks from Causal Principles

Mathieu Even, Clément Berenfeld, Linus Bleistein, Tudor Cebere, Julie Josse, Aurélien Bellet
  • The paper introduces a causal framework for Membership Inference Attacks (MIAs), defining memorization as the causal effect of including a data point in training.
  • It identifies key biases in existing MIA evaluation methods, such as interference in one-run methods and confounding in zero-run methods.
  • The authors propose practical estimators for causal MIA metrics, offering non-asymptotic consistency guarantees.
  • The approach is validated through experiments on synthetic data and CIFAR-10, demonstrating its reliability under distribution shifts.
  • This work provides a principled foundation for privacy evaluation in large-scale AI systems, especially when retraining is infeasible.
Read More
Abstract
This paper reframes the evaluation of Membership Inference Attacks (MIAs) as a causal inference problem, addressing key biases and limitations in existing methods. MIAs are used to assess privacy risks by determining whether a specific data point was part of a model's training set. Traditional evaluation methods, such as multi-run retraining, are computationally expensive, while more practical one-run and zero-run approaches suffer from statistical biases. The authors propose a novel causal framework to define memorization as the causal effect of including a data point in the training set. They identify and formalize sources of bias in existing protocols, such as interference in one-run methods and confounding in zero-run evaluations. The paper introduces causal analogues of standard MIA metrics, develops principled estimators with consistency guarantees, and validates their approach through experiments on synthetic data and CIFAR-10. This work provides a robust foundation for privacy evaluation in modern AI systems, particularly in scenarios where retraining is impractical or data access is limited.
Methodology
The authors adopt a causal inference perspective, using the potential outcomes framework to define causal counterparts to traditional MIA metrics. They analyze multi-run, one-run, and zero-run MIA evaluation methods, identifying sources of bias such as interference and confounding. They propose practical estimators for causal MIA metrics, leveraging tools like algorithmic stability to address challenges such as random interference. The approach is validated through experiments on synthetic data and CIFAR-10 datasets, demonstrating its effectiveness in addressing biases and providing reliable memorization measurements.
Results
The proposed causal estimators effectively address biases in one-run and zero-run MIA evaluations, providing reliable memorization measurements even under distribution shifts. Experiments on synthetic data and CIFAR-10 demonstrate the practical utility and robustness of the approach, showing that it can achieve consistent and accurate results without requiring computationally expensive retraining.
Implications
This work has significant implications for privacy evaluation in modern AI systems, particularly in scenarios where retraining is impractical or data access is restricted, such as with large language models deployed by major tech companies. By providing a principled and computationally efficient framework for MIA evaluation, the proposed methods can help organizations and regulators assess privacy risks, ensure compliance with data protection regulations, and address concerns about data memorization and leakage in machine learning models.
View on arXiv

Most Convolutional Networks Suffer from Small Adversarial Perturbations

Amit Daniely, Idan Mehalel
  • Adversarial examples exist in random CNNs with perturbations in â„“2-distance of order ∥x∥/√d, which is near-optimal.
  • A single step of gradient descent is sufficient to find adversarial examples with small perturbations.
  • The authors use Fourier decomposition to bound the singular values of random linear convolutional operators, a key technical contribution.
  • The results improve upon prior work by providing tighter bounds and constructive methods for finding adversarial examples.
  • The analysis assumes constant depth and limited width growth in CNNs, which is a limitation compared to some prior studies.
Read More
Abstract
This paper investigates the susceptibility of convolutional neural networks (CNNs) to adversarial examples, which are small, carefully crafted perturbations to inputs that cause incorrect model predictions. Extending prior work on adversarial examples in fully connected networks, the authors prove that adversarial examples exist in random CNNs with perturbations in ℓ2-distance of order ∥x∥/√d, which is essentially the smallest possible. They also demonstrate that such adversarial examples can be efficiently found using a single step of gradient descent. The authors leverage Fourier decomposition to derive bounds on the singular values of random linear convolutional operators, which are central to CNN layers. While their results require certain constraints on network depth and width, the findings provide new insights into the theoretical understanding of adversarial vulnerabilities in CNNs.
Methodology
The authors analyze random CNNs using a theoretical framework based on Fourier decomposition to derive bounds on the singular values of random linear convolutional operators. They prove the existence of adversarial examples with near-optimal perturbation distances and demonstrate that these examples can be found using a single step of gradient descent. The analysis assumes a specific CNN architecture with constant depth and limited width growth, and considers a wide family of activation functions, excluding ReLU but including its smooth variants.
Results
The paper establishes that adversarial examples in random CNNs can be found with perturbations of ℓ2-distance ∥x∥/√d, which is the smallest possible under the given assumptions. Furthermore, the authors show that these adversarial examples can be efficiently identified using a single gradient descent step. The results improve upon previous work by providing tighter bounds and a constructive method for finding adversarial examples.
Implications
The findings highlight the inherent vulnerability of CNNs to small adversarial perturbations, even under random initialization. This has implications for the robustness of CNNs in real-world applications, particularly in domains like image classification and security-sensitive tasks. The theoretical insights and methods introduced in this paper could inform the development of more robust architectures and adversarial defense strategies.
View on arXiv

Notes on the Reward Representation of Posterior Updates

Pedro A. Ortega
  • Posterior identification constrains reward-value interactions to conditional pointwise mutual information (PMI), providing a sharp representation structure.
  • Absolute rewards remain ambiguous without a baseline convention, highlighting the limits of reward identifiability from observed conditionals.
  • A coherence constraint ensures consistency across update directions, coupling reward parametrizations for different conditioning orders.
  • The study isolates the boundary case where KL-regularized updates are literal Bayesian posteriors, rather than metaphorical approximations.
  • The findings have implications for bounded rationality, maximum-entropy control, and generalized Bayesian frameworks.
Read More
Abstract
This paper explores the theoretical underpinnings of decision-making as inference, focusing on the specific case where KL-regularized soft updates align exactly with Bayesian posterior updates under a single probabilistic model. The study imposes structural constraints to identify when behavioral changes are driven solely by evidence transmitted through a genuine information channel. The author demonstrates that posterior updates can represent relative incentive signals (via conditional pointwise mutual information) but cannot uniquely determine absolute rewards without a baseline convention. Additionally, the paper introduces a coherence constraint requiring a single reusable value function across different update directions, ensuring consistency in reward parametrizations. These findings contribute to the broader understanding of inference-optimization unification, reward identifiability, and agent design in reinforcement learning and control systems.
Methodology
The paper employs a theoretical approach, deriving algebraic identities and constraints under the assumptions of posterior identification and coherence across update directions. It uses KL-regularized optimization objectives and conditional pointwise mutual information to analyze reward-value interactions within a fixed probabilistic model.
Results
{'posterior_PMI_representation': 'Posterior updates are represented as conditional PMI, linking reward-value interactions to evidence reweighting.', 'gauge_freedom': 'Rewards are identified only up to context-specific baselines, leaving absolute reward levels ambiguous.', 'commutativity_constraint': 'A single value function across update directions imposes integrability constraints on reward parametrizations.'}
Implications
['Provides a formal framework for understanding reward representation in inference-based decision-making systems.', 'Highlights the limitations of reward identifiability, which could inform the design of reinforcement learning algorithms and bounded rationality models.', 'Offers insights into the unification of inference and optimization, potentially influencing agent design in control systems and artificial intelligence.']
View on arXiv

Periodic Regularized Q-Learning

Hyukjun Yang, Han-Dong Lim, Donghwan Lee
  • The authors propose a new algorithm, Periodic Regularized Q-Learning (PRQ), which combines regularization and periodic updates to stabilize Q-learning under linear function approximation.
  • A regularized projected Bellman equation (RP-BE) and its associated value iteration (RP-VI) are formulated, ensuring contraction properties and stable convergence.
  • Theoretical analysis establishes finite-time convergence guarantees and sample complexity bounds for PRQ under both i.i.d. and Markovian observation models.
  • Empirical results show that both periodic updates and regularization are necessary for stable learning, with counterexamples provided to demonstrate failure when either component is removed.
  • The method avoids reliance on restrictive assumptions like truncation or strong local convexity, making it applicable to a wide range of RL problems.
Read More
Abstract
This paper introduces Periodic Regularized Q-Learning (PRQ), a novel reinforcement learning (RL) algorithm designed to address the instability of Q-learning under linear function approximation. Traditional Q-learning, while effective in tabular settings, struggles with convergence when combined with function approximation due to the 'deadly triad' of off-policy learning, bootstrapping, and approximation. To mitigate these issues, the authors propose a regularization approach at the level of the projection operator, leading to a contraction mapping in the projected Bellman operator. This is extended into a stochastic setting to develop PRQ, which incorporates periodic parameter updates alongside regularization. The authors provide rigorous theoretical guarantees, including finite-time convergence and sample complexity bounds, and demonstrate empirically that both periodic updates and regularization are essential for stable learning. The proposed method does not rely on truncation, projection onto a ball, or strong local convexity assumptions, making it broadly applicable.
Methodology
The authors introduce regularization at the projection operator level, creating a regularized projected value iteration (RP-VI) that ensures contraction. This is extended to a stochastic setting to develop the PRQ algorithm, which employs periodic parameter updates to separate inner convex optimization from outer Bellman updates. Theoretical analysis is conducted to prove finite-time convergence and sample complexity bounds, and empirical experiments validate the necessity of both periodic updates and regularization.
Results
Theoretical results demonstrate finite-time convergence guarantees and sample complexity bounds for PRQ under linear function approximation. Empirical experiments confirm that the combination of periodic updates and regularization is crucial for stable learning, with counterexamples showing failure when either component is removed.
Implications
The proposed PRQ algorithm has significant implications for reinforcement learning, particularly in stabilizing Q-learning under function approximation. Its theoretical guarantees and broad applicability make it a promising approach for real-world RL problems, including robotics, game playing, and autonomous systems, where stability and convergence are critical.
View on arXiv

Prompt Augmentation Scales up GRPO Training on Mathematical Reasoning

Wenquan Lu, Hai Huang, Randall Balestriero
  • Introduces 'prompt augmentation,' a training strategy that mixes diverse reasoning templates and formats to enhance rollout diversity.
  • Addresses the entropy collapse issue in RL post-training, enabling stable and prolonged training without KL regularization.
  • Achieves state-of-the-art performance on mathematical reasoning benchmarks using the Qwen2.5-Math-1.5B model.
  • Demonstrates that prompt augmentation stabilizes training even in low-entropy regimes, allowing for extended training durations (up to 50 epochs).
  • Highlights the potential of diverse reasoning formats to improve reasoning diversity and prevent overfitting to a single reasoning style.
Read More
Abstract
This paper addresses the challenges of entropy collapse and training instability in reinforcement learning (RL) post-training of large language models (LLMs) for mathematical reasoning. The authors propose a novel training strategy called 'prompt augmentation,' which introduces diverse reasoning templates and formats during training to increase rollout diversity. This approach eliminates the need for KL regularization, enabling stable and prolonged training even in low-entropy regimes. Using the Qwen2.5-Math-1.5B model and training on the MATH Level 3–5 dataset, the authors demonstrate state-of-the-art performance on multiple mathematical reasoning benchmarks, including AIME24, AMC, MATH500, Minerva, and OlympiadBench. The method achieves significant improvements in both per-benchmark and per-question accuracy, outperforming existing GRPO-based methods. The paper highlights the effectiveness of prompt augmentation in stabilizing RL training, enhancing reasoning diversity, and enabling sustained policy improvement.
Methodology
The authors propose prompt augmentation, which involves mixing multiple reasoning templates (e.g., tagged reasoning-answer separation, free-form generation, chain-of-thought prompting, and reflection-based formats) within a single training run. Each template is paired with specific format rewards to ensure adherence during training. This approach eliminates the need for KL regularization and enables stable training over extended horizons. The method is evaluated using the Qwen2.5-Math-1.5B model on the MATH Level 3–5 dataset and several mathematical reasoning benchmarks.
Results
The proposed method achieves state-of-the-art performance on mathematical reasoning benchmarks, including 44.5% per-benchmark accuracy and 51.3% per-question accuracy. It outperforms both vanilla GRPO and DAPO baselines in terms of accuracy and training stability. Prompt augmentation enables training to continue for up to 50 epochs, significantly longer than the 5–20 epochs typically seen in prior work.
Implications
The findings suggest that prompt augmentation can be a powerful tool for improving the reasoning capabilities of LLMs in mathematical and other structured reasoning tasks. By stabilizing RL training and enhancing reasoning diversity, this approach could be applied to other domains such as coding, medical diagnosis, and tabular reasoning, where diverse reasoning paths are critical for performance.
View on arXiv

QuAIL: Quality-Aware Inertial Learning for Robust Training under Data Corruption

Mattia Sabella, Alberto Archetti, Pietro Pinoli, Matteo Matteucci, Cinzia Cappiello
  • QuAIL introduces a quality-aware training mechanism that integrates feature reliability priors into the learning process.
  • The method employs a learnable feature-modulation layer and a quality-dependent proximal regularizer to stabilize optimization under data corruption.
  • QuAIL eliminates the need for explicit data cleaning or instance-level reweighting, making it more practical for real-world applications.
  • Empirical evaluations on 50 datasets show consistent performance improvements over standard baselines under both random and value-dependent corruption.
  • The approach is particularly effective in low-data and systematically biased scenarios, demonstrating robust behavior across diverse settings.
Read More
Abstract
The paper introduces QuAIL (Quality-Aware Inertial Learning), a novel training mechanism designed to improve the robustness of machine learning models when dealing with tabular datasets affected by non-uniform data corruption. Unlike traditional approaches that rely on explicit data cleaning or instance-level quality annotations, QuAIL incorporates feature-level quality priors directly into the optimization process. It achieves this through a learnable feature-modulation layer combined with a quality-dependent proximal regularizer, which selectively constrains updates to unreliable features while allowing reliable ones to adapt freely. This approach stabilizes training under structured corruption without requiring data repair or sample reweighting. Empirical evaluations on 50 classification and regression datasets demonstrate that QuAIL consistently outperforms standard neural baselines and curriculum-based methods, particularly in scenarios with limited data or systematic biases. The results highlight the effectiveness of treating data quality as an integral component of the learning process, offering a practical solution for resilient tabular learning in real-world settings.
Methodology
QuAIL incorporates a learnable gating layer that modulates feature contributions based on quality priors. A proximal anchor regularizer selectively slows updates for low-quality features, introducing resistance to parameter drift. This allows reliable features to adapt freely while constraining unreliable ones. The method is evaluated on 50 classification and regression datasets under various corruption protocols, including random and value-dependent corruption.
Results
QuAIL consistently outperformed standard neural baselines and curriculum-based methods across 50 datasets. It demonstrated robust performance gains under both random and value-dependent corruption, with particularly strong results in low-data and systematically biased settings. The method effectively stabilized training without requiring explicit data cleaning or sample reweighting.
Implications
QuAIL provides a practical and effective solution for improving the robustness of machine learning models in real-world tabular data scenarios, where data corruption and quality issues are common. Its ability to integrate feature reliability into the optimization process could reduce the need for costly data cleaning pipelines and enhance the deployment of machine learning systems in domains such as healthcare, finance, and sensor-based applications.
View on arXiv

SAGE-5GC: Security-Aware Guidelines for Evaluating Anomaly Detection in the 5G Core Network

Cristian Manca, Christian Scano, Giorgio Piras, Fabio Brau, Maura Pintor, Battista Biggio
  • Introduces SAGE-5GC, a framework for security-aware evaluation of anomaly detection in 5G Core networks.
  • Highlights the limitations of existing evaluation practices, such as reliance on IID data and static threat models.
  • Proposes a genetic algorithm-based optimization strategy to craft adversarial samples that evade detection.
  • Demonstrates that adversarial attacks can significantly degrade detection performance in realistic 5G scenarios.
  • Emphasizes the need for robust, adversarially informed evaluation methodologies for real-world deployments.
Read More
Abstract
This paper addresses the challenges of evaluating machine learning-based anomaly detection systems in the 5G Core (5GC) network, particularly under realistic and adversarial conditions. The authors propose SAGE-5GC, a set of Security-Aware Guidelines for Evaluating anomaly detection systems in the 5GC network. These guidelines incorporate domain knowledge of 5G protocols and explicitly account for adversarial threats. Using a realistic 5GC dataset, the study evaluates the baseline performance of various anomaly detection models against standard cyberattacks targeting PFCP-based control-plane services. The evaluation is then extended to adversarial scenarios, where attackers manipulate traffic features to evade detection while preserving the functionality of malicious activities. The authors introduce a model-agnostic optimization strategy based on genetic algorithms to craft adversarial samples. Experimental results reveal that adversarial attacks can significantly degrade the performance of anomaly detection systems, even those that perform well under conventional evaluation settings. The findings emphasize the importance of adopting security-aware and adversarially informed evaluation methodologies for robust anomaly detection in real-world 5G deployments.
Methodology
The authors evaluate several anomaly detection algorithms on a realistic 5G Core dataset, first under standard cyberattack scenarios and then under adversarial conditions. They use a genetic algorithm-based optimization strategy to craft adversarial samples by manipulating attacker-controllable features while preserving the functionality of malicious traffic. The robustness of the anomaly detection models is assessed by analyzing their sensitivity to both random and optimized adversarial perturbations.
Results
The study shows that adversarially crafted attacks can substantially degrade the performance of anomaly detection systems, even for models that perform well under conventional evaluation settings. This highlights the vulnerability of machine learning-based detectors to adversarial manipulation in realistic 5G network environments.
Implications
The findings underscore the necessity of adopting security-aware and adversarially informed evaluation frameworks for anomaly detection in 5G Core networks. The proposed SAGE-5GC guidelines can help researchers and practitioners design more robust and reliable anomaly detection systems, enhancing the security of 5G networks in real-world deployments. The genetic algorithm-based adversarial attack strategy also provides a practical tool for stress-testing detection models against adaptive threats.
View on arXiv

Scaled Dot-Product Attention implements projection of inputs onto a common surface

Terence D. Sanger
  • Scaled Dot-Product Attention (SDPA) is reinterpreted as a projection operation onto a common surface defined by input vectors.
  • This reformulation aligns SDPA with principles of signal processing and dynamic systems theory, providing a mathematically rigorous foundation.
  • The projection-based view highlights SDPA's ability to capture time-dependent, context-sensitive nonlinear dependencies in data.
  • The proposed framework simplifies the mathematical analysis of SDPA and suggests potential extensions for time-series applications.
  • Experimental results show that projection-based SDPA performs equivalently to standard SDPA in a translation task, with no loss of computational power.
Read More
Abstract
This paper provides a novel mathematical interpretation of Scaled Dot-Product Attention (SDPA), a core mechanism in Transformer-based models. The author demonstrates that SDPA can be reformulated as a projection operation, where input vectors are projected onto a common surface defined by their relationships. This reformulation offers a new perspective on SDPA, moving away from the traditional 'query-key-value' framework and instead interpreting it as a mechanism for discovering time-dependent, context-sensitive nonlinear dependencies in input data. The projection-based view simplifies the mathematical understanding of SDPA, aligns it with principles of signal processing, and suggests potential extensions for time-series data. The paper also explores the implications of this interpretation for language modeling, where SDPA is seen as modifying token embeddings based on a local context surface. Experimental results on a Spanish-to-English translation task confirm that the projection-based SDPA performs equivalently to standard SDPA, while offering a more interpretable framework.
Methodology
The author reformulates the SDPA equation by expressing it as a projection operation. This involves rewriting the attention mechanism in terms of Gaussian-weighted distances between input vectors, rather than the traditional dot-product formulation. The reformulated equations are tested by modifying standard Transformer code to replace the original SDPA with the projection-based version. The performance of both versions is compared on a Spanish-to-English translation task using the Tatoeba dataset.
Results
The projection-based SDPA was shown to be mathematically equivalent to the standard SDPA, with no additional computational power. Experimental results on a Spanish-to-English translation task demonstrated that the projection-based SDPA performs equivalently to the standard SDPA, confirming its validity as an alternative formulation.
Implications
This work provides a new mathematical perspective on SDPA, which could lead to better theoretical understanding and potential extensions for applications involving time-series data. The projection-based interpretation may also inspire new architectures or modifications to attention mechanisms in machine learning models, particularly in contexts where local context and nonlinear dependencies are critical.
View on arXiv

Soft Sensor for Bottom-Hole Pressure Estimation in Petroleum Wells Using Long Short-Term Memory and Transfer Learning

M. A. Fernandes, E. Gildin, M. A. Sampaio
  • The paper introduces a soft sensor for BHP estimation using LSTM networks, addressing the limitations of physical PDGs.
  • Transfer Learning is applied to adapt models across different operational environments, enhancing generalizability.
  • The methodology achieves high accuracy (MAPE < 2%) on real-world offshore datasets, outperforming traditional methods like Multi-Layer Perceptron (MLP) and Ridge Regression.
  • The solution is cost-effective and applicable across diverse reservoir and flow conditions, particularly in steady-state scenarios.
  • The soft sensor can be integrated into digital twin systems for continuous monitoring and anomaly detection.
Read More
Abstract
This paper addresses the challenge of estimating flowing bottom-hole pressure (BHP) in petroleum wells, a critical variable for production optimization, safety, and emissions reduction. Permanent Downhole Gauges (PDGs), which provide real-time pressure data, are often unreliable or economically unfeasible, especially in mature fields or low-productivity wells. To overcome this limitation, the authors propose a machine learning-based soft sensor leveraging Long Short-Term Memory (LSTM) networks and Transfer Learning. The soft sensor uses topside and wellhead measurements to estimate BHP, providing a cost-effective alternative to physical sensors. The methodology is tested on real offshore datasets from Brazil's Pre-salt basin and achieves high accuracy, with Mean Absolute Percentage Error (MAPE) consistently below 2%. This approach is particularly suited for steady-state flow conditions and can be integrated into digital twin systems for anomaly detection and error monitoring.
Methodology
The authors developed a data-driven soft sensor using LSTM networks to estimate BHP based on topside and wellhead measurements. Transfer Learning was employed to adapt the model to different operational environments. The approach was compared against benchmarks such as Multi-Layer Perceptron (MLP) and Ridge Regression, and tested on real offshore datasets from Brazil's Pre-salt basin.
Results
The proposed LSTM-based soft sensor achieved a Mean Absolute Percentage Error (MAPE) consistently below 2%, outperforming traditional methods like MLP and Ridge Regression. The model demonstrated robustness and adaptability across diverse operational environments, validating its effectiveness in real-world scenarios.
Implications
This work provides a cost-effective and accurate alternative to physical sensors for BHP estimation, reducing reliance on expensive PDGs and wireline interventions. The methodology has broad applicability in the petroleum industry, particularly for mature fields and low-productivity wells. Additionally, it can be integrated into digital twin systems for enhanced monitoring, anomaly detection, and operational optimization.
View on arXiv

Step-Wise Refusal Dynamics in Autoregressive and Diffusion Language Models

Eliron Rahimi, Elad Hirshel, Rom Himelstein, Amit LeVi, Avi Mendelson, Chaim Baskin
  • The paper introduces a framework for analyzing step-wise refusal dynamics in AR and DLMs, highlighting the role of sampling strategies in safety behavior.
  • The Step-Wise Refusal Internal Dynamics (SRI) signal is proposed as an interpretable safety representation that captures internal recovery dynamics.
  • Diffusion sampling enables iterative correction of harmful intermediate states, unlike AR sampling, which cannot revise harmful content once generated.
  • SRI signals can identify harmful generations through incomplete internal recovery, even when such issues are not observable at the text level.
  • Lightweight inference-time detectors based on SRI outperform existing defenses while reducing inference overhead by over 100×.
Read More
Abstract
This paper investigates the differences in refusal behavior and robustness to jailbreak attacks between autoregressive (AR) and diffusion language models (DLMs). The authors introduce a novel analytical framework to study step-wise refusal dynamics, revealing that the sampling strategy itself, rather than the underlying learned representations, plays a critical role in safety behavior. To address this, the authors propose the Step-Wise Refusal Internal Dynamics (SRI) signal, which provides an interpretable safety representation by analyzing step-wise generation trajectories. The SRI signal identifies anomalous behavior, such as harmful generations, as cases of incomplete internal recovery that are not observable at the text level. The paper demonstrates that SRI can be used to develop lightweight inference-time detectors that generalize to unseen attacks while maintaining high performance and significantly reducing computational overhead. This work provides new insights into the structural differences between AR and diffusion sampling mechanisms and their impact on safety and robustness.
Methodology
The authors developed an analytical framework to compare step-wise refusal dynamics in AR and DLMs by evaluating identical model weights under different sampling strategies. They introduced the SRI signal, which analyzes step-wise generation trajectories to capture internal safety dynamics. The SRI signal was used to train lightweight inference-time detectors to identify harmful generations and evaluate their performance against existing defenses.
Results
The study found that diffusion sampling allows for iterative correction of harmful intermediate states, making DLMs more robust to jailbreak attacks compared to AR models. The SRI signal effectively identified harmful generations through incomplete internal recovery, even when such behavior was not observable at the text level. Detectors based on SRI generalized to unseen attacks and matched or outperformed existing defenses while reducing inference overhead by over 100×.
Implications
This work has significant implications for improving the safety and robustness of language models. The SRI signal provides a lightweight and interpretable tool for detecting harmful generations, which could be integrated into real-world applications to enhance the reliability of both AR and diffusion-based language models. Additionally, the findings highlight the potential of diffusion sampling for safer and more controllable text generation, paving the way for further research into hybrid or diffusion-based approaches for language modeling.
View on arXiv

TabPFN for Zero-shot Parametric Engineering Design Generation

Ke Wang, Yifan Tang, Nguyen Gia Hien Vu, Faez Ahmed, G. Gary Wang
  • Proposes a zero-shot generative framework for parametric engineering design using TabPFN, eliminating the need for retraining or fine-tuning.
  • Achieves conditional design generation by sequentially predicting design parameters based on target performance indicators.
  • Demonstrates competitive performance on three engineering design datasets, achieving low performance errors (e.g., <2% for ship hull designs).
  • Significantly reduces computational overhead and data requirements compared to diffusion-based generative models.
  • Supports flexible conditioning and partial design completion, enabling practical integration into engineering workflows.
Read More
Abstract
This paper introduces a novel zero-shot generative framework for parametric engineering design using TabPFN, a Prior-Data Fitted Network originally developed for tabular data. The proposed approach enables conditional design generation without task-specific training or fine-tuning, addressing limitations of traditional deep generative models such as high computational cost, large dataset requirements, and lack of adaptability to new design tasks. By sequentially generating design parameters conditioned on target performance indicators, the method provides a flexible and efficient alternative to conventional generative models like diffusion models. The framework is evaluated on three engineering design datasets—ship hull design, BlendedNet aircraft, and UIUC airfoil—and demonstrates competitive performance in terms of design diversity, robustness to parameter dimensionality, and low performance error. The results highlight the potential of this zero-shot, data-efficient approach for real-world engineering workflows, enabling rapid deployment and adaptation to new design settings.
Methodology
The method leverages TabPFN, a pre-trained Prior-Data Fitted Network, to perform sequential conditional generation of design parameters inspired by Recurrent Neural Networks (RNN). The framework uses a small reference dataset to conditionally generate designs without additional training. It is evaluated against diffusion-based generative models on three datasets, focusing on diversity, robustness, and performance accuracy.
Results
The proposed framework achieves competitive diversity across structured parametric design spaces, remains robust to variations in sampling resolution and parameter dimensionality, and achieves low performance errors (e.g., <2% for ship hull designs). It outperforms diffusion-based generative models in terms of computational efficiency and data requirements while maintaining reliable generation performance.
Implications
This zero-shot generative framework has significant potential for real-world engineering design workflows. It enables rapid deployment, flexible adaptation to new design tasks, and seamless integration into existing processes. The approach could accelerate design cycles, reduce computational costs, and expand the applicability of generative models in engineering domains such as aerospace, automotive, and product design.
View on arXiv

Trajectory Consistency for One-Step Generation on Euler Mean Flows

Zhiqi Li, Yuchen Sun, Duowen Chen, Jinjin He, Bo Zhu
  • Euler Mean Flows (EMF) introduces a trajectory-consistent framework for one-step and few-step generation using a linearized semigroup formulation.
  • The proposed surrogate loss enables direct supervision of long-range flow maps without explicit gradient computations, reducing memory and computational costs.
  • EMF supports both u-prediction and x1-prediction variants, enhancing flexibility in generative modeling tasks.
  • Experiments show improved optimization stability and sample quality under fixed sampling budgets.
  • Training time and memory consumption are reduced by approximately 50% compared to existing one-step generation methods.
Read More
Abstract
This paper introduces Euler Mean Flows (EMF), a novel flow-based generative framework designed for efficient one-step and few-step generation. EMF addresses the challenge of enforcing trajectory consistency in flow-based models, which is critical for ensuring coherent long-range dynamics in generative processes. The authors propose a linearized surrogate loss derived from the semigroup formulation of flow maps, enabling direct supervision of long-range flow-map compositions without requiring explicit gradient computations. This approach significantly reduces memory and computational overhead while improving optimization stability. EMF supports both u-prediction and x1-prediction variants, providing flexibility in generative modeling tasks. Experimental evaluations on image synthesis, particle-based geometry generation, and functional generation demonstrate that EMF achieves competitive sample quality while reducing training time and memory consumption by approximately 50% compared to existing one-step generation methods.
Methodology
The authors derive a linearized surrogate loss from the semigroup property of flow maps, enabling direct supervision of long-range trajectory consistency. This approach avoids explicit Jacobian computations, leading to a gradient-free training framework. The method is inspired by Euler time integration and supports both u-prediction and x1-prediction variants for generative modeling tasks.
Results
EMF demonstrates improved optimization stability and sample quality across tasks such as image synthesis, particle-based geometry generation, and functional generation. The framework achieves competitive generative performance while reducing training time and memory consumption by approximately 50% compared to existing one-step methods.
Implications
The proposed EMF framework has the potential to enhance the efficiency and scalability of generative modeling across various domains, including image synthesis, 3D geometry modeling, and functional generation. Its reduced computational requirements and improved stability make it suitable for applications in resource-constrained environments and large-scale generative tasks.
View on arXiv

UNSO: Unified Newton Schulz Orthogonalization

Chen Hu, Qianxi Zhao, Yuming Li, Mingyu Zhou, Xiyin Li
  • UNSO replaces the iterative structure of traditional Newton-Schulz methods with a unified framework, reducing computational overhead.
  • The method introduces a polynomial with learnable coefficients, optimized to improve convergence and stability.
  • UNSO eliminates insignificant terms in the matrix power expansion, further enhancing efficiency.
  • The approach achieves superior performance compared to existing NS-based methods, both in terms of stability and computational cost.
  • The method is applicable to a wide range of tasks requiring orthogonalization, such as neural networks and Riemannian optimization.
Read More
Abstract
This paper introduces Unified Newton-Schulz Orthogonalization (UNSO), a novel framework designed to improve the efficiency and stability of the Newton-Schulz (NS) iteration, a widely used method for orthogonalization in machine learning and optimization tasks. Traditional NS iterations suffer from inefficiencies due to repeated matrix multiplications and instability under certain conditions. UNSO addresses these issues by consolidating the iterative structure into a single unified operation, avoiding the computational overhead of iterative steps. The authors analyze the contributions of different matrix powers in the NS iteration, eliminate insignificant terms, and propose a polynomial with learnable coefficients to optimize performance. This approach not only simplifies the computation but also ensures stable convergence. The proposed method outperforms existing NS-based techniques in terms of efficiency and accuracy, as demonstrated through theoretical analysis and empirical results. The authors provide an open-source implementation of UNSO for further research and application.
Methodology
The authors propose a unified framework that consolidates the iterative steps of the Newton-Schulz method into a single operation. They analyze the role of each matrix power in the NS iteration, remove negligible terms, and introduce a polynomial with learnable coefficients. The coefficients are optimized to ensure stable convergence and efficient computation. The input matrix is preprocessed by scaling its singular values into the range (0, 1), and the unified operation is applied to achieve orthogonalization.
Results
UNSO demonstrates significant improvements in efficiency and stability compared to traditional NS iterations and their variants. The proposed method reduces computational complexity by avoiding repeated matrix multiplications and achieves stable convergence across various scenarios. Empirical evaluations show that UNSO outperforms existing methods in terms of both accuracy and runtime.
Implications
UNSO has broad implications for tasks requiring efficient and stable orthogonalization, such as neural network training, Riemannian optimization, and other machine learning applications. By reducing computational overhead and improving stability, UNSO can enable faster and more reliable optimization in large-scale systems.
View on arXiv

Unveiling Covert Toxicity in Multimodal Data via Toxicity Association Graphs: A Graph-Based Metric and Interpretable Detection Framework

Guanzong Wu, Zihao Zhu, Siwei Lyu, Baoyuan Wu
  • Introduces Toxicity Association Graphs (TAGs) to model semantic associations and detect both overt and covert toxicity in multimodal data.
  • Proposes the Multimodal Toxicity Covertness (MTC) metric to quantify hidden toxicity in multimodal expressions.
  • Develops the Covert Toxic Dataset (CTD), the first benchmark dataset for high-covertness toxic multimodal content.
  • Demonstrates superior performance of the proposed framework compared to existing methods in detecting both overt and covert toxicity.
  • Ensures interpretability and transparency in toxicity detection through explicit reasoning pathways.
Read More
Abstract
This paper addresses the challenge of detecting covert toxicity in multimodal data, where harmful meanings emerge from subtle associations between modalities such as text and images. The authors propose a novel framework based on Toxicity Association Graphs (TAGs) to systematically model semantic associations between benign and toxic concepts. They introduce a new metric, Multimodal Toxicity Covertness (MTC), to quantify the degree of hidden toxicity in multimodal expressions. Additionally, the authors construct the Covert Toxic Dataset (CTD), the first benchmark dataset specifically designed to evaluate high-covertness toxic multimodal instances. Extensive experiments demonstrate that the proposed approach outperforms existing methods in detecting both overt and covert toxicity while maintaining interpretability and transparency in decision-making. This work represents a significant advancement in explainable multimodal toxicity detection and provides a foundation for future research in context-aware and interpretable AI systems.
Methodology
The authors propose a graph-based framework leveraging Toxicity Association Graphs (TAGs) to model semantic relationships between benign and toxic concepts in multimodal data. They introduce the Multimodal Toxicity Covertness (MTC) metric to quantify the degree of hidden toxicity. The framework is validated using the newly constructed Covert Toxic Dataset (CTD), which contains high-covertness toxic multimodal instances. Extensive experiments are conducted to compare the proposed method with existing approaches in terms of accuracy and interpretability.
Results
The proposed framework outperforms existing methods in detecting both overt and covert toxicity across various datasets, including the newly introduced Covert Toxic Dataset. The Multimodal Toxicity Covertness (MTC) metric effectively quantifies hidden toxicity, and the framework provides interpretable and transparent reasoning pathways for its decisions.
Implications
This work advances the field of multimodal toxicity detection by addressing the challenge of identifying covert toxicity, which has been underexplored in prior research. The proposed framework and dataset can be used to develop more robust and explainable AI systems for content moderation, online safety, and combating harmful content on social media platforms. The interpretability of the framework also enhances trust and accountability in automated decision-making systems.
View on arXiv