AI-generated summaries
Today's ML research,
without the noise.
Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.
24
Papers today
8h
Update frequency
7
Days of history
Borrowed Geometry: Computational Reuse of Frozen Text-Pretrained Transformer Weights Across Modalities
Multimodal
Robotics
Large Language Models
- Frozen text-pretrained transformer weights can be reused across different modalities.
- Significant performance gains were observed in robotic manipulation tasks using the frozen weights.
- The study identifies specific attention heads that are crucial for task performance across modalities.
- The methodology demonstrates that pretrained weights can serve as a general computational substrate.
Read more
Borrowed Geometry: Computational Reuse of Frozen Text-Pretrained Transformer Weights Across Modalities
Summary
This paper investigates the potential of reusing frozen weights from a text-pretrained transformer model, specifically the Gemma 4 31B, across different modalities without modification. The author proposes that these frozen weights can serve as a general computational substrate that can be adapted for tasks outside of their original training domain. The study employs a thin trainable interface to facilitate this transfer. The results demonstrate significant performance improvements in various tasks, including robotic manipulation and decision-making in continuous control environments. The findings suggest that the pretrained weights carry transferable computational primitives that can be effectively utilized in non-text domains, supporting the hypothesis that deep learning architectures can generalize across different types of data. The paper also highlights the importance of specific attention heads in the model that are critical for task performance, indicating a structured approach to understanding the model's capabilities.
Methodology
The research employs a frozen version of the Gemma 4 31B transformer model pretrained on text data, utilizing a thin trainable interface to adapt the model for various tasks. The experiments include performance evaluations on robotic manipulation tasks and decision-making scenarios, with a focus on analyzing the contributions of specific attention heads through dual-measurement protocols.
Results
The paper reports a +4.33 point improvement over the state-of-the-art in a robotic manipulation task, achieving parity with Decision-Transformer models while using significantly fewer trainable parameters. Additionally, a linear interface around the frozen weights achieved a per-bit error rate of 0.0505 in associative recall tasks, outperforming a from-scratch trained transformer model by an 8.7x margin.
Implications
The findings suggest that pretrained transformer models can be effectively adapted for a wide range of tasks beyond their original training scope, potentially reducing the need for extensive retraining and resource investment. This approach could lead to more efficient model deployment in various applications, particularly in robotics and decision-making systems.
What Physics do Data-Driven MoCap-to-Radar Models Learn?
Interpretability
- Introduction of a physics-based interpretability framework for MoCap-to-radar models.
- Development of two metrics to evaluate physical consistency without requiring ground truth radar data.
- Demonstration that low reconstruction error does not guarantee physical consistency.
- Identification of temporal attention as a critical factor for transformer models in learning physics.
Read more
What Physics do Data-Driven MoCap-to-Radar Models Learn?
Summary
This paper investigates whether data-driven models that convert motion capture (MoCap) data into radar micro-Doppler spectrograms genuinely learn the underlying physics of radar signals. The authors introduce a physics-based interpretability framework that employs two complementary metrics: one assesses the alignment of model predictions with physics-derived Doppler frequencies, while the other evaluates the preservation of the velocity-frequency relationship under velocity interventions. The study reveals that low reconstruction error does not necessarily indicate physical consistency, as some models with similar error rates can perform differently on the proposed metrics. Additionally, the research highlights the importance of temporal attention in transformer-based models for capturing the underlying physics of the data. The findings suggest that while data-driven models can produce visually plausible results, they may not always reflect a true understanding of the physical principles governing radar signal generation.
Methodology
The authors propose a physics-based interpretability framework that utilizes two metrics to assess the physical consistency of MoCap-to-radar models. These metrics compare model predictions against physics-derived Doppler frequencies and evaluate the preservation of the velocity-frequency relationship. The framework is applied across various model architectures, including transformer-based models, to analyze their performance in terms of physical understanding.
Results
The experiments reveal that models achieving low reconstruction error do not necessarily exhibit physical consistency. Some models demonstrate significant discrepancies in the proposed metrics despite comparable Mean Absolute Errors (MAE). Furthermore, ablation studies indicate that transformer models without temporal attention struggle to achieve physical consistency, underscoring its importance in learning the underlying physics.
Implications
The findings have significant implications for the development of data-driven models in radar applications, particularly in enhancing the interpretability and reliability of synthesized micro-Doppler signatures. This research could lead to improved methodologies for human activity recognition, health monitoring, and other radar-based sensing applications by ensuring that models not only produce plausible outputs but also adhere to the underlying physical laws.
A Dirac-Frenkel-Onsager principle: Instantaneous residual minimization with gauge momentum for nonlinear parametrizations of PDE solutions
Optimization
Theory
- Introduces the Dirac-Frenkel-Onsager principle to address non-uniqueness in parameter dynamics.
- Utilizes a history variable interpreted as momentum to promote smooth parameter evolution.
- Maintains the instantaneous residual minimization property of the Dirac-Frenkel principle.
- Demonstrates increased robustness in challenging computational regimes.
Read more
A Dirac-Frenkel-Onsager principle: Instantaneous residual minimization with gauge momentum for nonlinear parametrizations of PDE solutions
Summary
This paper introduces the Dirac-Frenkel-Onsager (DFO) principle, which addresses the challenges of non-unique parameter dynamics and ill-conditioning in the context of nonlinear parametrizations of partial differential equations (PDEs). The authors interpret the non-uniqueness of parameter dynamics as gauge freedom, allowing for the selection of better-conditioned parameter velocities. By incorporating a history variable that acts as a momentum term, the DFO principle promotes smooth temporal evolution of parameters while maintaining the instantaneous residual minimization characteristic of the Dirac-Frenkel principle. This approach contrasts with traditional regularization methods that may introduce bias. The authors demonstrate the effectiveness of the DFO principle through examples, showing increased robustness in singular and near-singular regimes, thereby enhancing the stability of neural network-based PDE solvers.
Methodology
The authors leverage the Dirac-Frenkel variational principle to minimize the PDE residual instantaneously while addressing the non-uniqueness of parameter dynamics by interpreting it as gauge freedom. They introduce a history variable that optimally updates parameter velocities using Onsager’s principle, ensuring smooth transitions and preserving the optimality condition.
Results
The proposed DFO dynamics successfully mitigate issues of ill-conditioning and non-uniqueness, leading to smoother and more stable parameter evolutions in neural network-based PDE solvers. The examples provided in the paper illustrate the robustness of the DFO approach in singular and near-singular scenarios.
Implications
The DFO principle has significant implications for the development of more reliable and efficient neural network models for solving PDEs, particularly in high-dimensional and complex systems. It may enhance the performance of various applications in computational physics, engineering, and other fields that rely on accurate PDE solutions.
Caracal: Causal Architecture via Spectral Mixing
NLP
Large Language Models
Efficient ML
- Caracal introduces a Multi-Head Fourier (MHF) module that replaces traditional attention mechanisms, achieving O(L log L) complexity.
- The architecture employs frequency-domain causal masking to enforce autoregressive capabilities, addressing a critical barrier for Fourier-based models.
- Caracal eliminates the need for explicit positional encodings by leveraging the inherent properties of the Fourier Transform.
- The model demonstrates competitive performance against Transformer and SSM baselines while maintaining portability and ease of implementation.
Read more
Caracal: Causal Architecture via Spectral Mixing
Summary
The paper introduces Caracal, a novel architecture designed to enhance the scalability of Large Language Models (LLMs) for long sequences by addressing the quadratic cost of attention and the limitations of positional encodings. Caracal replaces the traditional attention mechanism with a Multi-Head Fourier (MHF) module, which operates with O(L log L) complexity, leveraging the Fast Fourier Transform (FFT) for efficient sequence mixing. The authors propose a frequency-domain causal masking technique that enforces autoregressive capabilities through asymmetric padding and truncation, overcoming previous challenges faced by Fourier-based generative models. Unlike other efficient models that depend on hardware-specific implementations, Caracal utilizes standard library operators, ensuring robust portability. The architecture retains a small number of attention layers constrained to a sliding window mechanism, allowing for local feature extraction while maintaining global positional information. Evaluations show that Caracal performs competitively with state-of-the-art Transformer and State Space Model (SSM) baselines, providing a scalable and efficient approach for long-sequence modeling.
Methodology
The authors developed the Multi-Head Fourier (MHF) module to mix token information in the frequency domain, replacing the attention layer in Transformers. They implemented a frequency-domain causal masking technique to ensure autoregressive capabilities and retained a limited number of attention layers for local feature extraction, all while ensuring the architecture's overall computational efficiency.
Results
Caracal achieved performance that is competitive with state-of-the-art Transformer and SSM baselines across various benchmarks. Its O(L log L) complexity is more efficient than the O(L^2) of traditional Transformers and slightly above the O(L) scaling of SSMs, while avoiding hardware-specific optimization challenges.
Implications
Caracal's architecture offers a scalable and efficient pathway for long-sequence modeling in various applications, particularly in natural language processing and generative tasks, while ensuring ease of deployment across different hardware environments.
AlphaInventory: Evolving White-Box Inventory Policies via Large Language Models with Deployment Guarantees
Large Language Models
Reinforcement Learning
Optimization
- AlphaInventory integrates large language models with reinforcement learning for evolving inventory policies.
- The framework provides statistical safety guarantees for policy deployment in dynamic environments.
- A theoretical interface connects training, inference, and deployment, characterizing performance gaps.
- Empirical results show AlphaInventory outperforms traditional and deep learning inventory strategies.
Read more
AlphaInventory: Evolving White-Box Inventory Policies via Large Language Models with Deployment Guarantees
Summary
This paper presents AlphaInventory, an innovative framework that leverages large language models (LLMs) to evolve inventory policies in dynamic, non-stationary environments. The authors address the limitations of existing inventory decision-making methods, which typically fall into classical theory or black-box approaches. AlphaInventory combines the interpretability of white-box methods with the adaptability of data-driven techniques. The framework utilizes reinforcement learning to train LLMs on historical demand data and additional features, generating statistically safe inventory policies for future deployment. A key contribution is the establishment of a theoretical interface that links training, inference, and deployment, allowing for the quantification of the performance gap between evolved policies and an oracle-safe benchmark. The framework is tested on both synthetic and real-world retail data, demonstrating superior performance compared to classical and deep learning methods. The results indicate that AlphaInventory not only improves upon existing benchmarks but also provides a pathway for discovering new inventory strategies, enhancing both academic research and practical applications in inventory management.
Methodology
The authors developed an end-to-end framework that trains a large language model using reinforcement learning, incorporating both demand data and additional features. The framework employs a confidence interval-based iterative reasoning and screening process to propose and evaluate candidate strategies, ensuring the final output is a white-box inventory policy with deployment guarantees.
Results
AlphaInventory demonstrated superior performance on both synthetic and real-world retail datasets, outperforming classical inventory policies and deep learning methods. The framework successfully evolved new inventory strategies that improved upon existing benchmarks, particularly in a single-sourcing lost-sales system scenario.
Implications
The findings suggest that AlphaInventory can significantly enhance inventory management practices across various sectors by providing interpretable, adaptable, and statistically safe policies. The framework's ability to discover new inventory strategies may also contribute to advancing inventory theory and practices in operations management.
Learning physically grounded traffic accident reconstruction from public accident reports
Multimodal
- Introduces a multimodal learning framework for traffic accident reconstruction from public reports.
- Develops the CISS-REC dataset with 6,217 real-world accident cases.
- Achieves improved reconstruction fidelity, including accident point accuracy and collision consistency.
- Demonstrates the potential of using public accident reports as scalable data for traffic safety analysis.
Read more
Learning physically grounded traffic accident reconstruction from public accident reports
Summary
This paper addresses the challenge of reconstructing traffic accidents using publicly available accident reports, which are often narrative in nature and lack detailed physical measurements. The authors propose a novel framework that formulates accident reconstruction as a parameterized multimodal learning problem. They introduce the CISS-REC dataset, comprising 6,217 real-world accident cases sourced from the NHTSA Crash Investigation Sampling System. The framework connects the semantics of accident reports to road topology and participant attributes, enabling the reconstruction of pre-impact vehicle trajectories and collision dynamics through localized geometric reasoning and temporal allocation. The proposed method demonstrates superior performance compared to existing baselines, achieving high fidelity in accident point accuracy and collision consistency. The findings suggest that public accident reports can be leveraged as scalable resources for quantitatively verifiable accident reconstruction, offering significant potential for enhancing traffic safety analysis, simulation, and autonomous driving research.
Methodology
The authors formulate accident reconstruction as a weakly supervised multimodal inverse problem, where accident reports serve as supervisory signals. The framework grounds report semantics to road topology and participant interactions, allowing for the reconstruction of vehicle motion prior to impact through geometric reasoning and temporal allocation.
Results
The proposed method outperforms representative baselines on the CISS-REC dataset, achieving the highest overall reconstruction fidelity. This includes significant improvements in the accuracy of accident point localization and the consistency of collision dynamics, demonstrating the effectiveness of the approach.
Implications
The ability to reconstruct traffic accidents from public reports has important implications for traffic safety analysis, enabling more effective simulations and research in autonomous driving. This approach could facilitate better understanding of accident dynamics and inform policy and design decisions in transportation systems.
Uniform-Correct Policy Optimization: Breaking RLVR's Indifference to Diversity
Reinforcement Learning
Optimization
Large Language Models
- Identifies a fundamental limitation in RLVR objectives regarding probability mass distribution among correct solutions.
- Proposes Uniform-Correct Policy Optimization (UCPO) to address diversity collapse in RLVR.
- Theoretically characterizes the optimal policy structure using robustness and entropy-regularized optimality criteria.
- Demonstrates significant improvements in Pass@K and diversity metrics without compromising Pass@1 performance.
Read more
Uniform-Correct Policy Optimization: Breaking RLVR's Indifference to Diversity
Summary
This paper addresses a critical limitation in Reinforcement Learning with Verifiable Rewards (RLVR), where improvements in single-attempt accuracy (Pass@1) often lead to a decline in multi-sample coverage (Pass@K), indicating a collapse in diversity. The authors identify that common RLVR objectives, such as GRPO, do not specify how probability mass should be distributed among correct solutions, resulting in a self-reinforcing collapse where probability mass concentrates on a narrow subset of correct outputs. To tackle this issue, they propose Uniform-Correct Policy Optimization (UCPO), which introduces a conditional uniformity penalty to encourage a more uniform distribution of probability mass among correct solutions. The paper theoretically characterizes the optimal policy structure under robustness and entropy-regularized optimality criteria, identifying the Uniform-Correct Policy as the unique optimal solution. Empirical results demonstrate that UCPO significantly improves Pass@K and diversity while maintaining competitive Pass@1 across various models and benchmarks, achieving notable improvements in performance metrics.
Methodology
The authors analyze the limitations of existing RLVR objectives and propose UCPO, which incorporates a uniformity penalty to redistribute gradient signals towards underrepresented correct solutions. They validate their findings through theoretical analysis and empirical experiments across multiple models and reasoning benchmarks.
Results
UCPO consistently improves Pass@K and diversity metrics while maintaining competitive Pass@1 performance. Notably, it achieves up to +10% absolute improvement on AIME24 at Pass@64 and up to 45% higher equation-level diversity within the correct set.
Implications
The findings suggest that UCPO can enhance the performance of RLVR systems in reasoning tasks, potentially leading to more robust and diverse outputs in applications such as mathematical reasoning and code generation.
Deep Kernel Learning for Stratifying Glaucoma Trajectories
Time Series
NLP
Multimodal
- Introduction of a hybrid architecture combining clinical-BERT embeddings with a DKL algorithm for predicting glaucoma patient trajectories.
- Identification of three clinically distinct patient subgroups, emphasizing the importance of trajectory over current disease severity.
- Demonstration of superior performance compared to standard time-series forecasting methods.
- Provision of calibrated uncertainty estimates to support clinical decision-making.
Read more
Deep Kernel Learning for Stratifying Glaucoma Trajectories
Summary
This paper addresses the challenge of stratifying patient risk in chronic diseases, specifically glaucoma, using a novel deep kernel learning (DKL) architecture that integrates a Gaussian Process (GP) backend. The proposed method utilizes a transformer-based feature extractor applied to clinical-BERT embeddings to analyze multimodal electronic health records (EHRs). The model successfully identifies three distinct patient subgroups based on their trajectories, revealing that a high-risk group may exhibit a worsening trajectory despite better average visual acuity compared to a stable but poor group. This decoupling of disease progression from current severity highlights the model's ability to predict future risks rather than merely assessing current states. The approach offers a significant advancement in clinical decision support, enabling targeted interventions for high-risk patients and improving glaucoma management. The methodology effectively handles the irregularity and sparsity of EHR data, outperforming traditional forecasting methods while providing calibrated uncertainty estimates, which can enhance clinical monitoring and intervention strategies.
Methodology
The authors developed a hybrid deep kernel learning architecture that processes EHR natural language features through clinical-BERT embeddings and employs a transformer-encoder feature extractor. This architecture is designed to predict patient visual acuity loss while addressing issues of data irregularity, sparsity, and high dimensionality without relying on imputation techniques.
Results
The proposed method achieved better hold-out test metrics on the SOURCE dataset compared to traditional recurrent neural networks and transformer-based forecasting methods. It successfully identified three clinically meaningful patient subgroups and provided calibrated uncertainty estimates, achieving an accuracy of 53.06% within 0.1 logMAR.
Implications
The findings suggest that the model can significantly enhance clinical decision support by identifying high-risk patients based on their progression trajectories, leading to more targeted interventions. The methodology's ability to quantify uncertainty in predictions can improve monitoring strategies for glaucoma and potentially other chronic conditions characterized by irregular follow-up and heterogeneous trajectories.
CRADIPOR: Crash Dispersion Predictor
Theory
Optimization
- CRADIPOR addresses the issue of numerical dispersion in automotive crash simulations.
- The proposed method combines Rank Reduction Autoencoder with supervised classification.
- RRAE outperforms Random Forest in identifying regions sensitive to numerical dispersion.
- Wavelet-based and slope-based signal representations are most effective for classification.
Read more
CRADIPOR: Crash Dispersion Predictor
Summary
The paper introduces CRADIPOR, a numerical tool designed to predict crash dispersion in automotive simulations. Traditional finite element (FE) crash models often yield inconsistent results due to numerical dispersion caused by parallel computations and model complexity. This inconsistency complicates engineering decisions, as performance criteria can vary significantly. The authors propose a solution that utilizes a Rank Reduction Autoencoder (RRAE) combined with supervised classification to identify areas sensitive to numerical dispersion without the need for repeated simulations. The study demonstrates that the RRAE framework outperforms a Random Forest baseline in detecting numerical dispersion. Among various signal representations tested, wavelet-based and slope-based inputs were found to be particularly effective, with slope variations yielding the best classification performance. The findings highlight the potential of structured latent representations to enhance numerical dispersion detection, thereby improving the reliability of crash simulation post-processing.
Methodology
The methodology involves using a Rank Reduction Autoencoder (RRAE) to reduce the dimensionality of the input data and supervised classification techniques to identify regions sensitive to numerical dispersion in crash simulations. The study compares the performance of the RRAE framework against a Random Forest baseline using various signal representations, including wavelet-based and slope-based inputs.
Results
The results indicate that the RRAE-based framework is more effective than the Random Forest approach in detecting numerical dispersion. Among the tested signal representations, slope variations provided the highest classification performance, suggesting that structured latent representations can significantly enhance the detection of numerical dispersion in crash simulations.
Implications
The implications of this research are significant for the automotive industry, as CRADIPOR can improve the reliability of crash simulation results, leading to better-informed design decisions. By quantifying numerical dispersion, engineers can avoid suboptimal design choices that may arise from relying on single simulation outcomes.
AdaMeZO: Adam-style Zeroth-Order Optimizer for LLM Fine-tuning Without Maintaining the Moments
Large Language Models
Optimization
Efficient ML
- AdaMeZO leverages Adam-style moment estimates without storing them, significantly reducing memory requirements.
- The optimizer achieves faster convergence compared to MeZO, requiring up to 70% fewer forward passes.
- Theoretical convergence bounds are established, showing AdaMeZO's effectiveness in non-convex scenarios.
- Extensive experiments validate AdaMeZO's superior performance across multiple LLM architectures.
Read more
AdaMeZO: Adam-style Zeroth-Order Optimizer for LLM Fine-tuning Without Maintaining the Moments
Summary
The paper introduces AdaMeZO, a novel zeroth-order optimizer designed for fine-tuning large language models (LLMs) without the high memory costs associated with traditional backpropagation methods. Existing methods like MeZO reduce memory usage by relying solely on forward passes but suffer from slower convergence due to their insensitivity to loss landscapes. In contrast, AdaMeZO employs Adam-style first- and second-moment estimates to enhance convergence speed while avoiding the memory overhead of storing these moments. The authors provide a theoretical analysis of AdaMeZO, demonstrating its convergence properties and efficiency. Extensive experiments show that AdaMeZO outperforms MeZO, achieving up to 70% fewer forward passes while maintaining or improving performance across various LLMs, including RoBERTa, OPT, and LLaMa. The findings suggest that AdaMeZO effectively balances memory efficiency and optimization speed, making it a promising approach for fine-tuning LLMs in resource-constrained environments.
Methodology
AdaMeZO utilizes zeroth-order gradient estimates combined with truncated first- and second-moment approximations to guide optimization without retaining historical gradient information in memory. This is achieved through block-wise random gradient direction generation and a pseudo-random number generator (PRNG) for efficient updates.
Results
AdaMeZO demonstrated superior performance in fine-tuning tasks, achieving convergence with significantly fewer forward passes compared to MeZO. In experiments, it reached similar termination conditions as MeZO while using up to 70% fewer forward passes across various models, including RoBERTa, OPT, and LLaMa.
Implications
The development of AdaMeZO has significant implications for fine-tuning large language models, particularly in scenarios where memory resources are limited. Its efficiency could enable broader accessibility and application of LLMs in various domains, including mobile and edge computing.
Aitchison Embeddings for Learning Compositional Graph Representations
Graph Learning
- Introduction of Aitchison Compositional Graph embeddings (AICoG) for graph representation learning.
- Use of isometric log-ratio (ILR) coordinates to preserve Aitchison distances and enable optimization in Euclidean space.
- Enhanced interpretability through a geometric notion of roles based on compositional latent space.
- Subcompositional coherence allows for principled component restriction and analysis of archetype influence.
Read more
Aitchison Embeddings for Learning Compositional Graph Representations
Summary
This paper introduces Aitchison Compositional Graph embeddings (AICoG), a novel framework for graph representation learning that leverages Aitchison geometry to provide interpretable embeddings of graph nodes. Traditional graph embedding methods often lack interpretability, as they do not adequately represent the role-based nature of nodes in many real-world networks. AICoG addresses this by modeling nodes as compositions over latent archetypal factors, allowing for a role-mixture view of nodes. The embeddings are generated using isometric log-ratio (ILR) coordinates, which maintain Aitchison distances and facilitate optimization in Euclidean space. This approach not only enhances interpretability but also supports coherent behavior under component restrictions. The authors demonstrate that AICoG achieves competitive performance in node classification and link prediction tasks while providing intrinsic explainability. The framework also allows for subcompositional coherence, enabling the analysis of how different archetype groups influence node representations and predictions. The results indicate that AICoG can effectively balance predictive performance with interpretability, making it a significant advancement in graph learning methodologies.
Methodology
The authors propose a framework that models graph nodes as compositions over latent archetypal factors, utilizing Aitchison geometry for comparison. The embeddings are computed using isometric log-ratio (ILR) coordinates, which facilitate optimization while preserving the intrinsic geometry of the compositions. The framework supports both fixed and learnable ILR bases and incorporates subcompositional coherence to analyze the influence of archetype groups.
Results
AICoG demonstrates competitive predictive performance in node classification and link prediction tasks when compared to established baselines. The method provides intrinsic explainability by construction, rather than relying on post-hoc analysis, and allows for effective exploration of how different archetypes contribute to node representations.
Implications
The AICoG framework has potential applications in various domains involving graph-structured data, such as social networks, telecommunications, and bioinformatics. Its ability to provide interpretable embeddings while maintaining competitive performance could enhance the understanding of complex networks and improve decision-making processes based on graph data.
Unlearning Offline Stochastic Multi-Armed Bandits
Reinforcement Learning
Theory
Efficient ML
- First study of unlearning in offline stochastic multi-armed bandits.
- Formalization of privacy constraints and utility measurement in decision-making.
- Development of adaptive algorithms combining Gaussian mechanism and rollback.
- Theoretical performance guarantees and lower bounds established.
Read more
Unlearning Offline Stochastic Multi-Armed Bandits
Summary
This paper addresses the challenge of machine unlearning in the context of offline stochastic multi-armed bandits (MAB), a foundational problem in sequential decision-making. The authors formalize a privacy constraint for offline MAB and propose a framework to measure utility based on post-unlearning decision quality. They systematically investigate both single- and multi-source unlearning scenarios under two data-generation models: the fixed-sample model and the distribution model. The proposed algorithms are built on two canonical base algorithms, the Gaussian mechanism and rollback, and adaptively switch between them based on the data regime and privacy constraints. The paper also introduces a mixing procedure to clarify the rationale behind these algorithms. Theoretical performance guarantees are provided, along with lower bounds for both dataset models. Experimental results validate the predicted trade-offs between privacy and decision quality, demonstrating the effectiveness of the proposed methods in achieving unlearning without full retraining.
Methodology
The authors propose a framework for (ε, δ)-unlearning in offline stochastic MAB, utilizing two base algorithms: the Gaussian mechanism and rollback. They adaptively switch between these algorithms based on the data regime and privacy constraints. The study includes theoretical analysis of performance guarantees and lower bounds, as well as empirical experiments to validate the proposed methods.
Results
The paper provides upper and lower bounds for the performance of the unlearning algorithms under both fixed-sample and distribution models. The results indicate that the proposed methods effectively balance the trade-offs between privacy and decision quality, confirming the utility of the unlearning algorithms in practical scenarios.
Implications
The findings have significant implications for privacy-preserving machine learning, particularly in applications where data deletion requests are common, such as recommendation systems and online advertising. The proposed unlearning framework can enhance user privacy while maintaining decision-making quality in sequential systems.
The Power of Order: Fooling LLMs with Adversarial Table Permutations
NLP
Large Language Models
Optimization
- LLMs exhibit significant vulnerability to the layout of tabular data, leading to inconsistent outputs.
- The Adversarial Table Permutation (ATP) attack is introduced as a method to systematically identify harmful permutations.
- Extensive experiments show that ATP can degrade the performance of a wide range of LLMs, regardless of their size or architecture.
- The study reveals a fundamental weakness in the 'linearize-then-prompt' paradigm used in current TQA tasks.
Read more
The Power of Order: Fooling LLMs with Adversarial Table Permutations
Summary
This paper investigates the vulnerability of Large Language Models (LLMs) to the structural arrangement of tabular data, particularly in the context of Table Question Answering (TQA). The authors demonstrate that semantically-invariant permutations of rows and columns can lead to incorrect or inconsistent outputs from LLMs, despite the underlying information remaining unchanged. To systematically explore this issue, they introduce the Adversarial Table Permutation (ATP) attack, a gradient-based method that identifies the worst-case permutations that disrupt model performance. Through extensive experiments, the authors reveal that ATP significantly degrades the performance of various LLMs, highlighting a critical design flaw in how these models process structured data. The findings emphasize the need for developing more robust models capable of handling tabular inputs reliably in real-world applications.
Methodology
The authors formalize the vulnerability of LLMs to row and column permutations in tabular inputs and introduce the ATP attack, which employs a gradient-based approach to find the worst-case permutations that maximize disruption to model performance. The methodology includes a systematic exploration of the attack space for tabular inputs and extensive empirical testing across various LLMs.
Results
The results indicate that ATP consistently uncovers permutations that significantly degrade the performance of LLMs across different architectures and sizes. The findings demonstrate that even random shuffling of rows and columns can lead to substantial drops in prediction consistency and accuracy, revealing a pervasive vulnerability in current models.
Implications
The implications of this research are significant for the deployment of LLMs in critical applications involving tabular data. The findings underscore the necessity for developing more robust models that can maintain performance despite structural changes in input data, which is crucial for ensuring reliability in high-stakes environments.
RunAgent: Interpreting Natural-Language Plans with Constraint-Guided Execution
NLP
Large Language Models
Interpretability
- RunAgent integrates natural language processing with structured programming constructs for reliable task execution.
- The platform autonomously generates and validates constraints for each step of the workflow.
- RunAgent supports dynamic selection of execution strategies, enhancing flexibility and accuracy.
- Human-in-the-loop features allow for user specification and feedback, improving the auditing process.
Read more
RunAgent: Interpreting Natural-Language Plans with Constraint-Guided Execution
Summary
The paper introduces RunAgent, a multi-agent platform designed to interpret and execute natural-language plans with a focus on structured workflow execution. Unlike traditional large language models (LLMs), which often struggle with reliability in executing complex tasks, RunAgent combines the expressiveness of natural language with the determinism of programming through an agentic language featuring explicit control constructs such as IF, GOTO, and FORALL. The platform autonomously derives and validates constraints for each task step, ensuring that outputs are syntactically and semantically verified. RunAgent dynamically selects execution strategies, including LLM-based reasoning, tool usage, and code generation, while incorporating mechanisms for error correction. The architecture supports human-in-the-loop operations, allowing users to specify workflows and provide feedback. Evaluations on the Natural-plan and SciBench datasets demonstrate that RunAgent significantly outperforms baseline LLMs and state-of-the-art plan generation methods, showcasing its effectiveness in executing complex workflows reliably.
Methodology
RunAgent employs a novel agentic language that combines natural language expressiveness with formal programming constructs. It autonomously generates constraints based on task descriptions, verifies outputs at each step, and selects appropriate execution strategies. The platform also filters context history to retain only relevant information during execution.
Results
RunAgent was evaluated against baseline LLMs and state-of-the-art plan generation methods on the Natural-plan and SciBench datasets. The results indicate that RunAgent outperforms these benchmarks, demonstrating its capability to execute complex workflows with higher reliability and accuracy.
Implications
RunAgent's approach can enhance the development of AI agents capable of executing complex tasks in various domains, including business automation, robotics, and interactive systems. Its ability to bridge natural language and structured programming may facilitate broader adoption of AI technologies by non-experts.
Possibilistic Predictive Uncertainty for Deep Learning
Theory
Efficient ML
- Introduction of a new framework for epistemic uncertainty modeling using possibility theory.
- Derivation of a tractable implementation with closed-form solutions for efficient computation.
- Demonstration of competitive performance against leading uncertainty quantification methods across diverse datasets.
Read more
Possibilistic Predictive Uncertainty for Deep Learning
Summary
This paper addresses the challenge of epistemic uncertainty in deep learning, which arises from the models' overconfidence in their predictions on unseen data. Traditional Bayesian methods for uncertainty quantification are often computationally expensive, while alternative second-order predictors lack rigorous theoretical foundations. To bridge this gap, the authors propose a novel framework called Dirichlet-approximated possibilistic posterior predictions (DAPPr), which utilizes possibility theory to model uncertainty. DAPPr defines a possibilistic posterior over model parameters and projects this posterior into the prediction space using supremum operators, approximating it with learnable Dirichlet possibility functions. This approach results in a training objective that is computationally efficient and provides closed-form solutions. The authors demonstrate the effectiveness of DAPPr through extensive experiments across various benchmarks, showing that it achieves competitive or superior uncertainty quantification compared to state-of-the-art evidential deep learning methods while maintaining theoretical rigor.
Methodology
The proposed DAPPr framework leverages possibility theory to define a possibilistic posterior over parameters, which is projected to the prediction space using supremum operators. The projected posterior is approximated with learnable Dirichlet possibility functions, resulting in a simple training objective that can be optimized efficiently.
Results
The experiments conducted show that DAPPr achieves competitive or superior performance in uncertainty quantification when compared to existing state-of-the-art methods. This includes evaluations on standard datasets, long-tailed distributions, distribution shift detection, and fine-grained classification tasks.
Implications
The DAPPr framework has significant implications for applications requiring reliable uncertainty quantification, such as autonomous driving and medical diagnosis, where understanding model confidence is critical for safe deployment.
Temporal Data Requirement for Predicting Unplanned Hospital Readmissions
Time Series
NLP
Efficient ML
- Shorter observation windows (3-6 months) are optimal for predicting readmissions using clinical notes.
- Structured data models improve with longer observation periods but plateau after 12 months.
- The study challenges the assumption that more historical data always leads to better predictive performance.
- Different data types (structured vs. unstructured) require distinct approaches for optimal model performance.
Read more
Temporal Data Requirement for Predicting Unplanned Hospital Readmissions
Summary
This study investigates the optimal observation window for predicting 30-day unplanned hospital readmissions following hip and knee arthroplasties using Electronic Health Record (EHR) data. The authors analyze the impact of varying time windows, ranging from the day of surgery to three years prior, on model performance. They utilize a dataset comprising over 4 million encounter records and 80,000 clinical notes from 7,174 patients, employing both structured and unstructured data. Various encoding techniques, including non-neural (Bag of Words, TF-IDF) and neural encoders (BERT, BiLSTM), are applied to extract meaningful information from clinical notes. The findings reveal that shorter observation windows (three to six months) yield better predictive performance for models using clinical notes, while structured data benefits from longer windows, plateauing after twelve months. This challenges the conventional belief that more historical data always enhances model accuracy, suggesting that the optimal time window varies based on data type.
Methodology
The study employs a retrospective analysis of EHR data from 10,534 patients, focusing on those who underwent hip or knee surgeries. After filtering, 7,174 patients' data were analyzed, including structured data and clinical notes. Various encoding techniques were used to extract features from the data, and machine learning models were developed to predict 30-day readmissions based on different observation windows.
Results
The results indicate that models using clinical notes perform best with a shorter observation window (3-6 months), while structured data models show improved performance with longer windows, plateauing after 12 months. These patterns were consistent across both non-neural and neural encoders, highlighting the importance of data type in determining optimal historical data length for predictive modeling.
Implications
The findings have significant implications for healthcare predictive modeling, suggesting that optimizing the observation window can enhance model accuracy while reducing data storage and processing requirements. This can lead to more efficient use of EHR data in predicting patient outcomes and improving healthcare delivery.
High-Probability Convergence in Decentralized Stochastic Optimization with Gradient Tracking
Optimization
Federated Learning
Theory
- Introduces GT-DSGD, a decentralized optimization method incorporating gradient tracking.
- Achieves high-probability convergence under relaxed assumptions compared to traditional DSGD.
- Establishes optimal HP convergence rates for non-convex and Polyak-Lojasiewicz costs.
- Provides the first HP guarantees for decentralized optimization methods with bias-correction.
Read more
High-Probability Convergence in Decentralized Stochastic Optimization with Gradient Tracking
Summary
This paper investigates high-probability (HP) convergence guarantees in decentralized stochastic optimization, particularly focusing on the Decentralized Stochastic Gradient Descent (DSGD) algorithm enhanced with gradient tracking (GT). Traditional HP results for decentralized optimization often rely on stringent assumptions, such as bounded data heterogeneity and strong convexity, which limit their applicability. The authors introduce GT-DSGD, which incorporates gradient tracking techniques to achieve HP convergence under relaxed conditions, specifically with noise that meets a sub-Gaussian criterion. They demonstrate that GT-DSGD achieves optimal HP convergence rates for both non-convex and Polyak-Lojasiewicz costs, with rates of O(log(1/δ)/√(nT)) and O(log(1/δ)/nT), respectively. This work is significant as it provides the first HP guarantees for decentralized optimization methods that utilize bias-correction techniques, showing that GT-DSGD converges under the same conditions as mean-squared error (MSE) methods while maintaining comparable transient times. The theoretical findings are supported by numerical experiments on both real and synthetic datasets, which highlight the superior performance of GT-DSGD and the benefits of bias-correction in achieving HP convergence.
Methodology
The authors develop the GT-DSGD algorithm, which combines decentralized stochastic gradient descent with gradient tracking techniques. They analyze the convergence properties of this method under relaxed assumptions, specifically focusing on noise conditions that satisfy a sub-Gaussian criterion. The theoretical framework is built upon existing literature on HP convergence and MSE results, allowing for a comprehensive comparison of performance metrics.
Results
GT-DSGD achieves high-probability convergence rates of O(log(1/δ)/√(nT)) for non-convex costs and O(log(1/δ)/nT) for Polyak-Lojasiewicz costs. These results indicate that GT-DSGD converges under the same conditions as MSE methods while achieving comparable transient times. The numerical experiments confirm the theoretical findings, showcasing the effectiveness of the proposed method in practical scenarios.
Implications
The findings of this paper have significant implications for decentralized learning applications, such as federated learning, where multiple agents collaborate to train models. The ability to achieve high-probability convergence under relaxed assumptions enhances the robustness and applicability of decentralized optimization methods in real-world scenarios, particularly in environments with data heterogeneity.
Technical Report: Activation Residual Hessian Quantization (ARHQ) for Low-Bit LLM Quantization
Large Language Models
Efficient ML
Optimization
- ARHQ effectively mitigates error propagation in low-bit quantization of LLMs.
- The method isolates error-sensitive weight directions using a residual Hessian approach.
- Experimental results show improved SNR and reasoning performance in quantized models.
- ARHQ adapts to specific quantization hardware and conditions, enhancing robustness.
Read more
Technical Report: Activation Residual Hessian Quantization (ARHQ) for Low-Bit LLM Quantization
Summary
This technical report introduces Activation Residual Hessian Quantization (ARHQ), a novel post-training weight splitting method aimed at reducing error propagation in low-bit quantization of large language models (LLMs). The authors identify that traditional low-rank splitting methods do not adequately address the amplification of quantization residuals from input activations, which is a significant source of output error. ARHQ constructs an input-side residual Hessian from activation quantization residuals and isolates error-sensitive weight directions into a high-precision low-rank branch. This is achieved through a closed-form truncated singular value decomposition (SVD) of the scaled weight matrix. Experimental evaluations on the Qwen3-4B-Thinking-2507 model demonstrate that ARHQ significantly enhances layer-wise signal-to-noise ratio (SNR) and maintains downstream reasoning performance even under aggressive quantization conditions. The authors provide a comprehensive analysis of the optimization landscape and justify the use of the residual covariance matrix, which adapts to specific quantization scenarios, thereby improving the robustness of LLMs under low-bit quantization.
Methodology
ARHQ employs a post-training weight splitting technique that constructs a residual Hessian from activation quantization residuals. It isolates weight directions that amplify quantization noise and assigns them to a high-precision low-rank branch. The method uses closed-form truncated SVD to achieve a low-rank approximation of the weight matrix, focusing on minimizing the amplification of activation quantization noise.
Results
The experimental results indicate that ARHQ leads to a significant improvement in layer-wise SNR and preserves the reasoning performance of LLMs, even when subjected to aggressive quantization. The method demonstrates its effectiveness on the Qwen3-4B-Thinking-2507 model, showcasing its potential for practical applications in low-bit quantization scenarios.
Implications
The findings suggest that ARHQ can be utilized to enhance the performance of large language models in resource-constrained environments, making them more efficient without sacrificing fidelity. This has implications for deploying LLMs in real-world applications where computational resources are limited.
Revealing graph bandits for maximizing local influence
Graph Learning
Theory
Efficient ML
- Introduces a graph bandit framework that does not require prior knowledge of the graph structure.
- Proposes BARE, a bandit strategy that learns to identify influential nodes through limited feedback.
- Establishes a regret guarantee that scales with the detectable dimension rather than the number of nodes.
- Demonstrates the practical applicability of the method in marketing scenarios involving social networks.
Read more
Revealing graph bandits for maximizing local influence
Summary
This paper addresses a novel graph bandit problem where the objective is to identify the most influential node in a graph with minimal information. Unlike existing methods that require prior knowledge of the graph structure, the authors propose a sequential and active learning approach that allows the graph to be discovered progressively. The proposed method, BARE (Bandit Algorithm for Revealing influence), enables the learner to select nodes and receive stochastic feedback about the influence exerted by those nodes on their immediate neighbors. The authors establish a regret guarantee for BARE that scales with a new metric called the detectable dimension, which is often significantly smaller than the total number of nodes in the graph. This approach is particularly relevant for applications in marketing within social networks, where advertisers seek to identify and target influential users without extensive exploration of the entire graph. The paper highlights the advantages of leveraging graph structure to improve learning efficiency and reduce the number of required samples.
Methodology
The authors develop BARE, a bandit algorithm that operates in a setting where the graph is revealed incrementally. At each round, the learner selects a node and receives feedback on the set of influenced nodes. The algorithm is designed to optimize the selection process based on the local influence structure of the graph, allowing for efficient learning without needing to sample the entire graph.
Results
The paper provides theoretical guarantees for the BARE algorithm, showing that its regret scales with the detectable dimension, which is often much smaller than the total number of nodes. This indicates that BARE can effectively identify influential nodes with fewer samples compared to traditional methods that rely on complete graph knowledge.
Implications
The findings suggest that BARE can be effectively utilized in real-world applications such as targeted marketing and social network analysis, where understanding local influence is crucial. The ability to learn from limited information can lead to more efficient strategies in large-scale networks, reducing costs and improving outcomes for advertisers.
Generating Statistical Charts with Validation-Driven LLM Workflows
Multimodal
Large Language Models
- Introduces a structured workflow for generating statistical charts from tabular data.
- Emphasizes the importance of rendered-output validation to improve chart readability and semantic accuracy.
- Retains comprehensive multimodal representations for each generated chart, including code, context, and Q&A pairs.
- Demonstrates the workflow's effectiveness through the generation of 1,500 charts and evaluation of multimodal LLMs.
Read more
Generating Statistical Charts with Validation-Driven LLM Workflows
Summary
This paper addresses the challenge of generating diverse and readable statistical charts from tabular data using large language models (LLMs). The authors propose a structured workflow that decomposes the chart generation process into several stages: dataset screening, plot proposal, code synthesis, rendering, validation-driven refinement, description generation, and question-answer generation. This approach emphasizes the importance of validating rendered outputs to identify visualization-specific failures, such as readability and semantic mismatches, which are often not detectable from data or code alone. The workflow retains each generated chart along with its corresponding code, dataset context, descriptions, and question-answer pairs, allowing for a comprehensive multimodal representation. The authors applied this workflow to UCI datasets, producing 1,500 charts across 74 datasets and 24 chart families, paired with 30,003 question-answer pairs. The evaluation of 16 multimodal LLMs on these chart-question pairs revealed that while chart-syntax questions were nearly saturated, tasks involving value extraction, comparison, and reasoning remained challenging, highlighting the utility of the proposed workflow for diagnostic studies in chart-grounded multimodal reasoning.
Methodology
The authors developed a structured workflow that includes stages for dataset screening, plot proposal, code generation, rendering, validation-driven refinement, and the generation of descriptions and question-answer pairs. This iterative process allows for the inspection and correction of charts based on rendered outputs, ensuring higher quality visualizations.
Results
The workflow successfully generated 1,500 statistical charts from 74 datasets, covering 24 chart families and producing 30,003 question-answer pairs. Evaluation of 16 multimodal LLMs indicated that while syntax-related questions were well-handled, more complex reasoning tasks posed challenges, demonstrating the workflow's potential for enhancing multimodal reasoning capabilities.
Implications
The proposed workflow can significantly improve the generation of statistical visualizations in various applications, including data analysis, educational tools, and automated reporting systems. It also provides a framework for further research into multimodal reasoning and the capabilities of LLMs in interpreting complex visual data.
Odysseus: Scaling VLMs to 100+ Turn Decision-Making in Games via Reinforcement Learning
Reinforcement Learning
Multimodal
Robotics
- Introduces Odysseus, a framework for training VLMs in long-horizon decision-making tasks.
- Proposes a lightweight turn-level critic in PPO to improve training stability and efficiency.
- Demonstrates the advantages of pretrained VLMs in providing strong action priors.
- Achieves at least 3× improvement in game progress compared to frontier models.
Read more
Odysseus: Scaling VLMs to 100+ Turn Decision-Making in Games via Reinforcement Learning
Summary
This paper presents Odysseus, a novel framework for training vision-language models (VLMs) to perform long-horizon decision-making tasks in video games, specifically in Super Mario Land, requiring over 100 turns of interaction. The authors identify limitations in existing methods that either rely on large-scale supervised fine-tuning or apply reinforcement learning (RL) only in short-horizon settings. They propose an adapted version of Proximal Policy Optimization (PPO) with a lightweight turn-level critic, which enhances training stability and sample efficiency. The study demonstrates that pretrained VLMs can provide strong action priors, significantly improving sample efficiency and reducing the need for manual action design. Odysseus integrates supervised initialization with multi-task RL, achieving substantial performance improvements over baseline models and outperforming state-of-the-art models by at least three times in game progress. The trained agents also show robust generalization capabilities across different game levels and maintain performance on general-domain tasks. The findings highlight the potential for stable and effective RL training in long-horizon, multi-modal environments, paving the way for future advancements in embodied agents.
Methodology
The authors conducted a systematic investigation of algorithmic components necessary for fine-tuning VLMs via RL in long-horizon settings. They adapted PPO with a lightweight turn-level critic and employed positive-advantage filtering to enhance stability. The framework combines supervised initialization with multi-task RL to facilitate training.
Results
Odysseus achieved significant performance gains, with trained models showing at least 3× higher average game progress than leading models like GPT-5.4 and GLM-4.6V. The models also demonstrated consistent improvements in both in-game and cross-game generalization settings.
Implications
The findings suggest that RL can be effectively applied to train VLMs for complex decision-making tasks, which could lead to advancements in developing embodied agents capable of interacting with dynamic environments. This work lays a foundation for future research in RL training of multi-modal foundation models.
Scale-Aware Adversarial Analysis: A Diagnostic for Generative AI in Multiscale Complex Systems
Generative Models
Theory
Interpretability
- Introduction of a scale-aware diagnostic framework using Constrained Diffusion Decomposition (CDD).
- Demonstration of the limitations of traditional XAI methods in capturing physical causality.
- Evaluation of Denoising Diffusion Probabilistic Models (DDPM) under physical perturbations.
- Establishment of CDD-based scale-space continuity as a criterion for physically consistent deep learning.
Read more
Scale-Aware Adversarial Analysis: A Diagnostic for Generative AI in Multiscale Complex Systems
Summary
This paper addresses the limitations of modern machine learning architectures in capturing the multiscale dynamics of complex physical systems, such as turbulence and cosmic structures. The authors introduce a novel diagnostic framework based on Constrained Diffusion Decomposition (CDD), which allows for physically constrained data generation and model evaluation. Traditional Explainable AI (XAI) methods often rely on pixel-wise perturbations that can lead to unphysical artifacts, failing to respect the continuous nature of the underlying dynamics. The proposed CDD framework enables deterministic interventions within a continuous scale space, allowing for the evaluation of generative models like Denoising Diffusion Probabilistic Models (DDPM). The authors demonstrate that these models exhibit structural freezing and non-linear instability under moderate physical perturbations, indicating a failure to maintain cross-scale continuity. This work establishes a rigorous methodology for assessing the physical causality of generative models, providing a foundation for future architectures to better respect the multiscale causality inherent in natural systems.
Methodology
The authors utilize Constrained Diffusion Decomposition (CDD) to create a scale-informed diagnostic framework. This framework allows for the execution of physical interventions in a continuous scale space, enabling the evaluation of generative models' responses to scale-specific perturbations. The methodology includes applying this framework to a Denoising Diffusion Probabilistic Model (DDPM) using observationally-derived data from a non-linear fluid manifold.
Results
The study reveals that the unconstrained generative model exhibits localized structural freezing and non-linear instability when subjected to moderate physical perturbations. This indicates that the model fails to maintain cross-scale continuity and diverges when encountering unseen physical states, highlighting the inadequacies of current generative architectures in internalizing the governing physical laws.
Implications
The findings suggest that future generative AI architectures should incorporate physical constraints to better model complex systems. This work provides a rigorous framework for diagnosing and improving the physical fidelity of machine learning models, which could have significant implications in fields such as astrophysics, fluid dynamics, and other areas involving multiscale phenomena.
Towards Robust and Scalable Density-based Clustering via Graph Propagation
Graph Learning
Efficient ML
Theory
- CluProp reimagines density-based clustering as a graph propagation process, improving robustness and scalability.
- The framework is agnostic to distance metrics and employs a deterministic algorithm for efficient neighborhood identification.
- CluProp significantly outperforms existing clustering methods in both accuracy and runtime on large-scale datasets.
- DANE algorithm allows for effective label propagation from local density peaks, enhancing clustering performance in heterogeneous data.
Read more
Towards Robust and Scalable Density-based Clustering via Graph Propagation
Summary
This paper introduces CluProp, a novel framework that enhances density-based clustering by reinterpreting it as a label propagation process over neighborhood graphs. CluProp addresses the limitations of traditional density-based methods, which are sensitive to parameters and struggle with datasets of varying densities. By employing a deterministic density-based propagation strategy, CluProp ensures efficient neighborhood identification and is agnostic to distance metrics. The framework leverages modularity-based propagation methods, such as Louvain and Leiden, and introduces DANE (Density-Aware Neighborhood Expansion) to facilitate label propagation from local density peaks. CluProp demonstrates superior scalability and accuracy, processing millions of data points in minutes while outperforming existing clustering methods in terms of accuracy and runtime efficiency.
Methodology
CluProp utilizes a label propagation approach over neighborhood graphs, employing a deterministic density-based propagation strategy (DANE) to identify clusters in high-dimensional spaces. The framework integrates efficient modularity-based methods for clustering and is designed to handle varied-density datasets without relying on rigid parameters.
Results
CluProp achieved a 90% Adjusted Mutual Information (AMI) score on the MNIST dataset in just 20 seconds, significantly outperforming the deep learning-based clustering method DCN, which took over 30 minutes to reach a 75% AMI score. On the larger MNIST8M dataset, CluProp completed clustering in under 15 minutes, achieving an 80% Normalized Mutual Information (NMI), while kernel k-means only reached 41% NMI.
Implications
The advancements presented in CluProp could lead to more effective clustering solutions in various applications, particularly in fields requiring the analysis of large and complex datasets with varied densities, such as image processing, social network analysis, and bioinformatics.
Distance metric learning for conditional anomaly detection
Theory
Optimization
Time Series
- Conditional anomaly detection allows for context-specific identification of anomalies.
- Instance-based approaches optimize predictive models for individual data instances.
- Standard distance metrics may be inadequate for anomaly detection tasks.
- Metric-learning methods (NCA and RCA) improve anomaly detection performance.
Read more
Distance metric learning for conditional anomaly detection
Summary
This paper presents a novel approach to conditional anomaly detection, particularly in the context of patient-management alert systems. Traditional anomaly detection methods identify unusual instances based on overall data patterns, but this work extends the framework to consider context-specific anomalies. The authors propose a method that distinguishes between context attributes and target attributes, allowing for the identification of anomalies in target attributes conditioned on context. The paper emphasizes the importance of selecting an appropriate distance metric for instance-based anomaly detection, as standard metrics may not adequately reflect the relevance of examples. The authors explore two metric-learning methods, Neighborhood Component Analysis (NCA) and Relative Component Analysis (RCA), to adaptively learn distance metrics that enhance the performance of anomaly detection. The proposed methods are evaluated using the Pneumonia PORT dataset, focusing on identifying unusual hospitalization decisions for patients with community-acquired pneumonia. The results demonstrate that the metric-learning approaches significantly outperform standard distance metrics in detecting anomalies.
Methodology
The authors utilize instance-based approaches for anomaly detection, focusing on learning adaptive distance metrics through two methods: NCA, which optimizes nearest neighbor classification accuracy, and RCA, which maximizes mutual information while constraining distances between similar cases. The evaluation is conducted on a subset of the Pneumonia PORT dataset, where 100 patient cases are analyzed for unusual hospitalization decisions.
Results
The experimental results indicate that the metric-learning methods (NCA and RCA) outperform standard distance metrics in identifying anomalous hospitalization decisions. The performance is assessed through sensitivity and specificity metrics, demonstrating the effectiveness of the proposed approach in a clinical context.
Implications
The findings suggest that conditional anomaly detection can enhance patient-management systems by providing more accurate alerts for unusual treatment decisions. This has potential applications in healthcare monitoring and decision support systems, improving patient outcomes through timely interventions.