AI-generated summaries

Today's ML research,
without the noise.

Daily summaries of the latest machine learning papers from arXiv, processed every 8 hours.

45 Papers today
8h Update frequency
7 Days of history
Liquid Latent State Dynamics for Interpretable Turbofan Degradation Modeling
Weizhi Nie, Weijie Wang, Yuting Su
Time Series Interpretability
  • Introduces a liquid latent dynamics model for turbofan sensor forecasting.
  • Factorizes latent state into degradation and condition components for better interpretability.
  • Achieves improved sensor forecasting RMSE compared to GRU baseline, especially in complex conditions.
  • Provides a clearer temporal degradation axis, enhancing interpretability of health states.
Read more
Black-Box Inference of LLM Architectural Properties with Restrictive API Access
Christopher Ellis, Shreyas Chaudhari, Mei-Yu Wang, Leighton Barnes, Giulia Fanti, Josรฉ M. F. Moura
Large Language Models NLP Theory
  • NightVision can infer architectural properties of LLMs even with restrictive API access.
  • The method combines common-set prompting and timing analysis to recover hidden dimension, depth, and parameter count.
  • Empirical results show a mean relative error of 23% for hidden dimension and 53% for depth and parameter count.
  • Current API restrictions are insufficient to fully obfuscate LLM architectural details.
Read more
BOUNDARY_SYNC: Measuring Communication-Induced Representational Coupling in Multi-Agent LLM Systems
Zewen Liu
Large Language Models NLP Multimodal
  • Boundary_Sync provides a standardized measurement protocol for communication-induced coupling in LLMs.
  • Text communication significantly homogenizes outputs (CAF=0.803), while image communication shows similar effects (CAF=0.834).
  • Group size influences the direction of coupling, with smaller groups potentially leading to diversification.
  • Coupling is stateless, dependent on immediate peer information, and does not accumulate over time.
Read more
Regularized Variational and Spectral Log-Density-Ratio Estimation in the Gaussian Location Model
Francis Bach
Theory
  • Introduces ridge-regularized log-density-ratio estimation in a Gaussian location model.
  • Derives high-dimensional asymptotic equivalents for variational and spectral estimators.
  • Demonstrates that the variational estimator outperforms the spectral estimator with many observations.
  • Identifies conditions under which the spectral estimator is favored due to lower variance.
Read more
Geometric Signatures of Reasoning: A Spectral Perspective on Task Hardness
Aria Masoomi, Mahsa Bazzaz, Adel Javanmard, Vahab Mirrokni
NLP Large Language Models Theory
  • Introduces a formal framework for analyzing the geometry of CoT reasoning in LLMs.
  • Defines effective dimension (dฯ) as a measure of task complexity related to reasoning trajectories.
  • Demonstrates that kinematic features can predict solution correctness early in the reasoning process.
  • Achieves high accuracy in distinguishing between easy and hard problems using geometric measures.
Read more
kNNGuard: Turning LLM Hidden Activations into a Training-Free Configurable Guardrail
Mahmoud Abdelfattah, Hamid Nasiri, Peter Garraghan
Large Language Models NLP
  • kNNGuard is a training-free guardrail framework that utilizes hidden activations from LLMs.
  • It achieves competitive F1 scores while being 2.7ร— faster than the best existing guardrails.
  • Domain adaptation is simplified, requiring only a small labeled bank and minimal setup time.
  • The framework combines activation-space and embedding-space scores for improved robustness.
Read more
Hybrid quantum-classical neural network for sentiment analysis
Giacomo Cappiello, Filippo Caruso, Xing Liang, Dimitrios Makris
NLP
  • Hybrid quantum-classical neural networks can effectively perform sentiment analysis on COVID-19-related tweets.
  • The study shows comparable accuracy between hybrid models and classical baselines, with improved learning dynamics.
  • Transfer learning experiments indicate a significant performance boost for hybrid models in spam classification tasks.
  • The research highlights the potential advantages of quantum machine learning in natural language processing.
Read more
Do LLMs Truly Generalize in the Molecular Domain? A Perturbation-Based Analysis
Jiatong Li, Weida Wang, Changmeng Zheng, Shufei Zhang, Yatao Bian, Xiao-yong Wei, Qing Li
Large Language Models Graph Learning
  • LLMs show fragility in generalizing molecular properties due to their reliance on local training distributions.
  • The Molecular Perturbation framework reveals that small structural changes can significantly degrade model performance.
  • In-Context Tuning (ICT) can improve robustness by anchoring predictions to structurally similar molecules.
  • The study emphasizes the need for models to align structural variations with chemically meaningful similarities.
Read more
Fourier Neural Operators for Rayleigh-Bรฉnard Convection
Chelsea Maria John, Thibaut Lunet, Sebastian Gรถtschel, Andreas Herten, Stefan Kesselheim, Daniel Ruprecht
Theory Efficient ML Time Series
  • Introduction of a lean FNO architecture that predicts increments for improved accuracy in modeling RBC.
  • Demonstrated faster inference times and reduced parameter count compared to existing models.
  • Ablation studies indicate that multi-layer 1D convolutional scaling operators outperform linear layers in accuracy.
  • Model generalizes well across spatial and temporal resolutions but is limited by training data resolution.
Read more
Beyond the Performance Illusion: Structure-Aware Stratified Partitioning and Curriculum Distributionally Robust Optimization for Spatially Correlated Domains
Prathamesh Patil, Arpit Jain, Aswanth Krishnan
Computer Vision Optimization
  • Identification of spatiotemporal leakage and hidden stratification as critical issues in standard evaluation methods.
  • Introduction of Structure-Aware Stratified Partitioning (SASP) to create more reliable validation splits.
  • Development of Curriculum Distributionally Robust Optimization (CDRO) to stabilize training under rigorous evaluation conditions.
  • Demonstration of improved generalization and confidence calibration across multiple domains.
Read more
Frequency Shift Physics-Informed Extreme Learning Machine for Solving High-Frequency Partial Differential Equations
Xiong Xiong, Ruonan Zhai, Zheng Zeng, Sheng Zhou, Rongchun Hu, Zichen Deng
Theory Efficient ML
  • Introduces FS-PIELM to mitigate spectral bias in high-frequency PDE solutions.
  • Utilizes a novel weight initialization mechanism that shifts the mean of the weight distribution.
  • Demonstrates significant accuracy improvements over existing physics-informed extreme learning machine variants.
  • Maintains computational efficiency with only a single linear solve required.
Read more
How to Allocate Your Tokens? Scaling Laws with Training Steps and Batch Size
Fabian Schaipp
Large Language Models Optimization Theory
  • Introduction of a three-term scaling law that incorporates model size, training steps, and batch size.
  • The law can be fitted using fewer training runs, specifically leveraging suboptimal batch sizes.
  • It provides a framework for deriving scaling laws for both optimal and suboptimal batch sizes.
  • The proposed model aligns with empirical findings regarding critical batch sizes and their scaling.
Read more
Gaming Consensus: Coordinated Manipulation in Crowdsourced Fact-Checking
Nikil Roashan Selvam, Jay Baxter, Sophie Hilgard, Brad Miller, Keith Coleman, Ellen Vitercik, Sanmi Koyejo
Theory
  • Demonstrates a coordinated manipulation strategy for crowdsourced fact-checking systems.
  • Empirical findings indicate that low-quality notes can be artificially elevated above consensus thresholds.
  • Introduces a counterintuitive property of the rating system where 'Not Helpful' ratings can increase helpfulness scores.
  • Develops a cost model for quantifying manipulation efforts.
Read more
Bayesian Sparse Low-Rank Adaptation for Large Language Model Uncertainty Estimation
Jijie Zhang, Zhe Ren, Quan Zhang, Dandan Guo
NLP Large Language Models Efficient ML
  • DALorRA introduces a new paradigm for uncertainty quantification in LLMs by focusing on low-rank adaptation.
  • The method employs stochastic masking to dynamically adjust model capacity, reducing overfitting risks.
  • Empirical results show that DALorRA provides excellent calibration and maintains high reasoning accuracy.
  • The framework combines principles from variational Bayesian estimation and ensemble methods for effective uncertainty quantification.
Read more
Spin-Weighted Spherical Harmonics Enable Complete and Scalable E(3)-Equivariant Networks
Chenxing Liang, Yuchao Lin, Andrii Kryvenko, Wendi Yu, Chuan Li, Jianwen Xie, Xiaofeng Qian, Shuiwang Ji
Theory Efficient ML
  • Introduces SpinGTP to overcome expressivity limitations of existing tensor products in E(3)-equivariant networks.
  • Utilizes Spin-Weighted Spherical Harmonics to capture antisymmetric interactions effectively.
  • Achieves a computational complexity of O(L^3) while maintaining high expressivity.
  • Demonstrates superior performance in tasks involving chiral materials and non-centrosymmetric geometries.
Read more
Revisiting Decentralized Online Convex Optimization with Compressed Communication
Hao Zhou, Xiaoyu Wang, Chang Yao, Mingli Song, Yuanyu Wan
Optimization Theory Efficient ML
  • Introduction of two FTRL-type algorithms for D-OCO with compressed communication.
  • First algorithm matches existing regret bounds in full-information settings.
  • Second algorithm significantly improves regret bounds and communication costs in bandit settings.
  • Simplified analysis and design compared to previous OGD-based approaches.
Read more
Program-as-Weights: A Programming Paradigm for Fuzzy Functions
Wentao Zhang, Liliana Hotsko, Woojeong Kim, Pengyu Nie, Stuart Shieber, Yuntian Deng
NLP Large Language Models Efficient ML
  • Introduction of Program-as-Weights (PAW) for fuzzy function programming.
  • PAW compiles natural language specifications into efficient neural binaries.
  • Demonstrated significant performance improvements with a smaller interpreter.
  • Five case studies illustrate practical applications of PAW in various fuzzy tasks.
Read more
On the Utility and Factual Reliability of Pruned Mixture-of-Experts Models in the Biomedical Domain
Atsuki Yamaguchi, Szymon Palucha, Lรฉo Bijar, Aline Villavicencio, Nikolaos Aletras
NLP Large Language Models Efficient ML
  • First systematic study on the impact of expert pruning on factual reliability in high-stakes biomedical settings.
  • Moderate pruning preserves utility while extreme pruning increases hallucination risks.
  • Utility and reliability degrade rapidly in general-domain tasks compared to in-domain tasks.
  • Safe compression of MoE models is highly task- and domain-dependent.
Read more
HERMES: A Multi-Granularity Labeling Substrate for Pre-training Data Mixtures
Ziyun Qiao, Yue Min, Ruining Chen, Yujun Li
Large Language Models NLP Optimization
  • HERMES provides a hierarchical labeling substrate that allows for dynamic granularity control in data mixtures.
  • The methodology utilizes a Learned Semantic Transform and a three-stage RVQ for efficient document annotation.
  • Performance improvements were observed by adjusting sampling strategies based on granularity, demonstrating the interplay between granularity and sampling methods.
  • HERMES allows for the annotation of approximately 50 million documents into up to 130,000 cells without the need for re-clustering.
Read more
Conditional Inference Trees and Forests for Feature Selection
Robert Milletich, Justin Downes, Steve Goley, Newel Hirst
Theory Efficient ML
  • CIT and CIF effectively reduce split-selection bias in feature selection.
  • CIF ranks highly among various classification and regression methods in benchmark studies.
  • Adaptive stopping and threshold search parameters significantly influence runtime efficiency.
  • High-dimensional simulations reveal potential weaknesses in feature recovery using CIF.
Read more
Ask the Right Comparison: Bias-Aware Bayesian Active Top-$k$ Ranking with LLM Judges
Jian Xu, Delu Zeng, John Paisley, Qibin Zhao
NLP Large Language Models Theory
  • Introduces a bias-aware Bayesian model for evaluating LLM judges that accounts for verbosity and position biases.
  • Develops a top-k-aware active acquisition rule that optimizes the selection of comparisons to identify the top-k items efficiently.
  • Demonstrates that naive aggregation methods can lead to incorrect top-k rankings due to inherent biases in LLM judges.
  • Shows significant improvements in recall rates for biased judges, with performance gains concentrated on lower-tier models.
Read more
CALM: Interpretable Cross-Modal Alignment for Biomarker Discovery from Unpaired Data
Jueqi Wang, Zachary Jacokes, John Darrell Van Horn, Kevin A. Pelphrey, Michael C. Schatz, Archana Venkataraman
Multimodal
  • CALM enables biomarker discovery from unpaired neuroimaging and genetic datasets.
  • The framework uses linear projections for cross-modal alignment in a shared latent space.
  • It outperforms existing methods and shows stability in associations across validation folds.
  • CALM reveals significant immune and metabolic pathways associated with autism spectrum disorder.
Read more
Optimizing Visual Generative Models via Distribution-wise Rewards
Ruihang Li, Mengde Xu, Shuyang Gu, Leigang Qu, Fuli Feng, Han Hu, Wenjie Wang
Generative Models Reinforcement Learning Computer Vision
  • Introduction of distribution-wise rewards to improve generative model training.
  • Mitigation of reward hacking and mode collapse issues common in sample-wise reward systems.
  • Development of a subset-replace strategy for efficient reward computation.
  • Demonstrated significant improvements in FID scores across multiple models.
Read more
The Rollout Infrastructure Tax in Coding-Agent Reinforcement Learning
Daniel Thi Graviet, Lovre Pesut, Ivan Dagelic, Vedran Jukic, Ivan Burazin
Reinforcement Learning Efficient ML
  • Introduction of the 'rollout infrastructure tax' concept, highlighting the impact of execution substrate on coding-agent RL efficiency.
  • Significant variations in performance metrics (cold-start latency and worker-hours) across different execution substrates.
  • The necessity for future coding-agent RL systems to optimize execution substrates as part of the training process.
  • Identification of specific design requirements for effective rollout-native substrates.
Read more
Rank-Then-Act: Reward-Free Control from Frame-Order Progress
Yuriy Maksyuta, George Bredis, Ruslan Rakhimov, Daniil Gavrilov
Reinforcement Learning Computer Vision Robotics
  • RTA enables learning control policies from video without environment rewards.
  • Introduces a correlation-based reward signal using Spearman rank correlation.
  • Demonstrates strong performance across various control tasks and environments.
  • Single pretrained progress scorer shows effective transferability across tasks.
Read more
Finite-Lag Operator Geometry of Recurrent Representations
Kanishka Reddy
Theory Time Series
  • Introduces finite-lag operator geometry for recurrent representations, emphasizing temporal dynamics.
  • Develops a conditional transport law and a source-centered transport tensor that captures the geometry of recurrent states.
  • Proves structural results including affine covariance and stability of estimators on bounded trajectory clouds.
  • Demonstrates the framework's effectiveness in detecting deterministic recurrent motion not visible to traditional methods.
Read more
Zeus: Towards Tuning-Free Foundation Model for Time Series Analysis
Yisong Fu, Zezhi Shao, Chengqing Yu, Yujie Li, Yongjun Xu, Xueqi Cheng, Fei Wang
Time Series
  • ZEUS is a unified TSFM that operates without task-specific fine-tuning.
  • It incorporates a multi-scale Transformer architecture to balance granularity and scalability.
  • MOTM enables ZEUS to accommodate diverse task-specific inductive biases.
  • ZEUS achieves competitive performance across five downstream tasks in a tuning-free manner.
Read more
The risk of KV cache compression
Lukas Haverbeck, Carmen Amo Alonso, Andres Felipe Posada-Moreno, Sebastian Trimpe, Marco Pavone
NLP Large Language Models Theory Efficient ML
  • Characterizes the minimax risk of KV cache compression, providing a theoretical foundation for its design.
  • Identifies the intrinsic compressibility of KV caches based on future query interactions.
  • Proposes novel design principles for efficient KV compression during autoregressive decoding.
  • Instantiates these principles in a practical algorithm that shows promising performance on LongBench.
Read more
Single-Channel EEG-Based Cognitive Load Assessment in Online Learning: A Hybrid Deep Learning Approach
Rowan Hussein, Mohamed Ouf
Time Series
  • Demonstrates the potential of single-channel EEG for cognitive load assessment in online learning.
  • Achieves up to 78.5% accuracy using a hybrid CNN+LSTM+Attention model, outperforming traditional classifiers.
  • Advocates for subject-independent evaluation to ensure model generalizability.
  • Provides a reproducible evaluation pipeline and an open visualization tool for educators.
Read more
EHHN: An Event-driven Heterogeneous Hypergraph Network for Object-Centric Next Activity Prediction
Jiaxing Wang, Kaitao Chen, Zhubin Han, Chenyu Hou, Bin Cao, Jing Fan, Ji Zhang
Graph Learning Time Series Optimization
  • Introduction of a heterogeneous hypergraph representation for object-centric next activity prediction.
  • Development of a micro-spatial encoder that models the asymmetric roles of events and objects.
  • Design of a macro-evolution encoder that captures inter-event timing and global execution patterns.
  • EHHN achieves state-of-the-art performance on OCEL benchmarks, outperforming nine baseline methods.
Read more
Self-explainable Operator Learning for Discovering Spatial Patterns in Functional Data
Mojgan Alishiri, Amirhossein Arzani
Interpretability
  • Introduces a self-explainable operator learning framework for functional data.
  • Enhances interpretability by linking input regions to output predictions.
  • Demonstrates effectiveness in fluid flow problems, revealing spatial feature importance.
  • Offers a transparent alternative to traditional opaque neural network-based models.
Read more
Denser $ eq$ Better: Limits of On-Policy Self-Distillation for Continual Post-Training
Meng Wang, Haohan Zhao, Wenzhuo Liu, Lu Yang, Geng Liu, Haiyang Guo, Guo-Sen Xie, Gaofeng Meng, Hongbin Liu, Fei Zhu
NLP Large Language Models Reinforcement Learning
  • On-policy self-distillation can enhance specialization but is fragile and prone to forgetting.
  • SDPO shows weaker retention compared to traditional on-policy reinforcement learning methods like GRPO.
  • Increased supervision density can lead to sensitivity and accumulated artifacts, complicating continual learning.
  • The study highlights the importance of teacher stability and token reliability in the effectiveness of self-distillation.
Read more
Multilayer Q-Matrix-Embedded Neural Network for Cognitive Diagnosis (M-QCDNet): Structure-Aware Deep Learning Architecture for Psychometric Interpretability
Yiyao Yang
Interpretability
  • M-QCDNet integrates Q-matrix structure into a neural network for enhanced interpretability and predictive accuracy in cognitive diagnosis.
  • The model introduces new evaluation metrics to quantify alignment between predicted skill activations and cognitive theory.
  • M-QCDNet supports practical applications in educational settings, enabling early detection of learning difficulties.
  • The architecture maintains psychometric meaning while leveraging deep learning capabilities, distinguishing it from prior models.
Read more
Towards Learning Representations of Policies in Two-Player Zero-Sum Imperfect-Information Games
Kevin Wang, Kevin Yang, Arjun Prakash, Amy Greenwald
Reinforcement Learning Theory
  • Introduction of methods for creating diverse datasets of policies in games.
  • Proposal of multiple techniques for learning policy representations, including weight autoencoders and functional encoders.
  • Evaluation of learned representations through downstream tasks, confirming the presence of useful behavioral embeddings.
  • Focus on two-player zero-sum imperfect-information games, particularly Kuhn and Leduc Poker.
Read more
Model Merging as Probabilistic Inference in Fine-Tuning Parameter Space
Long Minh Bui, Tuan Anh Le Van, Tung Phi Duc, Phi Le Nguyen, Jana Doppa, Trong Nghia Hoang
Optimization Theory Efficient ML
  • Introduces a probabilistic framework for model merging that improves upon traditional geometric approaches.
  • Formulates model merging as MAP inference under a product of task-specific energy-based experts.
  • Identifies the limitations of Gaussian assumptions in existing methods and proposes a heavy-tailed PoE design.
  • Demonstrates significant performance improvements over state-of-the-art methods in empirical evaluations.
Read more
SINA: A Fully Automated Circuit Schematic Image to Netlist Generator Using Artificial Intelligence
Saoud Aldowaish, Yashwanth Karumanchi, Kai-Chen Chiang, Mohammed Ayman Habib, Finn Murphy, Rishen Cao, Morteza Fayazi
Computer Vision NLP Multimodal
  • SINA achieves a netlist generation accuracy of 96.67%, outperforming existing methods by 2.72 times.
  • The system is capable of processing both IC and PCB schematics, including hand-drawn and scanned images.
  • SINA incorporates advanced techniques such as deep learning, OCR, and VLM for enhanced component detection and connectivity inference.
  • The methodology addresses common pitfalls in existing automated conversion methods, such as misidentifying wire connections and reference designators.
Read more
Many Voices, One Reward: Multi-Role Rubric Generation for LLM Judging and Reward Modeling
Dazhi Fu, Jiuding Yang, Yiwen Guo, Jicong Fan
NLP Large Language Models Reinforcement Learning
  • Identification of dimensional blind spots as a critical failure mode in single-voiced rubric generation.
  • Introduction of Multi-Role Rubric Generation (MRRG) to elicit diverse evaluative perspectives.
  • MRRG consistently outperforms existing single-role rubric generation methods across multiple benchmarks.
  • Demonstrated improvements in reward signals for RLVR applications, enhancing LLM performance.
Read more
ART for Diffusion Sampling: Continuous-Time Control and Actor-Critic Learning
Yilie Huang, Wenpin Tang, Xun Yu Zhou
Generative Models Reinforcement Learning Optimization
  • Introduction of Adaptive Reparameterized Time (ART) for optimal timestep allocation in diffusion sampling.
  • Development of ART-RL, a reinforcement learning framework that learns sampling clock rates using Gaussian policies.
  • Establishment of a theoretical link between ART and ART-RL, ensuring optimality in policy learning.
  • Demonstration of ART's superior performance over traditional sampling schedules in various experimental settings.
Read more
Expander Sparse Autoencoders: Parameter-Efficient Dictionaries for Mechanistic Interpretability
Rodrigo Mendoza-Smith
Interpretability Efficient ML Large Language Models
  • Introduction of Expander SAEs, a parameter-efficient architecture for sparse autoencoders.
  • Demonstrated a storage-fidelity trade-off across multiple language models, achieving 293ร— fewer learned decoder values with minimal loss in fidelity.
  • Proposed a parallel implementation of OMP that enhances decoding efficiency.
  • Theoretical proofs supporting the identifiability of k-sparse codes under specific conditions.
Read more
Predictive Conformal Slip Monitoring: An Empirical Evaluation of Rolling Split Conformal Prediction for Pre-Incident Traction Loss Detection
Varshith Roy Kotla
Theory Time Series Optimization
  • The study evaluates Rolling Split Conformal Prediction for detecting pre-incident traction loss in motorsport.
  • Results showed a mean precision and recall of essentially 0.0, indicating the method's ineffectiveness.
  • The high false-alarm rate (15.3%) suggests significant limitations in the current approach.
  • Methodological rigor is emphasized, with diagnostics revealing violations of underlying assumptions.
Read more
SCAPE: Accurate and Efficient LLM Training with Extreme Sparse Communication
Mingkai Zheng, Junlin Chen, Haotian Xie, Zhao Zhang
Large Language Models Optimization Efficient ML
  • SCAPE enables extreme sparsification of communication in LLM training without sacrificing model performance.
  • The optimizer is built on AdamS, which shows improved robustness to high sparsity compared to AdamW.
  • The method achieves up to 43.3% reduction in pre-training wall-clock time while maintaining model quality.
  • SCAPE's approach allows for efficient synchronization of masks and computation, enhancing overall training efficiency.
Read more
Evolutionary Feature Engineering for Structured Data
Ege Onur Taga, Yilin Zhuang, M. Emrullah Ildiz, Petros Mol, Abhimanyu Das, Karthik Duraisamy, Samet Oymak
Time Series Optimization Interpretability
  • EFE framework utilizes LLMs for evolving preprocessing transformations in structured data.
  • EFE-Time improves time-series forecasting accuracy with dataset-specific normalization.
  • EFE-Tab evolves compact feature programs, enhancing interpretability and performance.
  • The methodology integrates feedback from downstream performance to refine transformations.
Read more
SABER: A Semantic-Aligned Brain Network Analysis Framework via Multi-scale Hypergraphs
Yidan Xu, Xiangmin Han, Rundong Xue, Huihui Ye
Graph Learning NLP Large Language Models
  • SABER integrates LLM-derived semantics directly into the brain network classification process.
  • The framework employs multi-scale hypergraphs to capture complex interactions among brain regions.
  • A decision-level semantic alignment mechanism allows for patient-specific semantic information to influence predictions.
  • SABER outperforms existing methods on benchmark datasets, showcasing improved robustness and interpretability.
Read more
EPnG: Adaptive Expert Prune-and-Grow for Parameter-Efficient MoE Fine-tuning
Ahin Lee, Sehyun Yun, Taesik Gong
NLP Large Language Models Efficient ML
  • EPnG dynamically reallocates resources based on expert importance derived from routing dynamics.
  • The framework prunes under-utilized experts and expands high-importance experts, maintaining a fixed parameter budget.
  • EPnG outperforms traditional LoRA methods while updating significantly fewer parameters.
  • The approach aligns parameter-efficient fine-tuning with the unique characteristics of MoE architectures.
Read more
SA-HGNN: Sample-Adaptive Hyperbolic Graph Neural Network for EEG-Based Depression Recognition
Yang Li, Pan Hu, Yan Zhang, Wenfan Yang, Tao Wu, Lianbo Guo
Graph Learning
  • SA-HGNN dynamically constructs individualized brain network topologies to enhance representation accuracy.
  • The use of hyperbolic geometry allows for better modeling of hierarchical structures in brain connectivity.
  • An attention pooling mechanism effectively reduces noise in EEG signals, preserving essential topological features.
  • The model outperforms traditional GNNs based on Euclidean metrics in EEG-based depression recognition tasks.
Read more