publications
2025
- VaeDiff-DocRE: End-to-end Data Augmentation Framework for Document-level Relation ExtractionKhai Phan Tran , Wen Hua , and Xue LiIn Proceedings of the 31st International Conference on Computational Linguistics , 2025
Document-level Relation Extraction (DocRE) aims to identify relationships between entity pairs within a document. However, most existing methods assume a uniform label distribution, resulting in suboptimal performance on real-world, imbalanced datasets. To tackle this challenge, we propose a novel data augmentation approach using generative models to enhance data from the embedding space. Our method leverages the Variational Autoencoder (VAE) architecture to capture all relation-wise distributions formed by entity pair representations and augment data for underrepresented relations. To better capture the multi-label nature of DocRE, we parameterize the VAE’s latent space with a Diffusion Model. Additionally, we introduce a hierarchical training framework to integrate the proposed VAE-based augmentation module into DocRE systems. Experiments on two benchmark datasets demonstrate that our method outperforms state-of-the-art models, effectively addressing the long-tail distribution problem in DocRE.
2024
- CDER: Collaborative Evidence Retrieval for Document-level Relation ExtractionKhai Phan Tran , and Xue Li2024
Document-level Relation Extraction (DocRE) involves identifying relations between entities across multiple sentences in a document. Evidence sentences, crucial for precise entity pair relationships identification, enhance focus on essential text segments, improving DocRE performance. However, existing evidence retrieval systems often overlook the collaborative nature among semantically similar entity pairs in the same document, hindering the effectiveness of the evidence retrieval task. To address this, we propose a novel evidence retrieval framework, namely CDER. CDER employs an attentional graph-based architecture to capture collaborative patterns and incorporates a dynamic sub-structure for additional robustness in evidence retrieval. Experimental results on the benchmark DocRE dataset show that CDER not only excels in the evidence retrieval task but also enhances overall performance of existing DocRE system.
2022
- Improving Traffic Load Prediction with Multi-modality: A Case Study of BrisbaneKhai Phan Tran , Weitong Chen , and Miao Xu2022
Fast and accurate traffic load prediction is a pivotal component of the Intelligent Transport System. It will reduce time spent by commuters and save our environment from vehicle emissions. During the COVID-19 pandemic, people prefer to use private transportation; thus predicting the traffic load becomes more critical. In these years, researchers have developed some traffic load prediction models and have applied these models successfully on data from the US, China or Europe. However, none of these models has been applied to traffic data in Australia. Considering that Australia bears different political, geographical, and climate conditions from other countries, these models may not be suitable to predict the traffic load in Australia. In this paper, we investigate this problem and proposes a multi-modal method that is capable of using Australia-specific data to assist traffic load prediction. Specifically, we use daily social media data together with traffic data to predict the traffic load. We illustrate a protocol to pre-process raw traffic and social media data and then propose a multi-modal model, namely DM2T, which accurately make time-series prediction by using both time-series data and other media data. We validate the effectiveness of our proposed method by a case study on Brisbane city. The result shows that with the help of Australia-specific social media data, our proposed method can make more accurate traffic load prediction for Brisbane than conventional methods.