cv
This is a description of the page. You can modify it in '_pages/cv.md'. You can also change or remove the top pdf download button.
Basics
Name | Khai P. Tran |
Label | Ph.D. Candidate |
tranphan.khai@gmail.com | |
Url | https://khaitran22.github.io/ |
Summary | A Vietnam-born Ph.D. candidate in Australia, researching about Information Extraction in Natural Language Processing (NLP). |
Education
-
2022 - Current Australia
Doctor of Philosophy
The University of Queensland
Information Extraction from Large-scale Low-quality Data
-
2020 - 2021 Australia
-
2015 - 2019 Vietnam
Bachelor of Business Administration
University of Economics Ho Chi Minh City (UEH)
Business Administration
Work
- 2022 - Now
Teaching Assistant
The University of Queensland, Australia
Teaching several Machine Learning-related and Software Engineering-related courses.
- COMP4702
- DATA7703
- CSSE7023
- 2019.04 - 2020.01
Junior .NET Developer
Hoozing Limited Liability Company
Maintained and developed features for Hoozing Integrated Platform & Systems and Hoozing Website
Awards
- 2022
Best Poster Presentation
The 34th Australasian Joint Conference on Artificial Intelligence
Awarded for the best poster representing accepted paper at the conference.
- 2022
UQ Earmarked Scholarship
The University of Queensland, Australia
Funded by the Australian Government to support excellent and innovative research project that addresses a significant problem or gap in knowledge and represents value for money.
- 2021 . 2022
Dean’s Commendation for Academic Excellence
Faculty of Engineering, Architecture and Information Technology, The University of Queensland.
Awarded to students who have excelled academically and who have shown a strong commitment to their program of study.
Publications
-
2025.01.24 VaeDiff-DocRE: End-to-end Data Augmentation Framework for Document-level Relation Extraction
The 31st International Conference on Computational Linguistics
Document-level Relation Extraction (DocRE) aims to identify relationships between entity pairs within a document. However, most existing methods assume a uniform label distribution, resulting in suboptimal performance on real-world, imbalanced datasets. To tackle this challenge, we propose a novel data augmentation approach using generative models to enhance data from the embedding space. Our method leverages the Variational Autoencoder (VAE) architecture to capture all relation-wise distributions formed by entity pair representations and augment data for underrepresented relations. To better capture the multi-label nature of DocRE, we parameterize the VAE’s latent space with a Diffusion Model. Additionally, we introduce a hierarchical training framework to integrate the proposed VAE-based augmentation module into DocRE systems. Experiments on two benchmark datasets demonstrate that our method outperforms state-of-the-art models, effectively addressing the long-tail distribution problem in DocRE.
-
2024.07.16 CDER: Collaborative Evidence Retrieval for Document-Level Relation Extraction
The 16th Asian Conference on Intelligent Information and Database Systems
Document-level Relation Extraction (DocRE) involves identifying relations between entities across multiple sentences in a document. Evidence sentences, crucial for precise entity pair relationships identification, enhance focus on essential text segments, improving DocRE performance. However, existing evidence retrieval systems often overlook the collaborative nature among semantically similar entity pairs in the same document, hindering the effectiveness of the evidence retrieval task. To address this, we propose a novel evidence retrieval framework, namely CDER. CDER employs an attentional graph-based architecture to capture collaborative patterns and incorporates a dynamic sub-structure for additional robustness in evidence retrieval. Experimental results on the benchmark DocRE dataset show that CDER not only excels in the evidence retrieval task but also enhances overall performance of existing DocRE system.
-
2022.03.19 Improving traffic load prediction with multi-modality: a case study of Brisbane
The 34th Australasian Joint Conference on Artificial Intelligence
Fast and accurate traffic load prediction is a pivotal component of the Intelligent Transport System. It will reduce time spent by commuters and save our environment from vehicle emissions. During the COVID-19 pandemic, people prefer to use private transportation; thus predicting the traffic load becomes more critical. In these years, researchers have developed some traffic load prediction models and have applied these models successfully on data from the US, China or Europe. However, none of these models has been applied to traffic data in Australia. Considering that Australia bears different political, geographical, and climate conditions from other countries, these models may not be suitable to predict the traffic load in Australia. In this paper, we investigate this problem and proposes a multi-modal method that is capable of using Australia-specific data to assist traffic load prediction. Specifically, we use daily social media data together with traffic data to predict the traffic load. We illustrate a protocol to pre-process raw traffic and social media data and then propose a multi-modal model, namely DM2T, which accurately make time-series prediction by using both time-series data and other media data. We validate the effectiveness of our proposed method by a case study on Brisbane city. The result shows that with the help of Australia-specific social media data, our proposed method can make more accurate traffic load prediction for Brisbane than conventional methods.
Skills
Languages | |
Python | |
Java | |
JavaScript | |
HTML | |
CSS |
Data Science & Machine Learning | |
PyTorch | |
Hugging Face Libraries | |
Deep Graph Library | |
TensorFlow |
Languages
Vietnamese | |
Native speaker |
English | |
Professional working proficiency |