Chung-Yu Wang

💪

Chung-Yu Wang

I am currently a graduate student of Computer Science (M.Sc) at York University (YorkU) in Canada supervised by Professor Hung Viet Pham. My research interests include but are not limited to Software Engineering (AI4SE), Natural Language Processing, and Machine Learning. More specifically, I focus on optimizing LLMs and FMs for software engineering tasks, such as prompt engineering for code generation.

Resume

🤖 InsightMe - An altenative way to know me through chatbot.

Education

MSc in Computer Science
York University, Canada September 2023 – April 2025
- Supervised by Professor Hung Viet Pham
- Published 2 conference paper (MSR'24 & KDD'25 Workshop)
BSc in Information Management
National University of Kaohsiung, Taiwan September 2018 – June 2022
- GPA: 3.5/4.0
- A research assistant of the Computational Biology & Intelligence System Lab supervised by Professor Tzu-Hsien Yang
- Published 2 journal papers and 1 domestic conference paper (TAAI 2021)

Featured Projects

Check out my featured projects below!

Question-Answering with RAG

Generate questions by retrieving documents using Retrieval-augmented Generation (RAG).

Apr 18, 2025

Fairness Algorithm Design

Fairness in Online Bipartite Matching Problems.

Apr 27, 2024

Bias Mitigation for Medical Images

Proposed data preprocessing techniques to mitigate the bias in the medical dataset.

Apr 20, 2024

Distillation with Small Language Model

Text-to-SQL tasks with small language models using recursive distillation.

Dec 8, 2023

Pick Up Choose

An automatic grading application for jujubes.

Dec 10, 2021

Moving soldiers

Implemented path searching algorithms with Java.

Dec 31, 2020

Who's the victim?

A computer game designed with Java.

Dec 15, 2020

Publications

Selection of Prompt Engineering Techniques for Code Generation through Predicting Code Complexity

Arxiv (Under review) ∙ September 2024

PET-Select is a PET-agnostic model designed to improve the accuracy of code generation by selecting the most appropriate prompt engineering technique (PET) based on code complexity. It uses contrastive learning to distinguish between simple and complex queries, enabling more effective PET selection. Evaluations show PET-Select improves pass@1 accuracy by up to 1.9% and reduces token usage by 74.8%, optimizing the code generation process across benchmarks.

Deep-Bench: Deep Learning Benchmark Dataset for Code Generation

Arxiv (Under review) ∙ Present

Deep-Bench is a new benchmark for function-level deep learning (DL) code generation, designed to cover the full DL pipeline across phases, tasks, and data types—unlike prior benchmarks like DS-1000, which focus narrowly on pre/post-processing. Leading LLMs such as GPT-4o achieve significantly lower accuracy on DeepBench (31% vs. 60% on DS-1000), highlighting its greater complexity. Our analysis reveals substantial performance variation across categories and common bugs in LLM-generated DL code, offering valuable insights into current limitations and future improvements.

Task-oriented Prompt Enhancement via Script Generation

31st Conference on Knowledge Discovery and Data Mining Workshop (KDD'25 Workshop) ∙ April 2024

TITAN is a novel strategy designed to enhance large language models’ (LLMs) performance on task-oriented prompts by using a universal, zero-shot approach. It eliminates the need for task-specific instructions and manual efforts by leveraging step-back and chain-of-thought prompting techniques to refine the code-generation process. In evaluations, TITAN outperforms existing zero-shot methods, achieving state-of-the-art performance in 8 out of 11 tasks, offering a significant improvement in handling everyday task-oriented prompts.

Can ChatGPT Support Developers? An Empirical Evaluation of Large Language Models for Code Generation

21st International Conference on Mining Software Repositories (MSR’24) ∙ March 2024

Large language models (LLMs) have shown promise in code generation, but existing studies focus mainly on research settings, leaving gaps in understanding their real-world utility. An empirical analysis of developer conversations from the DevGPT dataset reveals that LLM-generated code is primarily used for demonstrating concepts or examples rather than as production-ready code. These findings highlight the need for further improvements before LLMs can play a significant role in modern software development.

Education

MSc in Computer Science

BSc in Information Management

Selection of Prompt Engineering Techniques for Code Generation through Predicting Code Complexity

Deep-Bench: Deep Learning Benchmark Dataset for Code Generation

Task-oriented Prompt Enhancement via Script Generation

Can ChatGPT Support Developers? An Empirical Evaluation of Large Language Models for Code Generation

YTLR: Extracting yeast transcription factor-gene associations from the literature using automated literature readers

High-efficiency classification of injured causes on agricultural jujubes using EfficentNet

Human IRES Atlas: an integrative platform for studying IRES-driven translational regulation in humans