Costa Huang's Website

Posted on Sat, Feb 15, 2020

👋 Hi, I'm Costa, a Machine Learning Engineer at Hugging Face 🤗. I passed my Computer Science Ph.D. defense at Drexel University, with a specialty in Reinforcement Learning. I enjoy doing research and conducting super cool experiments!

🐦 Twitter 👨‍💼 Linkedin 🐙 Github 📜 Resume

I use Deep Reinforcement Learning (DRL) to train bots to play games autonomously!

Our trained agent learned to play breakout!!

My bot learns to play volleyball through self-play

My more recent work focuses on scaling DRL to Real-time Strategy (RTS) games, in collaboration with my advisor Santiago Ontañón.

Let's safely land our shuttle to the lunar surface. Gotta be steady and precise!

Have some fun with a car racing game!

A simulated RTS game. The white, green block and circles are bases, resources, and workers. The purpose of the game is to gather resources, build an army and destroy enemy forces.

You can easily use my DRL library CleanRL to train the agents to play all of the games above.

Feel free to get in touch with me at [email protected] 🎉.

🔧 My Projects

Feel free to click on the links for details

📖 My Publications


Huang, S., Reproducible and Efficient Deep Reinforcement Learning, 2023.


Huang, S., Dossa, R., Ye, C., Braga, J., “CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms”, Journal of Machine Learning Research, 2022

Dossa, R., Huang, S., Ontañón, S., Matsubara, T., “An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization”, IEEE Access, 2021


Weng, J., Lin, M., Huang, S., Liu, B., Makoviichuk, D., Makoviychuk, V., Liu, Z., Song, Y., Luo, T., Jiang, Y. and Xu, Z., 2022. “EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine”, NeurIPS 2022.

Huang, S.,, Dossa, R., Raffin, A., Kanervisto, A., Wang, W. “The 37 Implementation Details of Proximal Policy Optimization”. ICLR Blog Post Track, 2022

Huang, S., Ontañón, S., “A Closer Look at Invalid Action Masking in Policy Gradient Algorithms", FLAIRS-35, 2022

Compton, R., Valmianski, I., Deng, L., Huang, C., Katariya, N., Amatriain, X., Kannan, A. “MEDCOD: A Medically-Accurate, Emotive, Diverse, and Controllable Dialog System.” Machine Learning for Health, 2021.

Huang, S., Ontañón, S., S., Bamford, C., Grela, L., ‘’Gym-µRTS: Toward Affordable Full Game Real-time Strategy Games Research with Deep Reinforcement Learning’’, IEEE Conference on Games 2021

Huang, S., Healy, C., “StreetTraffic: a Library for Traffic Flow Data Collection and Analysis”, ACMSE 2018 Conference, March 2018

Workshop / Preprints:

Huang, S., Kanervisto, A., Raffin, A., Wang, W., Ontanon, S., Dossa, R.F. A2C is a special case of PPO. preprint, 2022

Huang, S., Ontañón, S., “Measuring Generalization of Deep Reinforcement Learning Applied to Real-time Strategy Games”, AAAI 2021 Reinforcement Learning in Games Workshop

Bamford, C., Huang, S., Lucas, S., “Griddly: A platform for AI research in games.", AAAI 2021 Reinforcement Learning in Games Workshop

Huang, S., Ontañón, S., "Action Guidance: Getting the Best of Sparse Rewards and Shaped Rewards for Real-time Strategy Games", AIIDE Workshop on Artificial Intelligence for Strategy Games, October 2020

Huang, S., Ontañón, S., “Comparing Observation and Action Representations for Reinforcement Learning in µRTS”, AIIDE Workshop on Artificial Intelligence for Strategy Games, October 2019

📜 My Resume