Ruilin (Henry) Xu

I'm a Ph.D. in Computer Science

I hold a Ph.D. in Computer Science from Columbia University, where I specialized in audio-visual deep learning. I build and study systems that work in real time, scale to deployment, and create meaningful user experiences. My work lies at the intersection of audio , vision , and human–AI interaction , advancing both research and applications.

My focus is on closing the theory-to-product gap : connecting algorithm design and deep models with practical concerns of data, latency, deployment, and usability. I believe in a pluralistic approach to system design —integrating classical signal processing, modern learning, and human-in-the-loop interaction—so intelligent systems are not only effective in theory but also responsive to human needs and creativity.

My Research Interests

Deep Learning Speech Enhancement Audio-Visual Learning Multimodal Representation Cross-Modal Generation Real-Time Systems Human–AI Interaction

Education

Doctor of Philosophy in Computer Science

2018 - 2024
Columbia University Columbia University, New York, NY

Master of Science in Computer Science

2015 - 2017
Columbia University Columbia University, New York, NY

Bachelor of Science in Computer Science

2010 - 2014
University of Illinois at Urbana-Champaign University of Illinois at Urbana-Champaign, Champaign, IL

Industry / Research Experience

Ph.D. Researcher

2023 - 2025
Snap Inc. Snap Inc., New York, NY
Columbia University Columbia University, New York, NY
  • Designed and developed DanceCraft, a real-time, music-reactive 3D dance improv system that trades scripted choreography for spontaneous, engaging improvisation in response to live audio.
  • Built a hybrid pipeline : music descriptors (tempo/energy/beat) → graph-based selection of motion segments → state-of-the-art motion in-betweening network for seamless transitions and realism.
  • Curated an 8+ hour 3D dance dataset , spanning diverse genres, tempi, and energy levels, enriched with idle behaviors and facial expressions to enhance expressiveness.
  • Shipped production features for interactivity & personalization: users (or DJs) drive the dance with any live music; Bitmoji avatar support (used by 250M+ Snapchat users) for personal embodiment.
  • Deployed at Snap as a production-ready service , adaptable from kiosks to large-scale online events; showcased at Billie Eilish events (120M+ followers) .
  • Evaluated through user studies, demonstrating engaging and immersive experiences.
  • Presented at ACM MOCO 2024 .

Ph.D. Researcher

2021 - 2023
Snap Inc. Snap Inc., New York, NY
Columbia University Columbia University, New York, NY
  • Proposed an environment- & speaker-specific dereverberation method with a one-time personalization step (measuring a representative RIR + user reading while moving for a short duration).
  • Designed a two-stage pipeline (classical Wiener filtering → neural refinement) for robust dereverberation while preserving high-frequency detail.
  • Outperformed classical and learned baselines on PESQ/STOI/SRMR; user studies showed strong preference for our results.
  • Integrated components into Snap's internal audio enhancement pipeline for immersive/AR and creative tools.
  • Presented at INTERSPEECH 2023 .

Ph.D. Researcher

2019 - 2022
SoftBank Group Corp. SoftBank Group Corp., Tokyo, Japan
Columbia University Columbia University, New York, NY
  • Conducted research on generative denoising and inpainting of everyday soundscapes , reconstructing missing or obscured audio to restore ambient context and temporal continuity.
  • Developed a deep generative model with a signal-processing front-end, capable of inferring plausible background textures and transients from partial or noisy inputs; designed and implemented dataset curation, training, and evaluation pipelines.
  • Achieved state-of-the-art naturalness and continuity over baselines (objective metrics + perceptual studies), with results showcased in public audio demos and project documentation.
  • Outcomes informed subsequent multimodal alignment work (e.g., music–motion synchronization in DanceCraft ).
  • Presented at NeurIPS 2020 .
  • Follow-up: Extended the approach to real-time/streaming denoising, building a low-latency pipeline suitable for interactive use; presented at ICASSP 2022 .

Ph.D. Researcher

2017 - 2019
Columbia University Columbia University, New York, NY
  • Proposed a planar-mirror "light trap" combined with pulsed time-of-flight (ToF) and first-return measurements to induce multiple ray bounces, mitigate multipath, and enable single-scan, surround 3D capture of geometrically complex shapes.
  • Conducted extensive simulations and theoretical analysis, showing that light rays can reach 99.9% of surface area after a few bounces; pyramid trap configurations achieved 99% coverage across diverse objects with ~3 reflections.
  • Implemented a fully-working hardware prototype (pulsed ToF + planar mirrors) with bespoke calibration and reconstruction, producing sharper edges and more accurate depth recovery in challenging scenes.
  • Presented at CVPR 2018 .

Student Researcher

2016 - 2016
Columbia University Columbia University, New York, NY
  • Developed anchor frame detection algorithms (C++/OpenCV) leveraging facial recognition, color histograms, and a novel adaptive background-based method to improve efficiency and accuracy.
  • Processed Chinese video metadata using Python (JieBa, TextBlob) to generate keyword tags with TF-IDF–based weighting ; automated reporting with PrettyTable.
  • Presented at ICALIP 2016 .

Earlier Industry Experience

Software Engineer

2014 - 2015
Foxit Software Foxit Software, Fremont, CA

Business Analyst Co-op

2013 - 2013
Monsanto Company Monsanto Company, St. Louis, MO

Software Engineer Intern

2012 - 2012
Foxit Software Foxit Software, Fremont, CA

Programming Languages

Python JavaScript C# Java C/C++ HTML/CSS MATLAB LaTeX Docker

Machine Learning & Deep Learning

PyTorch TensorFlow Keras torchaudio librosa SciPy scikit-learn Matplotlib pandas OpenCV

  • All Projects
  • Speech
  • Animation
  • Vision
  • Real-time
  • Interactive

DanceCraft

A Music-Reactive Real-time Dance Improv System.

Personalized Dereverberation

Dereverberation of Speech via Personalization, Classical and Learning-based Approaches.

Dynamic Sliding Window

Real-Time Speech Denoising via Machine Learning.

Listening to Sounds of Silence

Speech Denoising via Machine Learning.

Light Trapping

3D Shape Reconstruction from Light Field.

Harmonizing Audio and Human Interaction: Enhancement, Analysis, and Application of Audio Signals via Machine Learning Approaches

Ph.D. Dissertation

DanceCraft: A Music-Reactive Real-time Dance Improv System

Conference Paper

  • Authors: Ruilin Xu , Vu An Tran, Shree K. Nayar, and Gurunandan Krishnan
  • Published at: In Proceedings of the 9th International Conference on Movement and Computing (MOCO 2024) .
  • Link: ACM Digital Library

Neural-network-based approach for speech denoising

US Patent

  • Authors: Changxi Zheng, Ruilin Xu , Rundi Wu, Carl Vondrick, and Yuko Ishiwaka
  • Patent Info: US Patent 11894012, 2024.
  • Link: Google Patents

Personalized Dereverberation of Speech

Conference Paper

  • Authors: Ruilin Xu , Gurunandan Krishnan, Changxi Zheng, and Shree K. Nayar
  • Published at: In Proceedings of the 24th Annual Conference of the International Speech Communication Association (INTERSPEECH 2023) .
  • Link: ISCA Archive

Dynamic Sliding Window for Realtime Denoising Networks

Conference Paper

  • Authors: Jinxu Xiang, Yuyang Zhu, Rundi Wu, Ruilin Xu , Changxi Zheng
  • Published at: In Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2022) .
  • Link: IEEE Xplore

Listening to Sounds of Silence for Speech Denoising

Conference Paper

  • Authors: Ruilin Xu , Rundi Wu, Yuko Ishiwaka, Carl Vondrick, and Changxi Zheng
  • Published at: In Proceedings of the 34th International Conference on Neural Information Processing Systems (NeurIPS 2020) .
  • Link: ACM Digital Library

Trapping Light for Time of Flight

Conference Paper

  • Authors: Ruilin Xu , Mohit Gupta, and Shree K. Nayar
  • Published at: In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2018) .
  • Link: IEEE Xplore

News event understanding by mining latent factors from multimodal tensors

Conference Paper

  • Authors: Chun-Yu Tsai, Ruilin Xu , Robert E Colgan, and John R Kender
  • Published at: In Proceedings of the 2016 ACM workshop on Vision and Language Integration Meets Multimedia Fusion (iV&L-MM 2016) .
  • Link: ACM Digital Library

An adaptive anchor frame detection algorithm based on background detection for news video analysis

Conference Paper

  • Authors: Ruilin Xu , Chun-Yu Tsai, and John R Kender
  • Published at: In Proceedings of the 2016 International Conference on Audio, Language and Image Processing (ICALIP 2016) .
  • Link: IEEE Xplore

GitHub

henryxrl

Google Scholar

Ruilin Xu

Facebook

Henry Xu

Instagram

@henryxrl