Calvin’s Serial Experiments

Pinlin [Calvin] Xu

Computer Science student (AI focus) at Stanford University; Passionate about languages, Classics, and East Asia studies; Bridging the quantitative and the humane.

Recent Experience

Fullstack Engineer Intern

Sep 2024 -
Ideaflow Inc. Palo Alto, CA
  • Incoming intern to work on web and iOS apps, semantic knowledge graph, product engineering

Undergraduate Research Fellow (machine learning, causal inference)

Jan 2024 – Aug 2024
Stanford Management Science and Engineering, Syrgkanis Lab
  • Researched LLM-powered causal graph discovery applied to genetic perturbation prediction
  • Augmented generative model for gene expression profiles with causal graph derived from PubMed abstracts using GPT-4 and Neo4j graph database
  • Initiated collaboration with PaperToGraph project at Ideaflow; deployed experiments to SOAL GPU cluster

LLM AI Integration Project Lead

Jun 2023 – Sep 2023
JuniorKids Group / Le Groupe JuniorKids Montreal, QC
  • Led an intern team building LLM-powered applications to improve Shopify storefront performance
  • Contributed to first-ever August traffic growth and 37% YoY growth
  • Managed projects on Notion and Github, hosted meetings, and kept everyone and the boss in the loop

Head Programmer, Captain

Nov 2018 – May 2021
FIRST Tech Challenge, Team 358 Gaulbots
  • Built robot vision pipeline and motion planning with closed-loop controls in Java and Kotlin
  • Handled new member recruitment, team building, logistics, and outreach

Student App Developer

Sep 2017 – May 2018
Avon Old Farms School
  • Co-developed a full-stack attendance tracking application that quickly scans student RFID cards
  • Replaced faculty’s pencil-and-clipboard workflow in the freezing dark through User-Oriented Collaborative Design
Pinlin [Calvin] Xu

Recent Projects

FLFL: Japanese Furigana Generation using Aligned Whisper Audiobook Transcription

2024

HuggingFace Trainer, axolotl, wandb, Modal.com

  • Released finetuning datasets from parsing 20+ GB of data released by the Japanese National Diet Library (Whisper transcriptions of public domain audiobooks)
  • Finetuned stockmark/gpt-neox-japanese-1.4b for furigana generation
  • Evaluated performance against few-shot GPT-4, MeCab/Unidic (writeup incoming)
View Model on HF View Training Data on HF
Non-Greedy Furigana String Generation

The Shades of Meaning: Investigating LLMs’ Cross-lingual Representation of Grounded Structures

2024, CS 224N: Natural Language Processing with Deep Learning

Python, PyTorch, HuggingFace transformers, SciPy

  • Led an outstanding CS 224N custom project
  • Collected and built a dataset of cross-lingual cultural color words
  • Performed a series controlled experiments on how the quality of representations varies with language, model, context, and fine-tuning
  • Introduced a novel color-mapping experiment that visualizes languages’ color representation
Read the Report See the Poster

Predicting Hospital Length of Stay from Imbalanced Data

2024, CS 229: Machine Learning

Python, scikit-learn, XGBoost

  • Built and presented a strong classification-regression pipeline using Synthetic Minority Oversampling, ensemble learning
Read the Report See the Poster

Allegorical Lisp Machine

2023, CS107E: Computer Systems from the Ground Up

C, Lisp, ARMv6 Assembly

  • A freestanding graphical Lisp environment on Raspberry Pi A+
  • Implemented Lisp interpreter, system calls, exception handling, REPL, etc. from relevant papers
  • Implemented memory allocation, bitmapped graphics, serial IO, math library, etc. in baremetal C
  • Wrote specifications, tracked progress, and assigned tasks as co-dev and project manager
  • Name is a play on Symbolics
View Publicly Released Code

Flow Tree-Style-Tab Browser

2019

SwiftUI, UIKit, WKWebView, Combine

  • The first tree-style tab browser on iOS & iPadOS
  • Utilized (then) latest native APIs to enable features such as iCloud sync, adblocker, dark and light mode, drag and drop, and multiwindow interactions
  • 89.8K impressions and 2.6K downloads while it was on the App Store
App Store link (currently delisted as I stopped paying for developer membership)

Hikari Ray Tracer

2022

Typed Racket

  • Implementation of The Ray Tracer Challenge in Typed Racket (Scheme, Lisp dialect)
  • Has features of up to chapter 11 of the book and additionally:
    • multithreaded rendering
    • focal blur
  • Source code is tangled from the (not very) literate raytracer-challenge.org document.
View on GitHub

math.c

2021

C, some Calculus

  • Naive freestanding implementation of math.h
View Gist

Other older things I worked on include solutions to SICP exercises (like 2.58), the iOS text encryption app Aenigmatis (delisted link), scripts for indexing and full-text search of media subtitles (link), automating building Anki decks from Kindle Vocabulary Builder using MeCab and MDict & scanning dactylic hexameter for homework, contributing to and testing for Memento, and more.

Writings

Delere imperium and animi imperio: the semantics of imperium in Cic. Cat. and Sall. Cat. Latin Philology, 2021 (link)

The Dawn of Modern Probability: A New Partial Translation of & Commentary on Christiaan Huygens’ De Ratiociniis in Ludo Aleae (1657) Latin, History of Science, 2023 (link)

About 🇨🇦

I am currently a (rising) third year undergraduate at Stanford University studying Computer Science (AI track) as well as Math and Statistics. Reach me at pinlinxu [at] stanford [dot] edu, or any of the other links above. You can usually find me in the San Francisco Bay Area or British Columbia, Canada.

I am also the vice president of Stanford Kendo Club (Japanese sword-fighting martial art), and member of Stanford Storyboard Club and Stanford Amateur Radio Club (Call Sign KN6YCY, FCC General Class). I speak English, Mandarin Chinese, and Japanese (Latin not really).