Computer Science student (AI focus) at Stanford University; Passionate about languages, Classics, and East Asia studies; Bridging the quantitative and the humane.
Recent Experience
Fullstack Engineer (part-time)
Sep 2024 -- Working on web (Next.js, React) and iOS apps, semantic knowledge graph, product engineering
Undergraduate Research Fellow (machine learning, causal inference)
Jan 2024 – Aug 2024- Researched LLM-powered causal graph discovery applied to genetic perturbation prediction
- Augmented generative model for gene expression profiles with causal graph derived from PubMed abstracts using GPT-4 and Neo4j graph database
- Initiated collaboration with PaperToGraph project at Ideaflow; deployed experiments to SOAL GPU cluster
LLM AI Integration Project Lead
Jun 2023 – Sep 2023- Led an intern team building LLM-powered applications to improve Shopify storefront performance
- Contributed to first-ever August traffic growth and 37% YoY growth
- Managed projects on Notion and Github, hosted meetings, and kept everyone and the boss in the loop
Head Programmer, Captain
Nov 2018 – May 2021- Built robot vision pipeline and motion planning with closed-loop controls in Java and Kotlin
- Handled new member recruitment, team building, logistics, and outreach
Student App Developer
Sep 2017 – May 2018- Co-developed a full-stack attendance tracking application that quickly scans student RFID cards
- Replaced faculty’s pencil-and-clipboard workflow in the freezing dark through User-Oriented Collaborative Design
Recent Projects
FLFL: Japanese Furigana Generation using Aligned Whisper Audiobook Transcription
2024HuggingFace Trainer, axolotl, wandb, Modal.com
- Released finetuning datasets from parsing 20+ GB of data released by the Japanese National Diet Library (Whisper transcriptions of public domain audiobooks)
- Finetuned stockmark/gpt-neox-japanese-1.4b for furigana generation
- Evaluated performance against few-shot GPT-4, MeCab/Unidic (writeup incoming)
Non-Greedy Furigana String Generation
The Shades of Meaning: Investigating LLMs’ Cross-lingual Representation of Grounded Structures
2024, CS 224N: Natural Language Processing with Deep LearningPython, PyTorch, HuggingFace transformers, SciPy
- Led an outstanding CS 224N custom project
- Collected and built a dataset of cross-lingual cultural color words
- Performed a series controlled experiments on how the quality of representations varies with language, model, context, and fine-tuning
- Introduced a novel color-mapping experiment that visualizes languages’ color representation
Predicting Hospital Length of Stay from Imbalanced Data
2024, CS 229: Machine LearningPython, scikit-learn, XGBoost
- Built and presented a strong classification-regression pipeline using Synthetic Minority Oversampling, ensemble learning
Allegorical Lisp Machine
2023, CS107E: Computer Systems from the Ground UpC, Lisp, ARMv6 Assembly
- A freestanding graphical Lisp environment on Raspberry Pi A+
- Implemented Lisp interpreter, system calls, exception handling, REPL, etc. from relevant papers
- Implemented memory allocation, bitmapped graphics, serial IO, math library, etc. in baremetal C
- Wrote specifications, tracked progress, and assigned tasks as co-dev and project manager
- Name is a play on Symbolics
7GUIs with React + TypeScript + MobX
2024React, TypeScript, MobX, Node.js
- A concise and accurate implementation of the 7GUIs challenge
- Fully reactive with state management & value derivation in MobX
Flow Tree-Style-Tab Browser
2019SwiftUI, UIKit, WKWebView, Combine
- The first tree-style tab browser on iOS & iPadOS
- Utilized (then) latest native APIs to enable features such as iCloud sync, adblocker, dark and light mode, drag and drop, and multiwindow interactions
- 89.8K impressions and 2.6K downloads while it was on the App Store
Hikari Ray Tracer
2022Typed Racket
- Implementation of The Ray Tracer Challenge in Typed Racket (PLT Scheme, Lisp dialect)
-
Has features of up to chapter 11 of the book and additionally:
- multithreaded rendering
- focal blur
- Source code is tangled from the (not very) literate raytracer-challenge.org document.
math.c
2021C, some Calculus
- Naive freestanding implementation of
math.h
Other older things I worked on include solutions to SICP exercises (like 2.58), the iOS text encryption app Aenigmatis (delisted link), contributing to and testing for Memento, and all kinds of scripts for indexing / tagging local media, automating building Anki decks from Kindle Vocabulary Builder using MeCab and MDict, scanning dactylic hexameter for homework, etc.
Writings
About 🇨🇦
I am currently a third year undergraduate at Stanford University studying Computer Science (AI track) as well as Math and Statistics. Reach me at pinlinxu [at] stanford [dot] edu, or any of the other links above. You can usually find me in the San Francisco Bay Area or British Columbia, Canada.
I am also the vice president of Stanford Kendo Club (Japanese sword-fighting martial art), and member of Stanford Storyboard Club and Stanford Amateur Radio Club (Call Sign KN6YCY, FCC General Class). I speak English, Mandarin Chinese, and Japanese (Latin not really).