I am a PhD student at Harvard University.


I research computer systems by fighting with

James Mickens , adhering to MMA rules.


My research focuses on the intersection of

systems, security, and programming languages.


My projects involve serverless computing,

program analysis, and large-scale vector

search system.

I am a PhD student at Harvard University.

I am a PhD student at Harvard University.


I research computer systems by fighting with

James Mickens , adhering to MMA rules.


My research focuses on the intersection of

systems, security, and programming languages.


My projects involve serverless computing,

program analysis, and large-scale vector

search system.

I research computer systems by fighting with James Mickens, adhering to MMA rules.

My research focuses on the intersection of systems, security, and programming languages.

My projects involve serverless computing, program analysis, and

large-scale vector search systems.

[email protected]

SEC 4.433, 150 Western Ave, MA 02135

[email protected]

SEC 4.433, 150 Western Ave, MA 02135

PUBLICATIONS

SOSP 2023

SPFresh: Incremental In-Place Update for Billion-Scale Vector Search

Yuming Xu, Hengyu Liang, Jin Li, Shuotao Xu, Qi Chen, Qianxi Zhang, Cheng Li, Ziyue Yang, Fan Yang, Yuqing Yang, Peng Cheng, Mao Yang

Yuming Xu, Hengyu Liang, Jin Li, Shuotao Xu, Qi Chen, Qianxi Zhang, Cheng Li, Ziyue Yang, Fan Yang, Yuqing Yang, Peng Cheng, Mao Yang

Yuming Xu, Hengyu Liang, Jin Li, Shuotao Xu, Qi Chen, Qianxi Zhang, Cheng Li, Ziyue Yang, Fan Yang, Yuqing Yang, Peng Cheng, Mao Yang

Approximate Nearest Neighbor Search (ANNS) is widely used in various applications like information retrieval, question answering, and recommendation. As the amount of vector data grows continuously, it becomes crucial to support updates to vector index to maintain accuracy.

SPFresh is a system that supports in-place vector updates to billion-scale vector index. At the heart of SPFresh is LIRE, a lightweight incremental rebalancing protocol for handling data distribution shit during vector updating. SPFresh greatly outperforms existing systems in terms of query latency and accuracy, while using far fewer resources.

This work was initiated when I was an intern at Microsoft Research Asia.

IEEE TVCG 2022

GeodesicEmbedding (GE): A High-Dimensional Embedding Approach for Fast Geodesic Distance Queries

Qianwei Xia, Juyong Zhang, Zheng Fang, Jin Li, Mingyue Zhang, Bailin Deng, Ying He

Qianwei Xia, Juyong Zhang, Zheng Fang, Jin Li, Mingyue Zhang, Bailin Deng, Ying He

Qianwei Xia, Juyong Zhang, Zheng Fang, Jin Li, Mingyue Zhang, Bailin Deng, Ying He

GE is a novel method that expedites geodesic distance queries through the embedding of a mesh into a high-dimensional Euclidean space.

My primary role in this project was centered around systems optimization rather than algorithm design. I undertook the implementation of the algorithms, optimizing them for parallelism and IO efficiency, which culminated in a remarkable acceleration of the program by a factor of 100. Additionally, I ported a Windows library to Linux, further augmenting the efficiency and cross-platform operability of the application. These optimizations significantly enhanced the feasibility and efficiency of the approach, thereby demonstrating its practical potential in real-world applications.

Yes, I was introduced to computer science research in graphics but later found my interests in systems.

PROJECTS

MITScript: Dynamic Language Engineering

MITScript is a teaching dynamic language designed for MIT 6.1120.

The project encapsulates four critical components, forming a robust scripting environment. The journey begins with the development of a parser for syntactic analysis of the source code, followed by an interpreter for direct code execution. Advancing further, the project introduces a compiler to translate MITScript into a stack-machine based IR, alongside an interpreter for efficient execution of the compiled IR. The final component, a garbage collector, oversees dynamic memory allocation and deallocation, ensuring optimal memory utilization. The project also contains a series of performance optimizations that enhance execution speed and resource efficiency, demonstrating a holistic approach to language design and implementation.

rLSM: LSM-Tree Based Database In Rust

A Log-Structured Merge (LSM) engine written in Rust from scratch. It includes a comprehensive implementation of the storage layer, including block cache and memory index management. The engine incorporates Bloom filters and key compression to improve IO efficiency, and its data structures are designed with Rust features for high concurrency. Additionally, the engine supports multiple compaction strategies for the leveled SST structure and utilizes Write-Ahead Logging (WAL) to ensure reliable recovery after restart.

Based on this engine, I conducted a comparative analysis of different compaction strategies against various data distributions, aiming to bolster the engine's robustness and efficiency.

Rsync-K: Optimizing IO Path For Rsync

Rsync-K is a project aimed at optimizing I/O performance on HDDs during regular upstream pulling, utilizing the existing attribute tree created by Rsync-huai for mirror service providers. This optimization significantly reduces the bytes read, with testing showcasing a remarkable reduction by 99.83%. Rsync-K enhances data synchronization efficiency, thereby providing a robust and streamlined mirroring process, making it a substantial advancement in data mirroring services.

XV6 Operating System Labs On RISC-V

Xv6 is a simple Unix-like teaching operating system developed for MIT 6.1810. The project starts with implementing basic functionalities like system calls, page tables, and traps. Then it delves deep into the intricacies of the operating system by implementing copy-on-write fork, introducing multithreading, developing a NIC device driver, refining locking strategies, broadening file system functionalities, and integrating mmap for detailed memory management.

© 2023 Jin Li. All contact relies on chance encounters.