SOPHIE COMPUTER: Papers

This is where I log the papers I've read this year. My goal for 2026 is to read 300 papers. By the time this document was last updated (June 06, 2026), I have read 74 papers this year. I should have read 128 papers by this time, meaning I am 54 papers behind schedule.

6/3/26	PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU	Link	Shanghai Jiao Tong University, Haibo Chen, SOSP, 2024
6/2/26	Towards Understanding, Analyzing, and Optimizing Agentic AI Execution: A CPU-Centric Perspective	Link	Georgia Institute of Technology & Intel, Tushar Krishna, arXiv, 2026
6/2/26	DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale	Link	Microsoft, Samyam Rajbhandari & Yuxiong He, ICML, 2022
5/24/26	MoE-Lens, Towards the Hardware Limit of High-Throughput MoE LLM Serving Under Resource Constraints	Link	University of Michigan, arXiv, 2026
5/22/26	MoE-Infinity: Efficient MoE Inference on Personal Machines with Sparsity-Aware Expert Cache	Link	University of Edinburgh, arXiv, 2025
5/21/26	HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading	Link	Cal Tech, arXiv, 2025
5/20/26	Helios: Adaptive Model and Early-Exit Selection for Efficient LLM Inference Serving	Link	UT Austin, arXiv, 2025
5/14/26	CUCo: An Agentic Framework for Compute and Communication Co-design	Link	UT Austin, arXiv, 2026
5/14/26	Mirage Persistent Kernel: A Compiler and Runtime for Mega-Kernelizing Tensor Programs	Link	Carnegie Mellon, OSDI, 2025
5/9/26	Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve	Link	Georgia Tech/Microsoft, OSDI, 2024
5/8/26	MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU	Link	U. of Notre Dame, arXiv, 2026
5/7/26	FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU	Link	Stanford, ICML, 2023
5/6/26	H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models	Link	UT Austin/Carnegie Mellon, NeurIPS, 2023
5/5/26	Efficient Memory Management for Large Language Model Serving with PagedAttention	Link	UC Berkeley, SOSP, 2023
5/4/26	Orca, A Distributed Serving System for Transformer-Based Generative Models	Link	Seoul National University, OSDI, 2022
5/4/26	Measuring AI Agents' Progress on Multi-Step Cyber Attack Scenarios	Link	AI Security Institute, arXiv, 2026
4/30/26	Pie: Pooling CPU Memory for LLM Inference	Link	UC Berkeley, arXiv, 2024
4/29/26	Oneiros: KV Cache Optimization through Parameter Remapping for Multi-tenant LLM Serving	Link	UT Austin, SoCC, 2026
3/9/26	HiCCL: A Hierarchical Collective Communication Library	Link	Stanford University, IPDPS, 2025
3/8/26	MPM-LLM4DSE: Reaching the Pareto Frontier in HLS with Multimodal Learning and LLM-Driven Exploration	Link	Shantou University, DATE, 2026
3/5/26	Exploring GPU-to-GPU Communication: Insights into Supercomputer Interconnects	Link	University of Rome, SC, 2024
3/2/26	RDMA over Ethernet for Distributed Training at Meta Scale	Link	Meta, ACM SIGCOMM, 2024
3/1/26	big.VLITTLE: On-Demand Data-Parallel Acceleration for Mobile Systems on Chip	Link	Cornell University, MICRO, 2022
2/27/26	Synthesizing optimal collective algorithms	Link	Microsoft Research, PPoPP, 2021
2/20/26	Computing the Full Earth System at 1km Resolution	Link	Max Planck Institute for Meteorology, SC, 2025
2/18/26	The Memory Processing Unit: A Generalized Interface for End-to-End In-Memory Execution	Link	University of Illinois, HPCA, 2026
2/5/26	ATLAHS: An Application-centric Network Simulator Toolchain for AI, HPC, and Distributed Storage	Link	ETH Zurich, SC, 2025
2/5/26	Real-Time Object Detection and Recognition in FPGA-Based Autonomous Driving Systems	Link	Samsung, IJCTT, 2024
2/4/26	Harmonic CUDA: Asynchronous Programming on GPUs	Link	University of California/NVIDIA, PMAM, 2023
2/3/26	Tawa: Automatic Warp Specialization for Modern GPUs with Asynchronous References	Link	NVIDIA, CGO, 2026
2/1/26	BetterTogether: An Interference-Aware Framework for Fine-grained Software Pipelining on Heterogeneous SoCs	Link	University of California, IISWC, 2025
1/29/26	Optimizing Green Energy Consumption of Fog Computing Architectures	Link	France University of Rennes, IEEE SBAC-PAD, 2020
1/28/26	An Online Fragmentation-Aware Scheduler for Managing GPU-Sharing Workloads on Multi-Instance GPUs	Link	Taiwan Tsing Hua University/IBM, arXiv, 2025
1/28/26	Collective Communication for 100k+ GPUs	Link	Meta, arXiv, 2026
1/26/26	Demystifying NCCL: An In-depth Analysis of GPU Communication Protocols and Algorithms	Link	ETH Zurich/NVIDIA, arXiv, 2025
1/19/26	The Landscape of GPU-Centric Communication	Link	Koç University, arXiv, 2024
1/18/26	Hot Regions in SPEC CPU2017	Link	UT Austin, IISWC, 2018
1/12/26	A Scheduling Framework for Efficient MoE Inference on Edge GPU-NDP Systems	Link	Nanjing University China, arXiv, 2026
1/10/26	PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel	Link	Meta AI, Proceedings of the VLDB Endowment, 2023
1/8/26	MSCCL++: Rethinking GPU Communication Abstractions for Cutting-edge AI Applications	Link	Microsoft, arXiv, 2025
1/8/26	Design Space Exploration of DMA based Finer-Grain Compute Communication Overlap	Link	UT Austin/AMD, arXiv, 2025
1/7/26	GPGPU Power Modeling for Multi-Domain Voltage-Frequency Scaling	Link	UT Austin, IEEE Transactions on Computers, 2012
1/3/26	Phase-Based Frequency Scaling for Energy-Efficient Heterogeneous Computing	Link	University of Salerno Italy, IPDPS, 2025
1/2/26	Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data	Link	Princeton, MICRO, 2003, 12 pages
1/1/26	GPGPU Power Modeling for Multi-Domain Voltage-Frequency Scaling	Link	2018, 12 pages
12/31/25	Designing Spatial Architectures for Sparse Attention: STAR Accelerator via Cross-Stage Tiling	Link	2025, 15 pages
12/31/25	Optimal Software Pipelining and Warp Specialization for Tensor Core GPUs	Link	2025, 15 pages
12/27/25	Optimizing Distributed ML Communication with Fused Computation-Collective Operations	Link	2024, 17 pages
12/26/25	Defect graph neural networks for materials discovery in high-temperature clean-energy applications	Link	2023, 12 pages
12/24/25	Power Stabilization for AI Training Datacenters	Link	2025, 10 pages
12/23/25	Advancing Cloud Computing Capabilities on gem5 by Implementing the RISC-V Hypervisor Extension	Link	2024, 8 pages

Last updated: June 06, 2026 at 16:33 UTC.