202605-16OpenSpace05-16CLI-Anything05-14AutoGrad in PyTorch05-13PBFT: Practical Byzantine Fault Tolerance05-12TMUX Useful Shortcuts05-11Rate Distortion Theory05-11Pratt Parsing05-10State Monad in Haskell05-10RL Infra Frameworks05-09nano-vllm Overview05-08Podman Network Configure05-07Spark Streaming05-07An Overview of Apache Spark05-07An Overview of Hadoop05-05Setup docker/podman for CUDA Development05-05Case Study: Facebook Photo Caching05-052026 ICPC Asia Pacific Championship05-04Tseung Kwan O @ 1st May05-04Memory Consistency Models05-04Error Handling in Haskell05-02Concurrency in Haskell05-02Higher-Order Programming in OCaml05-02Unit Testing with OUnit in OCaml05-02Decoder-Only LLMs05-02Encoders and Decoders05-01Transformer for Vision (ViT)04-29Introduction to AI Agents04-29Introduction to RAG04-29LLM Fundamental Knowledge04-29Bank and Bank Conflict in GPU Programming04-29Introduction to Streaming Multiprocessors04-29An Introduction to AscendNPU IR04-28Quick Meson.Build Reference04-28Basic Grammer of Zig Programming Language04-27Tucker Decomposition04-26Abel Summation04-26ST Monad in Haskell04-26Justice is Geometric04-26Managing Modules in Haskell04-25Generalized Algebraic Datatypes04-24Roofline Model & Optimization Roadmaps04-24Gluon: Linear Layout and Improvement of Triton04-23Avalanche Protocol for Blockchain04-21Differential Privacy04-13Records While Testing on SWE-Bench04-112025 ICPC AsiaEC Shanghai04-10Profiling CUDA Kernels in PyTorch04-10Divide and Conquer Reduction with CUDA04-09CRTP: Static Polymorphism in C++04-09Function Calling, Tools, MCP and Skills04-08Skill Loading in Claude Code04-08Agent Memory04-08Design of Claude Code (1)04-08Agent Design & Agent Loop04-06Java RMI Overview04-05RISC-V 汇编简单梳理04-04Design Pattern: Snapshot04-02Byzantine Fault Tolerance04-01Rust Async Programming and Coroutine03-31[Paper] SplitCom03-31Context Switch03-30RWLock 读写锁03-30SpinLock and the Idea of LockGuard03-30Atomic Operations and Locks03-30Rust no_std 开发03-30Data Center for AI03-29Behavioural Design Pattern: Template Method03-29Behavioural Design Pattern: Strategy03-29大模型基础设施集群与通信概述03-29Codeforces Round 108803-28PyTorch FX 框架03-28PyTorch FX IR03-28PyTorch v2: TorchDynamo03-28PyTorch TorchInductor03-27AtCoder Beginner Contest 45003-26MapReduce Architecture03-25CUDA Multiple GPU03-25CUDA Data Transmission03-25CUDA Multi Streaming03-25Latency Hiding: CUDA Async Pipeline Execution03-25存算重叠:双缓冲 (Double Buffering) 与多级流水线 (pipelining)03-25cuda 常用官方库03-24Rust: Multi-Processing03-24Concurrency in OS: Instruction Reordering & Memory Model03-23Rust: 并发编程03-19.pth Model Format of PyTorch03-18Sage Attention v1,v2,v3 代码梳理 (2): SA 的 CUDA 实现03-18[Paper] HO-SFL: Hybrid-Order Split Federated Learning with BP-Free Client and Dimension-Free Aggregation03-17ReAct Agent 框架03-17单例模式 (Singleton)03-17Git Snippets: hard reset + soft reset + merge 清理复杂历史03-17Git Snippets: 本地合并上游分支03-16NVIDIA GPU 大学习之 Tensor Core03-16CUDA 算子优化:Warp Divergence03-16CUDA 算子优化:ILP03-16CUDA 算子优化:微指令调优03-16CUDA 算子优化:PTX03-16CUDA 算子优化:量化03-16Design Pattern: Factory Method03-16ninetoothed: CodeGenerator workflow03-11Rust Iterators03-11Rust Trait (3): TryFrom, TryInto03-11ninetoothed 项目整理03-10Rust STL (2): Vector03-10Rust STL (1): HashMap03-10Rust Trait (2): From, Into03-10Rust Trait (1): AsRef, AsMut03-10LLM Inference (1): Chat Server 与流式输出03-09gflags 简易指南:C++ 命令行参数解析库03-09模型训练框架:Model Checkpoints03-09分布式训练03-08PyTorch Extension: 算子集成03-08Sage Attention v1,v2,v3 代码梳理 (1):INT8 Per-Block Quant Kernel03-07Bank Conflict03-07GPU Parallelism: PTX03-07Memory Alignment & Coalescing03-07SIMD 优化03-07Nsight Compute 简易指南03-07cuda-gdb 简易指南03-07CUDA 查询设备信息03-07CUDA Technique: Grid-Strided Loop03-07Nsight Systems 简易指南03-07CUDA 编译流程03-07GPU Architecture for CUDA03-07CUDA Optimization: Swizzling03-07CUDA Kernel: ArgMax03-06AI Infra Engineering: Abstraction03-06Git Snippets: 合并 Commits03-05InfiniTensor AI Compiler v2.0 整理:GraphBuilder03-05Raft Consensus Protocol03-05[Paper] Merge Then Compress03-04InfiniTensor AI Compiler v2.0 整理03-04Python 与 C/C++ 联合开发(二):Pybind1103-04NumPy 与 PyTorch 在数据格式上的互转与二进制存储03-03NF4 Dequant CUDA Kernel 优化过程 (1)03-03Rust 的智能指针03-01Git Snippets: 先 clone 后下载 submodule02-26Git Snippets: 将原仓库下的新分支同步到自己 fork 的仓库中02-25在 ArchLinux 上从零构建 RISC-V Linux 并使用 QEMU 进行模拟02-24Bash Associative Array (Dictionary)02-23Rust: Crate & Package & Module02-23PyTorch 中的图优化02-23CUDA Graph 介绍02-22Rust 泛型02-22C++ 智能指针与资源管理02-22Google C++ 风格指南02-22Python Decorator02-22C++ 的 static 关键字02-21Python 与 C/C++ 联合开发(一):ctypes 库02-19[Paper] Does Training with Synthetic Data Truly Protect Privacy?02-19Laziness and Evaluation Model of Haskell02-19用 Foundry 工具链开发智能合约02-19Solidity 重要语法02-18Important Types in OCaml02-18Basic Grammars in OCaml02-15Haskell Monads02-15Haskell Applicative02-15Haskell Functors02-15Haskell 中的 IO02-15Introduction to Haskell's Type System02-15Basic Grammars in Haskell02-15Git Snippets: 从旧 commit 分叉出新 branch02-15Triton 编写 Flash Attention02-14cuda 编写 flash attention 算子02-12ArchLinux 下将 CapsLock 映射到 Escape02-12Two-Phase Commit02-08Remote Procedure Call (RPC)02-08[Paper] Flash Attention02-07The Second Half of AI02-07nmcli 配置 HKU WiFi