Archives - Arca's Blog

2026

05-23Compilation Units in OCaml

05-22Setup Emacs for OCaml Development

05-22Performance Optimization Techniques

05-22Design Pattern: Builder

05-22Web API Development Procedure

05-22Client Server Communication

05-22Software Development: MVVM

05-22Android UI Components

05-22Android Layouts

05-22Retrieving Results from Activity

05-22Activity Start Mode and Task Stack

05-21Android Development: Activity Navigation

05-21Android Development: Activity Lifecycle

05-21Pretty Printing

05-21Abstract Types

05-21Module System in OCaml

05-20Lexical Analysis

05-19Message Queue Explained

05-17Setup Haskell Language Server for Emacs

05-16Codeforces Round 1097 (Div. 2)

05-16CLI-Anything

05-13PBFT: Practical Byzantine Fault Tolerance

05-12TMUX Useful Shortcuts

05-10State Monad in Haskell

05-09nano-vllm Overview

05-08Podman Network Configure

05-07Spark Streaming

05-07An Overview of Apache Spark

05-07An Overview of Hadoop

05-05Setup docker/podman for CUDA Development

05-05Case Study: Facebook Photo Caching

05-052026 ICPC Asia Pacific Championship

05-04Tseung Kwan O @ 1st May

05-04Memory Consistency Models

05-04Error Handling in Haskell

05-02Concurrency in Haskell

05-02Higher-Order Programming in OCaml

05-02Unit Testing with OUnit in OCaml

04-29Introduction to AI Agents

04-29Introduction to RAG

04-29LLM Fundamental Knowledge

04-29Bank and Bank Conflict in GPU Programming

04-29Introduction to Streaming Multiprocessors

04-28Quick Meson.Build Reference

04-28Basic Grammer of Zig Programming Language

04-26Abel Summation

04-26ST Monad in Haskell

04-26Justice is Geometric

04-26Managing Modules in Haskell

04-25Generalized Algebraic Datatypes

04-24Roofline Model & Optimization Roadmaps

04-13Records While Testing on SWE-Bench

04-112025 ICPC AsiaEC Shanghai

04-10Profiling CUDA Kernels in PyTorch

04-10Divide and Conquer Reduction with CUDA

04-09CRTP: Static Polymorphism in C++

04-09Function Calling, Tools, MCP and Skills

04-08Skill Loading in Claude Code

04-08Agent Memory

04-08Design of Claude Code (1)

04-08Agent Design & Agent Loop

04-06Java RMI Overview

04-04Design Pattern: Snapshot

03-31[Paper] SplitCom

03-30RWLock 读写锁

03-30SpinLock and the Idea of LockGuard

03-30Rust no_std 开发

03-29Behavioural Design Pattern: Template Method

03-29Behavioural Design Pattern: Strategy

03-29Codeforces Round 1088

03-28PyTorch FX 框架

03-28PyTorch FX IR

03-28PyTorch v2: TorchDynamo

03-28PyTorch TorchInductor

03-27AtCoder Beginner Contest 450

03-26MapReduce Architecture

03-25CUDA Multiple GPU

03-25CUDA Data Transmission

03-25CUDA Multi Streaming

03-25Latency Hiding: CUDA Async Pipeline Execution

03-25存算重叠：双缓冲 (Double Buffering) 与多级流水线 (pipelining)

03-23Rust: 并发编程

03-19.pth Model Format of PyTorch

03-18Sage Attention v1,v2,v3 代码梳理 (2): SA 的 CUDA 实现

03-18[Paper] HO-SFL: Hybrid-Order Split Federated Learning with BP-Free Client and Dimension-Free Aggregation

03-17ReAct Agent 框架

03-17单例模式 (Singleton)

03-17Git Snippets: hard reset + soft reset + merge 清理复杂历史

03-17Git Snippets: 本地合并上游分支

03-16CUDA 算子优化：Warp Divergence

03-16CUDA 算子优化：ILP

03-16CUDA 算子优化：微指令调优

03-16CUDA 算子优化：PTX

03-16CUDA 算子优化：量化

03-16Design Pattern: Factory Method

03-11Rust Iterators

03-11Rust Trait (3): TryFrom, TryInto

03-10Rust STL (2): Vector

03-10Rust STL (1): HashMap

03-10Rust Trait (2): From, Into

03-10Rust Trait (1): AsRef, AsMut

03-09gflags 简易指南：C++ 命令行参数解析库

03-09模型训练框架：Model Checkpoints

03-08PyTorch Extension: 算子集成

03-08Sage Attention v1,v2,v3 代码梳理 (1)：INT8 Per-Block Quant Kernel

03-07Bank Conflict

03-07GPU Parallelism: PTX

03-07Memory Alignment & Coalescing

03-07SIMD 优化

03-07Nsight Compute 简易指南

03-07cuda-gdb 简易指南

03-07CUDA 查询设备信息

03-07CUDA Technique: Grid-Strided Loop

03-07Nsight Systems 简易指南

03-07CUDA 编译流程

03-07GPU Architecture for CUDA

03-07CUDA Optimization: Swizzling

03-07CUDA Kernel: ArgMax

03-06AI Infra Engineering: Abstraction

03-06Git Snippets: 合并 Commits

03-05InfiniTensor AI Compiler v2.0 整理：GraphBuilder

03-05Raft Consensus Protocol

03-05[Paper] Merge Then Compress

03-04InfiniTensor AI Compiler v2.0 整理

03-04Python 与 C/C++ 联合开发（二）：Pybind11

03-03NF4 Dequant CUDA Kernel 优化过程 (1)

03-03Rust 的智能指针

03-01Git Snippets: 先 clone 后下载 submodule

02-26Git Snippets: 将原仓库下的新分支同步到自己 fork 的仓库中

02-25在 ArchLinux 上从零构建 RISC-V Linux 并使用 QEMU 进行模拟

02-24Bash Associative Array (Dictionary)

02-23Rust: Crate & Package & Module

02-23PyTorch 中的图优化

02-22Rust 泛型

02-22C++ 智能指针与资源管理

02-22Google C++ 风格指南

02-22Python Decorator

02-22C++ 的 static 关键字

02-21Python 与 C/C++ 联合开发（一）：ctypes 库

02-19[Paper] Does Training with Synthetic Data Truly Protect Privacy?

02-19Laziness and Evaluation Model of Haskell

02-19用 Foundry 工具链开发智能合约

02-19Solidity 重要语法

02-18Important Types in OCaml

02-18Basic Grammars in OCaml

02-15Haskell Monads

02-15Haskell Applicative

02-15Haskell Functors

02-15Haskell 中的 IO

02-15Introduction to Haskell's Type System

02-15Basic Grammars in Haskell

02-15Git Snippets: 从旧 commit 分叉出新 branch

02-15Triton 编写 Flash Attention

02-14cuda 编写 flash attention 算子

02-12ArchLinux 下将 CapsLock 映射到 Escape

02-12Two-Phase Commit

02-08Remote Procedure Call (RPC)

02-08[Paper] Flash Attention

02-07The Second Half of AI

02-07nmcli 配置 HKU WiFi