HTTP/3 and QUIC
HTTP/3 is the third major version of the HTTP protocol. Unlike its predecessors, it does not run over TCP — it runs over QUIC, a new transport protocol built on UDP.
The protocol stack
HTTP/1.1 or HTTP/2 HTTP/3
───────────────────── ─────────────────────
HTTP HTTP/3 (RFC 9114)
TLS 1.2/1.3 QUIC (RFC 9000) ← TLS 1.3 built-in
TCP UDP
IP IP
QUIC folds the transport handshake and TLS 1.3 into a single exchange, so the first request can be sent after 1 RTT (or 0 RTT on session resumption, with replay caveats).
What QUIC fixes compared to TCP
Head-of-line (HoL) blocking
HTTP/2 multiplexes many streams over one TCP connection. A single lost TCP segment stalls all streams until it is retransmitted — the kernel cannot deliver later segments out of order to the application.
QUIC carries each stream independently. A lost UDP datagram only blocks the stream it belongs to; other streams continue unaffected.
Connection migration
TCP connections are identified by the 4-tuple (src IP, src port, dst IP, dst port). Changing
network (Wi-Fi → LTE) tears the connection.
QUIC connections are identified by an opaque Connection ID chosen by the endpoints. The underlying IP/port can change transparently — the logical connection survives.
Handshake latency
| Protocol | New connection | Resumed connection |
|---|---|---|
| TCP + TLS 1.3 | 2 RTT (TCP) + 1 RTT (TLS) | 1 RTT (TCP) + 0 RTT (TLS) |
| QUIC | 1 RTT | 0 RTT |
Header compression
HTTP/2 uses HPACK, which requires a shared, ordered state between encoder and decoder — a form of HoL blocking at the header level. HTTP/3 uses QPACK (RFC 9204), which tolerates out-of-order delivery.
Where HTTP/3 performs well
HTTP/3 shines when the bottleneck is latency or loss, not raw bandwidth:
- Many small parallel requests (web pages, API fan-out): 0-RTT resumption and no transport HoL blocking reduce tail latency.
- Mobile / lossy networks: per-stream loss isolation and connection migration avoid full reconnects on network changes.
- High-latency paths (intercontinental, satellite): fewer round trips to establish a connection matter more.
Real-world data confirms this: Wix reported up to 33% better connection setup times and improved LCP at the 75th percentile after enabling H3. (APNIC blog, 2023)
Where HTTP/3 is not yet mature: large transfers
For single, long, high-throughput transfers on clean, fast links, HTTP/3 can underperform TCP/H2. The root cause is structural, not incidental.
The receiver-side overhead problem
TCP's congestion control and ACK processing live in the kernel, which benefits from decades of optimization and hardware offloads (TSO, GRO, RSS). QUIC runs in user space: every packet requires a syscall, and ACK processing happens in the application process.
A 2024 study (arXiv 2310.09423) profiled this in detail and found up to ~45% goodput loss for QUIC/H3 vs TCP/H2 on fast links, with the gap widening at higher bandwidths. The receiver hot path — not the sender — was the bottleneck.
The numbers make the problem concrete. Downloading a 1 GB file on a 1 Gbps link with Chrome:
| HTTP/2 | HTTP/3 | |
|---|---|---|
| Download time | 9.32 s | 18.60 s (+99%) |
| CPU usage | 77.1% | 97.4% |
| Packets received by OS | ~53K (GRO coalesced) | ~743K (no GRO) |
netif_receive_skb calls |
15K | 231K |
do_syscall_64 calls |
4K | 17K |
| Packet RTT (local link) | 1.9 ms | 16.2 ms |
The packet count difference is the key: TCP's GRO coalesces many segments into one before handing them to the stack; QUIC gets no such treatment, so the OS processes ~14× more packets. Each packet crosses the kernel/user-space boundary individually, and each QUIC ACK is generated in user space — whereas TCP delayed ACKs are handled entirely in the kernel.
A breakdown of where Chromium's network thread spends its time (1 GB download):
| Stage | HTTP/2 | HTTP/3 |
|---|---|---|
| Read packets from socket | 0.037 s | 0.248 s |
| Process packets for payload | 0.084 s | 0.310 s |
| Decode encrypted packets | 0.814 s | 0.660 s |
| Parse frames | 3.182 s | 3.468 s |
| Generate ACKs | — (kernel) | 2.972 s |
QUIC spends ~3 s just generating ACKs that TCP handles for free in the kernel.
TCP offloads that QUIC cannot yet use
| Offload | TCP | QUIC |
|---|---|---|
| TSO (TCP Segmentation Offload) | ✔ kernel + NIC | ✗ user space only |
| GRO (Generic Receive Offload) | ✔ kernel | Partial (UDP GRO, requires tuning) |
| Zero-copy send | ✔ sendfile, splice |
Limited |
| RSS (Receive Side Scaling) | ✔ automatic | Requires QUIC-aware hashing |
State of improvements (2025–2026)
The ecosystem is actively closing the gap, but results remain implementation-dependent:
- UDP GSO/GRO: modern QUIC stacks (e.g., quic-go)
batch UDP sends via Generic Segmentation Offload and recommend raising
rmem_max/wmem_maxto multi-MB. This directly reduces syscall overhead. - Paced-GSO: a 2025 paper (arXiv 2505.09222) evaluates kernel-assisted pacing (FQ qdisc timestamping, paced-GSO patches) to reduce bursts and improve goodput on fast links.
- 10 Gb/s feasibility: a 2025 testbed study across stacks (quiche, ngtcp2, quic-go) showed that increasing packet size and stream count can double throughput, and some implementations can saturate a 10 Gb/s link — but only with careful tuning, and QUIC+HTTP throughput can still lag raw QUIC by ~27% with certain app-layer choices. (KIT 2025)
- In-kernel QUIC: there is active work to move the QUIC data path into the Linux kernel (handshake stays in user space), which would enable zero-copy and NIC crypto offload. Not mainline yet. See In-kernel QUIC for details.
Bottom line for large transfers (2026): TCP/H2 is still more CPU-efficient and often faster out-of-the-box for a single bulk stream on a clean, high-bandwidth path. H3 can reach line-rate with aggressive tuning (GSO, large datagrams, multiple streams, BBR), but it is not plug-and-play yet.
Tuning checklist for large transfers over H3
If you need H3 to carry big objects, these are the levers that matter:
-
Enable UDP GSO/GRO in your QUIC stack and raise socket buffers:
sysctl -w net.core.rmem_max=26214400 sysctl -w net.core.wmem_max=26214400 -
Use FQ qdisc for precise pacing (avoids burst-induced loss):
tc qdisc replace dev eth0 root fq -
Maximize datagram size (approach path MTU) and use multiple concurrent streams rather than one large stream.
-
Choose a modern congestion controller (BBRv3 when available; IETF draft).
-
Profile the receiver, not just the sender — the 2024 study showed the receiver hot path is the decisive bottleneck.
Operational requirements
Deploying HTTP/3 requires changes beyond the application layer:
- UDP 443 must be open end-to-end. Some enterprise firewalls and middleboxes block or throttle UDP; always configure graceful fallback to H2.
- Load balancers must be QUIC-aware (or terminate QUIC themselves). Connection IDs must be routed consistently to the same backend.
- Observability: QUIC encrypts most transport metadata. Passive monitoring tools that inspect TCP headers need updating; use qlog or vendor-specific QUIC metrics instead.
- 0-RTT replay risk: 0-RTT data can be replayed by a network attacker. Only use it for idempotent, read-only requests.
Quick reference
| Property | HTTP/1.1 | HTTP/2 | HTTP/3 |
|---|---|---|---|
| Transport | TCP | TCP | QUIC (UDP) |
| Multiplexing | No | Yes (TCP HoL) | Yes (no transport HoL) |
| Handshake (new) | TCP + TLS | TCP + TLS | 1 RTT |
| Handshake (resumed) | TCP + TLS | TCP + TLS | 0 RTT |
| Connection migration | No | No | Yes |
| Header compression | None | HPACK | QPACK |
| Large transfer efficiency | Good | Good | Needs tuning |
| Kernel offload maturity | High | High | Low–Medium |
Sources
- RFC 9000 — QUIC
- RFC 9114 — HTTP/3
- RFC 9204 — QPACK
- arXiv 2310.09423 — "QUIC is not Quick Enough over Fast Internet" — receiver-side profiling, −45% goodput on fast links
- arXiv 2505.09222 — Pacing for QUIC — user-space vs kernel-assisted pacing
- KIT 2025 — Heterogeneous QUIC throughput — 10 Gb/s testbed across stacks
- APNIC blog — Measuring HTTP/3 real-world performance
- Cloudflare — HTTP/3 vs HTTP/2
- quic-go optimizations — GSO, UDP buffer sizing
- Linux segmentation offloads
- LWN — In-kernel QUIC
- BBRv3 IETF draft