HTTP/3 and QUIC

HTTP/3 is the third major version of the HTTP protocol. Unlike its predecessors, it does not run over TCP — it runs over QUIC, a new transport protocol built on UDP.

The protocol stack

HTTP/1.1 or HTTP/2          HTTP/3
─────────────────────       ─────────────────────
HTTP                        HTTP/3  (RFC 9114)
TLS 1.2/1.3                 QUIC    (RFC 9000)  ← TLS 1.3 built-in
TCP                         UDP
IP                          IP

QUIC folds the transport handshake and TLS 1.3 into a single exchange, so the first request can be sent after 1 RTT (or 0 RTT on session resumption, with replay caveats).

What QUIC fixes compared to TCP

Head-of-line (HoL) blocking

HTTP/2 multiplexes many streams over one TCP connection. A single lost TCP segment stalls all streams until it is retransmitted — the kernel cannot deliver later segments out of order to the application.

QUIC carries each stream independently. A lost UDP datagram only blocks the stream it belongs to; other streams continue unaffected.

Connection migration

TCP connections are identified by the 4-tuple (src IP, src port, dst IP, dst port). Changing network (Wi-Fi → LTE) tears the connection.

QUIC connections are identified by an opaque Connection ID chosen by the endpoints. The underlying IP/port can change transparently — the logical connection survives.

Handshake latency

Protocol	New connection	Resumed connection
TCP + TLS 1.3	2 RTT (TCP) + 1 RTT (TLS)	1 RTT (TCP) + 0 RTT (TLS)
QUIC	1 RTT	0 RTT

Header compression

HTTP/2 uses HPACK, which requires a shared, ordered state between encoder and decoder — a form of HoL blocking at the header level. HTTP/3 uses QPACK (RFC 9204), which tolerates out-of-order delivery.

Where HTTP/3 performs well

HTTP/3 shines when the bottleneck is latency or loss, not raw bandwidth:

Many small parallel requests (web pages, API fan-out): 0-RTT resumption and no transport HoL blocking reduce tail latency.
Mobile / lossy networks: per-stream loss isolation and connection migration avoid full reconnects on network changes.
High-latency paths (intercontinental, satellite): fewer round trips to establish a connection matter more.

Real-world data confirms this: Wix reported up to 33% better connection setup times and improved LCP at the 75th percentile after enabling H3. (APNIC blog, 2023)

Where HTTP/3 is not yet mature: large transfers

For single, long, high-throughput transfers on clean, fast links, HTTP/3 can underperform TCP/H2. The root cause is structural, not incidental.

The receiver-side overhead problem

TCP's congestion control and ACK processing live in the kernel, which benefits from decades of optimization and hardware offloads (TSO, GRO, RSS). QUIC runs in user space: every packet requires a syscall, and ACK processing happens in the application process.

A 2024 study (arXiv 2310.09423) profiled this in detail and found up to ~45% goodput loss for QUIC/H3 vs TCP/H2 on fast links, with the gap widening at higher bandwidths. The receiver hot path — not the sender — was the bottleneck.

The numbers make the problem concrete. Downloading a 1 GB file on a 1 Gbps link with Chrome:

	HTTP/2	HTTP/3
Download time	9.32 s	18.60 s (+99%)
CPU usage	77.1%	97.4%
Packets received by OS	~53K (GRO coalesced)	~743K (no GRO)
`netif_receive_skb` calls	15K	231K
`do_syscall_64` calls	4K	17K
Packet RTT (local link)	1.9 ms	16.2 ms

The packet count difference is the key: TCP's GRO coalesces many segments into one before handing them to the stack; QUIC gets no such treatment, so the OS processes ~14× more packets. Each packet crosses the kernel/user-space boundary individually, and each QUIC ACK is generated in user space — whereas TCP delayed ACKs are handled entirely in the kernel.

A breakdown of where Chromium's network thread spends its time (1 GB download):

Stage	HTTP/2	HTTP/3
Read packets from socket	0.037 s	0.248 s
Process packets for payload	0.084 s	0.310 s
Decode encrypted packets	0.814 s	0.660 s
Parse frames	3.182 s	3.468 s
Generate ACKs	— (kernel)	2.972 s

QUIC spends ~3 s just generating ACKs that TCP handles for free in the kernel.

TCP offloads that QUIC cannot yet use

Offload	TCP	QUIC
TSO (TCP Segmentation Offload)	✔ kernel + NIC	✗ user space only
GRO (Generic Receive Offload)	✔ kernel	Partial (UDP GRO, requires tuning)
Zero-copy send	✔ `sendfile`, `splice`	Limited
RSS (Receive Side Scaling)	✔ automatic	Requires QUIC-aware hashing

State of improvements (2025–2026)

The ecosystem is actively closing the gap, but results remain implementation-dependent:

UDP GSO/GRO: modern QUIC stacks (e.g., quic-go) batch UDP sends via Generic Segmentation Offload and recommend raising rmem_max/wmem_max to multi-MB. This directly reduces syscall overhead.
Paced-GSO: a 2025 paper (arXiv 2505.09222) evaluates kernel-assisted pacing (FQ qdisc timestamping, paced-GSO patches) to reduce bursts and improve goodput on fast links.
10 Gb/s feasibility: a 2025 testbed study across stacks (quiche, ngtcp2, quic-go) showed that increasing packet size and stream count can double throughput, and some implementations can saturate a 10 Gb/s link — but only with careful tuning, and QUIC+HTTP throughput can still lag raw QUIC by ~27% with certain app-layer choices. (KIT 2025)
In-kernel QUIC: there is active work to move the QUIC data path into the Linux kernel (handshake stays in user space), which would enable zero-copy and NIC crypto offload. Not mainline yet. See In-kernel QUIC for details.

Bottom line for large transfers (2026): TCP/H2 is still more CPU-efficient and often faster out-of-the-box for a single bulk stream on a clean, high-bandwidth path. H3 can reach line-rate with aggressive tuning (GSO, large datagrams, multiple streams, BBR), but it is not plug-and-play yet.

Tuning checklist for large transfers over H3

If you need H3 to carry big objects, these are the levers that matter:

Enable UDP GSO/GRO in your QUIC stack and raise socket buffers:

sysctl -w net.core.rmem_max=26214400
sysctl -w net.core.wmem_max=26214400

Use FQ qdisc for precise pacing (avoids burst-induced loss):
```
tc qdisc replace dev eth0 root fq
```
Maximize datagram size (approach path MTU) and use multiple concurrent streams rather than one large stream.
Choose a modern congestion controller (BBRv3 when available; IETF draft).
Profile the receiver, not just the sender — the 2024 study showed the receiver hot path is the decisive bottleneck.

Operational requirements

Deploying HTTP/3 requires changes beyond the application layer:

UDP 443 must be open end-to-end. Some enterprise firewalls and middleboxes block or throttle UDP; always configure graceful fallback to H2.
Load balancers must be QUIC-aware (or terminate QUIC themselves). Connection IDs must be routed consistently to the same backend.
Observability: QUIC encrypts most transport metadata. Passive monitoring tools that inspect TCP headers need updating; use qlog or vendor-specific QUIC metrics instead.
0-RTT replay risk: 0-RTT data can be replayed by a network attacker. Only use it for idempotent, read-only requests.

Quick reference

Property	HTTP/1.1	HTTP/2	HTTP/3
Transport	TCP	TCP	QUIC (UDP)
Multiplexing	No	Yes (TCP HoL)	Yes (no transport HoL)
Handshake (new)	TCP + TLS	TCP + TLS	1 RTT
Handshake (resumed)	TCP + TLS	TCP + TLS	0 RTT
Connection migration	No	No	Yes
Header compression	None	HPACK	QPACK
Large transfer efficiency	Good	Good	Needs tuning
Kernel offload maturity	High	High	Low–Medium

Sources

RFC 9000 — QUIC
RFC 9114 — HTTP/3
RFC 9204 — QPACK
arXiv 2310.09423 — "QUIC is not Quick Enough over Fast Internet" — receiver-side profiling, −45% goodput on fast links
arXiv 2505.09222 — Pacing for QUIC — user-space vs kernel-assisted pacing
KIT 2025 — Heterogeneous QUIC throughput — 10 Gb/s testbed across stacks
APNIC blog — Measuring HTTP/3 real-world performance
Cloudflare — HTTP/3 vs HTTP/2
quic-go optimizations — GSO, UDP buffer sizing
Linux segmentation offloads
LWN — In-kernel QUIC
BBRv3 IETF draft