Skip to content

HTTP/3 and QUIC

HTTP/3 is the third major version of the HTTP protocol. Unlike its predecessors, it does not run over TCP — it runs over QUIC, a new transport protocol built on UDP.

The protocol stack

HTTP/1.1 or HTTP/2          HTTP/3
─────────────────────       ─────────────────────
HTTP                        HTTP/3  (RFC 9114)
TLS 1.2/1.3                 QUIC    (RFC 9000)  ← TLS 1.3 built-in
TCP                         UDP
IP                          IP

QUIC folds the transport handshake and TLS 1.3 into a single exchange, so the first request can be sent after 1 RTT (or 0 RTT on session resumption, with replay caveats).

What QUIC fixes compared to TCP

Head-of-line (HoL) blocking

HTTP/2 multiplexes many streams over one TCP connection. A single lost TCP segment stalls all streams until it is retransmitted — the kernel cannot deliver later segments out of order to the application.

QUIC carries each stream independently. A lost UDP datagram only blocks the stream it belongs to; other streams continue unaffected.

Connection migration

TCP connections are identified by the 4-tuple (src IP, src port, dst IP, dst port). Changing network (Wi-Fi → LTE) tears the connection.

QUIC connections are identified by an opaque Connection ID chosen by the endpoints. The underlying IP/port can change transparently — the logical connection survives.

Handshake latency

Protocol New connection Resumed connection
TCP + TLS 1.3 2 RTT (TCP) + 1 RTT (TLS) 1 RTT (TCP) + 0 RTT (TLS)
QUIC 1 RTT 0 RTT

Header compression

HTTP/2 uses HPACK, which requires a shared, ordered state between encoder and decoder — a form of HoL blocking at the header level. HTTP/3 uses QPACK (RFC 9204), which tolerates out-of-order delivery.

Where HTTP/3 performs well

HTTP/3 shines when the bottleneck is latency or loss, not raw bandwidth:

  • Many small parallel requests (web pages, API fan-out): 0-RTT resumption and no transport HoL blocking reduce tail latency.
  • Mobile / lossy networks: per-stream loss isolation and connection migration avoid full reconnects on network changes.
  • High-latency paths (intercontinental, satellite): fewer round trips to establish a connection matter more.

Real-world data confirms this: Wix reported up to 33% better connection setup times and improved LCP at the 75th percentile after enabling H3. (APNIC blog, 2023)

Where HTTP/3 is not yet mature: large transfers

For single, long, high-throughput transfers on clean, fast links, HTTP/3 can underperform TCP/H2. The root cause is structural, not incidental.

The receiver-side overhead problem

TCP's congestion control and ACK processing live in the kernel, which benefits from decades of optimization and hardware offloads (TSO, GRO, RSS). QUIC runs in user space: every packet requires a syscall, and ACK processing happens in the application process.

A 2024 study (arXiv 2310.09423) profiled this in detail and found up to ~45% goodput loss for QUIC/H3 vs TCP/H2 on fast links, with the gap widening at higher bandwidths. The receiver hot path — not the sender — was the bottleneck.

The numbers make the problem concrete. Downloading a 1 GB file on a 1 Gbps link with Chrome:

HTTP/2 HTTP/3
Download time 9.32 s 18.60 s (+99%)
CPU usage 77.1% 97.4%
Packets received by OS ~53K (GRO coalesced) ~743K (no GRO)
netif_receive_skb calls 15K 231K
do_syscall_64 calls 4K 17K
Packet RTT (local link) 1.9 ms 16.2 ms

The packet count difference is the key: TCP's GRO coalesces many segments into one before handing them to the stack; QUIC gets no such treatment, so the OS processes ~14× more packets. Each packet crosses the kernel/user-space boundary individually, and each QUIC ACK is generated in user space — whereas TCP delayed ACKs are handled entirely in the kernel.

A breakdown of where Chromium's network thread spends its time (1 GB download):

Stage HTTP/2 HTTP/3
Read packets from socket 0.037 s 0.248 s
Process packets for payload 0.084 s 0.310 s
Decode encrypted packets 0.814 s 0.660 s
Parse frames 3.182 s 3.468 s
Generate ACKs — (kernel) 2.972 s

QUIC spends ~3 s just generating ACKs that TCP handles for free in the kernel.

TCP offloads that QUIC cannot yet use

Offload TCP QUIC
TSO (TCP Segmentation Offload) ✔ kernel + NIC ✗ user space only
GRO (Generic Receive Offload) ✔ kernel Partial (UDP GRO, requires tuning)
Zero-copy send sendfile, splice Limited
RSS (Receive Side Scaling) ✔ automatic Requires QUIC-aware hashing

State of improvements (2025–2026)

The ecosystem is actively closing the gap, but results remain implementation-dependent:

  • UDP GSO/GRO: modern QUIC stacks (e.g., quic-go) batch UDP sends via Generic Segmentation Offload and recommend raising rmem_max/wmem_max to multi-MB. This directly reduces syscall overhead.
  • Paced-GSO: a 2025 paper (arXiv 2505.09222) evaluates kernel-assisted pacing (FQ qdisc timestamping, paced-GSO patches) to reduce bursts and improve goodput on fast links.
  • 10 Gb/s feasibility: a 2025 testbed study across stacks (quiche, ngtcp2, quic-go) showed that increasing packet size and stream count can double throughput, and some implementations can saturate a 10 Gb/s link — but only with careful tuning, and QUIC+HTTP throughput can still lag raw QUIC by ~27% with certain app-layer choices. (KIT 2025)
  • In-kernel QUIC: there is active work to move the QUIC data path into the Linux kernel (handshake stays in user space), which would enable zero-copy and NIC crypto offload. Not mainline yet. See In-kernel QUIC for details.

Bottom line for large transfers (2026): TCP/H2 is still more CPU-efficient and often faster out-of-the-box for a single bulk stream on a clean, high-bandwidth path. H3 can reach line-rate with aggressive tuning (GSO, large datagrams, multiple streams, BBR), but it is not plug-and-play yet.

Tuning checklist for large transfers over H3

If you need H3 to carry big objects, these are the levers that matter:

  1. Enable UDP GSO/GRO in your QUIC stack and raise socket buffers:

    sysctl -w net.core.rmem_max=26214400
    sysctl -w net.core.wmem_max=26214400
    
  2. Use FQ qdisc for precise pacing (avoids burst-induced loss):

    tc qdisc replace dev eth0 root fq
    
  3. Maximize datagram size (approach path MTU) and use multiple concurrent streams rather than one large stream.

  4. Choose a modern congestion controller (BBRv3 when available; IETF draft).

  5. Profile the receiver, not just the sender — the 2024 study showed the receiver hot path is the decisive bottleneck.

Operational requirements

Deploying HTTP/3 requires changes beyond the application layer:

  • UDP 443 must be open end-to-end. Some enterprise firewalls and middleboxes block or throttle UDP; always configure graceful fallback to H2.
  • Load balancers must be QUIC-aware (or terminate QUIC themselves). Connection IDs must be routed consistently to the same backend.
  • Observability: QUIC encrypts most transport metadata. Passive monitoring tools that inspect TCP headers need updating; use qlog or vendor-specific QUIC metrics instead.
  • 0-RTT replay risk: 0-RTT data can be replayed by a network attacker. Only use it for idempotent, read-only requests.

Quick reference

Property HTTP/1.1 HTTP/2 HTTP/3
Transport TCP TCP QUIC (UDP)
Multiplexing No Yes (TCP HoL) Yes (no transport HoL)
Handshake (new) TCP + TLS TCP + TLS 1 RTT
Handshake (resumed) TCP + TLS TCP + TLS 0 RTT
Connection migration No No Yes
Header compression None HPACK QPACK
Large transfer efficiency Good Good Needs tuning
Kernel offload maturity High High Low–Medium

Sources