Milo: A new HTTP parser for Node.js

Milo: A new HTTP parser for Node.js

·

6 min read

By Paolo Insogna

As Node.js continues to evolve, Milo stands ready to facilitate seamless HTTP communication

The Hypertext Transfer Protocol (HTTP) stands as one of the oldest and most critical protocols in daily internet usage. Ensuring a solid and performant implementation of this protocol is fundamental for both user satisfaction and developer efficiency.

Let's explore how Node.js tackles this task and introduces Milo, a promising replacement for the current HTTP parser, llhttp.

A journey through HTTP versions

Originating in 1991 with HTTP/0.9, the HTTP protocol has evolved over the years. Despite its age, only three main versions exist today:

  1. HTTP/1.1: The most widely used version, defined in RFC 9112.

  2. HTTP/2.0: An evolution built upon SPDY, aiming to address TCP protocol limitations, detailed in RFC 9113.

  3. HTTP/3.0: The latest iteration, based on QUIC, ushering HTTP over UDP for the first time. Defined in RFC 9114.

While HTTP/0.9 and HTTP/1.0 remain widely used, they never achieved RFC standardisation and are now considered obsolete. The current version of the procotols instead share common semantics in RFC 9110.

Node.js currently supports HTTP/1.1 and, to some extent, HTTP/0.9 and HTTP/1.0 via its http and https modules. Although HTTP/2.0 is supported through the http2 module, active development is halted due to the relatively low adoption rate and the emergence of HTTP/3.0.

The state of HTTP parsing in Node.js

Presently, Node.js employs the llhttp parser, replacing its predecessor, http_parser, due to performance and maintainability issues. Developed by Fedor Indutny in 2019, llhttp operates by specifying parsing rules in TypeScript, transpiling them into a high-performance parser in C using llparse. In the image below you can see a visual representation of llhttp state machine.

A visual representation of llhttp state machine

llhttp is the default Node.js parser since version 12.0.0. Additionally, llhttp has an extensive test suite, partially inherited by http_parser, which is executed by transpiling Markdown files to C and then execute them.

Despite its efficiency, llhttp suffers from a lack of documentation of its unique architecture, posing challenges for maintenance, evolution and security.

Introducing Milo: A modern HTTP parser

In response to the limitations of llhttp, Paolo Insogna started Milo in 2023 — a new HTTP parser written in Rust. Milo retains a state-based architecture akin to llhttp but leverages Rust's powerful procedural macros to define states, resulting in more readable and maintainable code.

For instance, the snippet below is an actual Milo state definition.

state!(request_protocol, {
  match data {
    string!("HTTP/") | string!("RTSP/") => {
      callback!(on_protocol, 4);
      parser.position += 4;

      move_to!(request_version, 1)
    }
    otherwise!(5) =>
      fail!(UNEXPECTED_CHARACTER, "Expected protocol"),
    _ => suspend!(),
  }
});

The programming language above is Rust.

When waiting for the protocol, which follows the URL in first line of the request (example: GET /home HTTP/1.1), the parser will try to match the current data with three possible cases:

  1. If it starts with HTTP/ or RTSP/ , then it will invoke the on_protocol callback, will advance the position in the data by 4 characters and then move the parser to the request_version state.

  2. Otherwise, if at least 5 characters are available then it will mark the parsing as failed.

  3. Otherwise, if not enough characters are available (_ => ...) it will suspend the parsing.

And below is the actual compiled Rust code after the macros have been replaced (which happens at compile time, so no performance penalty).

// This is part of a big "match" statement
STATE_REQUEST_PROTOCOL => {
  match data {
      [72u8,
      84u8,
      84u8,
      80u8,
      47u8,
      ..,
      ]
      | [82u8,
      84u8,
      83u8,
      80u8,
      47u8,
      ..,
      ] => {
          #[cfg(not(target_family = "wasm"))]
          (self.callbacks.on_protocol)(self, self.position, 4);
          self.position += 4;
          self.state = STATE_REQUEST_VERSION;
          self.position += 1;
      }
      [_u0, _u1, _u2, _u3, _u4, ..] => {
          self.fail(ERROR_UNEXPECTED_CHARACTER, "Expected protocol");
          break 'parser;
      }
      _ => {
          not_suspended = false;
          break 'state;
      }
  }
}

The programming language above is Rust.

Even with limited Rust knowledge, it's relatively simple to map statements between the two snippets.

Using the macros and an eager approach, Milo was able to reduce the number of states to around 30 (while llhttp has around 80).

Unlike llhttp, Milo strictly parses only HTTP/1.1, reducing complexity and chances of introducing vulnerabilities.

How to utilise Milo

Milo offers three implementations: Rust, native (C++, via cbindgen), and WebAssembly (JavaScript). While the Rust implementation is straightforward, the native and WebAssembly versions provide flexibility across different platforms, ensuring compatibility with Node.js environments.

Regardless of the language, Milo ensures a consistent developer experience. The code below shows a sample parsing procedure in Rust with Milo.

use core::ffi::c_void;
use core::slice;
use milo::Parser;

fn main() {
  // Create the parser.
  let mut parser = Parser::new();

  // Prepare a message to parse.
  let message = String::from("HTTP/1.1 200 OK\r\nContent-Length: 3\r\n\r\nabc");
  parser.context = message.as_ptr() as *mut c_void;

  // Milo works using callbacks, All callbacks have the same signature.
  parser.callbacks.on_data = |p: &mut Parser, from: usize, size: usize| {
    let message = unsafe {
      std::str::from_utf8_unchecked(slice::from_raw_parts(p.context.add(from) as *const u8, size))
    };

    // Do something with the informations.
    println!("Pos={} Body: {}", p.position, message);
  };

  // Now perform the main parsing using milo.parse.
  parser.parse(message.as_ptr(), message.len());
}

The programming language above is Rust.

When executed via cargo run, it outputs the following:

Pos=38 Body: abc

The code snippet below shows its equivalents in C++:

#include "milo.h"
#include "stdio.h"
#include "string.h"

int main() {
  // Create the parser.
  milo::Parser* parser = milo::milo_create();

  // Prepare a message to parse.
  const char* message = "HTTP/1.1 200 OK\r\nContent-Length: 3\r\n\r\nabc";
  parser->context = (char*) message;

  // Milo works using callbacks, All callbacks have the same signature.
  parser->callbacks.on_data = [](milo::Parser* p, uintptr_t from, uintptr_t size) {
    char* payload = reinterpret_cast<char*>(malloc(sizeof(char) * size));
    strncpy(payload, reinterpret_cast<const char*>(p->context) + from, size);
    printf("Pos=%lu Body: %s\n", p->position, payload);
    free(payload);
  };

  // Now perform the main parsing using milo.parse. 
  // The method returns the number of consumed characters.
  milo::milo_parse(parser, reinterpret_cast<const unsigned char*>(message), strlen(message));

  // Cleanup used resources.
  milo::milo_destroy(parser);
}

The programming language above is C++.

And JavaScript:

import { milo } from "@perseveranza-pets/milo";

// Prepare a message to parse.
const message = Buffer.from("HTTP/1.1 200 OK\r\nContent-Length: 3\r\n\r\nabc");

// Allocate a memory in the WebAssembly space. 
// This speeds up data copying to the WebAssembly layer.
const ptr = milo.alloc(message.length);

// Create a buffer we can use normally.
const buffer = Buffer.from(milo.memory.buffer, ptr, message.length);

// Create the parser.
const parser = milo.create();

// Milo works using callbacks, All callbacks have the same signature.
milo.setOnData(parser, (p, from, size) => {
  console.log(
    `Pos=${milo.getPosition(p)} Body: ${message
      .slice(from, from + size)
      .toString()}`
  );
});

// Now perform the main parsing using milo.parse. 
// The method returns the number of consumed characters.
buffer.set(Buffer.from(message), 0);
const consumed = milo.parse(parser, ptr, message.length);

// Cleanup used resources.
milo.destroy(parser);
milo.dealloc(ptr, message.length);

What lies ahead for Milo and Node.js?

Milo is currently feature-complete, boasting superior performance in its native version. However, optimisation efforts are underway for the WebAssembly version, which is the one targeted for Node.js integration. Once performance issues are resolved, Milo aims to replace llhttp, bringing Node.js in a new era of HTTP parsing characterised by robustness, performance, and maintainability. As Node.js continues to evolve, Milo stands ready to facilitate seamless HTTP communication, empowering developers to build efficient and reliable web applications.