shelving/markupmodule

A rule-based Markdown renderer that turns user-facing text into React nodes.

Concepts

This module converts Markdown-ish text into React nodes — suitable for rendering blog post bodies, user-written descriptions, or any rich text that originates outside your code.

Rendering is done by a MarkupParser instance. Each block element type — heading, paragraph, blockquote, fenced code, ordered list, unordered list (including [ ] / [x] todo items), table, separator — is handled by an independent MarkupRule. Inline elements — bold, italic, inserted/deleted/highlighted text, inline code, links, autolinks, line breaks — are a separate rule set applied within block content.

The engine groups rules into priority tiers and resolves the highest tier first. Once a tier claims a region of the text that region is "masked" so lower tiers cannot match into or across it. Each rule renders its match to a React element and recurses into its own children, optionally in a different context.

The default ruleset intentionally diverges from CommonMark:

  • *asterisk* is always <strong>, _underscore_ is always <em> — no ambiguity between the two.
  • A single \n newline is always a <br /> — no trailing double-space needed.
  • Literal HTML tags and &amp; character entities are not supported — they render as plain text.
  • All bare URLs are autolinked.
  • Whitespace is not fussy: there is no four-space indented code block, and nesting (e.g. a sub-list) uses a single tab per level — see Input normalisation.

Usage

Rendering markup

Create a MarkupParser and call MarkupParser.parse():

ts
import { MarkupParser } from "shelving/markup";

const parser = new MarkupParser();
const node = parser.parse("# Hello\n\nThis is *bold* and _italic_.");
// `node` is a `ReactNode` — render it directly in JSX.

parse() returns a ReactNode: a single element, a string, an array of those, or null.

For the common case the shared MARKUP_PARSER sentinel — a MarkupParser with default options — saves constructing one:

ts
import { MARKUP_PARSER } from "shelving/markup";

const node = MARKUP_PARSER.parse(content);

Options

MarkupParser is constructed with MarkupOptions:

OptionTypeDescription
rulesMarkupRulesRules to apply. Defaults to MARKUP_RULES (all block + inline rules).
relstringrel attribute applied to every rendered link, e.g. "nofollow ugc".
urlImmutableURLCurrent page URL — base for resolving relative refs (./foo, #x).
rootImmutableURLSite root URL — base for resolving site-absolute paths (/foo).
schemesURISchemesAllowed URI schemes for links. Defaults to ["http:", "https:"].
contextstringDefault starting context. Defaults to "block".
ts
const parser = new MarkupParser({
  rel: "nofollow ugc",
  url: requireURL("https://example.com/page/"),
  root: requireURL("https://example.com/"),
  schemes: ["http:", "https:", "mailto:"],
});

Link href resolution goes through getLink() — site-absolute paths resolve against root, relative refs against url, scheme-prefixed URIs (mailto:, tel:, …) pass through, URL instances are emitted as-is.

Block-only or inline-only rendering

MarkupParser.parse() takes an optional second argument — the starting context. Rules declare which contexts they apply in, so "inline" skips every block-level rule:

ts
// Inline only — no block wrappers like <p> or <h1>.
const inline = MARKUP_PARSER.parse("Some *bold* text", "inline");

MARKUP_RULES_BLOCK and MARKUP_RULES_INLINE expose the block and inline rule sets separately if you want a parser with only one.

Custom rules

Build custom rules with createMarkupRule() and combine them with the defaults:

tsx
import { createMarkupRule, MARKUP_RULES, MarkupParser } from "shelving/markup";

const HIGHLIGHT_RULE = createMarkupRule<{ text: string }>(
  /==(?<text>[^=]+)==/,
  (key, { text }) => <mark key={key}>{text}</mark>,
  ["inline"],
);

const parser = new MarkupParser({ rules: [...MARKUP_RULES, HIGHLIGHT_RULE] });

createMarkupRule() takes a regexp (named captures are typed), a render function (key, data, parser) => ReactElement, the contexts the rule applies in, and an optional priority (default 0; higher priorities form earlier-resolved tiers). See shelving/markup for the full built-in rule set.

Input normalisation

The parser expects normalised input and deliberately does not try to absorb every whitespace variation. Normalised text means:

  • Indentation is tabs — one tab per nesting level. Leading spaces are not significant.
  • No trailing whitespace at the end of a line.
  • No runs of more than two \n newlines; \r\n / \r line endings are normalised to \n.
  • Control characters are removed.

Keeping this guarantee out of the rules is what keeps them simple: nesting — a sub-list inside a list item, a quote inside a quote — is always exactly one extra tab, never an ambiguous run of spaces.

Normalisation is not performed by the parser — give it text that is already clean. sanitizeMultilineText() from shelving/util/string produces exactly this form: it converts four-space indents to tabs, strips sub-tab leading spaces, trims trailing whitespace, and collapses three-or-more newlines to two. StringSchema runs sanitizeMultilineText() automatically when validating any multi-line field (rows > 1), so text stored through a schema is already normalised.

ts
import { sanitizeMultilineText } from "shelving/util";

// Raw / untrusted text — normalise before parsing.
const node = MARKUP_PARSER.parse(sanitizeMultilineText(untrustedInput));

If the text reaches you straight from a StringSchema-validated field it is already normalised, and you can parse it directly.

Functions

Go

createBlockRegExp()function

Create a RegExp that matches a block of content, wrapped between block-start and block-end boundaries.

createBlockRegExp(pattern: NamedRegExp<T>, start?: PossibleRegExp, end?: PossibleRegExp): NamedRegExp<T>
createBlockRegExp(pattern: PossibleRegExp, start?: PossibleRegExp, end?: PossibleRegExp): T extends NamedRegExpData ? NamedRegExp<T> : RegExp
Go

createLineRegExp()function

Create a RegExp that matches a single line of content, wrapped between line-start and line-end boundaries.

createLineRegExp(pattern: NamedRegExp<T>, start?: PossibleRegExp, end?: PossibleRegExp): T extends NamedRegExpData ? NamedRegExp<T> : RegExp
createLineRegExp(pattern: PossibleRegExp, start?: PossibleRegExp, end?: PossibleRegExp): T extends NamedRegExpData ? NamedRegExp<T> : RegExp
Go

createWordRegExp()function

Create a RegExp that matches a word of content, wrapped between word-start and word-end boundaries.

createWordRegExp(pattern: NamedRegExp<T>, start?: PossibleRegExp, end?: PossibleRegExp): T extends NamedRegExpData ? NamedRegExp<T> : RegExp
createWordRegExp(pattern: PossibleRegExp, start?: PossibleRegExp, end?: PossibleRegExp): T extends NamedRegExpData ? NamedRegExp<T> : RegExp
Go

createMarkupRule()function

Create a typed MarkupRule from a NamedRegExp, a renderer, its contexts, and an optional priority.

createMarkupRule(regexp: NamedRegExp<T>, render: (key: string, data: T, parser: MarkupParser) => ReactElement, contexts: MarkupContexts, priority?: number): MarkupRule
createMarkupRule(regexp: RegExp, render: (key: string, data: EmptyDictionary, parser: MarkupParser) => ReactElement, contexts: MarkupContexts, priority?: number): MarkupRule

Classes

Go

Parserclass

Base class for a parser that converts an input of type I into an output of type O.

new Parser<I, O>()
Go

MarkupParserclass

Parses a Markdownish markup string and renders it as a React node using a tiered, masking rule engine.

new MarkupParser({ rules = MARKUP_RULES, rel, url, root, schemes = HTTP_SCHEMES, context = "block" }: MarkupOptions = {})

Interfaces

Go

MarkupRuleinterface

A single markup rule: a regular expression that matches a span of input plus a renderer that turns the match into an element.

{
	regexp: RegExp;
	render(key: string, data: NamedRegExpData | undefined, parser: MarkupParser): ReactElement;
	contexts: MarkupContexts;
	priority: number;
}

Types

Go

MarkupContextstype

One or more named contexts a markup rule renders in (e.g. ["block"], ["inline", "list", "link"]).

[string, ...string[]]
Go

MarkupRulestype

An immutable list of MarkupRule instances applied by a MarkupParser.

readonly MarkupRule[]
Go

MarkupOptionstype

Options configuring a MarkupParser (represents the current state of the parsing).

{
	readonly rules?: MarkupRules | undefined;
	readonly rel?: string | undefined;
	readonly url?: ImmutableURL | undefined;
	readonly root?: ImmutableURL | undefined;
	readonly schemes?: URISchemes | undefined;
	readonly context?: string;
}

Constants

Go

BLOCK_CONTENT_REGEXPconstant

Regular expression source matching block content — the shortest run of any character.

Go

BLOCK_SPACE_REGEXPconstant

Regular expression source matching block whitespace — any single whitespace character.

Go

BLOCK_START_REGEXPconstant

Regular expression source matching the start of a block — the start of the string, or one linebreak.

Go

BLOCK_END_REGEXPconstant

Regular expression source matching the end of a block — the end of the string, or two linebreaks, with trailing whitespace trimmed.

Go

LINE_CONTENT_REGEXPconstant

Regular expression source matching line content — the shortest run of any character except newline.

Go

LINE_SPACE_REGEXPconstant

Regular expression source matching line whitespace — any single whitespace character except newline.

Go

LINE_START_REGEXPconstant

Regular expression source matching the start of a line — the start of the string, or one linebreak.

Go

LINE_END_REGEXPconstant

Regular expression source matching the end of a line — the end of the string, or one linebreak, with trailing whitespace trimmed.

Go

WORD_CONTENT_REGEXPconstant

Regular expression source matching word content — at least one letter or number character.

Go

WORD_START_REGEXPconstant

Regular expression source matching the start of a word — a zero-width assertion that the previous character is not a letter or number.

Go

WORD_END_REGEXPconstant

Regular expression source matching the end of a word — a zero-width assertion that the next character is not a letter or number.

Go

MARKUP_PARSERconstant

Shared MarkupParser instance configured with the default markup rules and behaviour.

Go

TABLE_RULEconstant

Table.

Go

ORDERED_RULEconstant

Ordered list.

Go

PARAGRAPH_RULEconstant

Paragraph.

Go

HEADING_RULEconstant

Headings are single line only (don't allow multiline).

Go

CODE_RULEconstant

Inline code.

Go

LINK_RULEconstant

Markdown-style link.

Go

AUTOLINK_RULEconstant

Autolinked URL starts with scheme: (any scheme in MarkupOptions.schemes) and matches an unlimited number of non-space characters.

Go

UNORDERED_RULEconstant

Unordered list.

Go

MARKUP_RULES_BLOCKconstant

Default markup rules that render in a block context — fenced code, headings, separators, lists, blockquotes, tables, and paragraphs.

MARKUP_RULES_BLOCK: MarkupRules
Go

MARKUP_RULES_INLINEconstant

Default markup rules that render in an inline context — inline code, links, autolinks, emphasis, and hard linebreaks.

MARKUP_RULES_INLINE: MarkupRules
Go

MARKUP_RULESconstant

Default markup rules — the combined block and inline rules MarkupParser uses when none are supplied.

MARKUP_RULES: MarkupRules
Go

SEPARATOR_RULEconstant

Separator (horizontal rule / thematic break).

Go

LINEBREAK_RULEconstant

Hard linebreak (<br /> tag).

Go

BLOCKQUOTE_RULEconstant

Blockquote block.

Go

INLINE_RULEconstant

Inline strong, emphasis, insert, delete, highlight.

Go

FENCED_RULEconstant

Fenced code blocks