Text, Regex, Markdown, And Templates

String Methods

Strings are immutable values. Every method returns a new string or a derived value - the original is unchanged.

Inspection

Method	Returns	Description
`length()`	`int`	Number of Unicode code points
`isEmpty()`	`bool`	`true` when the string has no characters
`isBlank()`	`bool`	`true` when empty or only whitespace
`get(index)`	`string`	Single character at `index` (negative = from end)
`chars()`	`list<string>`	All characters as a list
`codePointAt(index)`	`int`	Unicode code point at `index`, or `null` if out of range (the "ord" of one character)
`codePoints()`	`list<int>`	All Unicode code points as a list
`graphemes()`	`list<string>`	Grapheme clusters (user-perceived characters)
`graphemeLength()`	`int`	Number of grapheme clusters
`truncateGraphemes(n)`	`string`	First `n` grapheme clusters

import io;

let s = "hello";
io.println(s.length());     # 5
io.println(s.isEmpty());    # false
io.println(s.get(0));       # h
io.println(s.get(-1));      # o
io.println(s.chars());      # [h, e, l, l, o]
io.println(s.codePointAt(0)); # 104

Graphemes vs code points

length(), chars(), and codePoints() work in Unicode code points. A user-perceived character (a "grapheme cluster") can be several code points: a base letter plus combining marks, or an emoji built from a ZWJ sequence. Use the graphemes methods (UAX #29 segmentation) when you mean what the reader sees, for example display width, truncation, or cursor steps.

import io;

let family = "\u{1F468}\u{200D}\u{1F469}\u{200D}\u{1F467}";  # man+ZWJ+woman+ZWJ+girl
io.println(family.length());          # 5  (code points)
io.println(family.graphemeLength());  # 1  (one perceived character)

let accented = "e\u{301}llo";          # e + combining acute = "éllo"
io.println(accented.length());         # 5
io.println(accented.graphemes());      # [é, l, l, o]

io.println("héllo wörld".truncateGraphemes(5));  # héllo
io.println("geblang".graphemes().reverse().join("")); # reverse by grapheme

Searching

Method	Returns	Description
`contains(needle)`	`bool`	`true` when `needle` appears anywhere in the string
`startsWith(prefix)`	`bool`	`true` when the string begins with `prefix`
`endsWith(suffix)`	`bool`	`true` when the string ends with `suffix`
`indexOf(needle)`	`int`	First index of `needle`, or `-1` if not found
`lastIndexOf(needle)`	`int`	Last index of `needle`, or `-1` if not found
`search(needle)`	`list<int>`	Every (rune) start position of `needle`, or every character index where the callable `needle` returns true
`searchPattern(regex)`	`list<int>`	Every match start position (rune index) for `regex`
`count(needle)`	`int`	Number of non-overlapping occurrences of `needle`
`equalsIgnoreCase(other)`	`bool`	Case-insensitive equality
`containsIgnoreCase(needle)`	`bool`	Case-insensitive substring test

import io;

let s = "hello world";
io.println(s.contains("world"));   # true
io.println(s.startsWith("hello")); # true
io.println(s.endsWith("world"));   # true
io.println(s.indexOf("l"));        # 2
io.println(s.lastIndexOf("l"));    # 9
io.println(s.count("l"));          # 3
io.println(s.equalsIgnoreCase("HELLO WORLD"));   # true
io.println(s.containsIgnoreCase("WORLD"));       # true

Slicing And Substrings

substring(start[, end]) and slice(start[, end]) are aliases - both extract a sub-sequence by code-point index. Negative indices count from the end.

Method	Returns	Description
`substring(start[, end])`	`string`	Characters from `start` up to (not including) `end`
`slice(start[, end])`	`string`	Same as `substring`

import io;

let s = "hello world";
io.println(s.substring(6));      # world
io.println(s.substring(0, 5));   # hello
io.println(s.slice(-5));         # world
io.println(s.slice(0, -6));      # hello

Transformation

Method	Returns	Description
`lower()`	`string`	All characters lower-cased
`upper()`	`string`	All characters upper-cased
`capitalize()`	`string`	First character upper-cased, the rest lower-cased
`title()`	`string`	Each whitespace-separated word title-cased
`trim()`	`string`	Leading and trailing whitespace removed
`trimStart()`	`string`	Leading whitespace removed
`trimEnd()`	`string`	Trailing whitespace removed
`replace(old, new[, n])`	`string`	Replace occurrences of `old` with `new`; `n` limits replacements
`reverse()`	`string`	Characters in reversed order
`repeat(n)`	`string`	String repeated `n` times
`padStart(len[, pad])`	`string`	Pad to at least `len` characters on the left
`padEnd(len[, pad])`	`string`	Pad to at least `len` characters on the right
`removePrefix(p)`	`string`	Strip prefix `p` if present, else unchanged
`removeSuffix(s)`	`string`	Strip suffix `s` if present, else unchanged

import io;

let s = "  Hello, World!  ";
io.println(s.trim());                       # Hello, World!
io.println(s.lower());                      # "  hello, world!  "
io.println(s.upper());                      # "  HELLO, WORLD!  "
io.println("abc".repeat(3));               # abcabcabc
io.println("hello".reverse());             # olleh
io.println("7".padStart(4, "0"));          # 0007
io.println("hi".padEnd(5, "."));           # hi...
io.println("hello world".replace("o", "0")); # hell0 w0rld
io.println("hello world".replace("o", "0", 1)); # hell0 world
io.println("hELLO wORLD".capitalize());     # Hello world
io.println("hELLO wORLD".title());          # Hello World
io.println("/usr/bin".removePrefix("/"));   # usr/bin
io.println("report.txt".removeSuffix(".txt")); # report

Splitting And Joining

Method	Returns	Description
`split(sep)`	`list<string>`	Split on `sep`; returns list of parts
`lines()`	`list<string>`	Split on line boundaries (LF and CRLF); no trailing empty line
`format(...)`	`string`	`printf`-style formatting with positional `{}` placeholders

import io;

let csv = "a,b,c,d";
let parts = csv.split(",");
io.println(parts);          # [a, b, c, d]
io.println(parts.length()); # 4

io.println("line1\nline2\nline3".lines()); # [line1, line2, line3]

let msg = "Hello, {}! You have {} messages.".format("Ada", 3);
io.println(msg);  # Hello, Ada! You have 3 messages.

Conversion

Method	Returns	Description
`toString()`	`string`	Returns the string itself (identity)
`isInt()`	`bool`	`true` exactly when `toInt()` would succeed (same parse: signs, `0b`/`0o`/`0x` bases, `_` separators)
`isDecimal()`	`bool`	`true` exactly when `toDecimal()` would succeed
`isNumeric()`	`bool`	`true` when the string parses as an int or a decimal

These predicates never throw, so you can test a string before converting instead of wrapping the cast in try/catch. They reuse the exact toInt / toDecimal parse, so s.isInt() is true if and only if s.toInt() does not raise.

Cast with as int, as decimal, as float, as bool where needed. Also new in 1.0.2: as bytes encodes the string as UTF-8, and a bytes value cast back as string decodes UTF-8 (the cast raises a catchable RuntimeError if the byte sequence is not valid UTF-8).

let b = "résumé" as bytes;
io.println(b.length);     # 8 (two two-byte runes plus four ASCII)
io.println(b as string);  # résumé

String Factories: `string`

Import string. The module is a small namespace for static / factory functions that don't belong on a string instance (you can't ask a non-existent string for its codepoint). Everything else string-related is an instance method - see String Methods above.

Function	Returns	Description
`fromCodePoint(n)`	`string`	Single-character string for the Unicode codepoint `n` (this is "chr"). Rejects negative values, values above U+10FFFF, and the UTF-16 surrogate range U+D800..U+DFFF.
`fromCodePoints(list<int>)`	`string`	Multi-character string built from a list of codepoints. Same validation per element.
`compare(a, b)`	`int`	Three-way comparison returning -1 / 0 / +1. Pass it straight to `xs.sort(string.compare)` (sort accepts a three-way comparator). Compares the underlying UTF-8 bytes, which agrees with codepoint order.
`equalsFold(a, b)`	`bool`	Case-insensitive equality respecting Unicode case folding. `string.equalsFold("CafÉ", "café")` is `true`.

import string;
import io;

io.println(string.fromCodePoint(65));               # A
io.println(string.fromCodePoint(8364));             # €
io.println(string.fromCodePoints([72, 105, 33]));   # Hi!
io.println(string.compare("apple", "banana"));      # -1
io.println(string.equalsFold("Hello", "HELLO"));    # true

Geblang has no separate chr / ord: string.fromCodePoint(n) is chr (codepoint to character) and s.codePointAt(i) is ord (character to codepoint). s.codePoints() and string.fromCodePoints convert a whole string to and from a list<int> of codepoints.

For timing-attack-safe string equality (HMAC verification, token comparison, etc.) use secrets.constantTimeEqual(a, b) from the security module - see Security. string.equalsFold and string.compare are not constant-time.

Regex string-method variants

Three convenience methods route through the re module without requiring the import re:

Method	Returns	Description
`splitRegex(pattern)`	`list<string>`	Split by a regex pattern.
`replaceRegex(pattern, replacement)`	`string`	Replace every regex match. `$1` / `$2` capture-group references work in the replacement.
`matchesRegex(pattern)`	`bool`	True when the string contains a match.

let parts = "foo, bar; baz".splitRegex("[,;] *");          # ["foo","bar","baz"]
let normalised = "John Smith".replaceRegex("(\\w+) (\\w+)", "$2, $1"); # "Smith, John"
let ok = "foo123".matchesRegex("[a-z]+[0-9]+");            # true

The pattern compile cache (introduced in 1.0.5 for the re module) applies here too, so repeated calls with the same pattern skip the recompile.

Builder: `strings.StringBuilder`

Import strings. StringBuilder is a builder-backed accumulator. Use it for tight loops that append many fragments - internally a single strings.Builder grows amortised O(n) instead of the O(n²) cost of repeated acc = acc + fragment allocating a fresh string every iteration.

import strings;
import io;

let sb = strings.StringBuilder();
for (int i = 0; i < 10; i++) {
    sb.append("part-");
    sb.append(i as string);
    sb.appendLine("");
}
io.println(sb.build());
sb.dispose();

Method	Returns	Description
`StringBuilder(initial = "")`	`StringBuilder`	Construct a new builder, optionally pre-seeded with `initial`.
`append(s)`	`StringBuilder`	Append a fragment. Returns `this` for chaining.
`appendLine(s)`	`StringBuilder`	Append a fragment followed by `\n`. Returns `this`.
`build()`	`string`	Materialise the accumulated content.
`length()`	`int`	Current byte length.
`clear()`	`StringBuilder`	Reset the buffer to empty. Returns `this`.
`dispose()`	`void`	Release the underlying handle. Safe to call multiple times. Call in long-running processes to free the builder.

For the common acc = acc + "literal" idiom inside a loop, the bytecode compiler automatically swaps the local to a builder-backed representation behind the scenes, then materialises it back to a string on the next read. No source change required:

string acc = "";
for (int i = 0; i < 10000; i++) {
    acc = acc + "x";          # compiler emits builder-backed append
}
io.println(acc.length());     # 10000 - acc materialises here

Reach for the explicit StringBuilder when the auto-rewrite doesn't apply: dynamic (non-literal) RHS, accumulator written through a class field, or when you want chained writes (sb.append("a").append("b")).

Low-level primitives: `strbuilder`

StringBuilder is implemented in stdlib/strings.gb on top of the strbuilder native module. The handle-based primitives are available directly for advanced uses:

Function	Returns	Description
`strbuilder.new(initial = "")`	handle	Create a new builder; returns an opaque handle.
`strbuilder.append(h, s)`	handle	Append `s` to the builder; returns `h`.
`strbuilder.appendLine(h, s)`	handle	Append `s` followed by `\n`.
`strbuilder.build(h)`	`string`	Materialise the current content.
`strbuilder.length(h)`	`int`	Current byte length.
`strbuilder.clear(h)`	handle	Reset the buffer.
`strbuilder.dispose(h)`	`null`	Release the handle.

Regex: `re`

Import re. The module is a thin wrapper over Go's regexp/syntax (RE2 dialect, no backreferences but full Unicode, anchors, and lookahead-free alternation).

test(pattern, text) - returns bool.
find(pattern, text) - returns the first match as a string, or null.
findAll(pattern, text) - returns every non-overlapping match as list<string>.
match(pattern, text) - returns a dict with the first match plus capture groups (see below), or null.
matchAll(pattern, text) - returns list<dict> with one entry per non-overlapping match.
replace(pattern, replacement, text) - returns a string. Use $1, $2, ${name} in replacement to reference capture groups.
split(pattern, text) - returns a list<string>.
compile(pattern) - validates the pattern eagerly and returns a reusable Pattern object.

Compiled patterns

re.compile(pattern) returns a Pattern that carries the compiled expression, so a loop states the pattern once and its methods drop the pattern argument:

let id = re.compile("[a-z]+[0-9]+");
for (token in tokens) {
    if (id.test(token)) { ... }
}

Pattern has the same surface as the module functions without the leading pattern: test(text), find(text), findAll(text), match(text), matchAll(text), replace(replacement, text), split(text). Invalid patterns raise at compile time rather than at first use. Performance is on par with the cached module functions for a single hot pattern, and steadier when several patterns are used in the same loop (each compiled form is retained, where the plain functions share one most-recent-pattern cache slot).

Match results

re.match and re.matchAll return dicts in the same shape:

Field	Type	Description
`text`	`string`	The whole match (alias for `groups[0]`).
`groups`	`list<string>`	Every group in order. `groups[0]` is the whole match; `groups[1]`, `groups[2]`, ... are the parenthesised subexpressions.
`named`	`dict<string, string>`	Named capture groups (`(?P<name>...)`) keyed by name.

import re;
import io;

let m = re.match("(?P<word>[A-Za-z]+)([0-9]+)", "Ada123");
io.println(m["text"]);              # Ada123
io.println(m["groups"][1]);         # Ada      (numbered group 1)
io.println(m["groups"][2]);         # 123      (numbered group 2)
io.println(m["named"]["word"]);     # Ada      (named group)

# Extract every name=value pair from a free-form string.
let pairs = re.matchAll("(?P<k>\\w+)=\"(?P<v>[^\"]*)\"",
                       "user=\"ada\" role=\"admin\"");
for (pair in pairs) {
    io.println(pair["named"]["k"] + " -> " + pair["named"]["v"]);
}

Anchors and flags

Geblang regexes follow Go's RE2 syntax. Anchors ^/$ match at start/end of input by default; pass (?m) to make them match line boundaries. Other useful inline flags:

(?i) - case-insensitive
(?s) - dot matches newline
(?U) - swap greedy and non-greedy quantifiers

io.println(re.test("(?i)^hello",  "Hello World"));   # true
io.println(re.test("(?s)foo.bar", "foo\nbar"));      # true

PCRE-compatible regex: `pcre`

Import pcre. pcre runs a PCRE-style engine (backed by .NET's regex syntax) that supports the features RE2 omits: lookahead, lookbehind, backreferences, atomic groups, possessive quantifiers, and named captures via either (?P<name>...) (PHP / Python) or (?<name>...) (.NET / PCRE2) syntax. Use it when porting PHP code or when the pattern needs features RE2 can't express.

re and pcre coexist. Prefer re for hot paths or any input that may be user-controlled (RE2 has linear-time matching and no catastrophic backtracking); reach for pcre when you need the richer syntax.

Every function accepts an optional flags string as the last argument:

Flag	Meaning
`i`	Case-insensitive
`m`	Multiline (`^` / `$` match line boundaries)
`s`	Dotall (`.` matches newlines)
`x`	Extended (whitespace ignored, `#` comments allowed)

Functions

test(pattern, text, flags = "") - returns bool.
find(pattern, text, flags = "") - first match as a string, or null.
findAll(pattern, text, flags = "") - every non-overlapping match as list<string>.
match(pattern, text, flags = "") - dict with text / groups / named (same shape as re.match), or null.
compile(pattern, flags = "") - returns a reusable Pattern that carries the pattern and flags; its methods mirror the functions without the pattern/flags arguments (e.g. pcre.compile("^foo$", "im").test(text)).
matchAll(pattern, text, flags = "") - list<dict>.
replace(pattern, replacement, text, flags = "") - returns a string. Use $1, $2, ${name} for backrefs.
split(pattern, text, flags = "") - returns a list<string>.
quote(text) - escapes regex metacharacters in a literal string.

Examples

import pcre;
import io;

# Lookahead: PCRE-only.
io.println(pcre.find('\w+(?=ing\b)', "swimming and running"));  # swimm

# Lookbehind: PCRE-only.
io.println(pcre.find('(?<=\$)\d+', "price is $42"));            # 42

# Backreferences: PCRE-only.
io.println(pcre.test('(\w+)\s+\1', "hello hello"));             # true

# PHP-style (?P<name>...) syntax works unchanged.
let m = pcre.match('(?P<word>[a-z]+)(?P<num>\d+)', "abc123");
io.println(m["named"]["word"]);                                  # abc

# Numbered backreference in replacement.
io.println(pcre.replace('(\w+) (\w+)', "$2 $1", "hello world")); # world hello

# Case-insensitive via flags.
io.println(pcre.test("hello", "HELLO", "i"));                    # true

# Escape user input before splicing into a pattern.
let needle = pcre.quote("a.b+c");
io.println(pcre.test(needle, "x a.b+c y"));                      # true

Markdown: `markdown`

Import markdown. The module supports full GitHub Flavored Markdown (GFM) - tables, strikethrough, task lists, autolinks, ordered lists, blockquotes, horizontal rules, setext headings, and raw HTML passthrough.

renderHtml(source) - render to HTML string.
parse(source) - returns a list<dict> of block nodes. Each dict has a "type" key; additional keys depend on the type (see below).
stripText(source) - extract all plain text, stripping markup.

Block types returned by parse:

`type`	Additional keys
`"heading"`	`level: int`, `text: string`
`"paragraph"`	`text: string`
`"list"`	`items: list<string>`
`"ordered_list"`	`items: list<string>`
`"task_list"`	`items: list<dict>` - each `{text: string, checked: bool}`
`"code"`	`lang: string`, `code: string`
`"table"`	`headers: list<string>`, `rows: list<list<string>>`
`"blockquote"`	`text: string`
`"hr"`	(no extra keys)
`"html"`	`html: string`

import markdown;
import io;

let src = "## Hello\n\n| col1 | col2 |\n|------|------|\n| a | b |\n\n- [x] done\n- [ ] todo";
io.println(markdown.renderHtml(src));

let blocks = markdown.parse(src);
io.println(blocks[0]["type"]);          # heading
io.println(blocks[1]["headers"][0]);    # col1
io.println(blocks[2]["items"][0]["checked"]);  # true

Unicode normalisation: `unicode` (1.6.0)

The unicode module exposes the four Unicode normalisation forms via unicode.normalize(s, form). form is the canonical SPDX-style name: "NFC", "NFD", "NFKC", or "NFKD".

import unicode;

let nfd = "é";                 # e + U+0301 combining acute (2 code points)
let nfc = unicode.normalize(nfd, "NFC");
io.println(nfc.length());          # 1 - now a single code point
io.println(unicode.normalize("ﬁ", "NFKC"));   # fi - ligature decomposed

Function	Returns	Description
`unicode.normalize(s, form)`	`string`	A copy of `s` normalised under `form`. Throws on an unknown form.
`unicode.isNormalized(s, form)`	`bool`	True when `s` is already in `form`. Cheap; does not allocate a normalised copy.

When to use which form

Form	Effect	Typical use
NFC	Canonical composition. Combining marks fold into precomposed code points where one exists.	Storage, display, equality comparison of "the same character" inputs. The Web's standard.
NFD	Canonical decomposition. Precomposed characters split into base + combining marks.	Sorting that respects diacritics, accent-insensitive search after stripping marks.
NFKC	Compatibility composition. Compatibility equivalents (ligatures, full-width, superscripts) fold to their base form, then canonical composition is applied.	Search across visually-similar characters; input sanitisation.
NFKD	Compatibility decomposition. Same compatibility folding as NFKC but no recomposition.	The fully decomposed canonical form; rarely needed directly.

Normalising untrusted input before storing or comparing is good defensive practice: it stops bypass attacks that rely on visually identical but byte-different strings ("admin" vs "admın" with a Turkish dotless i, for example - NFKC won't collapse that, but normalising at least makes equality reliable).

Templates: `template`

The template module is backed by Go's html/template: a full templating engine with data binding, conditionals, loops, and pipelines, plus contextual auto-escaping - interpolated values are HTML-escaped for the position they appear in (element text, attribute, URL, script), so the engine is XSS-safe by default. (For escaping a single string outside a template, see encoding.htmlEscape; for sanitizing untrusted HTML, encoding.sanitizeHtml.)

Module functions:

renderString(source, data) - render a template string against data, returning the result string.
Template(source[, name]) - compile a reusable Template value.
load(path) - read and compile a template from a file.
Engine(dir) - a TemplateEngine rooted at a directory; accepts a string path or an options dict ({"dir": ...}).

Template methods: render(data), name(), path(), toString(). TemplateEngine methods: render(name, data) (loads <dir>/<name> and renders), load(name) (returns a Template), dir().

Syntax

Data is supplied as a dict (or any value); fields are referenced with a leading dot:

import template;
import io;

io.println(template.renderString("Hello {{.name}}", {"name": "Ada"}));
io.println(template.renderString("{{.user.email}}",
    {"user": {"email": "[email protected]"}}));

Common actions:

Conditionals: {{if .admin}}Admin{{else}}Guest{{end}}
Iteration: {{range .items}}<li>{{.}}</li>{{end}} (inside range, . is the current element; {{range $i, $v := .items}} binds index and value).
Scoping: {{with .profile}}{{.bio}}{{end}}
Pipelines: {{.price | printf "%.2f"}}
Comments: {{/* not rendered */}}

let tmpl = template.Template(
    "<ul>{{range .todos}}<li>{{.title}}</li>{{end}}</ul>");
io.println(tmpl.render({"todos": [{"title": "ship"}, {"title": "rest"}]}));

Auto-escaping means untrusted data is safe to interpolate directly; the engine escapes <, >, &, quotes, and URL/script context as needed. To emit pre-trusted HTML verbatim, mark it with the engine's standard mechanisms rather than disabling escaping.

A directory-backed engine keeps templates on disk:

let engine = template.Engine("templates");
io.println(engine.render("welcome.html", {"name": "Ada"}));

← Collections Bytes, Encoding, And Compression →