Skip to content

ivan-magda/kotlin-textmate

Repository files navigation

Hero

CI Benchmark License: MIT

A Kotlin/JVM port of vscode-textmate — TextMate grammar tokenizer for syntax highlighting on Android and JVM.

Why

Existing TextMate engines for the JVM are either abandoned or tied to specific ecosystems. codroid-textmate has been dormant since 2022. tm4e targets Eclipse and requires Java 21. Neither provides a Compose UI layer.

The alternative is Highlights, which uses hand-written regex and supports 17 languages.

TextMate grammars — the same format VS Code uses — cover 600+ languages and are actively maintained. KotlinTextMate ports vscode-textmate to Kotlin with Compose integration and published conformance tests and benchmarks.

Features

  • Loads standard .tmLanguage.json grammars — the same files VS Code uses
  • VS Code JSON theme support (Dark+, Light+, or any tokenColors-based theme)
  • Jetpack Compose CodeBlock composable with AnnotatedString output
  • Joni (Java Oniguruma) regex engine with graceful fallback for unsupported patterns
  • Line-by-line tokenization with persistent state across lines

Quick start

// Load a grammar
val rawGrammar = assets.open("grammars/kotlin.tmLanguage.json")
    .use { GrammarReader.readGrammar(it) }
val grammar = Grammar(rawGrammar.scopeName, rawGrammar, JoniOnigLib())

// Load a theme (base + overlay, same as VS Code)
val theme = assets.open("themes/dark_vs.json").use { base ->
    assets.open("themes/dark_plus.json").use { overlay ->
        ThemeReader.readTheme(base, overlay)
    }
}

// Render in Compose
CodeBlock(
    code = sourceCode,
    grammar = grammar,
    theme = theme,
)

For custom rendering without CodeBlock:

val highlighted = rememberHighlightedCode(code, grammar, theme)
Text(text = highlighted)

Or tokenize directly:

var state: StateStack? = null
for (line in code.lines()) {
    val result = grammar.tokenizeLine(line, state)
    state = result.ruleStack
    for (token in result.tokens) {
        val style = theme.match(token.scopes)
        // style.foreground (ARGB), style.fontStyle
    }
}

Project structure

Module Description
core Grammar parsing, rule compilation, tokenizer, theme engine (pure JVM)
compose-ui CodeBlock composable, CodeHighlighter, AnnotatedString bridge
sample-app Android demo app — 3 languages, 2 themes, soft wrap toggle
benchmark JMH performance benchmarks via kotlinx-benchmark

Benchmarks

Grammar Lines/sec ms per 1k lines
Kotlin 79,300 12.6
JSON 457,600 2.2
Markdown 95,700 10.4
JavaScript 10,300 97.1

Competitive with vscode-textmate (~5.6–18.3k lines/sec on jQuery) and syntect (~13k). Details and methodology in BENCHMARK.md.

Known limitations

This is a proof-of-concept port. The following are not yet supported:

  • No injection grammars. RawGrammar.injections is parsed but never evaluated. Grammars that use injections for scope-specific overrides or embedded languages will miss those rules. The scope matcher (matcher.ts) is not ported.
  • Joni regex limitation. Backreferences inside lookbehind assertions (e.g., (?<=_\1)) cannot compile in Joni. Such patterns fall back to a never-matching sentinel (?!). Tracked in #9.
  • JVM/Android only. The regex layer uses Joni (Java Oniguruma). iOS/Desktop would require a expect/actual abstraction with a native Oniguruma binding.
  • No incremental tokenization. There is no built-in line-level state cache for partial re-tokenization. The bundled CodeHighlighter retokenizes the entire file on every call. Consumers can implement their own caching on top of Grammar.tokenizeLine()'s prevState parameter.
  • Not thread-safe. Grammar holds mutable compilation state. Do not call tokenizeLine() concurrently on the same instance. Theme is safe to share. See ARCHITECTURE.md for details.
  • Per-token background color not rendered. CodeHighlighter applies foreground color and font style from theme rules but drops per-token background. Only the theme's default background is used as the container color.

Acknowledgments

  • vscode-textmate — the TypeScript implementation this project ports
  • Joni — Java port of the Oniguruma regex engine
  • TextMate — the original grammar format

License

MIT

About

Pure Kotlin port of VS Code's TextMate grammar engine. No regex hacks - real TextMate scoping with standard .tmLanguage grammars and Compose AnnotatedString rendering. Android/JVM.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors