Syntax Highlighting System Architecture
MyLittleContentEngine features a sophisticated multi-layered syntax highlighting system that provides server-side highlighting for optimal performance while maintaining broad language compatibility. The system uses a cascading approach with four different highlighting engines, each optimized for specific use cases.
Note
Architecture Overview
The syntax highlighting system follows a priority-based cascade that maximizes accuracy and performance:
graph TD
    A[Code Block] --> B{Language Detection}
    B --> C{Roslyn Available?}
    C -->|Yes + C#/VB| D[Roslyn Highlighter]
    C -->|No| E{Custom Highlighter?}
    E -->|GBNF/Shell/Text| F[Custom Highlighter]
    E -->|No| G{TextMate Grammar?}
    G -->|Available| H[TextMateSharp]
    G -->|Unavailable| I[Client-side Highlight.js]
  
    D --> J[Server-side HTML with CSS Classes]
    F --> J
    H --> J
    I --> K[Client-side Highlighting]
This layered approach ensures that:
- C# and VB.NET get the most accurate highlighting via Roslyn
 - Common languages are highlighted server-side for better performance
 - Specialized languages receive custom tokenization
 - Any language can be handled via client-side fallback
 
Layer 1: Roslyn Highlighter (Highest Priority)
The Roslyn highlighter provides the most accurate syntax highlighting for .NET languages by leveraging Microsoft's actual C# and VB.NET compilers.
Special Roslyn Features
XML Documentation Integration
```csharp:xmldocid
T:System.Collections.Generic.List<T>.Add
```
This extracts the actual implementation of List<T>.Add from the loaded assemblies and highlights it with full semantic information.
File Path Loading
```csharp:path
examples/MinimalExample/Program.cs
```
Loads and highlights external files from your solution, ensuring documentation stays in sync with actual code.
Tip
Body-Only Documentation
```csharp:xmldocid,bodyonly
M:MyNamespace.MyClass.MyMethod
```
Shows only the method body without class declaration context.
Implementation Details
The Roslyn highlighter operates by:
- Loading Solutions: Scans referenced assemblies and XML documentation
 - Semantic Analysis: Uses full compiler semantic information
 - Classification: Applies precise token classification (keywords, types, literals, etc.)
 - HTML Generation: Outputs semantic HTML with CSS classes matching Highlight.js conventions
 
Layer 2: Custom Highlighters
Custom highlighters provide specialized tokenization for domain-specific languages that benefit from purpose-built parsers.
GBNF (Grammar Backus-Naur Form)
Language: gbnf
Purpose: Highlighting grammar definitions for language parsers
root ::= expr
expr ::= term (("+" | "-") term)*
term ::= factor (("*" | "/") factor)*
factor ::= number | "(" expr ")"
number ::= [0-9]+
Features:
- Rule Detection: Identifies grammar rule definitions
 - Operator Highlighting: Highlights GBNF operators (
::=,|,*,+,?) - String Literals: Recognizes quoted terminal symbols
 - Character Ranges: Highlights bracket notation 
[a-z] - Comments: Supports 
//line comments 
Shell/Bash Highlighter
Languages: bash, shell
Purpose: Command-line script highlighting with semantic understanding
#!/bin/bash
# Install dependencies
curl -sfL https://example.com/install.sh | tar -xzf -
dotnet build --configuration Release
Features:
- Command Recognition: Identifies shell commands and built-ins
 - Flag Highlighting: Recognizes command-line flags (
-f,--verbose) - String Detection: Handles single and double-quoted strings
 - Comment Support: Highlights 
#andREMcomments - Shebang Support: Recognizes script interpreters
 
Plain Text Handler
Languages: text, `` (empty string)
Purpose: Renders code without syntax highlighting
Simply wraps content in <pre><code> tags with HTML escaping, providing a fallback for content that shouldn't be highlighted.
Layer 3: TextMateSharp Integration
TextMateSharp provides broad language support using Visual Studio Code's grammar definitions, offering server-side highlighting for 49+ programming languages.
Language Coverage
Web Technologies
- Frontend: HTML, CSS, SCSS, Less, JavaScript, TypeScript
 - Data: JSON, XML, YAML
 - Templates: Handlebars, Pug, Razor
 
Systems Programming
- Native: C, C++, Rust, Go, Swift, Objective-C
 - Managed: Java, Kotlin, Dart, Pascal
 
Scripting & Dynamic
- Python: Full Python 3 syntax support
 - Ruby: Including Rails-specific highlighting
 - PHP: Web-focused syntax highlighting
 - PowerShell: Windows scripting support
 - Lua: Lightweight scripting language
 - R: Statistical computing language
 
Functional Programming
- F#: .NET functional language
 - Clojure: JVM-based Lisp dialect
 - Julia: Scientific computing language
 
Documentation & Markup
- Markdown: Including extensions and frontmatter
 - LaTeX: Mathematical typesetting
 - AsciiDoc: Technical documentation
 - Typst: Modern markup language
 
Specialized Languages
- HLSL/ShaderLab: Graphics programming
 - SQL: Database queries
 - Groovy: JVM scripting
 - Dockerfile: Container definitions
 
Fallback Behavior
When a requested language isn't found:
- Scope Name Generation: Tries 
source.{language}pattern - Alias Resolution: Attempts common language aliases
 - Plain Text Fallback: Renders as plain code block if no grammar is found
 
Layer 4: Client-Side Highlight.js (Fallback)
When server-side highlighting isn't available, the system falls back to client-side Highlight.js for maximum language compatibility.
Pre-loaded Languages (23)
High-frequency languages loaded immediately:
- Web: 
javascript,typescript,css,html,xml,json,yaml - Systems: 
cpp,c,rust,go,java,kotlin,swift - Scripting: 
python,php,ruby,bash,shell - Data: 
sql,markdown - .NET: 
csharp 
Dynamic Loading
For uncommon languages, Highlight.js loads additional grammars from CDN:
// Lazy load additional languages from CDN
const response = await fetch(`https://cdn.jsdelivr.net/npm/highlight.js@11/lib/languages/${language}.min.js`);
This provides access to 190+ languages without increasing initial bundle size.
Summary
MyLittleContentEngine's syntax highlighting system provides comprehensive language support through a multi-layered architecture with 52+ server-side languages and 190+ client-side languages for maximum compatibility.