Friday, November 9, 2007

PowerShell syntax highlighting with HTML

When I decided to start this blog, I thought it would be nice to be able to display PowerShell code examples with nice formatting and syntax highlighting. I tried a few freely available tools out there that advertised PowerShell syntax support, but they all seemed to fall short in a category or two. None of them correctly handled multiple line strings or here-strings, and none of them correctly highlighted PowerShell variables enclosed in curly braces e.g. "${this is a variable}".

I thought it would be fun to try to write my own syntax highlighting tool with PowerShell. It was a little more difficult than I originally thought it would be, but it really was fun.

The script takes a string parameter that can be a code snippet or a path to a PowerShell script file. A switch parameter can be provided if line numbers are wanted in the output. The script highlights strings, comments, operators, numbers, keywords (including things kind of like keywords), types (specifically the shortcut types available in PowerShell, like [string] and [regex]), variables, and Cmdlet names. The colors used to highlight each of these items, along with the background color, default foreground color, and line number color can be customized by changing the values of the variables declared at the top of the script.

Here is the script (highlighted with itself):

# Highlight-Syntax.ps1
# version 1.0
# by Jeff Hillman
#
# this script uses regular expressions to highlight PowerShell
# syntax with HTML.

param[string] $code, [switch] $LineNumbers )

if ( Test-Path $code -ErrorAction SilentlyContinue )
{
    $code = Get-Content $code | Out-String
}

$backgroundColor = "#DDDDDD"
$foregroundColor = "#000000"
$stringColor     = "#800000"
$commentColor    = "#008000"
$operatorColor   = "#C86400"
$numberColor     = "#800000"
$keywordColor    = "#C86400"
$typeColor       = "#404040"
$variableColor   = "#000080"
$cmdletColor     = "#C86400"
$lineNumberColor = "#404040"

filter Html-Encode( [switch] $Regex )
{
    # some regular expressions operate on strings that have already
    # been through this filter, so the patterns need to be updated
    # to look for the encoded characters instead of the literal ones.
    # we do it with this filter instead of directly in the regular 
    # expression so the expressions can be a bit more readable (ha!)

    $_ = $_ -replace "&", "&"
    
    if ( $Regex )
    {
        $_ = $_ -replace "(?<!\(\?)<", "&lt;"
        $_ = $_ -replace "(?<!\(\?)>", "&gt;"
    }
    else
    {
        $_ = $_ -replace "\t", "    "
        $_ = $_ -replace " ", "&nbsp;"
        $_ = $_ -replace "<", "&lt;"
        $_ = $_ -replace ">", "&gt;"
    }
    
    $_
}

# regular expressions

$operatorRegex =  @"
((?x:
 (?# assignment operators)
 =|\+=|-=|\*=|/=|%=|
 (?# arithmatic operators)
 (?<!\de)
 (\+|-|\*|/|%)(?![a-z])|
 (?# unary operators)
 \+\+|\-\-|
 (?# logical operators)
 (-and|-or|-not)\b|!|
 (?# bitwise operators)
 (-band|-bor)\b|
 (?# redirection and pipeline operators)
 2>>|>>|2>&1|1>&2|2>|>|<|\||
 (?# comparison operators)
 (
  -[ci]? (?# case and case-insensitive variants)
  (eq|ne|ge|gt|lt|le|like|notlike|match|notmatch|replace|contains|notcontains)\b
 )|
 (?# type operators)
 (-is|-isnot|-as)\b|
 (?# range and miscellaneous operators)
 \.\.|(?<!\d)\.(?!\d)|&|::|:|,|``|
 (?# string formatting operator)
 -f\b
))
"@ | Html-Encode -Regex

$numberRegex = @"
((?x:
 (
  (?# hexadecimal numbers)
  (\b0x[0-9a-f]+)|
  (?# regular numbers)
  (?<!&)
  ((\b[0-9]+(\.(?!\.))?[0-9]*)|((?<!\.)\.[0-9]+))
  (?!(>>|>&[12]|>))
  (?# scientific notation)
  (e(\+|-)?[0-9]+)?
 )
 (
  (?# type specifiers)
  (l|ul|u|f|ll|ull)?
  (?# size shorthand)
  (b|kb|mb|gb)?
  \b
 )?
))
"@ | Html-Encode -Regex

$keyWordRegex = @"
((?x:
 \b(
 (?# don't match anything that looks like a variable or a parameter)
 (?<![-$])
 (
  (?# condition keywords)
  if|else|elseif|(?<!\[)switch(?!\])|
  (?# loop keywords)
  for|(?<!\|</span>&nbsp;)foreach(?!-object)|in|do|while|until|default|break|continue|
  (?# scope keywords)
  global|script|local|private|
  (?# block keywords)
  begin|process|end|
  (?# other keywords)
  function|filter|param|throw|trap|return
 )
 )\b
))
"@

$typeRegex = @"
((?x:
 \[
 (
  (?# primitive types and arrays of those types)
  ((int|long|string|char|bool|byte|double|decimal|float|single)(\[\])?)|
  (?# other types)
  regex|array|xml|scriptblock|switch|hashtable|type|ref|psobject|wmi|wmisearcher|wmiclass
 )
 \]
))
"@

$cmdletNames = Get-Command -Type Cmdlet | Foreach-Object { $_.Name }

function Highlight-Other( [string] $code )
{
    $highlightedCode = $code | Html-Encode
    
    # operators
    $highlightedCode = $highlightedCode -replace 
        $operatorRegex, "<span style='color: $operatorColor'>`$1</span>"

    # numbers
    $highlightedCode = $highlightedCode -replace 
        $numberRegex, "<span style='color: $numberColor'>`$1</span>"

    # keywords
    $highlightedCode = $highlightedCode -replace 
        $keyWordRegex, "<span style='color: $keywordColor'>`$1</span>"

    # types
    $highlightedCode = $highlightedCode -replace 
        $typeRegex, "<span style='color: $typeColor'>`$1</span>"

    # Cmdlets
    $cmdletNames | Foreach-Object {
        $highlightedCode = $highlightedCode -replace 
            "\b($_)\b", "<span style='color: $cmdletColor'>`$1</span>"
    }

    $highlightedCode
}

$RegexOptions = [System.Text.RegularExpressions.RegexOptions]

$highlightedCode = ""

# we treat variables, strings, and comments differently because we don't 
# want anything inside them to be highlighted.  we combine the regular 
# expressions so they are mutually exclusive

$variableRegex = '(\$(\w+|{[^}`]*(`.[^}`]*)*}))'

$stringRegex = @"
(?x:
 (?# here strings)
 @[`"'](.|\n)*?^[`"']@|
 (?# double-quoted strings)
 `"[^`"``]*(``.[^`"``]*)*`"|
 (?# single-quoted strings)
 '[^'``]*(``.[^'``]*)*'
)
"@

$commentRegex = "#[^\r\n]*"

[regex]::Matches( $code, 
                  "(?<before>(.|\n)*?)" + 
                  "((?<variable>$variableRegex)|" + 
                  "(?<string>$stringRegex)|" + 
                  "(?<comment>$commentRegex))",
                  $RegexOptions::MultiLine ) | Foreach-Object {
    # highlight everything before the variable, string, or comment    
    $highlightedCode += Highlight-Other $_.Groups[ "before" ].Value

    if ( $_.Groups[ "variable" ].Value )
    {
        $highlightedCode += 
            "<span style='color: $variableColor'>" + 
            ( $_.Groups[ 'variable' ].Value | Html-Encode ) + 
            "</span>"
    }
    elseif ( $_.Groups[ "string" ].Value )
    {
        $string = $_.Groups[ 'string' ].Value | Html-Encode
        
        $string = "<span style='color: $stringColor'>$string</span>"

        # we have to highlight each piece of multi-line strings
        if ( $string -match "\r\n" )
        {
            # highlight any line continuation characters as operators
            $string = $string -replace 
                "(``)(?=\r\n)", "<span style='color: $operatorColor'>``</span>"

            $string = $string -replace 
                "\r\n", "</span>`r`n<span style='color: $stringColor'>"
        }

        $highlightedCode += $string
    }
    else
    {
        $highlightedCode += 
            "<span style='color: $commentColor'>" + 
            $( $_.Groups[ 'comment' ].Value | Html-Encode ) + 
            "</span>"
    }

    # we need to keep track of the last position of a variable, string, 
    # or comment, so we can highlight everything after it
    $lastMatch = $_
}

if ( $lastMatch )
{
    # highlight everything after the last variable, string, or comment   
    $highlightedCode += Highlight-Other $code.SubString( $lastMatch.Index + $lastMatch.Length )
}
else
{
    $highlightedCode = Highlight-Other $code
}

# add line breaks
$highlightedCode = 
    [regex]::Replace( $highlightedCode, '(?=\r\n)', '<br />', $RegexOptions::MultiLine )

# put the highlighted code in the pipeline
"<div style='width: 100%; " + 
            "/*height: 100%;*/ " +
            "overflow: auto; " +
            "font-family: Consolas, `"Courier New`", Courier, mono; " +
            "font-size: 12px; " +
            "background-color: $backgroundColor; " +
            "color: $foregroundColor; " + 
            "padding: 2px 2px 2px 2px; white-space: nowrap'>"

if ( $LineNumbers )
{
    $digitCount = 
        ( [regex]::Matches( $highlightedCode, "^", $RegexOptions::MultiLine ) ).Count.ToString().Length

    $highlightedCode = [regex]::Replace( $highlightedCode, "^", 
        "<li style='color: $lineNumberColor; padding-left: 5px'><span style='color: $foregroundColor'>",
        $RegexOptions::MultiLine )

    $highlightedCode = [regex]::Replace( $highlightedCode, "<br />", "</span><br />",
        $RegexOptions::MultiLine )
    
    "<ol start='1' style='border-left: " +
                         "solid 1px $lineNumberColor; " +
                         "margin-left: $( ( $digitCount * 10 ) + 15 )px; " +
                         "padding: 0px;'>"
}

$highlightedCode

if ( $LineNumbers )
{
    "</ol>"
}

"</div>"


As you might have guessed, most of the work with this script was getting the regular expressions right. I have always loved the support for regular expressions offered by the .Net Framework, and PowerShell makes them even easier to use. It turns out that I was able to reuse the expressions in a grammar file for my new favorite text editor, Intype. I like that my code examples look absolutely identical to what I see in my editor.

The script obviously relies heavily on these regular expressions, which can contribute to a higher potential for problems, but it seems to do a pretty good job. With all of the matching and string processing, the script can also be fairly slow.

Then along came the CTP for Windows PowerShell 2.0. One of the new classes available to developers is the System.Management.Automation.PsParser class, which can be used to tokenize PowerShell code. As you might imagine, a task like highlighting syntax becomes much easier.

Below is an equivalent highlighting script that makes use of the System.Management.Automation.PsParser class. It is used in the same way as the PowerShell version 1.0 script.

#requires -version 2.0

# Highlight-Syntax.ps1
# version 2.0
# by Jeff Hillman
#
# this script uses the System.Management.Automation.PsParser class
# to highlight PowerShell syntax with HTML.

param( [string] $code, [switch] $LineNumbers )

if ( Test-Path $code -ErrorAction SilentlyContinue )
{
    $code = Get-Content $code | Out-String
}

$backgroundColor = "#DDDDDD"
$foregroundColor = "#000000"
$lineNumberColor = "#404040"

$PSTokenType = [System.Management.Automation.PSTokenType]

$colorHash = @{ 
#    $PSTokenType::Unknown            = $foregroundColor; 
    $PSTokenType::Command            = "#C86400";
#    $PSTokenType::CommandParameter   = $foregroundColor;
#    $PSTokenType::CommandArgument    = $foregroundColor;
    $PSTokenType::Number             = "#800000";
    $PSTokenType::String             = "#800000";
    $PSTokenType::Variable           = "#000080";
#    $PSTokenType::Member             = $foregroundColor;
#    $PSTokenType::LoopLabel          = $foregroundColor;
#    $PSTokenType::Attribute          = $foregroundColor;
    $PSTokenType::Type               = "#404040";
    $PSTokenType::Operator           = "#C86400";
#    $PSTokenType::GroupStart         = $foregroundColor;
#    $PSTokenType::GroupEnd           = $foregroundColor;
    $PSTokenType::Keyword            = "#C86400";
    $PSTokenType::Comment            = "#008000";
    $PSTokenType::StatementSeparator = "#C86400";
#    $PSTokenType::NewLine            = $foregroundColor;
    $PSTokenType::LineContinuation   = "#C86400";
#    $PSTokenType::Position           = $foregroundColor;
    
}

filter Html-Encode
{
    $_ = $_ -replace "&", "&amp;"
    $_ = $_ -replace " ", "&nbsp;"
    $_ = $_ -replace "<", "&lt;"
    $_ = $_ -replace ">", "&gt;"

    $_
}

# replace the tabs with spaces
$code = $code -replace "\t", ( " " * 4 )

if ( $LineNumbers )
{
    $highlightedCode = "<li style='color: $lineNumberColor; padding-left: 5px'>"
}
else
{
    $highlightedCode = ""
}

$parser = [System.Management.Automation.PsParser]
$lastColumn = 1
$lineCount = 1

foreach ( $token in $parser::Tokenize( $code, [ref] $null ) | Sort-Object StartLine, StartColumn )
{
    # get the color based on the type of the token
    $color = $colorHash[ $token.Type ]
    
    if ( $color -eq $null ) 
    { 
        $color = $foregroundColor
    }

    # add whitespace
    if ( $lastColumn -lt $token.StartColumn )
    {
        $highlightedCode += ( "&nbsp;" * ( $token.StartColumn - $lastColumn ) )
    }

    switch ( $token.Type )
    {
        $PSTokenType::String {
            $string = "<span style='color: {0}'>{1}</span>" -f $color, 
                ( $code.SubString( $token.Start, $token.Length ) | Html-Encode )

            # we have to highlight each piece of multi-line strings
            if ( $string -match "\r\n" )
            {
                # highlight any line continuation characters as operators
                $string = $string -replace "(``)(?=\r\n)", 
                    ( "<span style='color: {0}'>``</span>" -f $colorHash[ $PSTokenType::Operator ] )

                $stringHtml = "</span><br />`r`n"
                
                if ( $LineNumbers )
                {
                     $stringHtml += "<li style='color: $lineNumberColor; padding-left: 5px'>"
                }

                $stringHtml += "<span style='color: $color'>"

                $string = $string -replace "\r\n", $stringHtml
            }

            $highlightedCode += $string
            break
        }

        $PSTokenType::NewLine {
            $highlightedCode += "<br />`r`n"
            
            if ( $LineNumbers )
            {
                $highlightedCode += "<li style='color: $lineNumberColor; padding-left: 5px'>"
            }
            
            $lastColumn = 1
            ++$lineCount
            break
        }

        default {
            if ( $token.Type -eq $PSTokenType::LineContinuation )
            {
                $lastColumn = 1
                ++$lineCount
            }

            $highlightedCode += "<span style='color: {0}'>{1}</span>" -f $color, 
                ( $code.SubString( $token.Start, $token.Length ) | Html-Encode )
        }
    }

    $lastColumn = $token.EndColumn
}

# put the highlighted code in the pipeline
"<div style='width: 100%; " + 
            "/*height: 100%;*/ " +
            "overflow: auto; " +
            "font-family: Consolas, `"Courier New`", Courier, mono; " +
            "font-size: 12px; " +
            "background-color: $backgroundColor; " +
            "color: $foregroundColor; " + 
            "padding: 2px 2px 2px 2px; white-space: nowrap'>"

if ( $LineNumbers )
{
    $digitCount =  $lineCount.ToString().Length

    "<ol start='1' style='border-left: " +
                         "solid 1px $lineNumberColor; " +
                         "margin-left: $( ( $digitCount * 10 ) + 15 )px; " +
                         "padding: 0px;'>"
}

$highlightedCode

if ( $LineNumbers )
{
    "</ol>"
}

"</div>"


Besides being much faster, the PsParser technique provides much more potential for customization. This script highlights the same types of things as the 1.0 version of the script, but other token types are available, including CommandParameter, CommandArgument (these two types would be very difficult to define with a regular expression), and Member. All of the token types are listed in the script; those that I ignore are commented out.

As an extra bonus, here is a little script that highlights PowerShell commands in the console:

# Highlight-Commands.ps1
# by Jeff Hillman
#
# this script highlights PowerShell commands with HTML.

param[string] $commands )

$backgroundColor = "#000000"
$foregroundColor = "#FFC400"

filter Html-Encode( [switch] $Regex )
{
    $_ = $_ -replace "&", "&amp;"
    $_ = $_ -replace "\t", "    "
    $_ = $_ -replace " ", "&nbsp;"
    $_ = $_ -replace "<", "&lt;"
    $_ = $_ -replace ">", "&gt;"
    
    $_
}

# add line breaks
$highlightedCommands = $commands | Html-Encode

$highlightedCommands = [regex]::Replace( $highlightedCommands, "^", 
    "<span style='font-weight: bold;'>",
    [System.Text.RegularExpressions.RegexOptions]::MultiLine )

$highlightedCommands = [regex]::Replace( $highlightedCommands, "(?=\r\n)", "</span><br />",
    [System.Text.RegularExpressions.RegexOptions]::MultiLine )


# put the highlighted commands in the pipeline
"<div style='width: 100%; " + 
            "/*height: 100%;*/ " +
            "overflow: auto; " +
            "font-family: `"Courier New`", Courier, mono; " +
            "font-size: 12px; " +
            "background-color: $backgroundColor; " +
            "color: $foregroundColor; " + 
            "padding: 2px 2px 2px 2px; white-space: nowrap'>"

$highlightedCommands

"</div>"

C:\Users\hillman\Documents\WindowsPowerShell\Utilities

PSH$ ls


    Directory: Microsoft.PowerShell.Core\FileSystem::C:\Users\hillman\Documents\WindowsPowerShell\Utilities


Mode                LastWriteTime     Length Name
----                -------------     ------ ----
-a---         07-Nov-07   4:05 PM      38117 Compile-Help.ps1
-a---         10-Nov-07   2:53 PM       8047 Highlight-1.0Syntax.ps1
-a---         10-Nov-07   3:09 PM       5182 Highlight-2.0Syntax.ps1
-a---         10-Nov-07   3:27 PM       1296 Highlight-Commands.ps1
-a---         09-Nov-07   2:49 PM      14741 Utilities.ps1


Well, I hope these scripts come in handy for someone else out there.

Thursday, November 1, 2007

PowerShell and Subversion

I try to use PowerShell for everything I can. Because we use Subversion for source control at the shop where I work, I have written a few PowerShell functions to make life a little easier when using Subversion at the command line.

Some might argue that most of this stuff can be done for you by TortoiseSVN or something similar. That may be true, but where's the fun in that? I was using TortoiseSVN when I first started playing around with PowerShell, but I found that using it forced me to keep a Windows Explorer window open a lot, which kept me away from the command line. I wanted to force myself to use PowerShell for as much as possible, so I uninstalled TortoiseSVN and I've never looked back.

The first function is called Get-SvnStatus. It uses the Subversion "status" command with the "--xml" switch and displays the status of versioned files and directories little more nicely. More importantly, the Status and Path are properties are on the objects output by this function. This means they can be used farther down the pipeline.

function Get-SvnStatus( [string] $filter = "^(?!unversioned)", [switch] $NoFormat )
{
    # powershell chokes on "wc-status" and doesn't like two definitions of "item"
    [xml]$status = ( ( svn status --xml ) -replace "wc-status", "svnstatus" ) `
        -replace "item=", "itemstatus="
    
    $statusObjects = $status.status.target.entry | Where-Object { 
        $_.svnstatus.itemstatus -match $filter 
    } | Foreach-Object {
        $_ | Select-Object @{ Name = "Status"; Expression = { $_.svnstatus.itemstatus } }, 
                           @{ Name = "Path";   Expression = { Resolve-Path $_.path } }
    } | Sort-Object Status
    
    if ( $NoFormat )
    {
        $statusObjects
    }
    else
    {
        $statusObjects | Format-Table -Auto
    }
}


D:\Subversion\projects

PSH$ Get-SvnStatus

Status   Path
------   ----
modified D:\Subversion\projects\KickButtApp\KickButtApp.cpp
modified D:\Subversion\projects\KickButtApp\KickButtApp.h


The filter can be any regular expression to match against the status of the item. The default filter doesn't allow unversioned files through. The "NoFormat" switch is there in case the Status or Path properties of the objects created need to be used down the pipeline.

The next function is Compare-SvnRevision. It uses Subversion's "cat" command to get a copy of a file at a specified revision to compare with your current working copy. The default value for the revision is "HEAD", which will get the latest version in the repository.

function Compare-SvnRevision( [string] $path, [string] $revision = "HEAD" )
{
    $url = Get-SvnUrl $path

    $fileInfo = New-Object System.IO.FileInfo $path

    svn cat -r $revision $url > "TEMP - $($fileInfo.Name)"

    WinMerge $path "TEMP - $($fileInfo.Name)"

    $winMerge = Get-Process WinMerge

    while ( $winMerge -eq $null )
    {
        $winMerge = Get-Process WinMerge
    }

    $winMerge.WaitForExit()

    Remove-Item "TEMP - $($fileInfo.Name)"
}


This function uses WinMerge to perform the comparison, which is my favorite two-way merge tool. It also assumes WinMerge is in $env:Path.

The next function, Resolve-SvnConflicts, uses Get-SvnStatus to get all the files in a "conflicted" state after an update, commit, or merge. It then uses DiffMerge to do a three-way merge of the base revision, your working copy, and the head revision. You are prompted to indicate if you were able to resolve conflicts, and if you have, the "resolved" command is performed on the file. This function assumes DiffMerge is in $env:Path.

function Resolve-SvnConflicts
{
    Get-SvnStatus "conflicted" -NoFormat | Foreach-Object { 
        $file = ( Resolve-Path $_.Path )

        Write-Output "Merging $( $file )..."

        $baseRevision, $headRevision = ( Get-ChildItem "$file.r*" | Sort-Object )

        DiffMerge /t1 "Base Revision" /t2 "Working Copy" /t3 "Head Revision" `
            $baseRevision, "$file.mine", $headRevision

        $diffMerge = Get-Process DiffMerge

        while ( $diffMerge -eq $null )
        {
            $diffMerge = Get-Process DiffMerge
        }

        $diffMerge.WaitForExit()

        Write-Output "Conflicts resolved? [yes, no]"

        $resolved = Read-Host

        if ( $resolved -imatch "^y(es)?$" )
        {
            Copy-Item "$file.mine" $file -Force
            svn resolved $file
        }
    }
}


These next two functions just use the Subversion "info" command with the "--xml" switch to get the URL or revision for a versioned file. They both have a switch parameter to indicate if you want the result to be put on the clipboard. To put these items on the clipboard, I use a Cmdlet I wrote myself, but the PowerShell Community Extensions have a Cmdlet with the same name that will do the same thing and, apparently, more.

function Get-SvnUrl( [string] $path = ".", [switch] $Clipboard )
{
    $url = ( [xml]( svn info --xml $path ) ).info.entry.url

    if ( $Clipboard )
    {
        Set-Clipboard $url  
    }

    $url
}

function Get-SvnRevision( [string] $path = ".", [switch] $Clipboard )
{
    $revision = ( [xml]( svn info --xml $path ) ).info.entry.revision

    if ( $Clipboard )
    {
        Set-Clipboard $revision  
    }

    $revision
}


Well, there you have it. I use these functions every day, so I hope sharing them will make someone else's life a little easier.