C Syntax highlighting in HTML

























Last change at 19. of May, 2001




This script highlight.tbz2 (4569 Bytes) generates syntax highlighted HTML output of a given C source file. It is the only tool i know of that can correctly highlight various levels of #if [0|1]/#endif combinations.

At the moment it's just a design idea on how to handle syntax highlighting correctly. It works roughly like this:

  • Scan the source code, start with a given state (normal C code for the beginning)
  • Find the first occurence of a valid (depends on the state) regular expression, e.g. in normal C code, the regular expression '"' starts a normal string.
  • depending on the actual state and the first valid regular expression found push the actual state to a stack and/or change to another state and/or pop a state from the stack.
  • This way we get chunks of source code that have a "state", e.g. normal code, string, c style comment, c++ style comment, ...
  • depending on the state of that chunk of source, highlight some of its parts differently, e.g. show (0[xX][0-9a-fA-F]*) as a hexadecimal number, when in source code.

  • This way should be easily portable to other languages, though it lacks some capabilities like "show matching braces" or "highlight perls regular expressions like s!abc!def!g (scanner can't remember char that started a regular expression)".

    Here's some of its output:

    #include <stdlib.h>
    #include "file.h"
    
    #define SOME_MACRO(A)	\
    	qwe(A) + asd(A)
    
    // There's a space at the end of next line
    #define SOME_FALSE_MACRO(A)	\ 
    	qwe(A) + asd(A)
    
    #if 1
    char c[07] = "abc";
    #if 0
    int b;
    #endif
    int c; // another c
    #endif
    
    #if 1
    char c[0x03] = "abc";
    #if 1
    int b;
    #endif
    int c; // another c
    #endif
    
    int d;
    
    #if 1
    int c;
    #else
    #include "qwe"
    #endif
    
    char efg[] = "This is a string";
    void* f = &efg[7];
    
    // And this is a C++ style comment
    
    int main(int argc, char** argv) {
        printf( " Hello, world\n " , getuid('qw'));
        /* Here's a normal good ole C style comment ?*/
    
        return 0;
    }