Contrib: GtkSourceView Haskell syntax highlighting file



All,

Attached is an initial version of haskell.lang, a GtkSourceView syntax
highlighting file for the programming language Haskell. My aim is
eventually to have this distributed with gtksourceview.

Gnome already has icons for .hs and .lhs files and corresponding mime
types, text/x-haskell and text/x-literate-haskell. (The attached
haskell.lang is only for text/x-haskell.)

I'm having a bit of trouble getting things to look just right however. I
was hoping someone might have some advice or could point me in the
direction of an expert or some documentation.

1. Character constants interfering with variables that end in a single
quote char. eg:

let foo' = bar

Haskell allows variables to end in a number of single quote chars, read
as "foo prime". I tried adding a variable regexp that matched a number
of quote chars at the end of a variable name, but that didn't work as
<string>s get priority over <pattern-item>s. So I changed the definition
of character constants:
<start-regex>[^A-Za-z0-9]&apos;</start-regex>

This seems to do what I want.

2. A more tricky problem is to do with reserved operators. Haskell
allows user-defined operators made up from strings of operator
characters:  $% #/*-+&|^!?><=
however there are a number of reserved operators that are used for other
special syntactic constructs like type definitions and assignments eg:

foo :: Bar m => a -> m b
foo x | x == 3 = do
                   x' <- x
      | x == 4 = someThingElse

These operators are reserved:  => -> <- = | ::

So how do I make the lexer distinguish these? I tried using a
<keyword-list> section for the reserved operators but the <pattern-item>
section seems to get higher priority. It's very hard to construct two
<pattern-item> regexps that just recognise the reserved and unreserved
operators without overlap (since ":", "==", and "||" are valid
operators). If I do the obvious thing:

<pattern-item _name = "Keysymbols" style = "Keyword">
	<regex>(::|-&gt;|&lt;-|=&gt;|=|;|~|\||\[\||\|\])</regex>
</pattern-item>

<pattern-item _name = "Operators" style = "Operators">
	<regex>[:*$+&gt;&lt;=~.!?/# %&amp;|\^\-]+</regex>
</pattern-item>

Then sometimes the reserved operators are coloured as keysymbols and
sometimes as operators. There doesn't seem to be a priority of one over
the other.

Any suggestions?

Duncan
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE language SYSTEM "language.dtd">
<language _name="Haskell" version="1.0" _section="Sources" mimetypes="text/x-haskell">

	<escape-char>\</escape-char>

	<line-comment _name = "Line Comment" style= "Comment">
		<start-regex>--</start-regex>
	</line-comment>

	<block-comment _name = "Block Comment" style = "Comment">
		<start-regex>\{-</start-regex>
        	<end-regex>-\}</end-regex>
	</block-comment>

	<syntax-item _name = "Include/Pragma" style = "Preprocessor">
		<start-regex>\{-#</start-regex>
	        <end-regex>#-\}</end-regex>
	</syntax-item>

	<pattern-item _name = "Type or Constructor" style = "Data Type">
		<regex>\b[A-Z][0-9a-zA-Z.]*(&apos;|#)*</regex>
	</pattern-item>

	<pattern-item _name = "Keysymbols" style = "Keyword">
		<regex>(::|-&gt;|&lt;-|=&gt;|=|\|)</regex>
	</pattern-item>

	<pattern-item _name = "Operators" style = "Others">
                <regex>[:*$+&gt;&lt;=~.!?/# %&amp;|\^\-]+</regex>
	</pattern-item>
<!--
	<keyword-list _name="Keysymbols" style = "Keyword">
		<keyword>::</keyword>
		<keyword>-&gt;</keyword>
		<keyword>&lt;-</keyword>
		<keyword>=&gt;</keyword>
		<keyword>=</keyword>
		<keyword>|</keyword>
	</keyword-list>
-->
	<string _name = "String" style = "String" end-at-line-end = "true">
		<start-regex>&quot;</start-regex>
		<end-regex>&quot;</end-regex>
	</string>

	<string _name = "Character Constant" style = "String" end-at-line-end = "true">
		<start-regex>[^A-Za-z0-9]&apos;</start-regex>
		<end-regex>&apos;</end-regex>
	</string>

	<pattern-item _name = "Decimal" style = "Decimal">
		<regex>\b[0-9]\b</regex>
	</pattern-item>

	<pattern-item _name = "Floating Point Number" style = "Floating Point">
		<regex>\b([0-9]+[Ee][-]?[0-9]+|([0-9]*\.[0-9]+|[0-9]+\.)([Ee][-]?[0-9]+)?)</regex>
	</pattern-item>

	<pattern-item _name = "Hex Number" style = "Base-N Integer">
		<regex>\b0[xX][0-9a-fA-F]+\b</regex>
	</pattern-item>

	<keyword-list _name = "Keywords" style = "Keyword" case-sensitive="true">
		<keyword>type</keyword>
		<keyword>data</keyword>
		<keyword>let</keyword>
		<keyword>in</keyword>
		<keyword>case</keyword>
		<keyword>of</keyword>
		<keyword>module</keyword>
		<keyword>class</keyword>
		<keyword>where</keyword>
		<keyword>instance</keyword>
		<keyword>import</keyword>
		<keyword>qualified</keyword>
		<keyword>as</keyword>
		<keyword>do</keyword>
		<keyword>deriving</keyword>
		<keyword>if</keyword>
		<keyword>then</keyword>
		<keyword>else</keyword>
		<keyword>newtype</keyword>
		<keyword>hiding</keyword>
		<keyword>infix</keyword>
		<keyword>infixl</keyword>
		<keyword>infixr</keyword>
		<keyword>with</keyword>
		<keyword>forall</keyword>
	</keyword-list>

	<keyword-list _name = "Preprocessor Definitions" style = "Preprocessor" case-sensitive="true"
		match-empty-string-at-beginning = "false"
		match-empty-string-at-end = "true"
		beginning-regex = "^[ \t]*#[ \t]*">
		<keyword>if</keyword>
		<keyword>ifdef</keyword>
		<keyword>ifndef</keyword>
		<keyword>else</keyword>
		<keyword>elif</keyword>
		<keyword>define</keyword>
		<keyword>endif</keyword>
		<keyword>undef</keyword>
		<keyword>error</keyword>
	</keyword-list>

</language>


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]