Making Linq to XML Usable with F#

Type providers are often the preferred mechanism for dealing with textual data in F#, but Linq to XML is still a very nice API when you need things to be a bit more dynamic or don't want to pull in a type provider package. However, due to it's reliance on implicit conversions, it can be somewhat awkward to work with in F#.

For instance, consider the following XML list of contacts:

<contacts>
    <contact first="John" last="Smith"/>
    <contact first="Susan" last="Jones"/>
</contacts>

In C#, we could parse it and print out the result thusly:

var doc = XDocument.Parse(xml)
foreach(var contact in doc.Element("contacts").Elements("contact"))
{
    Console.WriteLine("First: {0}, Last: {1}",
        contact.Attribute("first").Value
        contact.Attribute("last").Value)
}

A direct port to F#, however, does not compile:

let doc = XDocument.Parse(xml)
for contact in doc.Element("contacts").Elements("contact") do
    printfn "First: %s, Last: %s" 
            (contact.Attribute("first").Value) 
            (contact.Attribute("last").Value)

The reason it doesn't compile is that the XLinq methods we're calling don't actually take strings, they take an XName. The XName type represents a qualified XML name, i.e. the local name and (optionally) the namespace. Rather than provide two overloads for every method, one taking a string and one taking an XName, the XLinq classes provide an implict conversion from string to XName. Unfortunately this doesn't work in F# because it doesn't support custom conversions (implicit or explicit).

Thus, to get it to work, we need to modify it as follows:

let doc = XDocument.Parse(xml)
for contact in doc.Element(XName.Get "contacts").Elements(XName.Get "contact") do
    printfn "First: %s, Last: %s" 
            (contact.Attribute(XName.Get "first").Value) 
            (contact.Attribute(XName.Get "last").Value)

Typing XName.Get everywhere is a bit tedious and something which I would prefer to avoid. Fortunately, this is easy to rectify with a type extension. Type extensions allow us to define extension methods on a type like in C# and VB.NET but also support extension properties (and apparently extension events, since events are just properties in F#). Of course, these special extension members can only be consumed from F#, but that is all we care about for this purpose.

Therefore, I made a little module that you can import into any F# project where you want to be able to use XLinq, because I have wanted to use it in several instances. I started by providing equivalent string overloads for each method taking an XName in the XObject hierachy:

[<AutoOpen>]
module System.Xml.Linq.FSharpExtensions

open System
open System.Runtime.CompilerServices
open System.Xml.Linq

type XNode with
    member this.Ancestors(name) = this.Ancestors(XName.Get name)
    member this.ElementsAfterSelf(name) = this.ElementsAfterSelf(XName.Get name)
    member this.ElementsBeforeSelf(name) = this.ElementsBeforeSelf(XName.Get name)    

type XContainer with
    member this.Descendants(name) = this.Descendants(XName.Get name)
    member this.Element(name) = this.Element(XName.Get name)
    member this.Elements(name) = this.Elements(XName.Get name)

type XElement with
    member this.AncestorsAndSelf(name) = this.AncestorsAndSelf(XName.Get name)
    member this.Attribute(name) = this.Attribute(XName.Get name)
    member this.Attributes(name) = this.Attributes(XName.Get name)
    member this.DescendantsAndSelf(name) = this.DescendantsAndSelf(XName.Get name)
    member this.SetAttributeValue(name, value) = this.SetAttributeValue(XName.Get name, value)
    member this.SetElementValue(name, value) = this.SetElementValue(XName.Get name, value)

Notice that this module has the AutoOpen attribute and is placed directly in the System.Linq.Xml namespace, so that any time that namespaces is opened these methods are automatically available.

After adding the module, my first F# snippet compiles and runs, but there is still something missing. XLinq also provides extension methods to various closed types of IEnumerable<_> so that you can query over collections of XML objects. For instance, suppose we just wanted to retrieve all the last names from our document. We could do the following:

doc.Root.Elements().Attributes(XName.Get "last") 
|> Seq.map (fun a -> a.Value)

This depends on an Attributes() an extension method being defined on IEnumerable<XElement>, but we would like to provide our own method that takes a string rather than an XName. However, F# does not (yet?) allow us to define type extensions for closed generic types. Fortunately, there is a workaround.

In addition to supporting type extensions, F# supports standard extension methods like you have in C# and VB.NET. This is mainly a compatibility feature, and there is no special syntax for defining these in F#. However, if we want to define them we need only provide the Extension attributes that C# and VB.NET add to indicate extension methods. Thus, we expand our module as follows:

[<Extension>]
type XLinqSeqExtensions =
    [<Extension>] static member Ancestors(source:seq<XNode>, name) = source.Ancestors(XName.Get name)
    [<Extension>] static member AncestorsAndSelf(source:seq<XElement>, name) = source.AncestorsAndSelf(XName.Get name)
    [<Extension>] static member Attributes(source:seq<XElement>, name) = source.Attributes(XName.Get name)
    [<Extension>] static member Descendants(source:seq<XContainer>, name) = source.Descendants(XName.Get name)
    [<Extension>] static member DescendantsAndSelf(source:seq<XElement>, name) = source.DescendantsAndSelf(XName.Get name)
    [<Extension>] static member Elements(source:seq<XContainer>, name) = source.Elements(XName.Get name)

Now the code snippet given above will work without the call to XName.Get.

So now that we can read XML much more easily, what about writing? For example, what if we wanted to produce the above XML document using XLinq? Without making any changes, we could do the following:

XElement(XName.Get "contacts",
    XElement(XName.Get "contact",
        XAttribute(XName.Get "first", "John"),
        XAttribute(XName.Get "last", "Smith")),
    XElement(XName.Get "contact",
        XAttribute(XName.Get "first", "Susan"),
        XAttribute(XName.Get "last", "Jones")))

Getting rid of the XName.Get calls in this case is more difficult, because we would need to define an extension constructor. I tried the following:

type XElement with
    new(name, [<ParamArray>] content) = XElement(XName.Get name, content)

type XAttribute with
    new(name, value) = XAttribute(XName.Get name, value)

This doesn't actually compile, but it parses. The compiler understands what we're trying to do, but won't allow such an extension on a type not defined in the same file. It gives the error FS0871: Constructors cannot be defined for this type.

But there is still hope! What if we just defined functions named XElement and XAttribute? It's possible to have a type and function or value with the same name; for example you have the type string (actually an alias for System.String) and the function string that converts a value to that type. This is permissible in F# because type names and function or value names are used in different contexts.

This means we can just rewrite our constructors as functions:

let XElement(name, [<ParamArray>] content) = new XElement(XName.Get name, content)
let XAttribute(name, value) = new XAttribute(XName.Get name, value)

Alas, this doesn't quite do what we want. The XAttribute one works fine. But the ParamArray attribute we used with the XElement doesn't do anything. Intellisense actually shows the word params in front of the content argument, but we can't specify multiple arguments, we still have to pass an array. It would be surprising if this did work, since XElement is just a function that takes a tuple and not a member.

So we'll remove the ParamArray attribute, which will force us to call our function with a collection. This means we can also change to argument type to be a sequence type so we're not restricted to passing just arrays:

let XElement(name, content) = new XElement(XName.Get name, Seq.toArray content)
let XAttribute(name, value) = new XAttribute(XName.Get name, value)

Now we can write the following:

XElement("contacts", 
    [ 
        XAttribute("test", "2")
        XElement("contact", 
            [ 
                XAttribute("first", "John")
                XAttribute("last", "Smith")
            ])
        XElement("contact", 
            [ 
                XAttribute("first", "Susan")
                XAttribute("last", "Jones")
            ])
    ])

That's probably more whitespace than I'd like, but I couldn't find a more optimal way to format it (I wish F# would have more tolerance for irregular indentation when nesting delimiters like [ and ] are being used).

One last thing: functions can't be overloaded. The original constructors for the XElement and XAttribute were overloaded, but we've shadowed them with our new functions. Fortunately there are two syntaxes for calling constructors in F#. Constructors can be called as functions, or they can be prefixed with the new keyword as in a more traditional object-oriented language. If we use the latter syntax, then F# knows that we want to call a constructor rather than a function and the original constructors (that we shadowed) are still available to us.

So everthing worked more or less perfectly. Now you can use Linq to XML from your F# code with less annoyance. This source for this module is available here.

Luke Sandell

Read more posts by this author.

comments powered by Disqus