XML Transformations with F#

Recently I needed to come up with a way to define transformations for importing different XML formats into an application. The traditional way to transform one XML format to another is with XSLT. However, XSLT is somewhat awkward and verbose, and few people understand it well. Sometimes it’s better to attack a problem with a full-fledged programming language rather than a specialized tool. So I decided to see if the data processing and scripting powers of F# could be better suited to the task.

The main F# feature that I was interested in (perhaps the killer feature, along with active patterns) was type providers. If you don’t already know, type providers are an extensible mechanism whereby you feed F# a data source, either a URL or example document, and it generates in-memory, Intellisense-enabled types on the fly to enable you write queries against that source.

For this example, consider the problem of flattening a data set. You have a file that looks like this:

<?xml version="1.0" encoding="utf-8" ?>  
<Customers>  
  <Customer name="ACME">
    <Order Number="A012345">
      <OrderLine Item="widget" Quantity="1"/>
    </Order>
    <Order Number="A012346">
      <OrderLine Item="trinket" Quantity="2"/>
    </Order>
  </Customer>
  <Customer name="Southwind">
    <Order Number="A012347">
      <OrderLine Item="skyhook" Quantity="3"/>
      <OrderLine Item="gizmo" Quantity="4"/>
    </Order>
  </Customer>
</Customers>  

And you want to transform it to look like this:

<?xml version="1.0" encoding="utf-8" ?>  
<OrderLines>  
  <OrderLine Customer="ACME" Order="A012345" Item="widget" Quantity="1"/>
  <OrderLine Customer="ACME" Order="A012346" Item="trinket" Quantity="2"/>
  <OrderLine Customer="Southwind" Order="A012347" Item="skyhook" Quantity="3"/>
  <OrderLine Customer="Southwind" Order="A012347" Item="gizmo" Quantity="4"/>
</OrderLines>  

To use the XML type provider we first need to reference the FSharp.Data assembly, which is available via NuGet. We can then declare our root XML type as follows:

type InputXml = XmlProvider<"input_sample.xml">  

Where sample_input.xml is the example XML file, given above, that type provider will use to infer the schema and generate the corresponding types. This allows us to then write the following code with type safety and Intellisense:

let input = InputXml.Load("input_sample.xml")

for customer in input.GetCustomers() do  
for order in customer.GetOrders() do  
for line in order.GetOrderLines() do  
    printfn "Customer: %s, Order: %s, Item: %s, Quantity: %d"
            customer.Name order.Number line.Item line.Quantity

That results in the following output:

Customer: ACME, Order: A012345, Item: widget, Quantity: 1  
Customer: ACME, Order: A012346, Item: trinket, Quantity: 2  
Customer: Southwind, Order: A012347, Item: skyhook, Quantity: 3  
Customer: Southwind, Order: A012347, Item: gizmo, Quantity: 4  

Now we just need to figure out how to output XML instead of text. One idea I had was to use the XML type provider again to generate types for the output model and use those to construct and serialize the result. Unfortunately, this doesn’t work. For example, the following snippet won’t compile:

let lines = [  
    for customer in input.GetCustomers() do
    for order in customer.GetOrders() do
    for line in order.GetOrderLines() do
        yield OutputXml.DomainTypes.OrderLine(
            Customer = Customer.Name,
            Order = Order.Number,
            Item = Orderline.Item,
            Quantity = Orderline.Quantity)
]

It won’t compile because the generated OutputXml.DomainTypes.OrderLine type lacks a constructor. It might be possible for type providers to generate types with constructors, but the current version of XmlProvider doesn’t seem to.

So if we want to have serialization types for our output model we’ll have create them ourselves. This is a reasonable approach if we plan to create many transforms with the same output model.

Here’s a how we can create a record type for use with XmlSerializer:

[<XmlType("OrderLine")>]
type OrderLine = {  
     [<XmlAttribute>] Customer: string
     [<XmlAttribute>] Order: string
     [<XmlAttribute>] Item: string
     [<XmlAttribute>] Quantity: int
}

And we could use it as follows:

XmlSerializer(typeof<OrderLine[]>, XmlRootAttribute("OrderLines"))  
    .Serialize(stdout,
    [|
        for customer in input.GetCustomers() do
        for order in customer.GetOrders() do
        for line in order.GetOrderLines() do
            yield OrderLine(
                Customer = customer.Name,
                Order = order.Number,
                Item = line.Item,
                Quantity = line.Quantity)
    |])

Another option is to forget about the output types and generate the XML dynamically. We can do using with Linq to XML classes:

XElement(XName.Get "OrderLines", seq {  
    for customer in input.GetCustomers() do
    for order in customer.GetOrders() do
    for line in order.GetOrderLines() do
        yield XElement(XName.Get "OrderLine",
            [
                XAttribute(XName.Get "Customer", customer.Name)
                XAttribute(XName.Get "Number", order.Number)
                XAttribute(XName.Get "Item", line.Item)
                XAttribute(XName.Get "Quantity", line.Quantity)
            ])
}).Save(stdout)

I may choose to go this dynamic route, since for my purposes the column names do not much matter. However, it is still a tad bit verbose with all that XLinq noise. We can clean things up a bit by creating some helper functions for creating XML nodes:

let element name children =  
    XElement(XName.Get name, (children:XObject seq)) :> XObject

let attribute name value =  
    XAttribute(XName.Get name, value) :> XObject

let text value =  
    XText(value:string) :> XObject

This allows us to rewrite the previous example very cleanly as follows:

element "OrderLines" [  
    for customer in input.GetCustomers() do
    for order in customer.GetOrders() do
    for line in order.GetOrderLines() do
        yield element "OrderLine" [
            attribute "Customer" customer.Name
            attribute "Number" order.Number
            attribute "Item" line.Item
            attribute "Quantity" (string line.Quantity)
        ]
] |> printfn "%A"

So those are the many possibilities and alternatives for transforming XML with F# and type providers. The other nice thing about using F# is that it doesn’t need to be compiled to an assembly; you can simply deploy the script and run it via fsi.exe. The one gotcha with type providers is that the example document you provide must be representative of the the documents you will receive at runtime. This is simple enough to achieve by simply tweaking the document to get the type projection that you expect. That is also the reason why I chose to include letters in my order numbers in this example, to avoid have them project as integers.

Luke Sandell

Read more posts by this author.

comments powered by Disqus