Dealing with Dynamic Data (almost) without Reflection

minute read.
Share this article

Often you need to manipulate a data structure without knowing it’s shape at compile time i.e. when authoring classes that expose generic interfaces or dealing with data described by runtime metadata (database schema, JSON files etc.).

For example, consider a method that is designed to generate a hashcode from the public properties of an arbitrary type. Firstly, let’s look at a model and a typical hashcode implementation:

Amido needs the contact information you provide to us to contact you about our products and services. You may unsubscribe from these communications at any time. For information on how to unsubscribe, as well as our privacy practices and commitment to protecting your privacy, please review our privacy policy.
Oops! Something went wrong while submitting the form.

Things you will learn

How to use the Expression framework to build an abstract syntax tree.

How to manipulate a data structure without knowing it’s shape at compile time

Dealing with dynamic data by using the Expression framework.

If we want to use a similar hashcode approach but against a generic type we might use reflection to implement something like the following:

This gets the properties from the type and then generates a hashcode in the same way. While this implementation will work, reflection is notoriously expensive. Generating 1 million hashcodes for our simple model takes around 4.5 seconds on my laptop but more complex examples can quickly escalate into a poor performance and the more complex the model the worse the performance. I’ve certainly worked on systems where the cost was simply too high and we’ve been forced to consider alternative approaches: eliminating reflection was definitely not a micro-optimisation.

One option is to use the Expression framework to build an abstract syntax tree and compile it into IL at runtime. While most people are familiar with the Func<> and Action<> support in C# for functional style programming, an Expression Tree framework introduced alongside it seems to have gone much more under the radar. Allowing you to compile code at runtime, the Expression Tree framework is incredibly useful as, when used appropriately, it can lead to massive performance improvements in data driven designs.

A syntax tree is, as the name suggests, a tree representation of source code with each node of the tree corresponding to a source code construct, for example, calling a method or a comparison operator. Consider the following C# code:

The _GetAddFunc()_ function returns a typical Func<int>, but one that is compiled at runtime. To do this we first build a simple tree comprised of an addition operator and two constant nodes. The addition operator is a kind of binary expression – a node that has left and right child nodes. Whereas the constant nodes are simple value nodes that have no children.

Finally, we wrap the addition node in a lambda expression of type Func<int>. Lambda expressions represent an entry point for the execution of an expression and as such support compilation.

Following this we can use the compiled function exactly as we would a normal function we’ve compiled from source code.

What if we want to be able to pass the values to add into our function? We can use an expression node called a ParameterExpression to allow us to declare placeholders for values that are passed into the expression, and these will take the form of parameters on our func:

There are a wide variety of expression nodes and amongst many other things we can call methods, get from properties, and perform comparisons – if you can write the code you can build the equivalent in a syntax tree.

Going back to our data driven hashcode problem, we can combine the Expression framework with our reflection example to inspect the C# types and compile code that generates a hashcode specifically for our type. Once we’ve undertaken the initial compilation we really are just using compiled C# code to process our runtime discovered type. It’s a bit of a leap in complexity and will introduce some further node types but an example implementation of this technique looks like this:

Firstly, let’s compare the performance – does this yield the performance improvement I promised? On the same laptop the performance of this is much improved – while the reflection based example took 4.2 seconds for generating 1 million hashcodes this new approach gives exactly the same hashcode for a given object but generates 1 million hashcodes in only 0.5 seconds, so around an 8x improvement in performance. Repeated reruns give fairly consistent results. It’s worth also noting that the bigger the model the bigger the savings. If I run the same tests with a 10 property model the reflection approach takes 8 seconds while the expression based approach still clocks in at around 0.5 seconds.

If we start to break down the above, you can see how the structure roughly follows that of our reflection implementation but instead of directly evaluating the hashcode it expresses each step as part of a syntax tree, finally bundling it into a lambda and compiling it into a Func<> we can call directly.

One of the key new node types introduced above is the block expression. This basically represents a scoped set of lines of code – essentially the equivelant of a { } brace block in C#. As such it’s factory method (Expression.Block) accepts an array of expression nodes that will be run in sequence, and these are passed as the second parameter to the factory. As we are declaring a scope we can also state any variables that belong to it and in this case, we are making use of an integer variable to store our hashcode.

We also make use of conditional expression nodes created with the Expression.Condition factory method. These take an expression to be evaluated (the first parameter), if true will return the evaluation of the expression used as the second parameter, and if false the third parameter. We’re also doing casting to object as part of our conditional check so we can consistently look at value types and reference types. If we don’t cast this and compare a value type to a reference type then during compilation of the expression, at runtime, we will receive an error. Given we’re using code to write code and know if the type is a value type, we could always check to see if this is the case and emit the comparison to null – however for the sake of keeping the example clear I’ve left this step out.

Finally, we make use of methods and properties through expression nodes through the Expression.Property and Expression.Call factories respectively. In each case the first parameter is an expression that returns the object to execute against, and the second parameter is PropertyInfo or MethodInfo respectively.

It’s worth noting there is a compile cost to this approach, this is nominal (around 0.004 seconds for our example on my laptop), but given we’re likely considering this approach to deal with a performance sensitive scenario it’s best to buffer this by using a lazy caching factory pattern. Otherwise, if we build the expression and compile each time we run into the model type then our savings will be lost. An example of this is shown in the sample below:

We’ve now got a dictionary lookup to account for but testson my laptop show that this only increases the time to generate 1 millionhashes to 0.6 seconds, a 0.1 second overhead, and still nearly 10x faster thanthe original reflection approach. I’ve used a concurrent dictionary in theabove as it seems likely that a class like this would be registered as asingleton in your IoC container and need to be thread safe.

It is, I hope, easy to see how this kind of approach can beused for compiling code to do all sorts of things wherever metadata isavailable – for example instead of using reflection we could use the sameapproach for compiling code to manipulate JSON based on a JSON schema.

As a final aside, if you run the sample app and have doneprevious work with hashcodes in .Net, something that might strike you as odd isthat the hashcode will change on each execution. This is because the hashcodealgorithm has changed with .Net Core. Whereas in Framework the hashcode wouldonly change “between frameworks and domain” (so between frameworks andpotentially between compilations), in Core it will change on every execution.While it’s dangerous to on hashcodes in this manner I still suspect this issomething that will catch people out.

Related content

No items found.

Need help plotting a route to the cloud?

We can help you define your digital strategy and turn it into a technical roadmap, achieving momentum to quickly deliver business value, whilst minimising risk.

Ask a question

If you consent to receiving communications from Amido, please subscribe using the checkbox below. If at any point you'd like to unsubscribe, you can do so using the links provided in our newsletters. You can review how your data is handled in our privacy policy.
Thank you, your submission has been received!
Oops! Something went wrong while submitting the form.