Skip to main content

Making array lookups faster



powershellThis post is about making lookups in arrays as fast as possible. The array can have may properties or few, it really does not matter. The only thing required is something unique that identifies each row of data.
So from time to time I find the need to make lookups fast. Usually it is a result of importing a huge csv file or something.



Sample data

First we have to create some dummy sample data which we can run some tests against. We will create an array of 10001 objects with a few properties. The unique property that identifies each row is called ID:


(sample data script)



How to test performance?


There are a couple of items that impact performance in Powershell. For instance running a Measure-Command expression will yield quite different results. Normally the first run is slower than the second one and then the standard deviation is quite large for consequent runs. To decreate the standard deviation, I use a static call to the .net GarbageCollector with [gc]::Collect(). I feel that the results are more comparable with this approach.



First contender Where-Object

There are two ways you can query an array with the Where keyword. You can pipe the array to the Where-Object cmdlet or you can use the Where method on the array. The where method will always be faster that the cmdlet/pipline approach since you save moving the objects through the pipeline. For our test, we will therefor use the where method as the base which we measure the performance against.
We are going to run 11 different queries and find 2 unique elements in the array. The time measured will be ticks. I have created an collections of IDs which we will use when we query the data ($CollectionOfIDs):


(Measure the Where method)

image

That is about 85ms on average to query the collection for two unique IDs. Base line ready.



There is a fast knock at the door

We have a new contender and he calls himself Hashtable. He claims he can do even better that 85ms on average. Challenge accepted.
First we need to create a hashtable representation of the $csvObjects collection/array. That should be pretty straight forward. We let the unique identifier (ID) become the key and the object itself the value:

(hashtable of csv)

Now I know you have a question. What is the performance penalty of converting that array to a hashtable? Good question and I am happy you asked. It converts the 10000 objects into an hashtable in apx 53 milliseconds:

image

I would say that is a small price to pay.
Using the same ($CollectionOfIDs) as we did for the where method, let’s run the same test against the hashtable:

(Measure the hashtable)

image

Okay, so the first one is quite slow about 11ms, however it improves quite dramatically to 0.038ms. I we use the average numbers (in ticks) to be fair, we have increased the performance with a factor of 649 (837265 / 1289).



Implications

I have only tested this on WMF 5.1 (5.1.14393.103). To use the Where query method on arrays, you need version 4 or later. Converting the collection to an hashtable will give you the ability to perform super fast queries. If you are querying a collection frequently, it makes sense to use hashtable.


Code for speed if you need it, otherwise write beautiful code!

Cheers

Tore

Comments

Popular posts from this blog

Monitoring Orchestrator runbook events from Operations Manager

Today I will follow up on my colleague’s post Mr ITblog (Knut Huglen) about monitoring Orchestrator Runbook events.  He has build a nice double up SNMP loopback feature that does self monitoring in Orchestrator resulting in entries written to a special Windows Eventlog. Now we need to raise alerts in SCOM when one of his runbooks fails or sends a platform event, who knows there could be trouble lurking in his paradise.

We are not going to do anything fancy, however these are the steps we will be focusing on today:
Create a Management Pack for our customizations Create rules that collects the events from the orchestrator serverOff we go then and fire up the SCOM console and a powershell window. First we create a MP, I am going to use powershell to do this, however you may use the SCOM console as well (Administration – ManagementPacks – Action: Create Management Pack):



Import the Management Pack into SCOM and move on to the Authoring section in the SCOM console. Create a new rule:



Give the…

Powershell – Log like you mean it

How do you do logging in powershell? Why should you do logging? What should you log? Where do you put your log? How do you remove your log? How do you search your log? All important questions and how you answer then depends upon what your background is like and the preferences you have. This will be a 2 part blog post and this is part 1.


Why should you log?

Well it is not mandatory, however I have 2 reasons:
Help with debugging a script/module/functionSelf documenting script/module/function
Firstly; Do you know any program that does not contain any bugs? Working with IT for the last 2 decades, I cannot name one. When you create scripts/modules/functions, you will create bugs, that is where they live and try to make your life a living mess.

Secondly: Adding a little extra information to your logging will make them self documenting. Do you like writing documentation? Well I normally am not fond of it and use logging while debugging to get two birds with one stone.


What should you log?

Anyt…

Powershell - List information about your WIFI networks

This is just a quick post about this new function I have created. Basically this is a text-output to powershell object output function that uses netsh to query the WIFI information. This illustrates the importance of changing the authentication level on your WIFI-network. No matter if you use WEP/WPA/WPA2 your password is available in clear text in your profile.



Cheers

Tore