Currently I am working on a big new module. In this module, I need to persist data to disk and reprocess them at some point even if the module/PowerShell session was closed. I needed to serialize objects and save them to disk. It needed to be very efficient to be able to support a high volume of objects. Hence I decided to turn this serializer into a module called HashData.
Other Serializing methods
In PowerShell we have several possibilities to serialize objects. There are two cmdlets you can use which are built in:
Both are excellent options if you do not care about the size of the file. In my case I needed something lean and mean in terms of the size on disk for the serialized object. Lets do some tests to compare the different types:
(Hashdata.Object.ps1)
You might be curious why I do not use the Export-CliXML cmdlet and just use the [System.Management.Automation.PSSerializer]::Serialize static method. The static method will generate the same xml, however we do not need to read back the content of the file the cmdlet creates.
If we compare the length of the string we get this:
As you can see, the XML serialization is very bloated with metadata, however the JSON serialization is much better. The winner is the HashData module with a 30% smaller size compared to a JSON string.
HashData module
Currently the module implements these cmdlets:
Like for the Import-XMLCli and Export-XMLCli, the logic for serialization and deserialization is implemented in Import-HashData and Export-HashData. I chose to also include and export from the module the helper functions ConvertTo-Hashtable and ConvertTo-HashString. Those could be useful in other scenarios as well. The New-Date function is probably my smallest function I have ever published. It purpose is to be able to convert datetime objects on deserializing objects.
Lets inspect the object we created above and look at it’s string representation:
(HashTextObject.ps1)
As you can see, the datetime object are converted to a [long] ticks value, which the function New-Date converts to a datetime object on deserialize.
Currently implemented property-types
In this version, your object may have properties of the following type:
Currently supported and tested object depth is 1. That might change in the future. You may pipe or supply an array of PSCustomObject to the Export-HashData function.
I have deliberately chosen not to convert the objects from Import-Hashdata to PSCustomObject in this release. Depending on feedback and the need, I will consider adding this at a later stage.
Security
The Assert-ScriptString function is a security boundary and is implemented and used in the Import-Hashdata function. The reason for that is, when you serialize a object as an hashtable string, you are in essence generating a script file which in this instance will behave like a scriptblock. When you Import something, that is invoking a scriptblock, the Assert-ScriptString will make sure nothing evil will ever execute. The only function allowed in the serialized object currently, is the New-Date function.
The Import-HashData function has a switch parameter (UnsafeMode) that lets you override this security feature. Use it with care.
PowershellGallery and GitHub
The module is published to the PowershellGallery https://www.powershellgallery.com/packages/hashdata and here is the link to the GitHub repro https://github.com/torgro/HashData.
Please reach out to me on twitter or leave a comment. I love feedback both good and bad.
Cheers
Tore
Other Serializing methods
In PowerShell we have several possibilities to serialize objects. There are two cmdlets you can use which are built in:
- Export-CliXml
- ConvertTo-JSON
Both are excellent options if you do not care about the size of the file. In my case I needed something lean and mean in terms of the size on disk for the serialized object. Lets do some tests to compare the different types:
(Hashdata.Object.ps1)
You might be curious why I do not use the Export-CliXML cmdlet and just use the [System.Management.Automation.PSSerializer]::Serialize static method. The static method will generate the same xml, however we do not need to read back the content of the file the cmdlet creates.
If we compare the length of the string we get this:
As you can see, the XML serialization is very bloated with metadata, however the JSON serialization is much better. The winner is the HashData module with a 30% smaller size compared to a JSON string.
HashData module
Currently the module implements these cmdlets:
- Assert-ScriptString
- ConvertTo-HashString
- ConvertTo-Hashtable
- Export-HashData
- Import-HashData
- New-Date
Like for the Import-XMLCli and Export-XMLCli, the logic for serialization and deserialization is implemented in Import-HashData and Export-HashData. I chose to also include and export from the module the helper functions ConvertTo-Hashtable and ConvertTo-HashString. Those could be useful in other scenarios as well. The New-Date function is probably my smallest function I have ever published. It purpose is to be able to convert datetime objects on deserializing objects.
Lets inspect the object we created above and look at it’s string representation:
(HashTextObject.ps1)
As you can see, the datetime object are converted to a [long] ticks value, which the function New-Date converts to a datetime object on deserialize.
Currently implemented property-types
In this version, your object may have properties of the following type:
- String
- Integer
- Boolean
- Double
- DateTime
- Array of String
- Array of Integers
Currently supported and tested object depth is 1. That might change in the future. You may pipe or supply an array of PSCustomObject to the Export-HashData function.
I have deliberately chosen not to convert the objects from Import-Hashdata to PSCustomObject in this release. Depending on feedback and the need, I will consider adding this at a later stage.
Security
The Assert-ScriptString function is a security boundary and is implemented and used in the Import-Hashdata function. The reason for that is, when you serialize a object as an hashtable string, you are in essence generating a script file which in this instance will behave like a scriptblock. When you Import something, that is invoking a scriptblock, the Assert-ScriptString will make sure nothing evil will ever execute. The only function allowed in the serialized object currently, is the New-Date function.
The Import-HashData function has a switch parameter (UnsafeMode) that lets you override this security feature. Use it with care.
PowershellGallery and GitHub
The module is published to the PowershellGallery https://www.powershellgallery.com/packages/hashdata and here is the link to the GitHub repro https://github.com/torgro/HashData.
Please reach out to me on twitter or leave a comment. I love feedback both good and bad.
Cheers
Tore
Comments
Post a Comment