At the PowerShell and DevOps Global Summit 2017 I had two community lightning talks available. Due to how many awesome talks there were, there wasn’t enough time to do my talk on Neo4j and PowerShell. So instead, I decided to write this blog post.
Graph all the PowerShell things!
I am currently a Senior Software Developer on the Modules with Puppet, specialising in Windows. I am also a contributor to Neo4j which is a Graph Database. Neo4j is a highly scalable, native graph database purpose-built to leverage not only data but also its relationships, but this seems a very odd place to find PowerShell.
Strangely enough, there is a lot you can do with PowerShell and Neo4j.
Chocolatey and PSake
I maintain the Chocolatey packages for the community edition of Neo4j, and of course being Chocolatey it’s all PowerShell. However I’m also lazy. Neo4j were doing a lot of releases and I had trouble keeping up, so PowerShell to the rescue. I created a PSake build task which would “screen scrape” the Neo4j Downloads webpage and gather which versions were the latest and their relevant download links. The task would then see if I had not packaged this version yet. If not, it would create the packaging template, commit the changes the github repository, create the chocolatey package and then publish it.
So now I had it automated, but I still had to remember to actually run the script and I needed a computer to run the script on. Appveyor to the rescue!! Appveyor is cloud based Continuous Integration/Build Service using Windows worker agents. It’s also free to use with Open Source projects which was perfect. I created an AppVeyor configuration which runs my PSake task once a day. So now I have Chocolatey packages, automatically created and published and I don’t need to lift a finger! I also get alerts if the script fails during the process, so I know when I need to update my templates for a new Neo4j release.
Links
PowerShell Module and Pester
Previously Neo4j used some simple batch files to manage Neo4j on Windows, hoever they were far from optimal. Neo4j asked me to help them out and rewrite the management scripts in PowerShell. This meant that we could have far more intelligent scripts but also test them using Pester, a unit testing framework for PowerShell.
So when you use Neo4j on Windows, you’re actually using a PowerShell module, called Neo4j-Management
. In fact, if you want to, you can import this module directly instead of using the wrapper batch files!
Links
Graphing PowerShell Help
The PowerShell help system is one of the jewels of the system, and like most documentation systems has many relationships between help documents. This is perfect for a graph database, in fact Neo4j was first developed by Emil Eifren as a solution to enterprise documentation management.
So as an example in how to use Neo4j with PowerShell, I set out to graph the help documents in PowerShell and the relationships between them. PowerShell help items have the ability to link to other items, in particular the RELATED LINKS
section, for example:
C:\> get-help get-item
NAME
Get-Item
...
RELATED LINKS
Online Version: http://go.microsof ...
Clear-Item
Copy-Item
Invoke-Item
Move-Item
...
Using this information we can construct a simple graph schema:
-
A Module has one or more Commands (also known as cmdlet or function). This is the
HAS_COMMAND
relationship. -
A Command can be related to zero or more other Commands. This is the
RELATED_LINK
relationship.
Getting ready to create the graph
Installing Neo4j
Use the Neo4j Chocolatey package:
PS> choco install neo4j-community
This will install the a Neo4j Graph Database. When you first browse to http://localhost:7474
you will be asked to setup the default Neo4j user. By default this is username and password of neo4j
. The code for this blog post assumes you change the the password of the neo4j user to Password1
. Of course, you can use your own username and password, just remember to change that in the scripts for this blog post.
Getting the nuget packages
- Get the source code from code-glennsarti.github.io
- Change the directory to
graphing-powershell-help
- Create a new directory called
nuget
and change directory into that - Install the Neo4j.Driver nuget package. This blog post specifically uses version 1.0.2 as version 2.0+ has different dependencies which are more difficult to resolve in PowerShell.
PS> git clone https://github.com/glennsarti/code-glennsarti.github.io.git
...
PS> cd code-glennsarti.github.io
PS> New-Item nuget -ItemType Directory
PS> cd nuget
PS> nuget install Neo4j.Driver -version 1.0.2
Note - If you do not have nuget installed, download it from nuget.org.
Creating the graph
The Import-PowerShell-Help.ps1
script imports all of the help information into the Neo4j graph database. This is a two pass import script, which could be improved to be a lot faster but would make it harder to explain. The first pass imports all of the modules and commands, and the second pass adds the relationships between commands.
Source Code - Import-PowerShell-Help.ps1
Calls Update-Help
so that all of the latest help is available on the local computer
$ErrorActionPreference = 'Stop'
# Update all of the help files
Update-Help -Confirm:$false
Imports all of the modules into the PowerShell session.
This can take a while and may consume a lot of memory
# Import the modules into the session
Get-Module -List | Select-Object -Unique | % {
Write-Host "Importing $($_.Name)"
$_ | Import-Module -ErrorAction Continue
}
Imports the Neo4j.Driver DLLs and connects to the Neo4j database
# Import DLLs
Add-Type -Path "$PSScriptRoot\nuget\Neo4j.Driver.1.0.2\lib\dotnet\Neo4j.Driver.dll"
Add-Type -Path "$PSScriptRoot\nuget\rda.SocketsForPCL.1.2.2\lib\net45\Sockets.Plugin.Abstractions.dll"
Add-Type -Path "$PSScriptRoot\nuget\rda.SocketsForPCL.1.2.2\lib\net45\Sockets.Plugin.dll"
Function Invoke-Cypher($query) {
$session.Run($query)
}
$authToken = [Neo4j.Driver.V1.AuthTokens]::Basic('neo4j','Password1')
$dbDriver = [Neo4j.Driver.V1.GraphDatabase]::Driver("bolt://localhost:7687",$authToken)
$session = $dbDriver.Session()
You can optionally only query certain modules or commands. By default it consumes everything
$moduleFilter = '.+'
$commandFilter = '.+'
Deletes all previous nodes and relationships
try {
# Kill everything ...
$result = Invoke-Cypher("MATCH ()-[r]-() DELETE r")
$result = Invoke-Cypher("MATCH (n) DELETE n")
Enumerates all of the commands for each module and creates the Module
and Command
nodes in the graph database.
# Create all the cmdlets and modules
Get-Module | ? { $_.Name -match $moduleFilter } | ForEach-Object -Process {
$ModuleName = $_.name
Write-Progress -Activity "Parsing $ModuleName" -Status "Importing Commands"
Invoke-Cypher("CREATE (:Module { name:'$ModuleName'})")
Get-Command -Module $_ | ? { $_.Name -match $commandFilter } | ForEach-Object -Process {
$CommandName = $_.Name
$query = "MATCH (m:Module { name:'$ModuleName'})`n" + `
"CREATE (com:Command { name:'$CommandName'})`n" + `
" SET com.commandtype = '$($_.CommandType)'`n" + `
"WITH m,com`n" + `
"CREATE (m)-[:HAS_COMMAND]->(com)"
Invoke-Cypher($query)
}
}
Enumerates the help for each command, ignoring anything that has a URL associated (Online Help). The script the creates a RELATED_LINK
relationship between the nodes.
# Create all the cmdlets and modules
Get-Module | ? { $_.Name -match $moduleFilter } | ForEach-Object -Process {
$ModuleName = $_.name
Write-Progress -Activity "Parsing $ModuleName"
Get-Command -Module $_ | ? { $_.Name -match $commandFilter } | ForEach-Object -Process {
$ThisCommandName = $_.Name
Write-Progress -Activity "Parsing $ModuleName" -Status "Creating links for $ThisCommandName"
$thisURI = $null
(Get-Help $_).relatedLinks.navigationLink | ForEach-Object -Process {
$HelpLink = $_
# Ignore anything that is a real URI
if ($HelpLink.uri -eq '') {
$ThatCommandName = $HelpLink.linkText
$query = "MATCH (this:Command { name:'$ThisCommandName'})`n" + `
"MATCH (that:Command { name:'$ThatCommandName'})`n" + `
"CREATE (this)-[:RELATED_LINK]->(that)"
Invoke-Cypher($query)
} else {
$thisURI = $HelpLink.uri
}
}
if ($thisURI -ne $null) {
$query = "MATCH (this:Command { name:'$ThisCommandName'})`n" + `
"SET this.uri = '$thisURI'`n"
Invoke-Cypher($query)
}
}
}
} finally {
$session = $null
$dbDriver = $null
}
This can take a while and may consume a lot of memory
Querying the graph
The Example-Queries.ps1
script then queries the graph database for interesting information. The Invoke-Cypher
function while strictly not needed, makes outputing the results of cypher queries eaier and look prettier.
Source Code - Example-Queries.ps1
Here’s what the output from my local computer looks like;
There were 79 modules and 1656 commands imported
C:\Source\code-glennsarti.github.io\graphing-powershell-help [add-neo4j-ps +1 ~0 -0 !]> .\Example-Queries.ps1
List the entities
-----------------
MATCH (m:Module)
WITH COUNT(m) AS ModuleCount
MATCH (c:Command)
WITH ModuleCount,COUNT(c) AS CommandCount
RETURN ModuleCount,CommandCount, SIZE(()-[:RELATED_LINK]->()) AS LinkCount
ModuleCount LinkCount CommandCount
----------- --------- ------------
79 5181 1656
The Hyper-V modules had the most number of commands.
List all modules
-----------------
MATCH (m:Module)-[:HAS_COMMAND]->(com:Command)
RETURN m.name AS ModuleName, Count(com) AS Commands
ORDER BY Count(com) DESC
LIMIT 10
ModuleName Commands
---------- --------
Hyper-V 232
Storage 154
Microsoft.PowerShell.Utility 107
Microsoft.PowerShell.Management 89
netsecurity 85
NetAdapter 68
Dism 43
MsDtc 41
NetEventPacketCapture 35
SmbShare 35
There were many commands that linked to themselves. This seems like a bad user experience e.g. For more help about Get-Process, see Get-Process.
Commands that link themselves
-----------------
MATCH (this:Command)-[:RELATED_LINK]->(that:Command)
WHERE this = that
RETURN this.name
LIMIT 10
this.name
---------
Get-DnsClientNrptPolicy
Debug-Process
Get-Process
Limit-EventLog
New-EventLog
Remove-Computer
Remove-EventLog
Stop-Process
Wait-Process
Write-EventLog
Many commands had no inbound or outbound links which again, may be a bad user experience.
Commands that have no links at all (Hermits)
-----------------
MATCH (this:Command)
WHERE NOT ( (this)-[:RELATED_LINK]-(:Command) )
RETURN this.name
LIMIT 10
this.name
---------
Use-WindowsUnattend
Resolve-DnsName
New-EtwTraceSession
Grant-HgsKeyProtectorAccess
Revoke-HgsKeyProtectorAccess
Get-HgsAttestationBaselinePolicy
Add-VMAssignableDevice
Add-VMDvdDrive
Add-VMFibreChannelHba
Add-VMGpuPartitionAdapter
Surprisingly the Save-NetGPO and Open-NetGPO were the most popular commands, whereby they had vastly more commands linking to them, than they did linking to others.
Popular Commands. More inbound links than outbound
-----------------
MATCH (this:Command)
WHERE SIZE( (this)<-[:RELATED_LINK]-(:Command) ) >
SIZE( (this)-[:RELATED_LINK]->(:Command) )
RETURN
this.name,
SIZE( (this)<-[:RELATED_LINK]-(:Command) ) AS InboundLinks,
SIZE( (this)-[:RELATED_LINK]->(:Command) ) AS OutboundLinks,
(SIZE( (this)<-[:RELATED_LINK]-(:Command) ) - SIZE( (this)-[:RELATED_LINK]->(:Command) )) AS LinksDiff
ORDER BY LinksDiff DESC
LIMIT 10
LinksDiff OutboundLinks this.name InboundLinks
--------- ------------- --------- ------------
57 3 Save-NetGPO 60
55 5 Open-NetGPO 60
15 4 Get-NetFirewallProfile 19
14 0 Get-VM 14
14 12 Get-NetFirewallAddressFilter 26
13 6 Get-NetFirewallPortFilter 19
12 1 Disable-BC 13
12 1 Reset-BC 13
12 7 Get-Disk 19
12 7 Set-NetIPsecRule 19
Also surprising was that were many cases of commands linking to commands in a different module.
Links across Modules
-----------------
MATCH (thismodule:Module)-[:HAS_COMMAND]->(this:Command)-[r:RELATED_LINK]->(that:Command)<-[:HAS_COMMAND]-(thatmodule:Module)
WHERE (this <> that) AND (thismodule <> thatmodule)
RETURN
("(" + thismodule.name + ") " + this.name) AS From,
("(" + thatmodule.name + ") " + that.name) AS To
LIMIT 10
From To
---- --
(Microsoft.PowerShell.Utility) Get-Culture (International) Set-Culture
(PKI) Switch-Certificate (Microsoft.PowerShell.Management) Set-Location
(PKI) Import-PfxCertificate (Microsoft.PowerShell.Management) Set-Location
(PKI) Import-Certificate (Microsoft.PowerShell.Management) Set-Location
(PKI) Get-Certificate (Microsoft.PowerShell.Management) Set-Location
(PKI) Test-Certificate (Microsoft.PowerShell.Management) Get-ChildItem
(PKI) Switch-Certificate (Microsoft.PowerShell.Management) Get-ChildItem
(PKI) Import-PfxCertificate (Microsoft.PowerShell.Management) Get-ChildItem
(PKI) Import-Certificate (Microsoft.PowerShell.Management) Get-ChildItem
(PKI) Get-Certificate (Microsoft.PowerShell.Management) Get-ChildItem
To show the recommendation engine side the Graph Database, inside of the database I created a module called Glenn
with a single command called Invoke-Glenn
which had a link to the New-JobTrigger
command. I then used a simple recommendation query to suggest other commands I should link too as well.
MATCH (that:Command {name: 'New-JobTrigger'})
MERGE (mod:Module { name: 'Glenn'})-[:HAS_COMMAND]->(glenn:Command {name: 'Invoke-Glenn'})
MERGE (glenn)-[:RELATED_LINK]->(that)
Recommendation Engine (New-JobTrigger)
-----------------
MATCH
(this:Command)-[:RELATED_LINK*2..4]->(recommend:Command)
WHERE
(this.name = 'Invoke-Glenn')
AND (this <> recommend)
AND NOT ( (this)-[:RELATED_LINK]->(recommend) )
WITH
recommend, COUNT(recommend) AS Popularity
RETURN
recommend.name AS CommandName,Popularity
ORDER BY Popularity DESC
LIMIT 5
Popularity CommandName
---------- -----------
270 Set-ScheduledJobOption
270 Get-JobTrigger
270 Disable-ScheduledJob
270 Remove-JobTrigger
270 Register-ScheduledJob
Closing thoughts
The point of this blog/presentation was not really about how Neo4j and PowerShell play together. It was more that people can use their PowerShell skills for more than just system administration. There are many open source community projects that lack Windows skills which we in the PowerShell community can help with, and hopefully bring awesome products, like Neo4j into the Windows fold.
Comments