Docker, so hot right now

Docker Docker Docker!

With the release of Windows Server 2016 came Windows Containers. This is not Docker-For-Windows which is still using a Linux based operating system to run containers, but running containers on Windows with a Windows kernel.

As a bit of an excuse to dive into this new technology from Microsoft, I decided to try running the Neo4j Graph Database within a Windows Container. Neo4j already release official docker based images but as yet, there are no Windows based containers available.

Disclaimer - This is my first real attempt at using containers so what I created should only be considered Proof-of-Concept quality

So what are Windows Containers?

Windows Containers are just like their Linux based Containers, but run on Windows kernel.

What are Containers?

They are an isolated, resource controlled, and portable operating environment.

Basically, a container is an isolated place where an application can run without affecting the rest of the system and without the system affecting the application. Containers are the next evolution in virtualization.

If you were inside a container, it would look very much like you were inside a freshly installed physical computer or a virtual machine. And, to Docker, a Windows Server Container can be managed in the same way as any other container.

Source - Windows Containers

High level steps

  • Install Windows Containers

  • Create a container for Neo4j (Enterprise v3.0.6)

  • Create a three node cluster of Neo4j containers

All source code for this blog is available at;

https://github.com/glennsarti/code-glennsarti.github.io/tree/master/neo4j-windows-containers

Install Windows Containers

I tried installing Windows Containers on my local Windows 10 laptop, but given I already use a Beta version of the Docker for Windows application, I decided to go the less risky route of quickly creating a Server 2016 VM on my laptop using Hyper-V. Once I had a VM I merely followed the installation instructions by Microsoft at https://msdn.microsoft.com/en-us/virtualization/windowscontainers/deployment/deployment.

My Hyper-V internal network uses 192.168.200.0/24 for it’s address. You will see this throughout the examples below. The Server 2016 VM was given an IP Address of 192.168.200.50

I could then use the docker client on my laptop to access the docker host in the VM simply by setting the environment variable before doing any docker calls.

PS> $ENV:DOCKER_HOST = 'tcp://192.168.200.50:2375'
PS> docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 2
Server Version: 1.12.2-cs2-ws-beta
Storage Driver: windowsfilter
 Windows:
Logging Driver: json-file
Plugins:
 Volume: local
 Network: nat null overlay transparent
Swarm: inactive
Security Options:
Kernel Version: 10.0 14393 (14393.447.amd64fre.rs1_release_inmarket.161102-0100)
Operating System: Windows Server 2016 Standard
OSType: windows
Architecture: x86_64
CPUs: 2
Total Memory: 767 MiB
Name: WIN-07K1J1J7NLN
ID: ********
Docker Root Dir: C:\ProgramData\docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Insecure Registries:
 127.0.0.0/8

There are 2 images because I had already pulled down the microsoft/nanoserver and microsoft/windowsservercore images

The docker CLI is the same whether it is accessing a Windows or Linux based container host.

Creating a DockerFile for Neo4j in a Windows Container

I used the DockerFile from Neo4j as the basis for building a image for Windows, but instead used the microsoft/nanoserver as the base image. However I came across an issue. When running docker build it was choosing a network which did not have access to the internet thus I couldn’t download the required files from within the container.

Fortunately this was an easy problem to solve. Using the powershell code in the Neo4j Chocolatey Packages as a base, I created a quick PowerShell script which downloaded and extracted a Neo4j Enterprise source Zip file and a Java JRE Tarball. I also added an entrypoint PowerShell script (docker-entrypoint.ps1) which is run when an image started. These were all placed into a context directory and could then be consumed by docker during a build.

build-neo4jent-context.ps1

Source

This script needs to be run before the docker image can be built. As stated earlier, it is responsible for downlading the Neo4j Enterprise and Java distributions. It also sets up the Dockerfile and entrypoint scripts.

Dockerfile

Source

The base image is microsoft/nanoserver because, while there are some Windows Containers which already contain Java available on the Docker Hub, I wanted to build an image from scratch to better understand how they work.

FROM microsoft/nanoserver

MAINTAINER glennsarti

LABEL Description="Neo4j Enterprise" Vendor="Neo Technologies" Version="3.0.6"

ENTRYPOINT ["powershell.exe","C:/neo4j/docker-entrypoint.ps1"]
CMD ["neo4j"]

COPY neo4j C:/neo4j

# Note - As we're using a transparent network, exposing ports isn't really required.  They are here
# for when we will use a different network driver.

# 7474 = HTTP Neo4j Connector
# 7473 = HTTPS Neo4j Connector
# 7687 = Bolt Connector
# 1337 = Neo4j Shell Port

EXPOSE 7474 7473 7687 1337

This is a fairly simple Dockerfile as most of the logic is contained in the build-neo4jent-context.ps1 and docker-entrypoint.ps1 scripts

As this container is based on Microsoft Nano server, it is running PowerShell Core inside the container.

docker-entrypoint.ps1

Source

This entry point file somewhat resembles the docker-entrypoint.sh bash script used in the offical Neo4j Docker container.

  • You pass in the neo4j command to run Neo4j

  • You pass in the dump-config command to display the current neo4j configuration command

  • You pass in any other string to run an arbitrary command in the image e.g. cmd.exe or powershell.exe

  • You modify the behaviour of the image by using environment variables:

    Neo4j HA variables

    • NEO4J_dbms_mode
    • NEO4J_ha_serverId
    • NEO4J_ha_host_data
    • NEO4J_ha_host_coordination
    • NEO4J_ha_initialHosts

    Windows specific

    • NEO4J_startup_delay - Delays the start of the Neo4j Server for the specified number of seconds

For example;

PS> docker run -it -e NEO4J_startup_delay=10 neo4j_enterprise neo4j

Would run the container and start a Neo4j single server (not HA) after a delay of 10 seconds

Container network sadness :-(

Windows Container Networking

Problem #1 - NAT Networks

Creating a container was simple. Running a single image and then connecting to the Neo4j Browser was simple. Creating two images and getting them to form a Neo4j Cluster was a frustrating experience. But to be fair it was mainly my fault, not Neo4j’s or Windows Containters’. The default networking method in Windows Continers is to use a NAT based gateway. This means that the internal and external IP address of a Neo4j container are different. Unfortunately, in Neo4j HA, the configuration parameter used to control the HA server only has one setting: ha_host_coordination. This setting tells Neo4j which network interface to bind to when starting up, and the name of the server that other cluster members will use to contact it.

Neo4j 3.1 addresses this issue and has different settings for binding and advertising interfaces.

Unfortunately in a NAT setup the inside and outside addresses are different so I had to use a different network type. Fortunately Windows Containers also have a Transparent network type, which basically creates a virtual network adapter which connects to the hosts external interface. This can then be given an IP Address via DHCP or staticly assigned via Docker.

Problem #2 - DHCP

I already have a DHCP server running on my local laptop for my Hyper-V Virtual Machines (Doesn’t everyone?) so that bit was done. I created a new docker network, started a new container on the network and ran ipconfig:

PS> docker network create -d transparent TransparentNetwork

PS> docker run -it --network=TransparentNetwork neo4j_enterprise:3.0.6 cmd.exe

C:\> ipconfig

Microsoft Windows [Version 10.0.14393]
(c) 2016 Microsoft Corporation. All rights reserved.

C:\>ipconfig

Windows IP Configuration

Ethernet adapter vEthernet (Container NIC 22c364f7):

   Connection-specific DNS Suffix  . : localdomain
   Link-local IPv6 Address . . . . . : fe80::75c2:3322:7679:a0f4%23
   IPv4 Address. . . . . . . . . . . : 192.168.200.53
   Subnet Mask . . . . . . . . . . . : 255.255.255.0
   Default Gateway . . . . . . . . . : 192.168.200.1

Success, my container had an external IP address!

So I then inspected the docker container to make sure it knew what the IP was:

PS> docker inspect <containerid>

...
            "Networks": {
                "TransparentNetwork": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": [
                        "0d5faefd4669"
                    ],
                    "NetworkID": "18a299ba8b47bfed4245028612bb6207de6803bcce77aae4f2ba24f4c8ed599f",
                    "EndpointID": "eb7fc3a83c728f85cf44dfe72ab86dc38a134e7f212d08dbd3b7755eb3d2aea7",
                    "Gateway": "",
                    "IPAddress": "",
                    "IPPrefixLen": 0,
                    "IPv6Gateway": "",
...

Well, it didn’t. Fortunately you can configure the transparent network to use static IP Addresses. So I removed the network I just made and created a new network:

PS> docker network rm TransparentNetwork

PS> docker network create -d transparent --subnet=192.168.200.0/24 --gateway=192.168.200.1 TransparentNetwork

And then tried the container on this new network, with a static IP of 192.168.200.100.

PS> docker run -it --network=TransparentNetwork --ip=192.168.200.100 neo4j_enterprise:3.0.6 cmd.exe

C:\> ipconfig

Windows IP Configuration

Ethernet adapter vEthernet (Container NIC a5797484):

   Connection-specific DNS Suffix  . :
   Link-local IPv6 Address . . . . . : fe80::c079:766:6979:ee05%23
   IPv4 Address. . . . . . . . . . . : 192.168.200.100
   Subnet Mask . . . . . . . . . . . : 255.255.255.0
   Default Gateway . . . . . . . . . : 192.168.200.1

And then checked the container network in docker

PS> docker inspect <containerid>

...
          "Networks": {
                "TransparentNetwork": {
                    "IPAMConfig": {
                        "IPv4Address": "192.168.200.100"
                    },
                    "Links": null,
                    "Aliases": [
                        "8f1bb6d34fc3"
                    ],
                    "NetworkID": "adc501bbb3b2972f8f87c0b13c38f05d7568b5dc4b2b86bb42e2227f95ab6dac",
                    "EndpointID": "c89a76e6099d1d8224fe02ddbbafeba9c4b4541d66f5d708d78b0d4edaa3e26a",
                    "Gateway": "",
                    "IPAddress": "192.168.200.100",
                    "IPPrefixLen": 24,
                    "IPv6Gateway": "",
...

Success!

Enter docker-compose

So now I had a docker image to run Neo4j, and a network to attach it to, however, I really wanted to create a three node cluster. While I could just run three docker commands, like Neo4j does in its documentation surely there should be a way to define this. A quick search and I found docker-compose

Compose is a tool for defining and running multi-container Docker applications. With Compose, you use a Compose file to configure your application’s services. Then, using a single command, you create and start all the services from your configuration.

Source - Docker Compose

This sounded exactly what I needed! As I had already been playing with Docker for Windows, I already had docker-compose installed. If not you can install it from Github Release Pages. I quickly created a docker compose configuration file

docker-compose.yml

Source

This simple configuration file will;

  • Version 2.1 is compatible with Windows Containers
  • Create a three node Neo4j cluser, with one member acting as an arbiter instance
  • Use staticly assigned IP addresses
  • Will build the neo4j_enterprise contatiner if it doesn’t exist already
  • Will use the docker network called TransparentNetwork
  • Adds in dependency information so that the primary docker image must be started before the remaining cluster instances
version: '2.1'

services:

  # Initial cluster member
  neo4j_1:
    build: ./context
    image: neo4j_enterprise:3.0.6
    entrypoint: "powershell C:/neo4j/docker-entrypoint.ps1"
    environment:
      NEO4J_dbms_mode: "HA"
      NEO4J_ha_serverId: "1"
      NEO4J_ha_initialHosts: "192.168.200.201:5001"
    networks:
      neo4jcluster:
        ipv4_address: "192.168.200.201"

  # Additional cluster member
  neo4j_2:
    depends_on:
      - neo4j_1
    image: neo4j_enterprise:3.0.6
    entrypoint: "powershell C:/neo4j/docker-entrypoint.ps1"
    environment:
      NEO4J_startup_delay: "15"
      NEO4J_dbms_mode: "HA"
      NEO4J_ha_serverId: "2"
      NEO4J_ha_initialHosts: "192.168.200.201:5001"
    networks:
      neo4jcluster:
        ipv4_address: "192.168.200.202"

  # Arbiter instance only
  neo4j_3:
    depends_on:
      - neo4j_1
    image: neo4j_enterprise:3.0.6
    entrypoint: "powershell C:/neo4j/docker-entrypoint.ps1"
    environment:
      NEO4J_startup_delay: "15"
      NEO4J_dbms_mode: "ARBITER"
      NEO4J_ha_serverId: "3"
      NEO4J_ha_initialHosts: "192.168.200.201:5001"
    networks:
      neo4jcluster:
        ipv4_address: "192.168.200.203"

# Externally defined transparent network
networks:
  neo4jcluster:
    external:
      name: TransparentNetwork

To create the cluster it is simply a matter of running docker-compose up. I’ve truncated the output so it’s easier to read:

PS> docker compose up

Building neo4j_1
Step 1/7 : FROM microsoft/nanoserver
 ---> 787d9f9f8804
...
Step 6/7 : COPY neo4j C:/neo4j
Step 7/7 : EXPOSE 7474 7473 7687 1337
 ---> Running in b59d5c78cb77
 ---> 811e3a6e8e18
Removing intermediate container b59d5c78cb77
Successfully built 811e3a6e8e18
WARNING: Image for service neo4j_1 was built because it did not already exist. To rebuild this image you must use `docker-compose build` or `docker-compose up --build`.
Creating neo4jwindowscontainers_neo4j_1_1
Creating neo4jwindowscontainers_neo4j_2_1
Creating neo4jwindowscontainers_neo4j_3_1
Attaching to neo4jwindowscontainers_neo4j_1_1, neo4jwindowscontainers_neo4j_2_1, neo4jwindowscontainers_neo4j_3_1
Attaching to neo4jwindowscontainers_neo4j_1_1, neo4jwindowscontainers_neo4j_2_1, neo4jwindowscontainers_neo4j_3_1
neo4j_2_1  | Container IP Address is 192.168.200.202
neo4j_3_1  | Container IP Address is 192.168.200.203
neo4j_1_1  | Container IP Address is 192.168.200.201
neo4j_2_1  | Detected environment variable NEO4J_dbms_mode
...
neo4j_3_1  | 2016-12-04 04:32:50.522+0000 INFO  [o.n.c.c.ClusterJoin] Attempting to join cluster of [192.168.200.201:5001]
neo4j_1_1  | 2016-12-04 04:32:50.770+0000 INFO  Could not join cluster of [192.168.200.201:5001]
neo4j_1_1  | 2016-12-04 04:32:50.832+0000 INFO  Creating new cluster with name [neo4j.ha]...
neo4j_1_1  | 2016-12-04 04:32:50.963+0000 INFO  Instance 1 (this server)  joined the cluster
neo4j_1_1  | 2016-12-04 04:32:51.256+0000 INFO  Instance 1 (this server)  was elected as coordinator
neo4j_1_1  | 2016-12-04 04:32:51.488+0000 INFO  Instance 1 (this server)  was elected as coordinator
neo4j_1_1  | 2016-12-04 04:32:51.614+0000 INFO  I am 1, moving to master
neo4j_2_1  | 2016-12-04 04:32:52.020+0000 INFO  Starting...
neo4j_1_1  | 2016-12-04 04:32:52.112+0000 INFO  Instance 3  joined the cluster
neo4j_3_1  | 2016-12-04 04:32:52.133+0000 INFO  [o.n.c.c.ClusterJoin] Joined cluster: Name:neo4j.ha Nodes:{1=cluster://192.168.200.201:5001, 3=cluster://192.168.200.203:5001} Roles:{coordinator=1}
neo4j_1_1  | 2016-12-04 04:32:52.568+0000 INFO  I am 1, successfully moved to master
neo4j_1_1  | 2016-12-04 04:32:52.754+0000 INFO  Instance 1 (this server)  is available
...
neo4j_1_1  | 2016-12-04 04:33:02.597+0000 INFO  Instance 1 (this server)  was elected as coordinator
neo4j_2_1  | 2016-12-04 04:33:02.605+0000 INFO  Instance 1  was elected as coordinator
neo4j_1_1  | 2016-12-04 04:33:02.747+0000 INFO  Instance 1 (this server)  is available as master at ha://192.168.200.201:6001?serverId=1 with StoreId{creationTime=1480825964492, randomId=1207142648701028915, storeVersion=15531981201765894, upgradeTime=1480825964492, upgradeId=1}
neo4j_1_1  | 2016-12-04 04:33:02.853+0000 INFO  Instance 1 (this server)  is available as backup at backup://127.0.0.1:6362 with StoreId{creationTime=1480825964492, randomId=1207142648701028915, storeVersion=15531981201765894, upgradeTime=1480825964492, upgradeId=1}
neo4j_2_1  | 2016-12-04 04:33:02.892+0000 INFO  Instance 1  is available as master at ha://192.168.200.201:6001?serverId=1 with StoreId{creationTime=1480825964492, randomId=1207142648701028915, storeVersion=15531981201765894, upgradeTime=1480825964492, upgradeId=1}
neo4j_2_1  | 2016-12-04 04:33:03.054+0000 INFO  Instance 1  is available as backup at backup://127.0.0.1:6362 with StoreId{creationTime=1480825964492, randomId=1207142648701028915, storeVersion=15531981201765894, upgradeTime=1480825964492, upgradeId=1}
neo4j_2_1  | 2016-12-04 04:33:03.565+0000 INFO  ServerId 2, moving to slave for master ha://192.168.200.201:6001?serverId=1
neo4j_1_1  | 2016-12-04 04:33:03.634+0000 INFO  Started.
neo4j_2_1  | 2016-12-04 04:33:04.070+0000 INFO  Checking store consistency with master
neo4j_2_1  | 2016-12-04 04:33:04.080+0000 INFO  The store does not represent the same database as master. Will remove and fetch a new one from master
neo4j_1_1  | 2016-12-04 04:33:04.218+0000 INFO  Instance 2  is unavailable as backup
neo4j_2_1  | 2016-12-04 04:33:04.219+0000 INFO  Instance 2 (this server)  is unavailable as backup
...
neo4j_2_1  | 2016-12-04 04:33:06.048+0000 INFO  Copying schema\label\lucene\labelStore\1\segments_1
neo4j_2_1  | 2016-12-04 04:33:06.626+0000 INFO  Copied schema\label\lucene\labelStore\1\segments_1 71.00 B
neo4j_2_1  | 2016-12-04 04:33:06.637+0000 INFO  Done, copied 17 files
neo4j_1_1  | 2016-12-04 04:33:07.753+0000 INFO  Remote interface available at http://0.0.0.0:7474/
neo4j_2_1  | 2016-12-04 04:33:10.236+0000 INFO  Finished copying store from master
neo4j_2_1  | 2016-12-04 04:33:10.307+0000 INFO  Checking store consistency with master
...
neo4j_2_1  | 2016-12-04 04:33:16.086+0000 DEBUG Mounting servlet at [/db/manage]
neo4j_2_1  | 2016-12-04 04:33:16.126+0000 DEBUG Mounting servlet at [/db/data]
neo4j_2_1  | 2016-12-04 04:33:16.182+0000 DEBUG Mounting servlet at [/]
neo4j_2_1  | 2016-12-04 04:33:17.887+0000 INFO  Remote interface available at http://0.0.0.0:7474/

One of the unexpected things with docker-compose is that all of the output from all three servers is aggregate so you can easily see what happens when a node joins the cluster.

Simulating a cluster node failure

We can easily emulate a cluster node failure by stopping one of the containers.

PS> docker ps
CONTAINER ID        IMAGE                    COMMAND                  CREATED             STATUS              PORTS                               NAMES
4efbfaf7683b        neo4j_enterprise:3.0.6   "powershell.exe C:..."   6 minutes ago       Up 6 minutes        1337/tcp, 7473-7474/tcp, 7687/tcp   neo4jwindowscontainers_neo4j_3_1
19941e8d0ea1        neo4j_enterprise:3.0.6   "powershell.exe C:..."   6 minutes ago       Up 6 minutes        1337/tcp, 7473-7474/tcp, 7687/tcp   neo4jwindowscontainers_neo4j_2_1
fa179e91dab3        neo4j_enterprise:3.0.6   "powershell.exe C:..."   6 minutes ago       Up 6 minutes        1337/tcp, 7473-7474/tcp, 7687/tcp   neo4jwindowscontainers_neo4j_1_1
C:\Source> docker stop neo4jwindowscontainers_neo4j_1_1
neo4jwindowscontainers_neo4j_1_1
PS>

In the docker-compose output we see:

neo4jwindowscontainers_neo4j_1_1 exited with code 1067
neo4j_2_1  | 2016-12-04 04:38:16.907+0000 INFO  Instance 1  has failed
neo4j_2_1  | 2016-12-04 04:38:17.009+0000 INFO  Instance 2 (this server)  was elected as coordinator
neo4j_2_1  | 2016-12-04 04:38:17.031+0000 INFO  I am 2, moving to master
neo4j_2_1  | 2016-12-04 04:38:17.347+0000 INFO  I am 2, successfully moved to master
neo4j_2_1  | 2016-12-04 04:38:17.371+0000 INFO  Instance 2 (this server)  is available as master at ha://192.168.200.202:6001?serverId=2 with StoreId{creationTime=1480825964492, randomId=1207142648701028915, storeVersion=15531981201765894, upgradeTime=1480825964492, upgradeId=1}
neo4j_2_1  | 2016-12-04 04:38:18.309+0000 INFO  Instance 2 (this server)  is available as backup at backup://127.0.0.1:6362 with StoreId{creationTime=1480825964492, randomId=1207142648701028915, storeVersion=15531981201765894, upgradeTime=1480825964492, upgradeId=1}

The container stopped and then neo4j_2_1 was re-elected to master

We can then restart the failed node

PS> docker restart neo4jwindowscontainers_neo4j_1_1
neo4jwindowscontainers_neo4j_1_1
PS>

In the docker-compose output we see:

neo4j_1_1  | Container IP Address is 192.168.200.201
neo4j_1_1  | Detected environment variable NEO4J_dbms_mode
neo4j_1_1  | Detected environment variable NEO4J_ha_serverId
neo4j_1_1  | Detected environment variable NEO4J_ha_initialHosts
neo4j_1_1  | VERBOSE: Neo4j Root is 'C:\neo4j'
neo4j_1_1  | VERBOSE: Neo4j Server Type is 'Enterprise'
neo4j_1_1  | VERBOSE: Neo4j Version is '3.0.6'
...
neo4j_1_1  | 2016-12-04 04:32:50.963+0000 INFO  Instance 1 (this server)  joined the cluster
neo4j_1_1  | 2016-12-04 04:32:51.256+0000 INFO  Instance 1 (this server)  was elected as coordinator
neo4j_1_1  | 2016-12-04 04:32:51.488+0000 INFO  Instance 1 (this server)  was elected as coordinator
neo4j_1_1  | 2016-12-04 04:32:51.614+0000 INFO  I am 1, moving to master
neo4j_1_1  | 2016-12-04 04:32:52.112+0000 INFO  Instance 3  joined the cluster
neo4j_1_1  | 2016-12-04 04:32:52.568+0000 INFO  I am 1, successfully moved to master
...
neo4j_1_1  | 2016-12-04 04:33:07.753+0000 INFO  Remote interface available at http://0.0.0.0:7474/
neo4j_1_1  | 2016-12-04 04:33:10.663+0000 INFO  Instance 2  is available as slave at ha://192.168.200.202:6001?serverId=2 with StoreId{creationTime=1480825964492, randomId=1207142648701028915, storeVersion=15531981201765894, upgradeTime=1480825964492, upgradeId=1}

The container restarted, and was re-elected to master

Cleaning up

Once we finished with our containers, simply press Ctrl-C and docker-compose will stop all containers, and then docker-compose down will cleanup any created containers, volumes or networks. It will not remove images though.

Gracefully stopping... (press Ctrl+C again to force)
Stopping neo4jwindowscontainers_neo4j_3_1 ... done
Stopping neo4jwindowscontainers_neo4j_2_1 ... done
Stopping neo4jwindowscontainers_neo4j_1_1 ... done
PS> docker-compose down
Removing neo4jwindowscontainers_neo4j_3_1 ... done
Removing neo4jwindowscontainers_neo4j_2_1 ... done
Removing neo4jwindowscontainers_neo4j_1_1 ... done
Network TransparentNetwork is external, skipping
PS>

Conclusion

After a bit of a battle with networking, I can now quickly spin up a three node Neo4j cluster, with an arbiter instance, all on Windows and then destroy it easily when I’m done!

Thanks to Ben Butler-Cole and Mark Needham for helping me out with some Neo4j issues.

Comments