bonding, bridging, and port density

Today is the day. You finally got 10Gbe networking! Boy is it expensive though. The price-per-port is very high, $500-$1000 on the low end. Add to that, “production-grade” things usually need to be highly-available. This begs the question, how can we best take advantage of this costly networking?

Enter bonding, also known as nic or channel teaming, and link aggregation. Bonding can solve the issue of high availability with several different modes of operation. The most common ones are active-backup and LACP. As far as taking advantage of ports, active-backup is the worst. It provides only high availability. This leaves an entire port unused until a failure happens. Given the price-per-port, this is not ideal. A better solution is LACP since it will use both interfaces to send and recieve. But that requires switch support and has a protocol overhead. This means bonding 2 interfaces with LACP does not yield 2x the bandwidth.

For me, the solutions above were less than ideal. I have a home lab with a small 8 port 10Gbe switch (XS708E) and every port counts. I have three servers with two 10Gbe nics each (Intel X540-T2). I simply can’t afford (figuratively and literally) to let a single interface sit unused with active-backup bonding, and LACP support on the switch was difficult to use and configure. This led me to using all of my ports without high availability, without bonding of any kind. But, I had an idea….

I have figured out a way to have each interface essentially in two active-backup bonds. This allows me to use each interface 100% without affecting the other unless an interface has failed, at which point the traffic is merged together into a single physical port. It looks like this:


In this example I have two physical interfaces, eth2 and eth3. The rest of the interfaces we will be creating now.

I start by creating two bridges:

# ip l a br0 type bridge
# ip l a br1 type bridge

I then create two bonds:

# ip l a bond0 type bond
# ip l a bond1 type bond

I then have to create my veth pairs for linking the bonds to the bridges.

# ip l a veth00 type veth peer name veth01
# ip l a veth10 type veth peer name veth11

Finally, we need to plug in all the veth pairs and add our interfaces to the bond.

# ifenslave bond0 eth2 veth00
# ifenslave bond1 eth3 veth10
# brctl addif br0 bond0
# brctl addif br0 veth11
# brctl addif br1 bond1
# brctl addif br1 veth01

And there you have it. Fancy networking. You’ll want to address the bridges as if they are the physical interfaces. In my case, br0 == eth2, and br1 == eth3. Things to check on if you are having issues:

  • bonds are in active-backup mode
  • all bonds, interfaces, bridges, and veth pairs are in state UP
  • the physical interface is set as the primary interface in each bond

I am not sure whether I will stick with this configuration in the long run, but that is what I have been running for a little while now and it is working as well as I could hope. There is no noticeable delay when either interface drops and the failover occurs as it would in a normal bonding setup (polling defaults to 500ms). There is also no measurable performance degradation when failing over and having to traverse two bridges. Overall, I am very happy with the entire setup.

Bonus: Here are the commands used to implement this in openvswitch. One caveat is that there is no way I could find to enforce a default or primary bond member, so you must run the `ovs-appctl` command to set the active slave in the bond after a failover. I solved this with a simple timer/cron job in systemd set to run every minute. Not ideal, but certainly functional.

# ovs-vsctl add-br br0
# ovs-vsctl add-br br0
# ovs-vsctl add-bond br0 bond0 eth2 veth00 -- set port bond0 bond_mode=active-backup -- set interface veth00 type=patch options:peer=veth01
# ovs-vsctl add-bond br1 bond1 eth3 veth10 -- set port bond1 bond_mode=active-backup -- set interface veth10 type=patch options:peer=veth11
# ovs-vsctl add-port br0 veth11 -- set interface veth11 type=patch options:peer=veth10
# ovs-vsctl add-port br1 veth01 -- set interface veth01 type=patch options:peer=veth00
# ovs-appctl bond/set-active-slave bond0 eth2
# ovs-appctl bond/set-active-slave bond1 eth3

letsencrypt, haproxy, and auto-renewal

If you have not heard about letsencrypt it is an amazing, and free, certificate authority. It proves free (as in beer) ssl certificates for anyone who can prove they own the domain. There is a little helper utility that can you can use to help you get a cert called certbot. The concept is pretty simple, a quick breakdown looks like this:

  1. Client says to Server “I want a cert for x.y.z domain”
  2. Server says verify you own this domain by serving file “/.well-known/acme-challenge/1234567890abcdef” from x.y.z domain
  3. Client setups the file as requested
  4. Server verifies file exists
  5. Server issues certificate for x.y.z domain

I have glossed over the massive amount of research and security involved in doing all of this, but that is the general concept. Of note, the certificate is only valid for 3 months and cannot be a wildcard cert.

Now lets talk about the issue. Once you receive your certificate you then use it on your webserver or application you are consuming the ports letsencrypt is expecting to use, namely 80 and 443. How do you renew the certificate without stopping the services? Enter Haproxy. If you happen to be loadbalancing through haproxy, you are in luck! You can host your site _and_ still do proper renewals with no downtime. The way it works is quite simple, haproxy can check certain things about the request and trigger conditions based on that. In this case we will be testing for if the URI begins with “/.well-known/acme-challenge”. If it does, we know to forward that to our certbot client. Here is how the haproxy config looks.

frontend ssl_redirector
    bind ssl crt /etc/haproxy/ssl/
    http-request del-header X-Forwarded-Proto
    http-request set-header X-Forwarded-Proto https if { ssl_fc }

    # Check if this is a letsencrypt request based on URI
    acl letsencrypt-request path_beg -i /.well-known/acme-challenge/
    # Send to letsencrypt-backend if it is a letsencrypt-request
    use_backend letsencrypt_backend if letsencrypt-request

    default_backend website_backend

frontend http_redirect
    # Redirect to HTTPS if this is not a letsencrypt-request
    redirect scheme https code 301 if !letsencrypt-request

    # Check if this is a letsencrypt request based on URI
    acl letsencrypt-request path_beg -i /.well-known/acme-challenge/
    # Send to letsencrypt-backend if it is a letsencrypt-request
    use_backend letsencrypt_backend if letsencrypt-request

backend letsencrypt_backend
    server letsencrypt

backend website_backend
    server server01
    server server02

Lets go through each section. The first section, ‘ssl_redirector’, listens on public ip and port 443. It has all certs it can server in /etc/haproxy/ssl/. It sets the X-Forward-Proto to https (some applications may require this). The next step is the meat of the issue we are solving. The acl checks if the path begins with “/.well-known/acme-challenge/” and if it does it sends it to the “backend letsencrypt_backend” section. All of this is the same for the ‘http_redirect’ section. If it doesn’t detect anything letsencrypt related, it forwards the request to one of the ‘website_backend’ servers like normal.

So that’s it. Haproxy will now detect and forward letsencrypt requests to a server located at “”. Now its time to setup that server.

I wrote a little bash script to do a cert renewal using a docker container I created. The docker container is samyaple/certbot. It is based on the github repo SamYaple/certbot. It builds automatically in DockerHub when I push changes to the github repo, which is pretty sweet. That’s a subject for another time, though. The script I use for autorenewing is here. I’ve left comments throughout the script to explain why certain code gets run.


set -o errexit


# This should only run when fetching a new cert
function http_failback {
    docker run --rm -v /etc/letsencrypt:/etc/letsencrypt -p samyaple/certbot:v0.8.1 --standalone --standalone-supported-challenges http-01 --http-01-port 49494 -d ${FQDN}

function fetch_certs {
    # If SNI fails, fail back to http authorization
    docker run --rm -v /etc/letsencrypt:/etc/letsencrypt -p samyaple/certbot:v0.8.1 --standalone --standalone-supported-challenges tls-sni-01 --tls-sni-01-port 49494 -d ${FQDN} || http_failback

function install_certs {
    if [[ -e "/etc/letsencrypt/live/${FQDN}/fullchain.pem" ]]; then
        cat /etc/letsencrypt/live/${FQDN}/{fullchain.pem,privkey.pem} > /etc/haproxy/ssl/${FQDN}.pem


systemctl reload haproxy

You execute this script with the paramater of your domain name and you are golden. It should create/renew your ssl cert and then reload haproxy. This can be made into a cronjob or simply run every 2 months to ensure renewal before the cert expires.

`./` will produce a file that haproxy will be able to use. Magic!

bcache, partitions, and DKMS

bcache is a fantastic way to speed up your system. The idea is simple: You have a fast-but-small SSD and a slow-but-large HDD and you want to cache the HDD with the SSD for better performance. This is what bcache was designed for and it does it’s job very, very well.

It has been in the kernel since 3.10 and according to the author, Kent Overstreet, it has been stable since 2012. I have personally been using it since it was announced as “stable” and haven’t ever had any data corruption issues with it. I even wrote a how-to on it a few years back. I highly recommend it and I am actually a bit shocked it hasn’t gotten more attention from the community at large. My guess is that the reason for that is it requires you to reformat your disk (or use a tool to convert the disk by shuffling metadata blocks) and this turns some people off of the idea.

One of the pain points at the time of this writing I just ran into is bcache devices are not setup to allow partitions. This is due to only being allocated 1 minor number when created in the kernel. Normally this would be ok, you could use tools like kpartx to generate a target for that partition. But in this case I needed a true partition that was predictable and autogenerated by the kernel for use with some existing tooling that expected to be able to take a raw block device and create a partition table and then address the partition on it. That tool was ceph-disk while playing around with the new bluestore OSD backend in Jewel.

The fix for this only-allocates-one-minor number is really quite simple, two lines of code. The issue is compiling the new module and using it. I long ago stopped rolling my own kernel and now-a-days I only do so when I am testing a new feature (like bcachefs). That leaves me with a situation where even if I did patch this partition “issue” upstream, I wouldn’t be able to consume that on a stable kernel for a few years.

DKMS saves the day! You have likely had some experience with DKMS in the past, perhaps when compiling zfs.ko or some proprietary graphics module. It is normally a smooth procedure and you don’t even notice you are actually compiling anything. Let’s start by installing dkms and the headers for your target kernel.

# apt-get install dkms linux-headers

Thanks to DKMS we can build a new, updated bcache module with the small changes needed to support partitions and use it with your preferred packages and stable kernel. The process was pleasantly simple. Since bcache is an in-tree linux kernel module we need to start by grabbing the source for your running kernel. On debian based systems this can be done with the following command:

# apt-get install linux-source

Now you will see the source tarball for your particular kernel in /usr/src/. We need to extract that. After extraction we need to copy the bcache source files to a new directory for dkms to use. The version number here is made up, in this case you should be able to use anything. I chose the kernel version I was working with.

# cd /usr/src
# tar xvf linux-source-3.16.tar.xz
# mkdir bcache-3.16
# cp -av linux-source-3.16/drivers/md/bcache/* bcache-3.16/

At this point we have copied all we need out of the kernel source tree for this particular module. If you are adapting these instructions for a different module then you may need additional files from other locations. Now that we have the source can go ahead and make our changes to the code. Here is a patch of the two lines I have changed. Like I said, very simple code change here.

# diff -up a/super.c b/super.c
--- a/super.c   2016-03-31 21:03:25.189901913 +0000
+++ b/super.c   2016-03-31 21:03:08.205513288 +0000
@@ -780,9 +780,10 @@ static int bcache_device_init(struct bca
        minor = ida_simple_get(&bcache_minor, 0, MINORMASK + 16, GFP_KERNEL);
        if (minor < 0)
                return minor;
+        minor = minor * 16;
        if (!(d->bio_split = bioset_create(4, offsetof(struct bbio, bio))) ||
-           !(d->disk = alloc_disk(1))) {
+           !(d->disk = alloc_disk(16))) {
                ida_simple_remove(&bcache_minor, minor);
                return -ENOMEM;

Now we need to create a dkms.conf file and use dkms to build and install our new module. Finally, we update our initramfs to pull in the new module on boot.

# cat << EOF > bcache-3.16/dkms.conf
# dkms add -m bcache -v 3.16
# dkms build -m bcache -v 3.16
# dkms install -m bcache -v 3.16
# update-initrafs -u -k all

And there you have it. We are now using our version of bcache with slight twist without a custom kernel! Here are the fruits of our labor, bcache0p1 :

# ls -lh /dev/bcache*
brw-rw---- 1 root disk 254,  0 Mar 31 20:17 /dev/bcache0
brw-rw---- 1 root disk 254,  1 Mar 31 20:17 /dev/bcache0p1
brw-rw---- 1 root disk 254, 16 Mar 31 20:17 /dev/bcache16
brw-rw---- 1 root disk 254, 17 Mar 31 20:17 /dev/bcache16p1
brw-rw---- 1 root disk 254, 32 Mar 31 20:17 /dev/bcache32
brw-rw---- 1 root disk 254, 33 Mar 31 20:17 /dev/bcache32p1

OpenStack Neutron, OpenVSwitch, and Jumbo frames

MTU has always been a touchy subject in Neutron. Who manages it? Should instances have info about the underlying infrastructure? Should this all be on the operator to configure properly? Luckily these questions appear to have been answered for the most part in the Mitaka release of OpenStack. This bug for OpenVSwitch (and this one for linuxbridge) more or less solve this issue for us.

Now we can configure MTU in Neutron and Neutron will be intelligent about how to use it. My end goal is to have an infrastructure with 9000 mtu, while the instances themselves can live with the standard 1500 mtu. To achieve that I did have to pull in one patch early, though it looks like it will make it into Neutrons mitaka-rc2 release. The patch applies global_physnet_mtu value to br-int and br-tun so the operator doesn’t have to. Beyond that, it was all just a matter of Neutron config options, which is fantastic!

Here are the changes I had to make to properly get Neutron using my larger 9000 MTU without my intervention.

# /etc/neutron/neutron.conf
global_physnet_mtu = 9000

# /etc/neutron/plugins/ml2/ml2_conf.ini
physical_network_mtus = physnet1:9000
# The default value for path_mtu is 1500, if you want your instances to have
# larger mtus you should adjust this to <= global_physnet_mtu
path_mtu = 1500

That was it! My interfaces we’re properly configured.

160: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN group default 
    link/ether a0:36:9f:67:32:c6 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::a236:9fff:fe67:32c6/64 scope link tentative 
       valid_lft forever preferred_lft forever
161: br-int: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN group default 
    link/ether 66:fd:95:70:37:4a brd ff:ff:ff:ff:ff:ff
    inet6 fe80::64fd:95ff:fe70:374a/64 scope link tentative 
       valid_lft forever preferred_lft forever
162: br-tun: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN group default 
    link/ether 76:37:eb:d7:3b:48 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::7437:ebff:fed7:3b48/64 scope link tentative 
       valid_lft forever preferred_lft forever

A shoutout to Sean Collins (sc68cal) for fixing the br-int/br-tun issue and Armando Migliaccio (armax) for targeting the bug for inclusion in mitaka-rc2 if all goes well!

Glance, Ceph, and raw images

If you are a user of Glance and Ceph right now, you know that using raw images is a requirement for Copy-on-Write cloning (CoW) to the nova or cinder pools. Without that there is a thick conversion that must occur which wastes time, bandwidth, and IO. The problem is raw images can take up quite a bit of space even if its _all_ zeros because raw images cannot be sparse images. This becomes even more visible when doing a non-CoW clone from Cinder or Nova to Glance whose volumes can be quite large.

Enter fstrim. For the uninitiated, OpenStack and fstrim.

Basically the idea is we want to tell that image to drop all of the unused space and thus reclaim that space on the Ceph cluster itself. Please note, this only works on images that do not have any CoW snapshots on them. So if you do have anything booted/cloned from that image you’ll need to flatten or remove those instances/volumes.

We will start with a “small” 80GiB raw image and upload that to Glance.

# ls -l 
total 82416868
-rw-r--r-- 1 root root 85899345920 Mar 21 15:34 ubuntu-trusty-80GB.raw
# openstack image create --disk-format raw --container-format bare --file ubuntu-trusty-80GB.raw ubuntu-trusty-80GB
| Field            | Value                                                                                                    |
| checksum         | 9ca30159fcb4bb48fbdf876493d11677                                                                         |
| container_format | bare                                                                                                     |
| created_at       | 2016-03-21T15:38:50Z                                                                                     |
| disk_format      | raw                                                                                                      |
| file             | /v2/images/bb371f84-2a61-47a0-ab22-f4dbf8467070/file                                                     |
| id               | bb371f84-2a61-47a0-ab22-f4dbf8467070                                                                     |
| min_disk         | 0                                                                                                        |
| min_ram          | 0                                                                                                        |
| name             | ubuntu-trusty-80GB                                                                                       |
| owner            | 762565b94f314ec6b370d978db902a78                                                                         |
| properties       | direct_url='rbd://20d283ba-5a51-49b0-9be7-9220bcc9afd0/glance/bb371f84-2a61-47a0-ab22-f4dbf8467070/snap' |
| protected        | False                                                                                                    |
| schema           | /v2/schemas/image                                                                                        |
| size             | 85899345920                                                                                              |
| status           | active                                                                                                   |
| tags             |                                                                                                          |
| updated_at       | 2016-03-21T15:53:54Z                                                                                     |
| virtual_size     | None                                                                                                     |
| visibility       | private                                                                                                  |

NOTE: If you are running with a cache tier you will need to evict the cache to get the proper output from the command below. It is not a requirement to evict the cache to make this work, just if you want valid output from the command below.

rados -p glance-cache cache-flush-evict-all

Now if we look at Ceph we should see it consuming 80GiB of space as well.

# rbd -p glance info bb371f84-2a61-47a0-ab22-f4dbf8467070
rbd image 'bb371f84-2a61-47a0-ab22-f4dbf8467070':
 size 81920 MB in 10240 objects
 order 23 (8192 kB objects)
 block_name_prefix: rbd_data.10661586915
 format: 2
 features: layering, striping
 stripe unit: 8192 kB
 stripe count: 1
# rados -p glance ls | grep rbd_data.10661586915 | wc -l

Sure enough, 10240 * 8MiB per object equals 80GiB. So Ceph is clearly using 80GiB of data on this one image, but it doesn’t have to be!

First step is to remove the snapshot associated with the glance image. This is the reason you cannot have any CoW clones based on this image.

# rbd -p glance snap unprotect bb371f84-2a61-47a0-ab22-f4dbf8467070@snap                                                                                                                                                                                                                    
# rbd -p glance snap rm bb371f84-2a61-47a0-ab22-f4dbf8467070@snap

The process to reduce this usage is to map the rbd to a linux host and mount the filesystem. At that point we can run fstrim on the filesystem(s) and tell Ceph it is ok to free up some space. Finally, we need to re-snapshot it so that moving forward Glance is CoW cloning the sparse rbd. Those steps are as follows:

# rbd -p glance snap unprotect bb371f84-2a61-47a0-ab22-f4dbf8467070@snap                                                                                                                                                                                                                    
# rbd -p glance snap rm bb371f84-2a61-47a0-ab22-f4dbf8467070@snap                                                                                                                                                                                                                           
# rbd -p glance map bb371f84-2a61-47a0-ab22-f4dbf8467070
# mount /dev/rbd0p1 /mnt/
# df -h /mnt/
Filesystem      Size  Used Avail Use% Mounted on
/dev/rbd0p1      79G  780M   74G   2% /mnt
# fstrim /mnt/
# umount /mnt/
# /usr/bin/rbd unmap /dev/rbd0

If you are running a cache tier now would be the time to evict your cache again.

# rbd -p glance info bb371f84-2a61-47a0-ab22-f4dbf8467070                                                                                                                                                                                                                 
rbd image 'bb371f84-2a61-47a0-ab22-f4dbf8467070':
        size 81920 MB in 10240 objects
        order 23 (8192 kB objects)
        block_name_prefix: rbd_data.10661586915
        format: 2
        features: layering, striping
        stripe unit: 8192 kB
        stripe count: 1
# rados -p glance ls | grep rbd_data.10661586915 | wc -l

And there it is! Now we are only using 895 * 8MiB objects (~7GiB). That is a 90% reduction in usage. Now why are we using 7GiB and not 780M as the df output shows? fstrim is a quick tool, its not a perfect tool. If you want even more efficiency you can use zerofree or write out a large file full of zeros on the filesystem and delete it before running fstrim. That will further reduce the size of this RBD.

Finally, to make this image usable to glance again you need to recreate and protect the snapshot we removed previously.

# rbd -p glance snap create bb371f84-2a61-47a0-ab22-f4dbf8467070@snap
# rbd -p glance snap protect bb371f84-2a61-47a0-ab22-f4dbf8467070@snap

All in all, this can be done fairly quickly and reduces usage a great deal in some cases. If you have glance and ceph and large images, this may be something to consider doing.

Deploying OpenStack Mitaka with Kolla, Docker, and Ansible

It’s true, Mitaka is not quite released yet. That said, these instructions haven’t changed since Liberty and will stay relevant once Mitaka is officially tagged.

The requirements and steps to build Kolla images are provided at Those have already been done and the Docker images exist in my private registry.

A bit about my environment before we begin.

3 identical custom servers with the following specs:

These servers are interconnected at 10Gb using a Netgear XS708E switch. I have one 10Gb interface (eth3) dedicated to VM traffic for Neutron. The other is in a bond (bond0) with one of my 1Gb nics for HA.

I will be deploying ceph, haproxy, keepalived, rabbitmq, mariadb w/ galera, and memcached along side the other OpenStack services with Kolla. To start, we need to do some prep work to the physical disks for Kolla to pick up the disks in the ceph bootstrap process. This would also be the same procedure needed to add new disks in the future.

The disks I will be using are /dev/sde, /dev/sdf, /dev/sdg with external journals on my pcie ssd located at /etc/nvme0n1. I will also be setting up an OSD for using as a cache tier with ceph on the ssd as well.

In order for the bootstrap process to tie the appropriate devices together we use GPT partition names to do this. For /dev/sde I create an fresh partition table with a new partition labeled KOLLA_CEPH_OSD_BOOTSTRAP_1. This explicit naming process is so Kolla never, ever messes with a disk it shouldn’t be.

# parted /dev/sde -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_1 1 -1
root@ubuntu1:~# parted /dev/sde print
Model: ATA ST4000DM000-1F21 (scsi)
Disk /dev/sde: 4001GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:

Number Start End Size File system Name Flags
1 1049kB 4001GB 4001GB btrfs KOLLA_CEPH_OSD_BOOTSTRAP_1

The same method is used for /dev/sdf and /dev/sdg, but with labels KOLLA_CEPH_OSD_BOOTSTRAP_2 and KOLLA_CEPH_OSD_BOOTSTRAP_3 respectively. Now we have to setup the external journals for each of those OSDs (you can co-locate the journals as well by using the label KOLLA_CEPH_OSD_BOOTSTRAP).

The external journal labels are simply the bootstrap label with ‘_J’ appended. For example, the journal for /dev/sde would be KOLLA_CEPH_OSD_BOOTSTRAP_1_J. Once those labels are in place, the Kolla bootstrap process with happily setup ceph on those disks. If you mess up any of the labels all that will happen is the Kolla bootstrap won’t pick up on those disks and you can rerun the playbooks after correcting the issue.

The final look of my disks with the cache tier osd and journals is as follows:

Model: ATA ST4000DM000-1F21 (scsi)
Disk /dev/sde: 4001GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags: 

Number Start End Size File system Name Flags
 1 1049kB 4001GB 4001GB btrfs KOLLA_CEPH_OSD_BOOTSTRAP_1

Model: ATA ST4000DM000-1F21 (scsi)
Disk /dev/sdf: 4001GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags: 

Number Start End Size File system Name Flags
 1 1049kB 4001GB 4001GB btrfs KOLLA_CEPH_OSD_BOOTSTRAP_2

Model: ATA ST4000DM000-1F21 (scsi)
Disk /dev/sdg: 4001GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags: 

Number Start End Size File system Name Flags
 1 1049kB 4001GB 4001GB btrfs KOLLA_CEPH_OSD_BOOTSTRAP_3

Model: Unknown (unknown)
Disk /dev/nvme0n1: 400GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number Start End Size File system Name Flags
 1 1049kB 100GB 100GB btrfs docker

Next up is configuring my inventory. Normally, you won’t need to configure more than the first 4 sections of your inventory if you have copied it from ansible/inventory/multinode. And that is all I have changed in this case as well. My inventory for my three hosts, ubuntu1 ubuntu2 and ubuntu3, is as follows:

# /etc/kolla/inventory





Once that was finished I modified my globals.yml for my environment. The final result is below, all options that I configured (with comment sections removed for brevity).

config_strategy: "COPY_ALWAYS"
kolla_base_distro: "ubuntu"
kolla_install_type: "source"
kolla_internal_vip_address: ""
kolla_internal_fqdn: ""
kolla_external_vip_address: ""
kolla_external_fqdn: ""
kolla_external_vip_interface: "bond0.10"
kolla_enable_tls_external: "yes"
kolla_external_fqdn_cert: "/etc/kolla/haproxy.pem"
docker_registry: ""
network_interface: "bond0.10"
tunnel_interface: "bond0.200"
neutron_external_interface: "eth3"
openstack_logging_debug: "True"
enable_ceph: "yes"
enable_cinder: "yes"
ceph_enable_cache: "yes"
enable_ceph_rgw: "yes"
ceph_osd_filesystem: "btrfs"
ceph_osd_mount_options: "defaults,compress=lzo,noatime"
ceph_cinder_pool_name: "cinder"
ceph_cinder_backup_pool_name: "cinder-backup"
ceph_glance_pool_name: "glance"
ceph_nova_pool_name: "nova"

And finally, the /etc/kolla/passwords.yml file. This contains, you guessed it, passwords. At the time of this writing it has very bad defaults of “password” as the password. By the time of the Mitaka release this patch will have merged and you be able to run kolla-genpwd to populate this file for you with the random passwords and uuids.

Once all of that was completed I run the pull playbooks to fetch all of the proper images to the proper hosts with the following command:

# time kolla-ansible -i /etc/kolla/inventory pull
Pulling Docker images : ansible-playbook -i /etc/kolla/inventory -e @/etc/kolla/globals.yml -e @/etc/kolla/passwords.yml -e action=pull /root/kolla/ansible/site.yml 

PLAY [ceph-mon;ceph-osd;ceph-rgw] ********************************************* 

GATHERING FACTS *************************************************************** 
ok: [ubuntu1]
ok: [ubuntu2]
ok: [ubuntu3]

TASK: [common | Pulling kolla-toolbox image] ********************************** 
changed: [ubuntu1]
changed: [ubuntu3]
changed: [ubuntu2]

PLAY RECAP ******************************************************************** 
ubuntu1 : ok=55 changed=36 unreachable=0 failed=0 
ubuntu2 : ok=55 changed=36 unreachable=0 failed=0 
ubuntu3 : ok=55 changed=36 unreachable=0 failed=0 

real 5m2.662s
user 0m8.068s
sys 0m2.780s

After the images were pulled I ran the actual OpenStack deployment where the magic happens. After this point it was all automated (including all the ceph cache tier and galera clustering) and I didn’t have to touch a thing!

# time ~/kolla/tools/kolla-ansible -i /etc/kolla/inventory deploy
Deploying Playbooks : ansible-playbook -i /etc/kolla/inventory -e @/etc/kolla/globals.yml -e @/etc/kolla/passwords.yml -e action=deploy /root/kolla/ansible/site.yml 

PLAY [ceph-mon;ceph-osd;ceph-rgw] ********************************************* 

GATHERING FACTS *************************************************************** 
ok: [ubuntu1]
ok: [ubuntu3]
ok: [ubuntu2]

TASK: [common | Ensuring config directories exist] **************************** 
changed: [ubuntu1] => (item=heka)
changed: [ubuntu2] => (item=heka)
changed: [ubuntu3] => (item=heka)
changed: [ubuntu1] => (item=cron)
changed: [ubuntu2] => (item=cron)
changed: [ubuntu3] => (item=cron)
changed: [ubuntu1] => (item=cron/logrotate)
changed: [ubuntu2] => (item=cron/logrotate)
changed: [ubuntu3] => (item=cron/logrotate)

PLAY RECAP ******************************************************************** 
ubuntu1 : ok=344 changed=146 unreachable=0 failed=0 
ubuntu2 : ok=341 changed=144 unreachable=0 failed=0 
ubuntu3 : ok=341 changed=143 unreachable=0 failed=0 

real 7m32.476s
user 0m48.436s
sys 0m9.584s

And thats it! OpenStack is deployed and good to go. In my case, I could access horizon at the with full ssl setup thanks to the haproxy.pem I supplied. With a 7 and a half minute run-time, it is hard to beat the speed of this deployment tool.

Bonus: Docker running containers on ubuntu1 host

# docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
b3a0400b2502 "kolla_start" 8 minutes ago Up 8 minutes horizon
bded510db134 "kolla_start" 8 minutes ago Up 8 minutes heat_engine
388f4c6c1cd3 "kolla_start" 8 minutes ago Up 8 minutes heat_api_cfn
6d73a6aba1e5 "kolla_start" 8 minutes ago Up 8 minutes heat_api
8648565cdc50 "kolla_start" 8 minutes ago Up 8 minutes cinder_backup
73cb05710d46 "kolla_start" 8 minutes ago Up 8 minutes cinder_volume
92c4b7890bb7 "kolla_start" 8 minutes ago Up 8 minutes cinder_scheduler
fec67e07a216 "kolla_start" 8 minutes ago Up 8 minutes cinder_api
d22abb2f75fb "kolla_start" 9 minutes ago Up 9 minutes neutron_metadata_agent
12cd372d0804 "kolla_start" 9 minutes ago Up 9 minutes neutron_l3_agent
6ada0dd5eff6 "kolla_start" 9 minutes ago Up 9 minutes neutron_dhcp_agent
cd89ac90384a "kolla_start" 9 minutes ago Up 9 minutes neutron_openvswitch_agent
4eac98222be5 "kolla_start" 9 minutes ago Up 9 minutes neutron_server
1f44c676f39d "kolla_start" 9 minutes ago Up 9 minutes openvswitch_vswitchd
609adb430b0f "kolla_start" 9 minutes ago Up 9 minutes openvswitch_db
96881dbecf8a "kolla_start" 9 minutes ago Up 9 minutes nova_compute
9c3d58d59f3d "kolla_start" 9 minutes ago Up 9 minutes nova_libvirt
ab09c12c0d4d "kolla_start" 10 minutes ago Up 10 minutes nova_conductor
0d381b7f3757 "kolla_start" 10 minutes ago Up 10 minutes nova_scheduler
58bc728e30ef "kolla_start" 10 minutes ago Up 10 minutes nova_novncproxy
c49c7703bbf0 "kolla_start" 10 minutes ago Up 10 minutes nova_consoleauth
799b7da9fac3 "kolla_start" 10 minutes ago Up 10 minutes nova_api
fd367be42634 "kolla_start" 10 minutes ago Up 10 minutes glance_api
34c69911d5bc "kolla_start" 10 minutes ago Up 10 minutes glance_registry
6adc4580aab3 "kolla_start" 11 minutes ago Up 11 minutes keystone
38e57a6b8405 "kolla_start" 11 minutes ago Up 11 minutes rabbitmq
4e5662f74414 "kolla_start" 12 minutes ago Up 12 minutes mariadb
52d766774cab "kolla_start" 13 minutes ago Up 13 minutes memcached
02c793ecff9f "kolla_start" 13 minutes ago Up 13 minutes keepalived
feaeb72eaca5 "kolla_start" 13 minutes ago Up 13 minutes haproxy
806c4d9f9db8 "kolla_start" 13 minutes ago Up 13 minutes ceph_rgw
fe1ddb781fef "kolla_start" 13 minutes ago Up 13 minutes ceph_osd_9
02d64b83b197 "kolla_start" 13 minutes ago Up 13 minutes ceph_osd_7
82d705e92421 "kolla_start" 13 minutes ago Up 13 minutes ceph_osd_5
dea36b30c249 "kolla_start" 13 minutes ago Up 13 minutes ceph_osd_1
c7c65ad2f377 "kolla_start" 15 minutes ago Up 15 minutes ceph_mon
407bcb0a393f "kolla_start" 15 minutes ago Up 15 minutes cron
b696b905ac23 "/bin/sleep infinity" 15 minutes ago Up 15 minutes kolla_toolbox
ceca142fb3be "kolla_start" 15 minutes ago Up 15 minutes heka