Apple HomeKit Across Subnets with Catalyst 3650

Introduction

I’ve been working on a home automation project that involves a lot of Apple HomeKit accessories, and thought I would share what I learned about the HomeKit protocol, and specifically how I set it up to work across subnet boundaries.

Let’s start with a few basics about Apple HomeKit. HomeKit is Apple’s solution in the home automation market. In a nutshell, it allows you to add a number of different home automation related accessories to your home network via WiFi, and then control those accessories with your iOS devices and Apple Watch. If you want to communicate with your accessories while you are off your home network, you will need either an Apple TV or an always on iPad. Either device will act as a HomeKit hub, and be the “middle man” between you and your devices when you are off network.

The Problem

HomeKit was designed to be super easy to use, and as such, it was designed to work in networks where the HomeKit accessories, hub (Apple TV/iPad), and clients (iOS devices) are all on the same network. Let’s dig into that a little, so that we can understand why it won’t work across subnet boundaries by default

HomeKit utilizes something called Bonjour for service discovery. Bonjour is Apple’s implementation of multicast DNS (mDNS). This is a fancy way of saying that Bonjour/mDNS is the mechanism that clients use to discover the HomeKit accessories on the network. Without going into too much detail, basically what happens is this:

  • Accessory comes online and gets an IP address via DHCP
  • Accessory starts it’s service and advertises itself on the network by publishing several different mDNS records
  • Client devices “see” the advertisements, and do a few different mDNS query’s to resolve the IP address of the accessory
  • Client then communicates via IP directly with the accessory

A fantastic article outlining details with a great example is here

So, what’s the problem? The problem is that these mDNS advertisements and subsequent queries use the destination multicast address 224.0.0.251, utilizing UDP port 5353. Let’s say you want your iOS devices on one VLAN, and you want your HomeKit accessories on another VLAN. Even if you enable multicast routing between subnets, this won’t work. Why? Because 224.0.0.251 is a link-local multicast address. This means the multicast packets destined to that address cannot be routed, because the TTL is set to 1. This means if your clients are on a different subnet than your accessories, the client will never see the mDNS advertisements, and also will not be able to do mDNS queries. The end result is that the client can’t “see” any devices.

My Goal & Network Environment

Here is what I wanted to achieve

  • Clients join one SSID that is tied to VLAN 10 and subnet 10.1.10.0/24
  • Apple TV is also on VLAN 10
  • HomeKit accessories join a different SSID specifically for home automation that is tied to VLAN 11 and subnet 10.1.11.0/24

In my network, I have 5 Cisco 3702i light weight access points, and a Cisco virtual WLC managing them. Additionally, all the access points run in FlexConnect mode and are doing FlexConnect local switching for all WLANs. On the wired side, I am dealing with a Cisco Catalyst 3650 switch

Solution

What we need to solve this problem is a way to get the mDNS advertisements and query’s across subnets. A solution for this is known as mDNS gateway

First, I looked at solving this with the WLC. The WLC does indeed implement a nice mDNS gateway. However, I didn’t have much luck with it when using FlexConnect local switched WLANs. Fortunately, the Catalyst 3650 also implements an mDNS gatway. In a nutshell, the switch will act as an mDNS gateway by proxying mDNS packets between subnets. So, in my situation, if there is a HomeKit accessory on VLAN 11, and it sends out an mDNS announcement to advertise it’s service, that announcement will be “absorbed” by the 3650 switch, and then sent back out to VLAN 10, this time sourced from the switch VLAN 10 SVI interface. The same idea will work the other way around from VLAN 10 to 11.

Here is the basic configuration I implemented on the 3650. First, we create something called a service-list. A service-list is like an ACL, and we use it to permit or deny certain types of mDNS traffic. We need this because when we enable the mDNS gateway, everything is denied by default. Next, we enable the mDNS gateway functionality. This can be done globally, or at the interface specific level. I started with the global configuration, but then migrated to interface specific once I got the hang of it, so that only the specific messages I need are permitted to only the subnets I need.

service-list mdns-sd HomeKit permit 10
 match message-type announcement
 match service-type _hap._tcp.local
!
service-list mdns-sd HomeKit permit 20
 match message-type announcement
 match service-type _homekit._tcp.local
!
service-list mdns-sd HomeKit permit 30
 match message-type query
!
service-list mdns-sd permit-all permit 10
!
service-list mdns-sd HomeKit query
 service-type _hap._tcp.local
 service-type _homekit._tcp.local
!
service-routing mdns-sd
!
interface Vlan10
 description iOS Client VLAN
 ip address 10.1.10.1 255.255.255.0
 service-routing mdns-sd
  service-policy-query HomeKit 100
  service-policy HomeKit IN
  service-policy permit-all OUT
  redistribute mdns-sd
!
interface Vlan11
 description HomeKit Accessory VLAN
 ip address 10.1.11.1 255.255.255.0
 ip access-group ACL-VL11-IN in
 ip access-group ACL-VL11-OUT out
 service-routing mdns-sd
  service-policy-query HomeKit 100
  service-policy HomeKit IN
  service-policy permit-all OUT
  redistribute mdns-sd

OK, let’s break down some of these commands, starting with the service-list called HomeKit. HomeKit uses two mDNS service-types that I could find. Accessories utilize _hap._tcp.local (HomeKit Accessory Protocol) and the Apple TV seems to utilize _homekit._tcp.local. So, in line 10 of the list, we permit any mDNS advertisements that are the HAP service-type. Similarly, in line 20, we permit any mDNS advertisements using the HomeKit service type. Finally, in line 30 we permit any mDNS queries. This service-list will be used for our inbound filter on both VLAN 10 and 11.

Next, we have another service-list permit-all, which as you might guess, just permits everything. This will be used as our outbound filter on both interfaces. Remember, by default, the switch drops everything. After that we have service-list mdns-sd HomeKit query. A service-list of type query is used to define what specific mDNS services you want to regularly query for from the switch. It is called later on in the interface level configuration. When a device advertises itself via mDNS, the switch hears that advertisement, and caches it. When a device on another subnet does an mDNS query, the switch can then respond with information it has in the mDNS cache. Problem is, some devices are not that chatty. They might only advertise themselves on the network when they first start up. The service-list mdns-sd query command allows us to proactively query for services every x seconds, so that we keep the cache full at all times. In my case, I found that most of the mDNS record TTL’s for my HomeKit accessories are 120 seconds, so I set an active query for every 100 seconds to make sure the cache is always full. My configuration is actually understandably confusing at this point, because I named both the service-list and the service-list query “HomeKit”, but they are different things. In retrospect, I should rename the service-list query to something like HomeKit-Query, but it works right now, and I’m lazy.

Next, we have service-routing mdns-sd global command, which enables the mDNS gateway feature. Even if you are doing the interface specific configuration, you need this enabled first.

Finally, we have the interface level configurations. On each interface, we have service-routing mdns-sd which drops us into a different prompt and allow us to configure the per interface mDNS features. Next, we have service-policy-query HomeKit 100. This is calling our service policy of type query we created before. Effectively, this tells the switch “Every 100 seconds, send an mDNS query out to see if there are any HomeKit or HAP devices.” Next, we apply our HomeKit service-list inbound, and we apply out permit-all outbound.

Verifications

Let’s do a little verification. The first command we want to look at is show mdns service-types. This will show us any mDNS service-type advertisements the switch has heard about. As you can see, I learned about service type _homekit._tcp.local from the Apple TV on VLAN 10 and _hap._tcp.local from some accessories on VLAN 11. the 4500/4424 and 120/44 are the TTL and TTL remaining respectively.

Clapton#sh mdns service-types | i hap|homekit
_homekit._tcp.local 4500/4424 Vl10
_hap._tcp.local 120/44 Vl11

Next, let’s have a look at the actual mDNS cache. The service-lists we applied inbound and outbound determine what information is allowed the actual cache, and what information is relayed out of the cache across subnets. My actual cache is quite busy looking at the moment, because I have over 20 different HomeKit accessories in there, each with multiple mDNS records. So, for purposes of example and the blog, let’s just look at one device. We’ll take a look at one of my Leviton smart dimmer switches here, and dig into the records so that we can understand the mDNS picture a little better

Clapton#sh mdns cache | i 0007.a60c.8d16
_hap._tcp.local PTR IN 120/88 3 Vl11 0007.a60c.8d16 Leviton Dimmer-0C8D16._hap._tcp.local
Leviton Dimmer-0C8D16._hap._tcp.local SRV IN 120/88 2 Vl11 0007.a60c.8d16 0 0 80 WICED-hap-0C8D16.local
WICED-hap-0C8D16.local A IN 120/88 1 Vl11 0007.a60c.8d16 10.1.11.157
Leviton Dimmer-0C8D16._hap._tcp.local TXT IN 120/88 2 Vl11 0007.a60c.8d16 (90)'id=5E:5A:C8:E1:04:FC''md=Leviton Dimmer-0C8D16''pv=1.1''s#=1''c#=1''sf=0''f~'~

OK, so the first thing we see here in the cache is that this particular smart device actually goes ahead and registers 4 different mDNS records. We have a service record (SRV), a PTR record, a TXT record, and finally an A record. Let’s dig into each

Let’s start with the SRV record, because this is sort of what I like to think of as the “main” record, and everything kind of comes from this. The service record is the name of the actual service instance, and it’s job is to map that service instance name to information needed by the client when the client wants to actually utilize that service. In this example, the SRV record is for “Leviton Dimmer-0C8D16._hap._tcp.local”, the name of the service instance on the device. Let’s see what this record looks like

Clapton#sh mdns cache name "Leviton Dimmer-0C8D16._hap._tcp.local" type SRV detail
mDNS CACHE
Name : Leviton Dimmer-0C8D16._hap._tcp.local
Type : SRV
Class : IN
TTL/Remaining : 120/26
Accessed : 10
Interface : Vlan11
MAC Address : 0007.a60c.8d16
Record Data : 0 0 80 WICED-hap-0C8D16.local
Access Type : Wireline

Take a look at the record data. Basically, what this SRV record tells us is “map the service instance name Leviton Dimmer-0CD16._hap._tcp.local to the hostname WICED-hap-OC8D16.local, and by the way, you can hit this service on TCP port 80”. So again, we are mapping the name of a service to hostname/port number. When you add accessories into HomeKit on your iOS device, the device likely remembers the service instance name of the accessory. The service name is how the accessory is known on the network. From the SRV record, the client can gather that they need to open a TCP connection on port 80 directly with WICED-hp-OC8D16.local in this case.

So, how do we then resolve the IP address of WICED-hap-OC8D16.local? That is where the previously seen A record comes into play. The A record will map a hostname to an IP address, just like in regular old DNS. Let’s look and see

Clapton#sh mdns cache name WICED-hap-0C8D16.local detail
mDNS CACHE
Name : WICED-hap-0C8D16.local
Type : A
Class : IN
TTL/Remaining : 120/108
Accessed : 1
Interface : Vlan11
MAC Address : 0007.a60c.8d16
Record Data : 10.1.11.157
Access Type : Wireline

There we have it. The hostname resolves to 10.1.11.157 in our case. To reiterate, when this accessory comes online, it registers the SRV record and announces it out on the network via mDNS using multicast. In our case, this happens on VLAN 11. The 3650 switch allows this service-type message (because of our service-list), and thus adds it to the cache, and relays it down to VLAN 10 where the clients are. At that point, the client knows the hostname and protocol/port it needs to connect to. Next, it does an mDNS query to resolve the hostname to an IP address via the A record. At that point, it has everything it needs.

So, we are in good shape. What about the other two records though? Let’s see the PTR record next. The PTR record is a pointer record. It’s job is to map the name of a type of service to a specific instance of that service. So, in other words, the record says “I have a service here called Leviton….and it’s of the service-type _hap._tcp.local. Every accessory publishes a PTR record with the same name but different Record Data. Why is this useful? This is useful so that we can query for a specific service-type, and get a list of all the devices on the network of that service type. For example, a device could do an mDNS query for PTR records with the name _hap._tcp.local, and get a list back of every HomeKit accessory on the network. The Record Data then gives it the SRV record name, and we already say how from that information, the device could find out the hostname, port/protocol of the accessory, and ultimately the IP address via an A record. Here is what the one PTR record looks like from the same switch we’ve been dealing with. The command you need is sh mdns cache name _hap._tcp.local type PTR detail

Name               : _hap._tcp.local
Type : PTR
Class : IN
TTL/Remaining : 120/57
Accessed : 13
Interface : Vlan11
MAC Address : 0007.a60c.8d16
Record Data : Leviton Dimmer-0C8D16._hap._tcp.local
Access Type : Wireline

Finally, we have the TXT record. The TXT record isn’t really used for anything magical. It’s basically just used to store any extra information the manufacturer might want us to know about the service instance. It may even be empty. Let’s look at the TXT record for the same smart dimmer switch we’ve been playing with here so far.

Clapton#sh mdns cache name "Leviton Dimmer-0C8D16._hap._tcp.local" type TXT detail
mDNS CACHE
Name : Leviton Dimmer-0C8D16._hap._tcp.local
Type : TXT
Class : IN
TTL/Remaining : 120/71
Accessed : 2
Interface : Vlan11
MAC Address : 0007.a60c.8d16
Record Data : (90)'id=5E:5A:C8:E1:04:FC''md=Leviton Dimmer-0C8D16''pv=1.1''s#=1''c#=1''sf=0''f~'~
Access Type : Wireline

So, obviously, we have the MAC address of the device in there, and some other information that is probably manufacturer specific.

What I found is that as long as you can keep the mDNS cache full at all times, you won’t run into any problems with devices going into the dreaded “not responding” unless you have some sort of other networking problems. The service-type query we configured every 100 seconds (called an active query) is really key to making sure this happens.

To see some more information about your mDNS deployment on the 3650 switch, you can also check out show mdns statistics and play around with the options.

Bonus Round: Security

That about does it for the basic setup. You might have noticed in the configuration that I also applied inbound and outbound ACLs on VLAN 11, because what fun would it be if you didn’t try to break it by allowing ONLY the necessary traffic to and from your automation network. I didn’t get too overly hardcore here, because these devices *do* need access to the internet. For example, the smart dimmers and smart switches can do over the air firmware updates. My thermostat needs constant access to a server at Honeywell. Sure, you could probably do packet sniffs and lock it down further to specific internet IPs, or something more sophisticated if you have something upstream that can do URL filtering and such. For me, I kept it kind of simple in that regard. Here is what I came up with.

In the inbound direction (packets coming IN to VLAN 11 SVI), we have the following:

  • Permit ICMP from the accessories
  • Permit mDNS from the accessories. We NEED this for the switch to “hear” and cache the messages, similar to how we might allow for EIGRP or OSPF
  • Permit return HTTP traffic from the accessories back to the clients
  • Permit traffic from the accessories to Apple TV on TCP 49155. This was the port published in the Apple TV HomeKit SRV record.
  • Deny any other traffic from the accessories back to rfc1918 space
  • Allow DNS traffic from the accessories to Comcast and Google DNS
  • Allow NTP, DHCP, HTTP and HTTPS
  • Deny and log everything else

You should notice that the outbound direction is very similar, just mirrored.

Clapton#show access-list ACL-VL11-IN
Extended IP access list ACL-VL11-IN
5 permit icmp 10.1.11.0 0.0.0.255 any (5 matches)
10 permit udp 10.1.11.0 0.0.0.255 host 224.0.0.251 eq 5353 (77953 matches)
20 permit tcp 10.1.11.0 0.0.0.255 eq www 10.1.10.0 0.0.0.255
40 permit tcp 10.1.11.0 0.0.0.255 10.1.10.0 0.0.0.255 eq 49155
50 deny ip 10.1.11.0 0.0.0.255 10.0.0.0 0.0.0.255 log
60 deny ip 10.1.11.0 0.0.0.255 172.16.0.0 0.15.255.255 log
70 deny ip 10.1.11.0 0.0.0.255 192.168.0.0 0.0.255.255 log
80 permit udp 10.1.11.0 0.0.0.255 host 75.75.75.75 eq domain
90 permit udp 10.1.11.0 0.0.0.255 host 75.75.76.76 eq domain
100 permit udp 10.1.11.0 0.0.0.255 host 8.8.8.8 eq domain
110 permit udp 10.1.11.0 0.0.0.255 any eq ntp
120 permit udp any eq bootpc any eq bootps
130 permit tcp 10.1.11.0 0.0.0.255 any eq www
140 permit tcp 10.1.11.0 0.0.0.255 any eq 443
150 deny tcp any any log
160 deny udp any any log (5572 matches)

Clapton#show access-list ACL-VL11-OUT
Extended IP access list ACL-VL11-OUT
5 permit icmp any 10.1.11.0 0.0.0.255
10 permit udp host 10.1.11.1 host 224.0.0.251 eq 5353
20 permit tcp 10.1.10.0 0.0.0.255 10.1.11.0 0.0.0.255 eq www
25 permit tcp 10.1.10.0 0.0.0.255 eq 49155 10.1.11.0 0.0.0.255
30 deny ip 10.0.0.0 0.255.255.255 10.1.11.0 0.0.0.255 log (2111 matches)
40 deny ip 172.16.0.0 0.15.255.255 10.1.11.0 0.0.0.255 log
50 deny ip 192.168.0.0 0.0.255.255 10.1.11.0 0.0.0.255 log
60 permit udp any eq domain 10.1.11.0 0.0.0.255
70 permit udp any eq ntp 10.1.11.0 0.0.0.255
80 permit tcp any eq www 10.1.11.0 0.0.0.255
90 permit tcp any eq 443 10.1.11.0 0.0.0.255
100 deny udp any any log
110 deny tcp any any log

1 Comment

Leave a Reply