Issue and Background
First and foremost, I am writing this article for myself, and if it helps others, that is a secondary benefit. I do not profess to be a network architect. I know enough to be dangerous, and arguably sufficiently effective in my role as it pertains to networking integration points after 20 odd years in the field.
With this in mind, I trip up once or twice a year in certain Citrix ADC (NetScaler) architectures whose multi-arm configurations and client access topologies result in return route issues needing additional configuration within the appliance. No matter how many times I run into this easily resolved issue, I forget it because frankly, the method to configure the resolution seems counter-intuitive to me (as again, I am not a network architect).
Under certain topology designs, traffic from client networks hitting VIPs (vServers) in another network on the Citrix ADCs may route through an incorrect return path, otherwise known as asymmetric routing. I often find this with multi-arm inner and outer DMZ subnet\VLAN designs, multi-armed Citrix ADC VPX designs, etc. Static routes can play a role (say, to ensure an inside leg’s SNIP is used for the ADC to access backend servers on internal networks for example).
The most common symptom I’ll come across is clients on an internal network needing to hit VIPs located in a DMZ, and while firewalls are open to allow for this, clients can’t successfully hit the VIPs. If we run a trace on the ADC (shell > nstcpdump.sh) we’ll see traffic coming into the ADC from the client, confirming this isn’t a firewall ACL issue outside of the ADC.
Although one might be tempted to turn on MAC-based forwarding, 9 out of 10 times it’s a cop-out and ill-advised. There are few scenarios where MBF is appropriate, and this definitely is not it.
The resolution lies in the creation of a policy-based route (PBR) which I never forget, but it’s the logic in its definition that can trip people up. And as a public service announcement, PBRs are not a concern on an ADC. Unlike an L3 switch where they are often frowned upon and may add additional load, every packet traversing an ADC is processed through packet engines, so PBRs don’t matter much. The packet will be processed in a single pass regardless.
We need the PBR to override whatever inherent behaviour is defined by the routing table in the appliance encountering the return route issue. The packet is being routed to the ADC through one network path, but the ADC will try and route the return traffic back through a different route path which causes the issue. We need to craft a PBR that based on defined conditions, will route all traffic received to the VIP range in our subnet out the proper gateway. Without it, the traffic will just get lost in the cosmic background noise of your network.
Time for a meme break…
In the example below, I have a few key parameters:
Source IP. This can be a specific VIP (which is the case in this example) or can be a range of IPs in the low and high fields. We need the traffic to match this condition using the = (equals) symbol.
Destination IP. This is typically a range, and should be the full range of the L2 segment. We need the condition to match the != (not equals) symbol.
Next Hop. This is the desired gateway that matching traffic should route through. In my example, this was a subnet smaller than a /24 thus the odd gateway IP and subnet range in destination IP. The gateway should be the local gateway of the VIP’s subnet.
Together these conditions have the effect of the ADC routing traffic out the correct gateway so long as the destination is NOT the local L2 segment (as routing local traffic through a gateway would be pointless and will probably break other traffic flows, so the != inclusion is very important).
Written out logically, this makes perfect sense, and hopefully, I’ll never again spend more than 5 minutes trying to resolve such simple matters.