Looking Glass (LG) emerged as a means to troubleshoot BGP routing issues on routers of other autonomous systems where one did not have full access via SSH/Telnet. To accommodate this need, LINX has in the past offered direct Telnet access to its route collectors via a read-only account and web interfaces for both, route collectors and servers. Over time these various legacy looking glass systems have shown not to
Adoption of BIRD on All Route Servers
Originally all of LINX route servers were based on Quagga. In order to provide better resiliency in case of vendor-specific bugs in the route server software, eventually half of the route servers were migrated to BIRD. Due to increasing issues of Quagga adapting to new requirements such as RPKI and BGP enhancements, subsequently all route servers were migrated to BIRD.
Retirement of Cisco 7200 Based Route Collectors
For about two decades, LINX has used Cisco 7200 route collectors which have been end of service since 2015 and therefore needed to be retired. Given that we made recent investments into automating the BIRD route server configuration and Alice LG, a migration of all route collectors to BIRD was the obvious way to move forward.
Legacy LG Technical Debt
For our route servers, we used an in-house LG which was originally developed in 2003 for Quagga-based route servers. It proved more and more difficult for an increasingly smaller pool of engineers familiar with this now two decades old software to implement new features and reduce technical debt. On the route collector side, we used mlrg which was a popular LG for Cisco routers but is not under active development any more. A new LG was therefore required which should have an easy-to-use GUI, support modern BGP enhancements and RPKI, offer an API and be actively maintained. Alice LG and the accompanying Birdwatcher API fulfilled all our requirements and was therefore chosen to succeed the legacy route server and router collector LG.
Alice LG Rollout on Route Servers
After much needed work on the route servers to enable RPKI and upgrade the route server operating system to Ubuntu, work finally started on the Alice LG replacement in February 2020, just as the COVID lock down set in. Testing was initially carried out using a VM on one route server emulating LON1 RS1 with Birdwatcher. This worked fine until we decided to see how things worked with all LINX route servers connected to the LG in a test environment. It became clear that we needed more memory resources and decided to deploy the LG on a physical device with more memory which seemed to allow Alice to work better.
BIRD & Alice LG Rollout on Route Collectors
In fall of 2020 work started to retire the legacy Cisco 7200 route collectors and migrate the collector functionality to so-called “captain” servers. These captain servers are normally used for certain troubleshooting and monitoring tasks and there is one captain server per site. Co locating the collector functionality on those captain servers saved us from installing physical servers in each LAN or acquiring VM licenses. After upgrading the memory on our captain servers, all collectors (apart from the LON1 collector which runs on a dedicated server) are now co-located on captain servers and run on BIRD using the same configuration automation as the route servers. The route collectors use separate Alice LG instances which are configured similarly to the route server LG minus some of the functionality regarding RPKI and route filtering which is not implemented on the route collectors.
Written by Senior Network Engineers Mo Shivji and Jan Kayser and Systems Reliability Engineer, Ariel Smutkochorn,< Go Back