Internet access for organizations today is no longer about connectivity for email and web browsing. A stable Internet connection is a vital component in the chain of IT systems required to conduct business. Typically, in the past, the focus around Internet connectivity has been on cost, with vendors providing solutions allowing organizations to spread their traffic across consumer and enterprise products. This approach is all good and can provide significant cost savings, especially when employee traffic is directed over low-cost consumer products such as ADSL; however, when conducting B2B business through front-end servers hosted in your DMZ, resilience becomes a major concern. In this scenario, a dead Internet link can mean a loss of revenue and even, potentially more serious, brand damage. In this paper, we discuss several methods that can be used to improve the resilience of an Internet link. While this sounds like it should be a simple case of connecting to multiple Internet Service Providers, the devil, as they say, is in the detail. My Update System
Mission critical Internet
Business networks have been mission critical for some time now and the focus on resilience and business continuity has always been top of any CIO’s mind, however, the general areas of interest for this focus were restricted to internal networks and systems. With more and more business being conducted directly via the web or B2B over Internet links to systems hosted in DDMZs, it is no longer permissible for an Internet link to be down. Loss of access to the Internet can directly impact revenue generation, especially today, as the business operating models begin shifting towards offsite cloud computing and software as a service.
Multihoming is a method whereby a company can connect to multiple ISPs simultaneously. The concept was born out of the need to protect Internet access in the event of either an ISP link failure or an ISP internal failure. In the earlier days of Internet access, most traffic was outbound, except email. An Internet link failure left internal users with no browsing capability and with email backing up on inbound ISP mail gateways. Once the link was restored, browsing and email delivery were also changed. The direct impact of the business was relatively small and mostly non-revenue-affecting. Early solutions to this problem were to connect multiple links to the same ISP, but while this offered some level of link resilience, it could provide no safeguards against an internal ISP failure.
Today, however, most organizations deploy on-site Internet services such as VVPN voice services, webmail, and secure internal system access while also musing business-critical offsite services such as software as a service (SaaS) and other cloud-based solutions. Furthermore, while corporate front-end websites are traditionally hosted offsite with web hosting firms, the real-time information on the corporate websites and B2B sites is provided by back-end systems based in the corporate data center or DMZ. Without a good quality Internet connection, these vital links would be severed.
Varied requirements and complexity
That said, the requirements for multihoming are varied and could range from the simple need for geographic link diversity (single ISP) to full link and ISP resilience, where separate links are run from separate data centers to different ISPs. While the complexity varies for each option, the latter forms the most complex deployment option but affords the highest availability, with the former providing some degree of protection but requiring a higher grade of ISP.
A major component of the complexity is IP addressing. The Internet IP addressing system works because each ISP applies for a range of addresses from the central Internet registrar in their region. They then allocate a range of IP addresses, called an address space, to their customers from this pool. No two ISPs can issue the same address space to a customer.
Why would this be a problem? Simply put, it’s all about routing. Routing is how the Internet finds out how to get traffic to your particular server. It’s a bit like the Google map for the IInternet. For somebody to find your server, a “route” or path needs to exist to the IP address of your server. Since you are getting your Internet service, and hence your IP address space, from your ISP, they are responsible for publishing the route to your server across the entire IInternet. They are effectively the source of your route; nobody else can do that for your particular address space. You can see how things can go wrong if the ISP suffers internal failure. If your specific route disappeared, your server would vanish from the IInternet even if your Internet link ran. This is precisely the kind of issue multihoming tries to solve, but for completeness, we will start with simpler options and work our way up.
Single Link, Single ISP, Multiple address spaces
While not a multihoming solution in the strictest sense, the single link, multiple address option can be useful for small sites. In this scenario, the publicly accessible host is assigned two IP addresses from two different address spaces. You would, of course, need two address spaces from your ISP for this to work. Thus, theoretically, if a routing issue impacts one of the address spaces, the other may still be available. The single physical ISP link is a single point of failure, and this option seems to offer little in the form of real resilience.
Multiple links, Single ISP, Single address spaces per link
This scenario, generally called multi-attached, is a variation of the above. The site now connects through multiple links, each with a different IP address space, but via sa single ISP. If one of the links fails, its IP addresses would become unreachable, however, the other IP address on the remaining link will still be available and your server would still be reachable. Internet service providers use a control protocol called Border Gateway Protocol (BGP) to manage their IP routes. This protocol is used to manage the traffic re-routing over the live link. BGP can be complex and demands a lot from its equipment. Of course, with complexity comes a cost; however, the BGP deployment for this scenario is not as demanding as a fully multihomed site and should not attract too much attention from the CFO. While the deployment is a simpler version of full multihoming, it does restrict the corporation to a single ISP, which may not be part of the business’s strategic intent.
Multiple Links, Multiple ISPs, Single address space
This scenario is generally what is meant when discussing multihoming. The BGP protocol is used to manage the visibility of the single address space across the multiple links and IISPs and, thus, maintain the routes. The BGP protocol communicates between the corporate routers and those of the two IISPs, with the protocol being able to detect link failure and divert traffic to the functioning link, even if this is via a different ISP network.
What’s the catch?
There is always a catch; in this case, several of them exist. To run true dual ISP multihoming and BGP as a corporate, you would need your provider-independent (PI) IP address space, and you would need to apply for a unique BGP Autonomous System Number (ASN). The AS Number is used to identify your site as a valid Internet location in the eyes of BGP. While applying for an ASN is not arduous, it places some significant responsibility squarely on you instead of the ISP. Deploying BGP effectively brings your organization closer to the Internet, making you responsible for advertising your public IP address spaces and, thus, your routes. It also means that any operational mistakes you make will ripple through the entire Internet in spectacular fashion.
Address space considerations
Most large organizations that operate true multihoming already have their own Provider Independent address space. They requested This address space directly from the local Internet registrar some time ago before IP version 4 (IPv4) addresses started running out. It is virtually impossible to allocate PI address space from the IPv4 pool today. It is possible to run a multihomed scenario using IISP-provided IP address spaces; still, the network configurations become considerably more complex and, at some point, start defeating the goal of increasing resilience. In the real world, increased complexity seldom equates to improved resilience.
Scaling
A true BGP-enabled multihoming deployment (often known as running defaults) will require hardware capable of storing IP routing tables of an Internet-scale scale. This is desirable as it protects the organization from an internal ISP failure. However, it requires the routers on-site to be of a “carrier grade”; in other words, it is big and beefy. The Internet routing tables are the massive and vast amount of processing power and memory wthat will be required to run defaults. It is possible to run in a reduced route mode where only local prefixes are stored on the routers; still, given the effort and expense of deploying a full multihomed solution, compromise should not be part of the conversation.
Summary
While there are definite advantages to full multihoming, there are also some significant caveats. Complexity and Scaling aside, the real reasons and costs for considering multihoming should be carefully considered.