ICC Home / Members / Meetings / Peer Support / Documentation / Projects

Minutes of January 15th, 2009 ITAC-NI Meeting:

Link to ACTION ITEMS from meeting

AGENDA:

Approve prior minutes
Bridges Infrastructure Group merging into CNS
CNS Wall-Plate SLO update
Update on IPv6 plans
Review of anycast project to improve availability of DNS service

CALL TO ORDER:

This meeting was scheduled in CSE E507 at 1:00 pm on Thursday, January 15th and was made available via videoconference with live-streaming and recording for future playback. Prior announcement was made via the Net-Managers-L list. The meeting was called to order by ITAC-NI chairman, Dan Miller, Network Coordinator of CNS Network Services.

ATTENDEES: Thirteen people attended this meeting locally. There were no attendees via Polycom videoconference and there are no records of how many may have listened into the stream via a web browser using the web interface.

Ten members were present: Clint Collins, Dan Cromer, Tim Fitzpatrick, Shawn Lander, Steve Lasley, Chris Leopold, Tom Livoti, Bernard Mair, Dan Miller, Handsford (Ty) Tyler.

Four members were absent: Charles Benjamin, Erik Deumens, Craig Gorme and Stephen Kostewicz.

Three visitors participated as well: Todd Hester, John Madey and Marcus Morgan.

Viewing the recording

You may view the recording via the web at http://128.227.156.84:7734. Currently, you will need to click on the "Top-level folder" link, then the "watch" link next to the "ITAC-NI Meeting_15Jan09_13.19" item. This will likely be moved into the ITAC-NI folder shortly. Cross-platform access may not be available; on the Windows platform you will have to install the Codian codec.

Audio archive

An archive of audio from the meeting is available.

1) Approve prior minutes

No corrections or additions were offered and the minutes were approved without further comment.

2) Bridges Infrastructure Group merging into CNS

This item was moved up in the agenda because Tim Fitzpatrick had to leave early. As reported January 6th via the DDD memo "Administrative Changes in PeopleSoft/Bridges and CIO Office", the PeopleSoft Infrastructure Group now reports to Tim as Director of CNS and that group will be relocated to SSRB in the near future.

2-1) Enterprise systems support at UF has been housed in multiple places

Tim stated that the distinctions between infrastructure level, applications level, business process level and user service level are a mish-mash across multiple organizations even though they are in some sense all central organizations.

CNS provides many services to enterprise systems at UF, including hardware, software and system admin support for the Student Records system. Likewise, Bridges has provided the same sorts of things for their own enterprise systems via the Bridges Infrastructure Group. That group involves ten out of the roughly one hundred overall FTE within the Bridges organization. The majority of Bridges staff is composed of applications and business process people rather than system admins.

While there has been a difference in where infrastructure sits for HR/Finance/Portal enterprise systems and the Student Records enterprise system, these two groups have many needs in common including processor, storage, vendors, products and architecture.

2-2) The purpose behind this move

Tim explained that the purpose in moving Chris Easley and the nine folks that work with him (Colin Hines, Curtis Weldon, Eric Bolinger, Frank Collada, Jay Jarboe, Mark Brumm, Mark Palmer, Phil Padgett, and Stuart Thomas) was to consolidate like functions. We are putting people who do the same kind of thing for the same generic purpose (enterprise systems at UF) into a common organization. The objectives are standardization of architecture, cross-pollination of skill sets, shared platforms, economies of scale and so forth.

2-3) How this will come to be

Chris Easley and Tim have met three times in the last week and one-half and Tim has sat in on one of his team meetings. Tim has been trying to get a sense of the inventory of platforms, annual budget, etc. It is hoped that these people will physically relocate into SSRB within two months. This is not considered urgent, however, and they don't have space to accommodate them currently; while CNS might have ten vacant spaces, they are scattered here and there. Those details are yet to be worked out.

2-4) Why now?

Tim said he didn't have a definitive answer to this question as that would have to come from Dr. Frazier, but believed that the concept of consolidating like functions to leverage an economy of scale was the driving reason.

2-5) Questions for Tim from the committee

2-5-1) Are there any implications for e-mail? (Dan Miller)

Dan Miller asked if the only implications for e-mail were that Exchange is now solely Mike Conlon's. Tim responded that e-mail is a question mark in the longer-term sorting out things. E-mail is an application but it is also somewhat of a middleware service connected to other applications. Exchange is closely tied to Active Directory and if we only had Exchange for e-mail those two would clearly fit together as they do now.

Tim said he views student e-mail as a commodity and that it should be first among the many things that should sooner or later be outsourced into the cloud. Other than that he didn't have much to say about e-mail at this time.

2-5-2) Did this merger arise from the UF IT Action Plans process? (Clint Collins)

Tim responded that such was not the case, rather this merger is viewed as something already planned for between Chuck Frazier and Ed Poppell. The initial recommendations of the ITAP Task Force have gone to the Steering Committee which will meet next Tuesday to provide feedback on that initial draft. The Task Force will then create a final draft for the President within the next two weeks.

2-5-3) Will this merger result in a cutback of total FTE devoted to the UF enterprise? (Handsford Tyler)

Tim responded that this has not yet been determined. He did say that CNS has been through a number of CNS consolidations and every one of those consolidations has ultimately resulted in an FTE reduction.

2-5-4) Where are the economies of scale in this? (Handsford Tyler)

If all we are doing is moving the same number of people to a different place to do the same thing, where we will see economies of scale? Tim replied that he hasn't gotten into enough of the details yet, but he knows there are common resources between the Bridges Infrastructure Group and CNS. Both have EMC storage, SAN networking components and networking personnel.

When Ty asked if the plan then was to do more things with the same number of people, Tim replied that he hoped to leverage the dollars spent on common platforms first. However, there have been a number of consolidations--often followed by people moving on or with people coincidentally retiring; in a number of instances we have then seized the opportunity not to replace those individuals.

3) CNS Wall-Plate SLO update

Todd Hester reported that we were currently about a year and one-half into the Wall-Plate project and there have been a number of new or expanded addenda added to the Wall-Plate Service Level Objectives document.

3-1) Addendum 7: Wall-Plate Firewall Services

Todd said that the SLO document had previously had a brief mention of firewall services which has now been expanded as addendum 7. This defines a free level of service which provides three firewalls per building handled through VLANs and ACLS. If more sophisticated needs exist then they go to a hardware-based firewall solution which involves an added fee. These will all be custom solutions, but the addendum does provide some price range examples.

3-2) Addendum 8: Requirements for Establishing a Non-Wall-Plate Network Zone within a Wall-Plate Building

Todd said that this addendum comes mostly from the research community who has expressed the need for special requirements. Initially we had said that a building has to be entirely Wall-Plate. This is a definition which allows the establishment of a non-Wall-Plate environment within a Wall-Plate building.

The concern here is in maintaining the integrity of the port security on the Wall-Plate network. The specifications included in this addendum are meant to protect that. If the customer can perform within those specifications, then CNS feels they can maintain the integrity of port security.

3-2-1) Specifications for non-Wall-Plate zones

The customer must clearly define the geographical area which will be non-Wall-Plate.
The Wall-Plate and non-Wall-Plate networks must utilize separate closets and cannot share horizontal cabling. Creating the non-Wall-Plate zone can infer with the ability of CNS to detect loops and keeping things separate is meant to prevent an inadvertent cross-connect.
The customer needs to provide a firewall. This provides a single MAC address connection to the non-Wall-Plate network so that the existing port security feature can still work.
Changes must be coordinated with CNS.
The non-Wall-Plate network must comply with all UF IT Policies and Standards.
The customer must accept all responsibility for problems which occur on the non-Wall-Plate network.
VOiP can be delivered to non-Wall-Plate areas, but they will be on Wall-Plate ports and locked down to the single MAC address of the specific phone. Additionally, the phone data port will be disabled.
If it is discovered that the non-Wall-Plate network has been inappropriately extended into a Wall-Plate space, the Wall-Plate ports in that space will be disconnected.
The firewall will either have to be configured to permit UF Network Security scanning or the customer will have to arrange with UF Network Security to do that scanning themselves and provide reports on a regular basis.
CNS will troubleshoot only up to the Wall-Plate port feeding the non-Wall-Plate network.
If the customer wants to rejoin an area to the Wall-Plate after it having been non-Wall-Plate, they can buy back the ports at the expansion port pricing.

3-2-3) Why permit a non-Wall-Plate zone?

Ty asked what sorts of activities people were engaging in that required a separate physical network. Todd responded that some need to be able to change their own firewall capabilities. They work late at night or otherwise need immediate change control and don't believe CNS will be as responsive as they require. Todd indicated that CNS doesn't necessarily endorse this, but does acknowledge that it may be necessary for the purposes of research in some instances.

3-2-2) Can server rooms be accommodated as non-Wall-Plate zones?

Chris Leopold asked if this addendum might be applicable to a server room--specifically, the server room (rm 208) in IFAS building 120. Todd responded that CNS provides an option of defining specific ports as "server ports" for machines in a maintained server room; port security would then be disabled from those. Chris responded that IFAS has a SAN in their server room and maintain their own management VLAN which has no access to the core. He wishes to maintain control of that. Todd responded that this could be considered a gateway situation that could fall under this addendum.

Chris believed that the server room scenario should be better addressed within the SLO--perhaps as a separate item. Chris asked about the firewall requirement as it might then apply to their server room; he was concerned about the extra cost that would entail. Todd said that a firewall would be required because CNS could not otherwise maintain the integrity of the network. Chris suggested that a separate port/feed on the router might be an option for server rooms. It was decided that Todd and Chris would take these discussions off-line in order to develop a custom solution that fit IFAS's needs.

3-3) Addendum 9: Withdrawing from the Wall-Plate Program

The third and final new SLO addendum concerns Wall-Plate exit procedures. Some units were concerned that entering the Wall-Plate program was a one-way street. Addendum 8 can be used as a mid-term exit strategy, but if exit occurs during a refresh cycle (roughly every five years) then a unit could potentially take over the switches that were already in-place. This addendum documents that process.

Shawn Lander asked if exiting would require the placement of a firewall between the building and the router. Todd responded that no firewall would be necessary. Shawn pointed out that this was the sort of connection that Chris Leopold had been wanting for their server room.

3-4) There is no mandate for Wall-Plate and CNS is being responsive to customer needs

Tom said that this sounds like a retreat from a mandate for Wall-Plate. Tim responded that there is no mandate; Wall-Plate started on an opt-in basis and remains so at present. However, many individuals have said that they can envision a mandate coming. Others have expressed concern that this is free today but will be charged back tomorrow--or that it's a one-way street and they're scared. This latter addendum puts the exit procedures in writing as an assurance.

Shawn added that this also represents CNS adjusting to customer needs. Via these minimal concessions, a number of buildings will go Wall-Plate that might not otherwise have done so--and that is a good thing. These addenda were developed, in part, to accommodate the concerns of certain Engineering units on campus.

3-5) Addendum 8 might be misconstrued to permit VoIP in a non-Wall-Plate building

Steve realized that this was not the intention, but there is nothing in this addendum that would stop a building from being essentially non-Wall-Plate (let's say one with a single Wall-Plate room as the extreme example) while still having VoIP service throughout. Tim responded that in such cases they would retreat to the intent rather than the letter of the addendum. Todd added that while there is the potential for abuse of the concept, this is intended for small areas within an otherwise Wall-Plate building.

Shawn asked if this could be revised to specify that if you have Wall-Plate with VoIP service and withdraw from Wall-Plate then you lose VoIP service as well. Todd responded that retaining VoIP in such a scenario would mean providing alternate wiring closets for their own network in any case; those costs alone would make it unlikely that anyone would go that route.

3-6) Is the Parallel Computing exception separate from Addendum 8?

Dan Miller asked about this. Todd had thought that this had been added in separately, but upon looking did not see it there. In discussing this further, Dan and Todd both seemed to think this addendum was general enough to accommodate those needs as well.

3-7) Can a Wall-Plate switch be housed within a server room?

Shawn asked this question and Todd replied that this could be and has been done for true server-room situations requiring a high-density of network connections. In such cases, a Wall-Plate switch is rack mounted in the machine room with a fiber connection run back to the BPOP. Tom added that this is how such things are handled within the HSC as well.

4) Update on IPv6 plans

IPv6 previously had been discussed briefly in October 2007 and again in passing on April 2008.

4-1) The IPv6 space is huge

Marcus Morgan related that IPv6 is starting to move a little bit after creeping at a glacial pace for the past twelve years; he still remains skeptical about how fast it will proceed. IPv6 is a 128-bit address space and hence beyond astronomically large. You can discount the last 64 bits because they are a host part consisting of globally-unique encoded MAC addresses; this allows easy portability of devices without readdressing--or at least that is the intent. Thus IPv6 effectively has a 64-bit address--which is still extremely large.

4-2) Allocation of IPv6 space

FLR has applied for, obtained and is starting to put into service what is called a "/32" where we have a fixed value in the first 32 bits of the address. That leaves 32 bits of address space which can be used for FLR's purposes.

What can be done for institutions such as UF is to allocate a "/48" where the last 16 bits are available for their use. This is still a huge space because it provides the equivalent of 2 to the 16th power subnets. So the current plan is to allocate "/48" spaces to institutions. If a particular institution requires more, then they could potentially get multiple "/48" allocations.

Because we have such a vast space and one of the problems with IPv6 is route aggregation, we are actually going to assign a "/44" for each instance of a "/48" which we allocate. This provides room for future growth within a given allocation, with 16 x "/48"s being available within each space. This was done due to concerns about the routing aggregation; should an institution outgrow its "/48" allocation, that can be expanded simply by changing a mask rather than actually allocating another space. That would provide the extra space while keeping the aggregation the same.

4-3) IPv6 architecture issues

Dan Miller mentioned that some of us might have seen the notice that IPv6 has now been configured on UF's external WAN routers; on January 11th a new "/48" of IPv6 space assigned by FLR was routed to the UFL campus. We are now on the IPv6 routing map for the world.

4-3-1) Rough draft plan for implementation

Now comes the question of what we should do inside the campus. Dan received a very rough plan today via e-mail from Chris Griffin which outlined the following points:

In the middle of February we will roll out IPv6 to all core routers, the Nexus routers, the research WAN and the lab. That testing will last two or three weeks.
In March it will be rolled out to the DC POPs and the CNS VLAN. CNS will be the first production use of IPv6. That test would run through the end of Spring Term. During that time we would take requests for IPv6 and evaluate whether or not the local networks were ready. We will also be working on design standards for building networks and plan for all the issues which arise.
One of the issues is throughput. We have a number of switches out there on building networks which perform very well currently. Once one begins routing IPv6 from a layer-3 switch, you could have some performance issues if you apply ACLs.
Assuming everything goes well and we come up with some good answers for the problems which arise, we will roll it out in a slow fashion to a few networks around campus over the summer. Then in Fall 2009, again assuming things proceed well, we will have wider availability for IPv6 services.
At this point IPv6 is only going to cover the main campus. We will talk with HealthNet depending on their interest in IPv6 and how the testing goes.
Some recent investigations on code and feature sets for the Cisco 3750 switches, which are our primary class of switch which we use for the layer-3 BPOPs, does have some good IPv6 functionality. We are optimistic that they will work well.

4-4) What is driving adoption of IPv6?

Tom asked about the current benefits of this implementation. Marcus responded that the benefit, other than as a solution for the need for public addresses, is not clear. We do know, however, that we eventually have to go to IPv6 and the question is more one of how to do that effectively. If we get into another NAT address translation situation, that could be pretty difficult and ugly. There is some consideration that dual-stack, where devices have the capability of addressing via both v4 and v6, can get us through a transition. The real concern is that public IPv4 space is projected to be gone by as early as the end of 2010.

4-4-1) Speed of world-wide IPv6 adoption is uncertain

Dan Miller added that he was not quite as pessimistic as Marcus about how slowly IPv6 may develop. UF doesn't want to be a late adopter and perhaps we should at least have a bulkhead of early adopters testing out the service. It might become more useful and a better alternative to NAT in the not-too-distant future.

Marcus pointed out that we need a critical mass of users world-wide, however. Right now there aren't that many places to go that can be reached via IPv6 addressing.

4-4-2) Routing aggregation for such a large address space will be a problem

There are plenty of addresses to go around with IPv6, but the real question is how we want to handle the few billion routes which you could potentially have. Marcus doesn't know that there is really a good solution to the routing aggregation problem. The size of route servers will have to really explode; currently we don't have the horsepower to handle that and it will translate to cost as well.

4-4-3) IPv6 has potential for removing the public vs. private issue

Clint asked if IPv6 could remove the issue of public vs. private space. Marcus said that the allocations will be so big that he believes that issue will indeed go away. You might have an interior routing policy which would give you the equivalent of a private space if you want that. Regarding security concerns, Marcus believes we will see networks in the future which are simply not connected and that have addressability just to themselves. If those are connected it will be done very carefully through proxies. Those are the situations where you will see internal routing policies.

4-4-4) Federal incentives may hasten roll-out in the USA

Another aspect is that the federal government is making a substantial push for IPv6. They may well provide financial incentives for rolling that out, or work requirements for connecting via IPv6 into NIH and NFS grants. As a large institution where we have a significant amount of federal tie-in, such matters provide additional incentive for UF to begin moving forward with this reasonably expeditiously. Dan Miller added that a number of other countries are investing heavily in IPv6 as well; he speculated that we might reach a critical mass within the next few years.

4-5) Coordination of planning at UF

Ty asked if CNS had a planning group for this. Dan Miller responded that at this point it is the network engineers within CNS talking with Todd; they will probably talk with Tim a bit more on this as well. Tom said that HealthNet would be interested in being kept informed and Chris Stowe of Shands should be brought up-to-speed as well.

Ty said that it would be very important that HSC be involved if IPv6 was indeed going to be tied to federal grant requirements; "65% of the grants at UF are hooked up to that Centrex router". Marcus cautioned that grant requirements were only speculation at this point, but Tom suggested that such tie-ins were extremely likely in his opinion. IT requirements on grants are not a new thing; 10 Gbps connectivity has made it into some already, for example. Marcus added that the feds are beginning to push out DNS security, so these sorts of things are not uncommon.

Tom suggested that a sub-committee of ITAC-NI might be a good place for such information to be disseminated--since this is a significant change to network infrastructure at UF. Dan Miller responded that would be a possibility, but he wasn't sure that the issues were sufficiently complicated that a sub-committee would be warranted. Ty said that this is not a matter of the complexities of the issues, but rather one of all being on the same page; it is easy to diverge if we don't talk. Dan asked for a few days to consider their own plan in more detail and then he would contact Tom about how to proceed from there.

5) Review of anycast project to improve availability of DNS service

Marcus Morgan reported that he is currently in the process of implementing a set of anycast DNS servers to provide multiple instances of the same physical address at various places on campus. Under such a scheme a given device would be directed to the closest DNS server and in the case of failure the next closest.

5-1) Anycast is used by DNS root servers

This is a pretty common practice used by root name servers. There are thirteen apparent root servers [addresses] but over a hundred physical servers providing DNS for the root zone. Those physical servers are distributed geographically using the process called anycast, which is really a pretty simple process.

5-2) How anycast works

With anycast, multiple small subnets are defined with internal routes to the same physical address. Those routes within the routing protocol have preferences so you have the same physical address as a target when you get there, but with a different preference value at each routing point. Thus, depending where you are you get the "closest" DNS server.

This anycast approach has been in pretty common use since about 1992. It doesn't involve any trickery in the routing process; multiple routes have been supported for a long time. It does require a routing protocol on the server; those servers must be route aware. It also requires pretty careful monitoring because the routing protocol is aware of whether the host is up or down, but doesn't know about the service. If the service fails and the host is up, then you have a black hole--which is not particularly good for something like DNS. However, this has been done by root servers and it has been done by several major universities.

5-3) Spring testing with summer deployment is planned

Marcus expects to be actively testing in April or May. By the middle of June he expects to have rolled out three servers with the same address. The real motivation here is that the IP address of the main UF DNS server has been hard coded into the configuration files of many machines across campus. If it fails, then DNS response is very slow because clients have to try that address first and time-out before trying the secondary server. This happens for every single TCP/IP request; the client-side resolvers are not very intelligent in the way they handle that process. The last survey showed about 4000 hosts tied to that one DNS server. We should be able to substantially improve their responsiveness in the event of a failure somewhere.

Three DNS servers with the address 128.227.128.124 will be implemented: one in CSE, one in SSRB, and one in Centrex. That will be lot more than we need in terms of capacity, but it should cover just about any single point failure we can envision. There have been a couple of previous network outages, one in November of 2007 and one in March of 2008, for which this sort of configuration would have improved the situation across campus significantly. Dan Miller said that this is in fact an official remedial action to that latter event.

5-4) Weathering DoS attacks is another potential advantage

Another advantage of this technique is that it tends to funnel a denial of service attack on a particular routable point to a single physical host. This has been a great help in withstanding distributed denial of service attacks on root servers, for example.

5-5) Monitoring needs

Steve asked about the monitoring needs. Marcus responded that he monitors name servers very closely in any case. He monitors that from inside the server to make sure that processes are working correctly and he has restart capabilities there. He also monitors them externally through Nagios and via other scripting methods so he can determine that they are functioning correctly.

Close monitoring is necessary because it can be a little tricky to determine which of the physical servers one gets to depending on where you are trying to monitor from. Root name servers usually put in a special definition so that you can send a query to the server and determine which one you got; consequently you must distribute your monitoring to some extent. Marcus will probably have another instance in the same physical location that will help monitor that name server.

Action Items

Subscribe Dan Miller, ITAC-NI chair, to all other ITAC committee lists for collaboration purposes (still pending from previous meeting).
Update our official membership list (still pending from previous meeting).

Next Meeting

The next regular meeting is tentatively scheduled for Thursday, February 12th.

last edited 12 March 2009 by Steve Lasley