ICC Home / Members / Meetings / Peer Support / Documentation / Projects

Minutes of December 13, 2007 ITAC-NI Meeting:

Link to ACTION ITEMS from meeting

AGENDA:

Approve prior minutes
CNS Network Edge Protection plan

CALL TO ORDER:

This meeting was held in CSE E507 at 1pm on Thursday, December 13 and was made available via videoconference, with live-streaming, and recording for future playback. Prior announcement was made via the Net-Managers-L list. The meeting was called to order just a few minutes late by ITAC-NI chairman, Dan Miller, Network Coordinator of CNS Network Services.

ATTENDEES: Seven people attended this meeting locally. There was one attendee via Polycom videoconference but there are no records of how many may have listened into the stream via the web interface. We may want to clarify that latter option, as there may be considerable interest in viewing the meeting in real-time via a simple web browser.

Five members were present: Tim Fitzpatrick, Mark Hill, Steve Lasley, Tom Livoti and Dan Miller.

Nine members were absent: Clint Collins, Dan Cromer, Erik Deumens, Craig Gorme, Stephen Kostewicz, Shawn Lander, Chris Leopold, John Sabin and Handsford (Ty) Tyler.

Three visitors were present as well: Dennis Brown (via Polycom), Dan Hawn and Ryan Vaughn.

Viewing the recording

You may view the recording via the web at http://128.227.156.84:7734. You will need to click on the "Top-level folder" link, then the "watch" link next to the "ITAC-NI Meeting 12/13/07" item. Cross-platform access may not be available; on the Windows platform you will have to install the Codian codec.

Audio archive

An archive of audio from the meeting is available though the first few minutes were inadvertently omitted.

Approve prior minutes

Mark Hill noted that he had been listed as a visitor. He is actually a member representing the Department of Housing and Residence Education. Steve apologized for the error and said it would be corrected. He noted that the oversight was due to the fact that Mark was not listed on the ITAC-NI committee website. Mark mentioned that he will be stepping down to be replaced by Charles Benjamin shortly. No other corrections or additions were offered and the minutes were approved without further comment.

CNS Network Edge Protection plan

This meeting covered primarily the single topic of Network Edge Protection--a topic which had been extensively previewed at our October meeting. Dan Miller provided a handout which detailed the various components of their plan and outlined how they hope to proceed. The initial 15 minutes were spent reviewing the plan outline and most of the remaining time involved discussion regarding an ITAC-NI recommendation on the matter.

Review of Plan Outline

Dan went through his handout systematically, discussing UF CNS's approach to protecting the network at the edge.

Rationale and Approach

Why is this Important?

Dan mentioned that the goal is to improve overall network availability, reduce the manpower required to maintain an enterprise network (projected to be 36,000 ports when the Wallplate rollout is complete), and to identify small problems that may go unnoticed and which may generate unfavorable impressions of network performance. This is also a precursor to an eventual BlueSocket system replacement based on 802.1x or other out-of-band authentication mechanism, which will provide greater robustness and scalability for authenticated ports.

Early Experience

As background, Dan discussed how deployment began on a pilot basis in late Fall 2006 and proceeded until Sept. 2007. Individual Local Administrators were consulted on these changes as they occurred during the Wall-Plate rollout. This project turned out to be more difficult than anticipated. Local administrators were supportive, but there were some concerns.

Technology Description

Dan then outlined the basic details of the technologies involved. Network Edge Protection encompasses several related technologies as listed below.

Port Security (see October discussion)

Dan mentioned that the most questionable of these protection measures is Port Security. One key issue is whether hubs and switches beyond the CNS managed Wall-Plate will be allowed. The details of this proposed control measure were outlined as follows.

Port Security may be used to prevent loops that spanning tree would not detect. One common example is a misconfigured notebook with wireless bridging enabled. Another would be a switch that filters and does not generate BPDUs (e.g., Linksys). These can go undetected for a while if the effects are minimal, or they can be extreme and affect multiple buildings across the core.
Port Security may also be used to prevent abuse at the edge. Dan asked Ryan Vaughn to elaborate on this point and Ryan related that such things consist mainly of DoS attacks, misconfigured workstations or some other added equipment which causes a problem.
Under the current scheme, a Port Security Violation occurs when:
- More than 16 MAC addresses are present on the interface, or when
- duplicate MAC addresses are received from more than 1 port. This second effect provides MAC address level anti-spoof of the router and other important systems. Dan mentioned that spoofing could be an important issue were it to occur; however, they don't see that currently and this feature of Port Security actually causes more difficulties than it resolves. Unfortunately, there is insufficient granularity in the command set to see events of particular interest and ignore others. It is hoped that feature requests may eventually encourage Cisco to address this shortcoming.
Port Security is set to automatically recover after 15 minutes once being tripped. This could be lowered to 5 or 10 minutes though there are some implications there. Tim wanted to clarify the fact that if one of these events is sensed, then the port involved is automatically shut down. Dan confirmed that this would happen for 15 minutes and added that, if the condition remained after automatic recovery, the shutdown would reoccur. Dan mentioned that this is one of the considerations. If you are dealing with a major loop scenario, how often do you want to take a hit while it figures out there is still a loop? If this happens late at night it may be awhile before they can get out to inspect the issue and determine that the port should be shut down hard.
Recent switch software will allow shutdown on a per VLAN basis. This will maintain VoIP phone connections when the host connected thru the phone switch trips Port Security. Dan said that is still being tested in the lab but that they are hopeful it will work as advertised.

Tim again asked for clarification in order to better understand the procedures. Since this is all happening automatically, Tim asked, "When does an actual person become aware that this is going on?". Dan responded that he would later talk about other things which we need to do to go forward; one of those is notifying the local administrator. The current model would seek to have the system logs generate an automatic e-mail to the local administrator which notifies them of a problem on a particular port. We can also generate a non-critical ticket for an engineer--or even a critical ticket; so far we have not made that determination. Tim asked if we were using Port Security currently. Dan responded that in some places we were, but we do not yet have automatic notification set up.

Storm Control (see October discussion)

Dan mentioned that this control measure is a bit more straight-forward compared to Port Security.

Storm Control is used to prevent loops with excessive pps that spanning tree would not detect.
Storm Control is used to protect networks against faulty equipment.
Storm Control is used to protect networks against DoS attack.
We began with lower values but have not settled on having violations occur when
- Broadcast exceed 5000 pps
- Multicast exceed 75000 pps
- Unicast exceed 75000 pps
The port is set to automatically recover 15 minutes after being tripped--just as in Port Control. Ryan mentioned that this is actually a global timer which affects about 10 different items all dealing with some aspect of switch security.
The desire is to target this at edge workstation ports, but the challenge is how to identify workstation versus server ports. There has been some discussion that this could be placed on most server ports; however, doing that would require interacting with various server administrators to locate machines which would require this to be turned off. It is difficult to set this to a level which is meaningful, but which does not cause false positives. Again, this is another system request which we will be making to Cisco for more control over the measurement window. The window is too short currently and catches spikes, when we really want to detect high levels over a longer duration.
Imaging (such as Ghost) and VMware migration are known to cause false hits due to brief bursts of traffic.

DHCP Snooping (see October discussion)

Should some unauthorized person plug-in a device which offers DHCP services, it can cause widespread negative effects on a local network. DHCP snooping helps control that.

This is recommended but not mandatory. One of the reasons this is optional is that CNS does not supply DHCP on every Wallplate network.
This works very well if CNS is informed of any DHCP server changes.

802.1x (see October discussion)

This is still in planning stages.
It could provide Gatorlink authentication to the network without using authentication servers.
That would shift load away from authentication servers.
It does not replace good host security, however.
Finally, there are considerable challenges to its deployment and that is why we still view this as a more long-term topic.
- Windows supplicants do not work well. UF may need to buy a third-party supplicant--there would be a major cost impact there.
- Supplicants on other OSes are also a concern.
- Nightly automated activities to the edge host (backups, security updates) would need a way to authenticate to the network if this was implemented.

Problem Details

Dan noted that CNS has accumulated examples of actual problems encountered during their extensive field trials. They are grouped here according to the cause.

External Network Devices (Hubs and Switches)

Most Port Security errors were due to external network devices allowing multiple hosts on a single edge port. The majority of these occurred when the MAC address limits were low i.e. 3-5. We have since settled on a value of 16 and now do not see a great deal of alerts on this particular problem.

Rapid physical changes on the same edge switch

There is a 5 minute window (timer, not adjustable) that prevents duplicate MAC addresses from appearing on the network. If a user moves a host from a port on a VoIP phone to another port on that same edge switch, then Port Security will trip due to duplicate MAC address detected. These issues may be solved someday by Cisco. In our feature requests to them, we will emphasize that we need to treat the phone port as an extension of the hard-wired switch port and provide better statistics/diagnostics and switch link traps. The same behavior would occur when moving devices on external network devices, but that problem will never be solved.

Marvel NICs

Marvel LAN card cards trip Port Security by putting duplicate MAC addresses on the network. This is broken behavior and not a Cisco bug. This can be circumvented by turning off 802.1x on these hosts which can be done via GPO. There also may be a new NIC driver someday that fixes this.

VMware Servers

A VMware server configuration that is copied to create a new server must have its configuration altered to change the MAC address or the address will show up twice, thus tripping Port Security. This would likely cause serious issues even if Port Security was not in place.

ESX VMware servers that have 2 NIC cards for failover will trip Port Security since the same MAC address would be used on both ports during failover. Port Security must be disabled for these redundant servers. Since this normally occurs only in major server rooms, we don't see this as a big problem; but we do need to identify locations where this may be a concern.

A VMware server that is running more than 15 instances would trip port security, but this should not normally happen.

Walkup and other General Use Ports

There are occasional port security errors at Walkup ports which could not be verified due to the transient nature of the users. These may have been laptops with bridging enabled.

What are the Next Steps?

The above was a summary of our current understanding regarding network edge protection technology and its associated problems. The question raised to ITAC-NI is "What do we do now?"

Deployment has been Paused for Now

CNS has paused Port Security deployment for now, but has enough ports deployed in the field to constitute a good cross section of UF's networks and the nature of edge network events. These buildings may have Port Security removed pending further discussion here and with the CIO or we may decide to move forward.

Continue Troubleshooting Problems to Increase Understanding

CNS will issue feature requests to Cisco for more granularity in Storm Control and Port Security, and also watch for overall technology improvements including Marvel NIC updates, and new supplicants.

User Education and Outreach

The User community needs to be educated about proper ways to connect to the network, and made aware of specific pitfalls. This will be supported by point of install consultations as well as e-mails and public web pages.

A network user manual is needed for local administrators. This will include why this is good, what not to do, and how to report/circumvent problems.
Automatic e-mails should be sent to local administrators from logging alerts. This should be possible within the next few months.

Discussion Regarding an ITAC-NI Recommendation

At this point Dan introduced two specific questions of concern regarding the proposed UF CNS Network Edge Protection plan. CNS is seeking a recommendation from ITAC-NI regarding these issues prior to proceeding. He began with a quick overview of the questions along with some of the related pros and cons.

Need specific ITAC-NI recommendation on 2 main points:

Should external network devices (hubs and switches) be allowed beyond the Wall-Plate?

There are local budget implications since building wiring is the local unit�s responsibility. What should be CNS�s response once a prohibited device is detected? If the local units are allowed to have these devices, then they should be reminded that any security event seen on that CNS edge port may lead to disabling all traffic thru that port.

Should Port Security be deployed once user notification systems are ready or should we wait for more improvements from vendors?

Are the benefits worth the inconveniences? CNS could use a data mining technique to detect external network devices once they are connected, but this will fail to protect the network from the largest impacts.

HealthNet

Dan asked Tom Livoti how HealthNet currently handled these issues. Tom responded that they specifically disallow the deployment of external network devices. Regarding the Wallplate project, Tom realizes that there will be a period of transition; his view, however, is "if you are managing the network, then you should manage the network". He mentioned that they do allow it on the rare occasion. For example, if someone is setting up a classroom temporarily and HealthNet knows about it, then they will make sure the ports are being monitored.

Tom mentioned that such devices are found continually, despite a clear policy disallowing them. When they discover those they confiscate them and send a corresponding note to the dean. Tom said that they find wireless access points (WAPs) all the time, including WAPs with DHCP enabled. At the same time, Tom realizes that honest mistakes happen; however, they have found instances where people hid devices up in the ceiling. In such cases, the individuals involved obviously knew the policy and were deliberately trying to circumvent it. Then there is the matter of "third-time offenders". The solution is really an administrative matter, not an IT technical issue. Tom feels there must be punitive measures incorporated if compliance is to be improved.

Housing

Dan then asked Mark Hill how this is handled within DHNet. Mark responded that such devices are strictly disallowed by policy; when discovered, the port is automatically turned off and an e-mail is sent to the individual (though they won't be able to receive that via the disabled port, naturally). That individual must then contact IT to resolve the matter. Third-time offenders have their port disabled for the semester and have to appeal that action through the Student Judicial Process.

Mark said that they simply must have these strict policies and enforcements in place due to the nature of their clientele (students). Tom described the residence halls as the "Wild West" of network management.

Individual Academic Unit Perspective

Steve Lasley offered his perspective as the IT support person for a (relatively) small academic unit, IFAS Entomology and Nematology. Steve is concerned with the unit-borne costs associated with disallowing external network devices--costs both in dollars and in local flexibility. Steve assumes that others units may have similar concerns.

Managed switches at Entomology

Entomology's current network infrastructure covers four main buildings/wings, each of which contain a rack of managed switch ports connected to the APOP via gigabit fiber. These provide a current total of 288 available managed ports via HP 2650 and 2625 ProCurve Switches. The authenticated VLAN is available to any of those ports and is currently used to support two Cisco WAPs--one within the Administrative Wing and a second within one of the Research/Teaching Wings. Wireless coverage is fairly complete over the Administrative Wing, but only covers a portion of one of our other wings. Our managed ports are supported under lifetime warranty and cost (GBICs included) a total of about $5000 back in the summer of last year. The Wallplate project proposes to replace the active portion of these managed ports and provide future maintenance and upgrades via central funding. The Wallplate does not cover, however, the considerably greater cost of wiring.

Workgroup switches at Entomology

When our wiring was converted to Cat5 back in 1999 after almost a decade of thinnet use, each faculty were provided two drops at departmental cost and offered the option of paying for additional drops at that time. Most opted for a single drop to their offices and one to their labs; the overall cost for the 108 drops installed at that time was about $14,000 (what would be more like $21,000 today). Since then, the cost of additional network connections has been the responsibility of each faculty member.

Over time, computer use in the labs has increased greatly. When an additional computer was needed next to an existing drop location, professors were offered two options: we could either run a drop from the managed switch rack or install a workgroup switch. While the use of managed ports were encouraged, faculty often opted for the less expensive workgroup switch--though it was explained that these switches might prove less reliable and that problems on a single machine connected to such switches might lead to temporary disabling of network access for all computers so connected. Over the last eight years, our number of managed ports and workgroup switches has steadily increased. Currently we have 24 workgroup switches deployed providing roughly 120 "unmanaged" ports of which about 80 are currently in use.

Unit-borne costs may present barrier to joining Wallplate

Should external network devices be disallowed under the Wallplate, Entomology will be faced with approximately $16,000 of remedial wiring as an entry fee to participation. Additionally, unit-level flexibility for temporary and ad hoc networking solutions is a concern; the details of how exceptions could be arranged via CNS (ease and timeliness) have yet to be elaborated.

Would disallowing external network devices provide reasonable ROI at this time?

Tom mentioned that allowing the connection of these switches means that the rest of the network is essentially subsidizing their costs. Those are not counted in the $1.5 million central funding but still benefit from the infrastructure which is provided. In the case of HealthNet, which relies on per port charges to maintain the entire infrastructure, they are forced to draw a harder line as it basically means that someone is getting something for nothing.

Tom also mentioned that someone can plug into these unmanaged switch ports without anyone knowing about it. Steve allowed that, but also pointed out that our current situation doesn't really protect any port from that; without more sophisticated access control measures than are currently in place, anyone can basically borrow any port to which they have physical access. Until more sophisticated access control is economically feasible we must live with that situation anyway in Steve's opinion.

Dan mentioned that we currently can't prevent the connection of workgroup switches because we don't have policies to support doing that. If policy was in place, we could tighten our controls. Dan went on to stress the potential damage to overall network reliability that workgroup switches could introduce via inadvertent routing loops.

Steve asked if workgroup switches weren't just one possible cause of network edge issues. Even if we controlled those perfectly problems would still continue.

Using wireless to obviate the need for remedial wiring

Tim asked what the cost of wireless might be for replacing the need to rewire should workgroup switches be disallowed. That would naturally depend on the number of WAPs required and Steve estimates that Entomology would need to provide at least several additional WAPs for complete coverage even if we are allowed one WAP per concentration point under the Wallplate. There would also be the cost of wireless NICs and Steve estimates a considerable increase in configuration support would be necessary due to our UFAD-joined managed machines and the need to logon via "dial-up" VPN connection.

Tim wondered what the throughput hit might be for wireless as opposed to wired. Dan said that there would be some, but it might not be noticeable. He also mentioned that improvements in wireless (802.11n) should soon remove that obstacle.

Server Rooms at Units

Dennis Brown related that Horticultural Science has a server room with a gigabit switch in a rack that is fed off a closet switch. He certainly would not want to put a server room on wireless and asked how such a circumstance might be handled under the Wallplate. Dan said that this would be handled when the Wallplate team comes in to consult on the design of your network. Perhaps that would be a valid place for a telecommunications room or a special server exception. It depends on the server density. Dan agreed that servers shouldn't be on wireless, but does not feel that they should be on external network devices either. If they are important enough to be treated as servers, then we would recommend hard-wired ports. Thus there would be wiring costs to the unit as well as extra cost for gigabit ports on the managed switch.

Tom mentioned that HealthNet handles machine rooms by providing a local gigabit switch which is connected via fiber directly to the BPOP--especially if there are a lot of servers. Dan agreed that high-density situations would call for a server switch being designated.

Current 802.1x Deployments

Tim asked Dan if 802.1x was intended to provide a different authentication method as opposed to BlueSocket. Dan responded that it could indeed push the authentication off to LDAP, RADIUS, or some other central authentication server technology. The traffic then would not have to go through an authentication device the way it does currently with BlueSocket. However, there are some major supplicant issues to using 802.1x.

Tim was curious about what HealthNet and DHNet were doing about the matter. Tom responded that they really don't know what to do yet. For wireless they have gone to a Cisco NAT, but it is a non-secure environment once you get on. There are many issues and decisions to resolve before 802.1x may be deployed widely.

Mark said that DHNet is piloting a major wireless rollout for Maguire UVS. They have 18 WAPs currently and are intending to expand that, but they have found many issues with the ability of everyone to login. They are probably going to go with some sort of fiber connection from the WAPs directly to the switch and then have Port Security on the switch. All of their authentication is Gatorlink at this point.

Compromise Solution?

Dan asked if we should perhaps just decouple the cost issue on the first question and ask if the committee would recommend that the proper way to manage an enterprise network is to not allow these devices. As a corollary to that we could add that there is concern of costs to the units who are forced to do things on the cheap.

Steve responded that while he does believe there are disadvantages to workgroup switches, he is not certain that now is the time to decide that we can afford to disallow them--at least those which are under the control of local IT support. (Note: the extension of network infrastructure by end users is prohibited by UF AUP -- see section on Network Infrastructure/Routing.) Another approach might be to require such connections be registered centrally. That would permit closer monitoring and allow CNS to present units with the details of specific issues as they occur. It would be much easier to convince a unit to invest in direct wiring once it was shown that their use of external switches caused a specific local outage. If broader outages occurred affecting other units or the core itself, this same data could provide justification for more stringent measures.

Steve remains concerned about the costs both in dollars and flexibility. Regarding the flexibility issue, Dan mentioned that accommodations could likely be made for temporary situations such that CNS could even supply such a switch temporarily as needed. Steve then asked if we could develop the procedures for how that might work. If such a service were managed in a responsive fashion, that could greatly alleviate such flexibility concerns.

CNS to Develop Recommendation

Tim mentioned that a quorum was not present for the committee to be able to pass a recommendation at this time. What he believed Dan could do, is to take the input from today and develop a recommendation to the committee. In that recommendation we could include the associated procedures of how port disabling and notification would occur. It would be useful to identify exceptions within those procedures as well, as there are matters such as the flexibility issues which were raised. We would like to list the sorts of exceptions which people could request on an occasional or on-going basis.

As one of the potential exceptions, Steve wanted to know more about the proposed bench switches in IT staff locations which would be "provided for dynamic activities such as new PC deployment and workbench trouble-shooting". Steve mentioned that he currently builds machines behind an inexpensive router as recommended by the UF Security Team. These are some of the kinds of special needs which Steve would like addressed. Dan mentioned that a separately ACL-ed VLAN could replace the need for such a router and added that if there was one thing which would concern him as a network administrator more than an external switch, it would be an external router.

Steve asked Tom if HealthNet included the cost of a new wire pull in the monthly port charge, or if that would be an extra cost an installation time. Tom replied that, first of all, they only accept work orders from people who understand the financial burden. A physician, for example, cannot ask for a port directly, but must go through their LAN manager or business manager. Then they are charged for the wire pull and that includes one active connect. Additional connections incur additional monthly charges. Tom noted that port charge includes a calculation for continuing maintenance and theoretically supports eventual wire replacement. Thus port charges do support building wiring upgrades provided by HealthNet--this is something that is not currently there in the Wallplate model.

Other Discussion

Tom asked if there had been any further information on the proposal for changing UF's second-level domain to allow both ufl.edu and uf.edu and jokingly suggested that we change the University logo to UFL. It was noted that Christine Schoaff had incorporated IT staff feedback into the project strategy document and that a new December 12, 2007 draft version is available. Feedback for the revisions came from various places, including efforts organized by Allen Rout via the CCC list and this committee.

Steve would like to mention that, from merely an IT technical viewpoint, the very best result possible from attempting such a project is that everything keeps working and we can advertise "UF.EDU" via our e-mail and service-host addresses. Multiple cascades of inter-related issues promise to make such likelihood slim, however. Whatever gains a domain name change might promise, such a project invents the need for a huge expenditure of IT resources at a time when one might suspect those could be better allocated more effectively elsewhere.

It has generally been agreed that any introduction of a "UF.EDU" domain would require "UFL.EDU" to remain in parallel indefinitely. A complete switch-over from one domain to the other is seen as too difficult to seriously contemplate. Yet the overall costs for a "parallel" vs. "switch" approach would likely be on the same-order-of-magnitude and involve similar levels of effort.

Via either method, the difficulties would not all be technical. Neither approach would be complete until essentially everyone affected understood the consequences and knew how to conduct their business accordingly. The success of either would thus depend on an informational campaign without precedent, both internally and externally. The overall costs and benefits in turn would be directly related to how well this "non-technical" aspect was handled. Steve just hopes this is weighed wisely before any attempts are made to move ahead.

Action Items

Schedule videoconferencing on a continuing basis for 1-2pm on the second Thursday of each month. Provide advertisement of Polycom interactive access for ITAC-NI members and broader announcement of web-based view-only access via Net-Managers-L.
Subscribe Dan Miller, ITAC-NI chair, to all other ITAC committee lists for collaboration purposes (pending from previous meeting).
Draft a committee position statement on our need for a multi-year plan for reclaiming IPv4 address space (pending from previous meeting).
Develop a recommendation to the committee based on input from today's meeting. That recommendation should include the associated procedures of how port disabling and notification would occur. It would be useful to identify exceptions within those procedures as well, as there are matters such as the flexibility issues which were raised. We would like to list the sorts of exceptions which people could request on an occasional or on-going basis.
Schedule times for miscellaneous agenda topics including: 2nd-site redundancy plans, the MRI network, UF Exchange project, wireless, IPv6, IPv4 space usage, heavy users of the network, and site blocking.

Next Meeting

The next regular meeting is tentatively scheduled for Thursday, January 11th.

last edited 17 December 2007 by Steve Lasley