ICC Home / Members / Meetings / Peer Support / Documentation / Projects
Minutes of November 8, 2007 ITAC-NI Meeting: |
Link to ACTION ITEMS from meeting AGENDA: CALL TO ORDER: This meeting was again held in CSE E507 to accommodate, on a trial basis, a proposal to videoconference, stream, and record our meetings for future playback. This meeting was called for Thursday, November 8th, at 1 PM and was run by ITAC-NI chairman, Dan Miller, Network Coordinator of CNS Network Services. ATTENDEES: Seventeen people attended this meeting locally. There were no attendees via Polycom videoconference, though Dave Pokorney did connect from the meeting itself via his Macintosh laptop. Ten members were present: Clint Collins, Dan Cromer, Erik Deumens, Tim Fitzpatrick, Mark Hill, Shawn Lander, Steve Lasley, Tom Livoti, Dan Miller and Handsford (Ty) Tyler. Four members were absent: Craig Gorme, Stephen Kostewicz, Chris Leopold and John Sabin. Six visitors were present as well: Jeff Capehart, Todd Hester, Marc Hoit, John Madey, Dave Pokorney and Christine Schoaff. Viewing the recording You may view the recording via the web at http://128.227.156.84:7734. You will need to click on the "Top-level folder" link, then the "watch" link next to the "cdmcu-2_08Nov07_12.45" item. Cross-platform access may not be available; on the Windows platform you will have to install the Codian codec. Audio archive An archive of audio from the meeting is available. Approve prior minutesNo corrections or additions were offered and those were approved without comment. November 7th UPnP induced network outageDan Miller inserted this agenda item at the meeting in order to briefly discuss this outage which affected most everyone at UF. He noted that many had likely seen the e-mails to the CCC list on this under the subject line "Network Routing". Dan mentioned that they were still learning more details and developing a plan of action on the matter. The root cause Dan reported that a certain series of Ricoh printers (some of which are also branded as "Lanier") occasionally have been noted to send out very large streams of multicast addresses with a short TTL (time-to-live). Depending on where that comes from and who is on the other end, when the TTL expires you can see some pretty serious effects (because when a router sees a packet with an expired TTL value, it should report the problem back to the sending network device--generating yet more traffic). The local subnet will certainly see a mini-DoS on the volume of the traffic alone. We do not know what triggers these multicasts currently, but one such printer caused yesterday's problems. Reactive measures Dan said that the particular printer involved has now been re-enabled, and that the owners of that unit have been informed to expect possible future local issues due to it. Filters have been applied on the core router which will prevent a future recurrence from affecting the rest of the network. Dan also mentioned that a couple more problem printers were noticed today and that two previous, but more minor, incidents had been noted prior to yesterday's major problems. Consequently, UPnP multicast filters will be applied on the core boundary routers beginning with today's 5pm maintenance window. At some future date they will probably extend those out further to layer 3 BPOPs and they are also looking at extending what they did yesterday with OSPF routing advertisements between routers--changing those from multicast (the default) to unicast. This will probably be done on Sunday. Outage details Yesterday's impact was pretty severe. In addition to the list of services which were announced as affected (DNS, Gatorlink Email, WebCT, all web sites hosted at SSRB, UF AD and UF VOIP), it should be mentioned that the DNS outage affected very many people. DNS resolvers are slow and when your primary DNS goes out, resolution doesn't work very well at all. That same effect trickled over to Bridges and the portal and had some secondary effects within the ERP infrastructure; consequently, the portal was largely impacted by the outage, which lasted about one hour. Dan Cromer proposed that various units consider having their client DNS settings configured to utilize DNS servers from separate subnets. IFAS currently uses DHCP to set 128.227.0.242 as the primary and 128.227.0.243 as the secondary, and both of those were unavailable during this outage. Dan Miller pointed out that the problem there is that the DNS resolvers are so slow, with something like a 30 second timeout, that a requested connection is dropped prior to the secondary resolving the address anyway; this makes that solution unworkable--at least for most applications. Proactive measures Tim stated that we did see impacts on DNS and VoIP and that they do intend to try to address those issues to see what improvements can be made. We might be able to implement better monitoring with more proactive alerts. We have a ton of that currently, but nothing which would have alarmed us to this particular sort of incident. Tim expected about 2-3 weeks of diagnostics would be performed to try and improve our future situation. Erik wondered if there might be other protocols, similar in nature to UPnP, which might be proactively identified and treated in the same manner--rather than just be in a reactive mode. Dan Miller didn't know whether that was really possible, though they do try to anticipate threats to the core network. Marc Hoit asked about the abilities of the routers to monitor and control such traffic bursts. Dan Miller, said that they do indeed have such capabilities and then went into more detail about what actually happened. There was a large stream of multicast going out. Multicast is much like broadcast in that, though it doesn't have to be interpreted by every host on the network, it still has to hit all the core routers and therefore has a wide reach. The stream of multicasts was large enough that it did trip a mechanism within the core router to rate-limit multicast traffic. The good effect there is that the routers did not crash. That allowed them to use out-of-band management pathways, which were much slower, but still allowed them to diagnose the problem more quickly. At least we didn't have routers crashing randomly. The downside is that this same rate-limit was what killed our router advertisements via multicasts; some of them got through, but many didn't. Some links "flapped" a little bit, while others went down hard. In the case of SSRB, Dan believes the problem was caused by the fact that they were hit by two different redundant paths with the same TTL, and that is what caused the major effect there. Thus, Tim summarized, two protective measures (the redundant paths and the rate-limiting) conspired to isolate SSRB and all its various services from the rest of the UF network. Dan Miller added that this was an unfortunate and unforeseen side effect of our complex network design. Ty asked that instructions for properly configuring these Ricoh printers be distributed and Dan Miller replied that this was being done. Tim sought to clarify that the steps being taken by CNS are specifically to protect the core network. Units with such printers may continue to see local effects until they disable that protocol on the printer or otherwise resolve the issue. Emergency notification Someone pointed out that people outside of SSRB could not get to Net-Services in their attempts to discover the source of the problem. Dave Pokorney and Christine Schoaff noted that they had tried to get to where the http://www.ufl.edu website could provide information, but there were difficulties in doing that. Dave Pokorney said that the switch-over problem will be solved and we should be more ready for such an event in the future. Marc Hoit stated that the goal is to have an automatic fail-over to an off-site website. Someone noted that such a site might best be advertised by IP# in order to still allow access for times when DNS was not functioning or reachable. Health Center routing issues Dan Cromer asked if the routing problems at the Health Center overlapped with the Ricoh-induced problem. Ty responded that their issue was resolved just prior. The root of the Health Science problem started on August 8th when a security engineer intercepted a hack on two IP addresses and, in order to correct that, instituted a rule on the firewall blocking 256 IP#s which belonged to Akamai. It didn't cause any problems until about 2 weeks ago when people saw some intermittent problems with web sites. Troubleshooting was very difficult because of how intermittent the issues were. It came to a head yesterday and, with the much appreciated help of the CNS engineers, they found the problem and removed the rule from the firewall. Tom mentioned that their attempts to correct it as a DNS error (e.g., clearing caches) just made things worse. It was a real learning experience. Discussion on project to convert UF second level domain from ufl.edu to uf.eduChristine Schoaff introduced herself as the director of the Web Administration group. She noted that she was on our agenda because a request has come through the University Relations group and her boss, Marc Hoit, to consider a project to change from "UFL.EDU" to "UF.EDU". Christine sought input from the ITAC-NI committee members and their constituents on this project proposal, whose first stage involves the drafting of a project plan. This draft document is currently kept on the Web Administration Project site and Christine would appreciate comments which can be applied directly to fleshing out that document. Plan overview Christine noted that she is currently trying to identify the various components which would have to be involved in the successful implementation of such a transition, and what the difficulties and costs of those might be. The intention is to incorporate those into this project plan so that the decision makers can make an informed decision about moving forward with the proposal or not. Marc Hoit stated that this is purely a "branding" issue--just as has been done with the logo--and they would like to have "UF.EDU" as our address. The president likes the idea and is very interested; all the PR and marketing folks think it is a great idea. Consequently, our job is to figure out if that is plausible, what it would cost, and how to make it work. Once sufficient input has been accumulated and organized, upper management can then make the decision on whether to move ahead with such a plan. Mixed mode expected Marc said that it was his "gut feeling" that we would always have two addresses, though we may publicize the UF.EDU version. It seems unlikely that UFL.EDU could ever go away because of how broad its usage is and has been. If duplication is necessary, then the question is whether that solves the problem and would allow business cards and advertising to say "UF.EDU", while allowing the rest of our processes to continue to function properly. Dave Pokorney responded that any pursuit of this would definitely require a mixed mode. However, utilizing aliasing (whereby requests for UF.EDU get pointed to UFL.EDU) will undoubtedly cause confusion for our users and clients--they will not understand where and when they can or should use UF.EDU versus UFL.EDU. Marc mentioned that he had a discussion with Marcus Morgan about a year ago and asked him if MX records couldn't be changed so that anything addressed as "@UF.EDU" could be delivered to "@UFL.EDU". Dave responded that the SMTP receiver on that needs to be taught to accept mail for that domain. It doesn't happen without something taking place on our 275 mail servers. Marc countered that this still would not take months of work and that the e-mail portion of this should be fairly easy to implement. Marc's greater concern is that if he puts "@UF.EDU" on his business card, then somebody might assume they can substitute UF.EDU for UFL.EDU in other contexts as well. For that to work with the web is more problematic; DNS changes will be necessary but not sufficient on their own. Is this a marketing whim? Someone brought up the issue that branding efforts always have a finite lifetime and questioned how often similar changes might be desired. It would behoove the UF IT community to take this opportunity to elucidate why our domain address is perhaps more fundamental than recurring branding efforts and the changing of university logo designs. Marc Hoit responded that he would not second guess administration's reasons for wanting this domain name change; rather, his job is to figure out what it would take to do this. Reverse DNS issues Erik responded that there are two aspects to consider. He believed it a hopeless task to expect that all addresses would truly change with an eventual goal of having UFL.EDU disappear. Marc Hoit countered that this was not the goal. Erik continued, if instead we are considering having all the DNS managers of all the domains at UF change their databases so that whenever you type "something.UF.EDU" you are directed to "something.UFL.EDU", then that would be something you could actually make a plan for and accomplish. There is one technical difficulty, however, in that there is currently a one-to-one mapping between the name and the address in the DNS table. We will have to decide whether to return "UF.EDU" or "UFL.EDU" responses for reverse lookups. Since reverse DNS is often used as an anti-spam measure, we would have to be concerned with mis-matches between addresses used and those reported via reverse DNS. Any mismatches would amount to self-inflicted denials for outgoing mail delivery to locations utilizing reverse lookups. The effects of dual-homing on page rankings Christine Schoaff added the concern that maintaining parallel domain names can hurt our Google search rankings due to external links being split between the two. She is researching what effects we might expect from that and what our options there might be for mitigating those. Marc Hoit responded that the potentially adverse effect from dual-homing was perhaps his biggest overall concern with the proposal. Dave Pokorney said (ironically) that this would affect the UF brand. He then asked what pain threshold would need to be reached for us to report back that we have investigated the matter and that this proposal is just not workable--that the pain would be too great. Marc responded that he wouldn't know until he acquired more details about what it would take to do this. It would depend on the overall costs and issues, and whether those could dissuade administration from proceeding. Other universities have done this Marc Hoit noted that other universities have migrated their names to different domains. Christine reported that Virginia and Ohio were two examples. Note: Dave Pokorney later found references to Northwestern and UIUC having done this as well. Marc feels that there must be a workaround for the search ranking issues; the question is "How much effort is that?". Consequently, Marc is seeking answers to the question "How CAN we do this, and what will it take?". Listing of the impediments is important, but we don't want to begin from the preconceived notion that this is simply undoable, as it obviously isn't. Marc would appreciate each of us sitting down and spending an hour or so listing the problems we foresee and then forwarding those to Christine. She would then involve the appropriate people to determine what would be needed to overcome those issues. Then we can provide a response to administration summarizing what it will cost in time and money to accomplish this and leave the final decision on whether to proceed or not to them. Long-term effects and other painful consequences Clint Collins expressed his concern about the difficulty in estimating the overall cost of doing this. He felt that there are so many possible issues and points for confusion that it would be easy to underestimate true overall costs. Shawn Landers asked if the plan would be for all new domains (for example, a new department) to get both "UFL.EDU" and "UF.EDU" designations. Shawn pointed out that this would be something which everyone would have to remain aware of and which we would have to maintain from here on out. Christine responded that they get approximately 1-2 new third-level domain requests per month; it wouldn't have to be a new department or center, but we do get a standard trickle of new requests. Christine said that whether we would have to maintain dual-homing forever would depend on the overall costs. We might look at eventually archiving the "UFL.EDU" (say) 10 years down the road--or we might find that there are too many problems with doing that for it to be feasible. It is on Christine's list to contact other universities who have done this and to find out why they did it, how they went about it, and what the problems were. Tom Livoti was curious as to how IRB would handle these changes, because he could recall a problem several years ago when they had changed IP#s. Doing so had required that research re-submit everything to have it re-approved by IRB for NFS or NIH grants. Tom mentioned being aware of the rule of unintended consequences on matters like this; you don't necessarily know what is going to blow up. Marc responded that getting a list of such issues is important, and that this is where we are in the process currently. How best to provide input Christine noted that the current plan document is her first attempt to get something on paper to give to Joe Hice. People can supply comments in whatever form they wish, but she will eventually have to incorporate those into this document. Consequently, contributions to the updating of that document might be the most useful way for individuals to present their opinions on this issue. Plugging suggestions directly into that document where you feel they best fit would be most helpful to her. Once that document is fully developed, Christine will append an executive summary and present it to administration for their decision on whether to proceed or not. What is the intended value? Ty questioned why this project was even proposed. He wanted to know who was confused by our use of "UFL.EDU". Marc responded that he understands that most find university resources via search rather than direct address entry. He is not a marketing specialist, however, and his task here is only to provide a plan for doing this, along with its associated costs. The decision to proceed will be up to higher administration. Christine noted that administration has not supplied details regarding the value they expect to gain from successful completion of this project; that is outside the scope of her involvement. Ty countered that conversations he has heard have been about how stupid this idea is--not about what it would take to do it. Marc responded that the same things were said about branding and he understands that. Ty asked why administration didn't provide information on why they wanted to do this, what the value is, and actually try to get people on board with the idea. Marc said he didn't want to go very public with this until he knew whether this was even feasible and whether we were doing it or not. Clint suggested to Christine that, when talking to other universities who have done this, she might want to ask whether the changes gave them the benefits which they were hoping for. Do they have any data to show that the improvements seen justified the costs which were involved? Timeframe of response Christine mentioned that there is no official deadline on this. She does know that Joe Hice expects to be asked at each board meeting regarding progress. He is telling them that it will take a long time, but he is using numbers like six months to a year to begin testing. Ignoring this isn't going to make it go away, so the quicker we get the information together the better. Dan Cromer stated that he could understand the value from the standpoint of public relations. He was concerned, however, that confusion during a prolonged transition period would make the payback period too long for it to be worthwhile. Proposed unadvertised testing Erik suggested, rather than having a long discussion, that we might simply go ahead and create the shadow mapping without advertisement. We could do this for all sites on campus and see how it goes. Shawn agreed that this should be easy to accomplish, and Christine noted that this has already been done for http://www.uf.edu. The big question is what would be all the unintended consequences of doing that? Since we wouldn't be advertising this, we could easily back off should problems occur. Shawn noted that this will be a problem with their e-mail, however. Administrators either would have to touch each user object and add the "UF.EDU" smtp address to each user object, or get UFAD to add a recipient policy which did those AD-wide. Summary Please consider problems and solutions along with their respective costs. Determine where best those might plug into our draft project document and get those to Christine. Update on Wall-Plate roll out progress and scheduleTodd Hester reported that they started the wallplate rollout on July 1st. When we started, we had about 12,000 data ports and about 1,600 VoIP phones. As of today, we have about 14,500 data ports and 2,100 phones. This is a growth of about 2,500 data ports and 500 VoIP phones over the past four months. We have completed Marston, Maquire, Powell, O'Connell, Architecture, Norman Gym, and data rollout for Weimer. The above graph shows the original projections. We thought we would have about 6,000 VoIP phones and 2,500 data ports when we were finished. After getting into it, we are now projecting about 8.500 phones and 36,000 data ports. So the project is growing on us. The other factor that is hitting us is new buildings and major renovation projects. Those are difficult to anticipate and generally cannot wait, but must be added into the middle of the project. The graph projects out with what they believe are real data, based on surveys, all the way out to the end of the project. The timing noted on the graph is based on the projected schedule. The original survey of our analog phones is what is producing the VoIP phone estimates (red line on graph). Todd mentioned that, interestingly enough, every time they roll-out VoIP phones, people seem to want more of them. Changes in project approach Originally we thought we would have nine building projects active at any one time. It turns out that there are currently about twenty-eight active projects. Six of those are buildings where they are making their initial contact and talking to them about the project. Nine buildings are in the plan and design phase, and thirteen buildings are actually being deployed currently. Thus the original nine buildings-at-a-time plan has gone right out the window. Initially we thought we would design an entire building and then order the equipment for it. That introduced a six-week delay just waiting for electronics to arrive. Consequently, they have started making educated guesses and preordering about two-to-three months in advance, so we are pipelining the switches and the phones in order to have a continual supply coming in. That change has shortened the installation time considerably. Hopefully we can find a few more ways to increase our efficiency, but we have a challenging schedule for these first three years. It won't quit after that time either, because we will begin the refresh all over again. Tim mentioned that some additional front-end difficulties are involved in translating a department's decision to go with VoIP into the actual counts and location details necessary to get that done. Getting customer commitment to supplying the precise details of what kinds of phones and exactly where they are to be located has been difficult. Todd mentioned that they are consequently trying to get with units earlier that originally planned in order to give people a little more time to make their decisions. Todd mentioned that this is an on-going process which we are tuning on a day-to-day basis, with the goal of becoming more efficient. Preparing units for anticipated costs Steve asked Todd how far ahead of time units were being contacted. Steve made the point that, in some cases, there may be considerable unit-side costs for remedial wiring and for VoIP, should that option be chosen. It seems to Steve that units would appreciate as much lead time on that as possible. Todd referred him to the "Planned Customer Commitment date" listed in the project schedule. They try to make contact at least one month prior to that date. Marc asked whether rough cost estimates were available to help units get an idea of what costs they might expect. It was mentioned that the handset costs were posted, but the costs are currently being communicated to each unit at first contact. Tim mentioned that there are three teams run by Mary Byrd, Sheard Goodwin, and Plant Rodgers. Each team consists of five people handling various installations. Also, there are several outside groups that may need to be involved, depending on the situation. If remedial wiring is required within the building, this is termed "inside plant" and that involves Judy Hulton as a sub-contractor to all of these teams. If you need fiber between the buildings, that is "outside plant" and Marvin Sawyer is the sub-contractor there. John Madey is the person who makes initial contact with each unit, makes the customer presentation, and hopefully closes the deal. There are also the customer service reps, led by Rosa Jackson, who collect the unit requirements and VoIP details. Todd Hester is sort of the "ringleader" of this circus. Todd mentioned that there is a team of roughly 50 people standing behind him. They have a weekly meeting of just the department heads and team leaders which involves 14 people, so there is a lot of coordination involved. Ty suggested placing an article in their monthly newsletter that would mention who to contact if departments wanted some budgetary estimates for planning. Shawn mentioned that he was hoping to host an early meeting for all of Engineering's network administrators, with CNS representatives available to answer such questions. That way, even those who may be three years out can get some idea of what will be involved and can be talking to their chairs now. It was mentioned that this had already been arranged for portions of CLAS by Erik and that it would be good to get other campus groups involved similarly. Ty also mentioned that it would be good to provide a spreadsheet which could be used to assist in estimating both costs and, potentially, cost savings. Todd noted that buildings which have been scheduled for later in the project are those which are expected to have the greater remedial wiring needs, and therefore greater unit costs. It is hoped that the longer lead-time will help them in making the necessary decisions. Are sufficient resources committed? Erik noted that the Provost has committed to pay for this project based on initial projections. Now that deployment numbers are exceeding those predictions, are our project costs going to have to increase as well? Tim responded that the three deployment teams have been funded as permanent positions and we have a permanent commitment to a 3-year replacement cycle to continue indefinitely. He has challenged his people with finding cost and implementation efficiencies which will make this project viable at current funding levels, even given the increased ports counts. It is their hope that this can be accomplished. We are funded at 25,000 ports and we are anticipating 35,000 ports. This is a 40% increase, not merely in capital equipment, but in productivity of the staff. So far, so good. At the prompting of Marc, Tim explained that the price of the handset was one of the major barriers in migrating to VoIP. The price of the electronics is basically standard regardless of where you buy it, but Tim had argued that our ongoing replacement commitment should warrant further price considerations. Extensive negotiations with Cisco eventually resulted in preferential pricing which Tim hopes can be maintained beyond the initial rollout of this project. That really helped and Tim feels good about our persistence in getting that done. Ty asked about the possibility of analog phone costs rising. John Madey responded that our current contract with AT&T requires only a single Centrex phone. He did say, however, that AT&T is likely to start pricing things differently as they start to see their business going away. The current contract for that ends in four-years. Ty asked if any organizations had declined VoIP so far. John responded that a couple had due to having recently purchased key systems and wishing to enjoy returns from that considerable investment. VoIP is a trickier sale for PBX/keyset customers and the cost analyses involved there are more difficult to present. Should we continue videoconferencing the ITAC-NI meetings?Dan Miller asked if anyone had any feedback on the usefulness of videoconferencing these meetings. Steve responded that he had not heard if any IFAS folks had taken advantage of viewing the archived stream. Dave Pokorny commented that he had attended a similar meeting via videoconferencing that took place in this room just two hours ago. Using his laptop, it was just like being there in person. Steve asked Dan Miller if notice was made to the Net-managers list of the live stream of this meeting being available via the web today. Dan responded that it wasn't really his intent to do that. Steve pointed out that we had discussed doing so at the last meeting. Dan said that he had it in his head, for some reason, that making notice there would not be a good idea. However, if the committee agrees, he can do that next time. Dan then asked if people would like to continue things next month on this same meeting schedule and no one objected. Call for Agenda itemsEdge Protection Dan Miller noted that we have a preliminary agenda topic next meeting for a more complete presentation from CNS on the edge protection features which we previewed at our previous meeting. They hope to get some recommendations from the committee to move forward on those. Second-site redundancy plans Dan asked for suggestions for other agenda topics for next month. Erik mentioned that there has been some discussion of redundant/split locations for ERP. Maybe we could have an update on those plans at some future meeting. Tim said he would talk to Mike Conlon about that; he was aware that Mike has some ideas that are becoming plans for a step along the path to supplying redundancy. Tim has similar ideas as it relates to our mail services, WebCT, and so forth. Both Tim and Mike have applications on servers and both have to confront the second-site redundancy issue. Tim is not sure that this will be ready by our next meeting, but it can certainly be discussed sooner or later. The MRI network Erik said that he could give a presentation on the architecture and features of the MRI Network, which is a 10 GigE research network connecting the HPC cluster to the Florida Lambda Rail. Dan Miller suggested that he might delay that until a meeting early next year, but would plan for that in the near future. Miscellaneous upcoming agenda items Dan Cromer suggested getting an update on the UF Exchange project. Dan Miller responded that this is on the list of future topics. Other topics on the list include wireless, IPv6, IPv4 space usage, and heavy users of the network (including perhaps DHnet, Shands and HealthNet, College of Business, etc.). Site blocking Marc Hoit proposed the topic of site blocking as a future agenda item. He just found out that some other schools block known pornographic and hacker sites. Tom reported that Shands is doing that currently and apparently Housing is as well. Action ItemsNext MeetingThe next regular meeting is tentatively scheduled for Thursday, December 13th. |
last edited 13 November 2007 by Steve Lasley