ICC Home / Members / Meetings / Peer Support / Documentation / Projects
Minutes of March 13, 2008 ITAC-NI Meeting: |
Link to ACTION ITEMS from meeting AGENDA: CALL TO ORDER: This meeting was scheduled in CSE E507 at 1:00 pm on Thursday, March 13th and was made available via videoconference with live-streaming and recording for future playback. Prior announcement was made via the Net-Managers-L list (late afternoon of the day prior). The meeting was called to order a bit late by ITAC-NI chairman, Dan Miller, Network Coordinator of CNS Network Services because we had trouble gaining access to the room initially. ATTENDEES: Twenty-three people attended this meeting locally. There was one attendee via Polycom videoconference but there are no records of how many may have listened into the stream via a web browser using the web interface. Twelve members (or their proxy) were present: Charles Benjamin, Dan Cromer, Erik Deumens, Tim Fitzpatrick, Stephen Kostewicz, Shawn Lander, Steve Lasley, Chris Leopold, Tom Livoti, Allan West (proxy for CLAS), Dan Miller, and Handsford (Ty) Tyler. Two members were absent: Clint Collins and Craig Gorme. Eleven visitors were present as well: Elwood Aust, Dennis Brown (via Polycom), Jeff Capehart, Will Chaney, Jeff Chorlog, David Gagné, Todd Hester, Andy Olivenbaum, Skip Rockwell (left very early-on), Jamie Serrato and Patricia Zabriskie. Viewing the recording You may view the recording via the web at http://128.227.156.84:7734. You will need to click on the "Top-level folder" link, then the "watch" link next to the "cdmcu-2_13Mar08_13.06" item. Cross-platform access may not be available; on the Windows platform you will have to install the Codian codec. Audio archive An archive of audio from the meeting is available. 1) Approve prior minutesNo corrections or additions were offered and the minutes were approved without further comment. 2) Review of network Support for Building Automation and the PPD Lenel Project which offers security services such as electronic locks, cameras, and alarms.2-1) IntroductionDan Miller began by mentioning that there are several building automation systems on campus. The HVAC controllers were the first of those for which CNS was involved; CNS worked with Skip Rockwell of PPD on that. Dan mentioned that Patricia Zabriskie was visiting us today along with Jeff Chorlog to talk about the Lenel security system and its suite of applications and devices. PPD also has an automatic meter-reading project which CNS is supporting, again in conjunction with Skip. To support these devices, CNS provides one VLAN per core router which is then sub-divided into address space ranges for each type of application. This provides an easy way to maintain standard security across the network and to simplify adds, moves, and changes. Lenel is the more complex of these examples. Dan supplied a handout of a "Typical LENEL Install" which detailed the variety of different devices and applications which that system supports. CNS supplies different address space ranges for burglar alarm panels, video recorders, cameras, and access panels. The automatic meter-reading project includes a separate energy management range as well.
With that brief introduction, Dan handed things off to Patricia who deferred to Jeff for a non-technical background and overview. 2-2) Background and overviewJeff Chorlog presented a brief history of the electronic access system and how PPD inherited responsibility for that, along with where they are currently. For about the last seven years, Jeff has been the Associate Director of Resource Management for PPD overseeing all their internal support functions, one of which is their IT support. Before Jeff even came to PPD, they had begun to have issues with various entities around campus installing their own little proprietary electronic access systems with keypads etc. It quickly became apparent that PPD and UPD wouldn't have access to some of these areas any longer; that, of course, is a big no-no--particularly for UPD. Consequently, the guys in the key shop and one of PPD's superintendents who was in charge of the key shop did fairly good research over a couple of years in an attempt to find out what was the best replacement for mechanical locks and keys to provide electronic access. They did some site visits to places like Georgia Tech and FSU and ended up with a recommendation for using the Lenel system. Jeff thinks this was a good choice, but noted that they did not consider the much broader implications of security, monitoring, video, police response, etc. Our needs really entailed more of a campus enterprise system than it did just the simple replacement of swiping a card rather than using a mechanical key. 2-4) This is going to get out-of-hand quicklyRight at the end of their research project, when they were getting ready to make a recommendation, the research team brought Patricia and Jeff into the loop. Those two hadn't been aware of the research being done prior, but they said it looked good and the decision was made to purchase Lenel's product. Once it arrived they then asked Patricia to load it in the server so they could begin to use it. Jeff then realized that this was going to grow quickly into a very large system, as it's more than just an electronic access system. Jeff said they originally planned to manage just the database and make sure that was validated to ensure it only contained active students, staff and faculty. They would then allow units to access the system and maybe provide a little help with that aspect as well. That turned out to be a little naive and this has quickly grown into quite a large system. Jeff originally politicked to have the entire system at UPD but they didn't want to hear of it. Consequently, PPD has it. PPD was not funded for this and have taken it out of their maintenance budget. Jeff had to politic for a while to get a dedicated position to even support what they are doing. Jeff introduced Jamie Serrato, who was hired a few weeks ago into a position which had been vacant six-eight months. They are struggling to keep up with the central administration of the system. The server they purchased resides at CNS and it is quickly becoming obvious that this system is a big deal. People are looking at it as a security system. UPD has agreed to monitor certain things. It is a critical system that needs to remain running continually and it is not just a casual slide-card access system. 2-5) PPD can't be project managersPPD had allocated $100,000 of their budget to help people with start-up projects and agreed to pay for the first ADA-compliant electronic access point in a given building. In the last couple years PPD's available funds have deteriorated to the point where they can't afford to do that anymore. There have been quite a few retrofit projects and now they are getting entire buildings with very complex systems coming on-line. Because PPD didn't realize what they were getting themselves into, they somehow became responsible for project management of installations. Consequently their dedicated support person had become a full-time project manager, doing something that we really weren't even well versed at doing, rather than being the manager of the central system as they had envisioned. Currently, Patricia has been working to specify all the things one needs to do in order to bring one of these systems up and have it actually work. We are also working with a consultant to do the construction aspect of the specification. When finished, PPD will have a full master specification that any project manager, be they Facilities Planning, IFAS or whoever, can use to learn all the rules and other considerations which one must take to get a system installed and running properly. They are doing this because PPD has neither the resources nor the expertise to be project managers for installations. So PPD is working diligently to relieve themselves of the project management portion so they can dedicate Jamie to the administration and customer support role for the central system. 2-6) Interim questions and commentsAllen West commented that he was a happy Lenel user from Pugh Hall. Jeff said he was sure Patricia could take the credit for that success and Allan responded that he had "bothered" Patricia quite a bit during the installation. Someone asked how many systems are currently in place and Patricia Zabriskie, who is in charge of all the PPD-specific applications, reported that they were at around 40 buildings currently with multiple installations in a number of buildings. Ty suggested that IG might want to take a look at the importance of this system to campus and the security of some particular systems and assets like server rooms and lab animals and take a look at the distributed way in which it is managed and maintained. They could perhaps come up with a suggestion. 2-7) Can or has this system been certified in any fashion?Dan Cromer asked if this system had been blessed by any high authority within UF to say that this system meets the security requirements of UF. Jeff responded that it had been blessed by Ed Poppell and Chuck Frazier back when he was Interim CIO. It has been declared the UF standard for any new installations. Jeff said he didn't believe it had been "certified" per se and Ty responded that he didn't believe there a certification program or process existed for such a thing. Dan Cromer's concern was not to wait until we were involved in a legal suit to look into that aspect. Patricia said that they had looked into some of the legal issues involved, for example, with mounting cameras at building entrances and knowing what the inference was there with regards to monitoring and response--whether signs are required as a disclaimer. They have explored this and other concerns with the General Counsel's office. They have discussed what the security clearance should be for administrators of the central system. They have gone through all the FBI background checks for people who are involved in that. Patricia believes they have conformed to the very minimal requirements which the General Counsel mentioned. Patricia has a whole series of concerns concerning the availability of information about these systems including how easy it is for people to request floor plans either from Facilities Planning or Architecture and Engineering--especially now that these floor plans are going to include the wiring for security systems. 2-8) Standardizing the methodology around installationsWhile Patricia thinks there are still several pending issues that need to be resolved, she believes the most urgent of those are being addressed in the Standards and Procedures document, a draft of which Dan Miller e-mailed out to the committee prior to this meeting. PPD is using that as a means of standardizing the methodology that they use for installing new Lenel systems. The goal is to make sure that the appropriate IT people are involved to protect the integrity of the network and not to introduce devices into it of which local IT are unaware. We also need to make sure that the systems get patched as needed and need to ensure that the expectations of the user are fulfilled regarding the liability of the system. 2-9) Server hardware aspectsPPD spent quite a bit of time along with Operations Analysis investigating what kind of a server should be used for this system. There were some licensing restrictions that made it cost prohibitive to buy two copies of the software in order to run a hot standby in a failover system. Consequently they ended up having to go with an NEC fault-tolerant internally-redundant hardware server; NEC partnered with Lenel in this effort. Homeland Security had the first installation of that and we have the second in the USA. We spent $58,000 dollars on that one server which, obviously, has to be renewed every few years. Maintenance is now due on that and Patricia is awaiting a quote to see how much it will cost to recertify the NEC server. Patricia believes that a better long-term solution would be to go to a blade system supported by CNS in-house and have a failover. Journalism has a live system that has the same level of licensing that PPD has; they are planning on moving to PPD's system which would then provide UF two full licenses. At that point PPD could get to a failover system without any additional expense. 2-10) Installation timing and coordination issues addressedSome of the aspects which were neglected in the past include not knowing exactly who would be occupying a particular area or building. The Cancer and Genetics building is one such example. Once that project had started and they had begun the plan reviews on that, they didn't know who the local IT support would end up being. The system ended up getting installed before anyone was actually in the building. PPD has devices in there that ran on Windows OS with no one responsible for patching them at that point. That generated some issues which Kathy Bergsma's office when one of the digital video recorders was breached. Patricia believes they have fully addressed that particular situation in the draft Standards and Procedures document. 2-11) Networking issues addressedThis document has also addressed many of the network connectivity issues including the allocation of IP addresses. The server currently is on a public IP; that needs to be corrected and moved to the use of private IP. Because CNS didn't necessarily have a presence in every one of the buildings involved, compromises to the plan were made and we have somewhat of a mish-mash of things out there currently. The current plan is to move buildings to this system as they come onto Wall-Plate and each Lenel device will be moved to the VLAN. 2-12) Funding modelsJeff commented on what Ty was implying previously. In many other institutions they take a big chunk of money off the top at the university and implement this system, virtually university-wide. Jeff believed that USF put something like $2 million into doing that a few years ago. Some other universities that PPD have talked to have implemented a monthly charge for every device in order to support these things. Jeff has discussed that with his boss and, considering all the budget cuts which we are facing currently, now does not seem to be a good time to implement that. In essence, however, PPD is running this on a shoestring. The system is growing and PPD's resources are either fixed or shrinking just like everyone else's. Supporting this system is obviously going to be a squeeze. 2-13) An enterprise-friendly systemPatricia mentioned that another aspect of the issue Ty raised is the central administration of granting access to rooms vs. the distributed management of that. One of the important criteria in the selection of the Lenel system when it was being evaluated years ago was the ability to segment the hardware so that administrators in one organizational unit could grant access to people while not being able to view the accesses granted for those same people in another unit. That was one of the things that differentiated Lenel from the other systems out there at the time. 2-14) Proper system support and maintenance is criticalTy wanted it clear that he felt distributed management of devices is a real good thing. He also thinks it is a bad thing that, when a device is malfunctions it is not like a mechanical key lock where he can put in a work order and somebody handles it and it is not like a toilet where if it is running he has somebody to call and fix it. If you have to call the contractor to take care of such issues then it is not being treated as an important university system. Someday, as a result, we are going to be reading in the paper about how somebody got into some animal care facility and killed a bunch of valuable rats or stole a bunch of valuable rats, or set something on fire because somebody didn't know how to fix this lock that was broken. That's the kind of thing we are subject to here and Ty believes that is a real serious problem. All the new buildings going up have this in it. It protects server rooms and animals. It protects patient information down in the HSC. It protects high-value labs. And, as you say, it is run on a shoestring. That is just a bad thing. Jeff didn't disagree and Patricia absolutely agreed. Dan Miller added that when we talk about network security, one of its basic tenets is that if an intruder has physical access then the game is over. So this matter is critical at a very fundamental level. 2-15) Difficulties acquiring a suitable maintenance agreementPatricia agreed and said that there have been many requests for campus-wide maintenance agreements. She has spoken with the vendor many times about this and just recently was given some scenarios which are used at other universities that might be interesting to contemplate a bit. For example, at FSU they run a different software package but in setting up their campus-wide support agreement they have two full-time people from the vendor who are at FSU all the time; they basically pay the salaries of two employees for the vendor. Those individuals are available to do nothing but support the end devices at FSU; they don't go around the rest of Tallahassee servicing other clients. PPD's own vendor's technicians who come to Gainesville also service other clients across northern Florida. Consequently, we can't always get them right when we need them; they have a limited number of staff as well. Patricia has explored the option of having a local Ingersoll-Rand representative from Hamilton Lock and Safe, but apparently this and all like organizations are under a different business unit and have nothing to do with electronic access and don't want to have anything to do with it. Exploration of that route in the past has been a dismal failure; locksmiths have no interest in getting involved with electronics and they weren't good at it so they broke a lot of things. The vendor had offered to situate a person here if we signed a campus-wide service agreement. Patricia's problem with that is the unrealistic expectation of having a single person here on call 24/7/365. The idea of having two full-time staff people from the vendor at least gives some backup and redundancy plus the ability to respond to two separate incidents at once. 2-16) In-house support does not appear to be an optionAnother option which has fallen flat is to train some of PPD's own staff to do this. Lenel will not train an end-user organization to do troubleshooting and repair; they have to be a value-added reseller of Lenel. That is a bit of a showstopper beside the fact that we don't have the money to staff two people out of ENG funds. Running it as an auxiliary, and charging back a monthly or yearly fee to the users of the systems has come up as well. Patricia suspects that the best option in that area is to do something like what FSU is doing on a fee/device basis and have an agreement with vendors to dedicate one or two people to just service UF. 2-17) Acquiring data on service call types and volumesWhat has kept PPD from doing that to this point is they didn't really have a feel for what volume of service calls they might expect. As part of all this reorganization and documentation of the system, Patricia is trying to institute a way of logging every time one of the vendor's technicians gets dispatched to campus so that they have a record of the volume and the types of calls that are being made. From that they can deduce patterns that may not be convenient for the vendors to disclose on their own such as "wow, we have really had a lot of failures of this particular kind of lock. The vendor is not apt to come to Patricia with that type of information. If PPD requires them to login with work management at each visit Patricia thinks they would get a better idea of what is going on. She can trace things back through purchase order payments and she has been trying to do that by commodity code to see what kind of activity is going on, but people can still use their P-cards to pay for these service calls and that can make things difficult. At least one of the vendors offered to try to generate some of those statistics for us. 2-18) Lenel system vendor detailsWe have two preferred vendors for UF, both of whom are out of Jacksonville. They are both certified Lenel resellers and their technicians are certified. That is in our UF Design & Construction Standards, section 13700. Three years ago PPD published those standards. They are a little bit out-of-date now so PPD will be revamping those with new hardware specs and new references. Patricia thinks it is getting apparent, even without those statistics, that we are reaching a critical level of service calls. We need the reliability of having someone available in less than a four hour response time. 2-19) Workarounds and best options for installation and maintenanceTy suggested that if they set themselves up as an auxiliary with the purpose of installing and servicing Lenel systems, that should make them a reseller. Patricia supposed that would be up to Lenel as to whether they would consider doing that, since we would be selling to ourselves and not retailing. Ty responded that there is no restriction on that; you could have the ability to sell to outside customers, set up your price list and say here it is. Patricia suspected that it would be better to get someone who is already trained. Jeff replied that he would tend to write a contract like we do for elevators with a fifteen minute rather than a four hour response time. Patricia said that this would be unique for Ingersoll-Rand, who has the largest presence here on campus currently; they do not provide anything less than a four hour response time even in Jacksonville where they are headquartered. Gainesville is much smaller however, and is without the traffic problems there, so Patricia thinks we could certainly negotiate a shorter response time. Patricia mentioned that Stanley Security Systems a subsidiary of Stanley Tools was the second vendor. Stanley used to be known as Best Access for those familiar with the older systems here at UF; they were bought out by Stanley a couple of years ago. The Ingersoll-Rand representative used to be called Security One; they were bought out by Ingersoll-Rand at around the same timeframe. All the names have changed, but the faces are the same--we have the same sales reps. 2-20) Questions on installation costs and proceduresDan Cromer asked for a ballpark figure for installing a single lock in a building to secure the data center. Patricia said that it usually runs between $4,000 and $5,000 for the first door because that requires the controller and the entire infrastructure. Adding a door after that, until you reach the capacity of the controller, is incremental. There are also other types of devices like video recorders, biometric readers, hand scanners and fingerprint readers. Dan Cromer asked if these could utilize our Gator 1 cards and Patricia responded that they could. She added that the standard for ADA entrances requires a long-range proximity reader (which is rather large); if you wanted to use proximity readers for internal doors those wouldn't have to be long-range (which is quite pricey). Ty related that it runs the HSC about $3000/door for the prox-card setup. Jeff responded that it not a straight ramp-up. You have to buy a controller to handle "X" number of devices. Patricia mentioned that one of the other advantages of the Lenel system over the others they had looked was that the controllers had interchangeable controller cards; you can upgrade the card to get more reader inputs and outputs. Ty mentioned that, for retrofit of a door, you have to go into the doorframe, so wiring is a big part of the expense. Dan Cromer asked if this was something for which one can put in a work order to PPD and they will provide a subcontractor who can do that. Patricia replied that those procedures are all part of the new procedure outlined in today's handout. PPD will be setting up a web site which will be the entry point for all requests for electronic access. From that point people will be directed to the appropriate project management area which will either be Facilities Planning or Architecture and Engineering depending on the size and location of the project. They will follow the appropriate checklist of items and be able to give you quote, help you analyze the quote and decide whether or not to move forward--hopefully giving you a couple of different options. Jeff said that these are just like other projects except they are a little bit more complex on the IT side. That is why they have to generate a more complex document for the managers to follow through. 2-21) Questions on access managementDan Cromer asked about access management and Patricia responded that the unit would specify an individual who would manage access to areas within the facility. Certainly the local IT staff has to be involved in the installation process, acquiring IP addresses and determining the location and power sources for the devices. They would have responsibility for keeping the video recorders patched and to update the client workstations. 2-22) Recurring license fees currently absorbed by PPDDan then asked if there was a recurring license fee for devices. Jeff responded that there was a recurring license, but that PPD is absorbing that cost currently. They do not like that because it gets bigger every year. Patricia expounded on that saying that their maintenance costs are a function of the number of devices we have. So that will continue to grow as we add on buildings and each time we collect another 64 readers there is an incremental maintenance cost increase in their license fees. The same thing applies for video licenses. New construction usually includes the head-in expense for adding the additional devices. With any large retrofit installation they are having the vendor include that head-in licensing upgrade cost as well. 2-23) Proper project management will raise costsOne of the consequences of going to a formal project management approach, which is the way it should have been done from the start, is we are going to have general contractor (GC) expenses added on to what we have been paying up until now for a particular installation. So the price is going to go up because you are now going to have a general contractor with their concomitant markup. The reason GCs need to be involved is we often have additional electric work that is required, we have network services with cable runs, sometimes there is some masonry work to do moving or installing pedestals, elevator work. There are often a lot of additional subcontractors that may be involved and need to be coordinated. Jeff followed by saying thus it is a project. PPD are not project managers and that is not their business. 2-24) Quick run-through of the Standards and Procedures draft documentHandout: DRAFT of the UF IT Standards and Procedures for Lenel Electronic Access Control Systems Patricia explained that, with the help of Operations Analysis and CNS, PPD has come up with a procedure to use which is included in the Standards and Procedures document. Section 11 summarizes those retrofit procedures; Patricia will be working with Facilities Planning to modify those for the new construction procedures. The sections leading up to the step-by-step procedures are mostly background identifying all the different players involved in a project, the types of Lenel devices, responsibility for maintenance, responsibility for controlling access to the installation, determining the locations for the devices and what the power source is going to be for the device. There is an important item that was added in one of the last draft reviews which was to make sure that the power source for all IP cameras, and basically all the Lenel devices, be approved by the network provider. The document begins with links to other reference documentation, most of which is already in existence. These include the Design and Construction Standards. Keep in mind these are a bit out-of-date and will have to be updated as part of this process. The Key and Lock Policy goes into a bit more detail. That will also be updated and perhaps both of those will be consolidated into the third document, the Electronic Access Control Master Specifications (not yet available). This is what PPD is working with the consultant on developing. This would be a complete guideline with standard drawings for each device type installation along with operation guidelines as well. Obviously, all the relevant portions of the Telecommunications standards and IT Policies and Standards and IT Security Regulations would apply and should be respected. Establishing a functional and consistent review process incorporated in with our installation procedures and long-term maintenance and support procedures is critical to assuring that all these standards actually get applied. Everything may seem obvious, but getting it to actually happen is the real trick. Patricia expressed her appreciation for all the input which they have received towards the effort so far and they would certainly appreciate further input should we see something which could be changed or added to this or which they have forgotten to contemplate. One item which came up just this morning was the need to make sure local administrators configure notifications of any interruption of service on switches in any areas where any of the Lenel system devices are installed and connected to a locally managed switch. It important that the switch manager and the user of the Lenel system get outage notifications because we have burglar alarms attached to this. We had one instance where the switch went down on a Friday afternoon and the local switch support person didn't find out about it until Monday morning. That meant the burglar alarm was disconnected from UPD for the entire weekend. This occurred in a critical area and notifications had not been configured. We can't make assumptions that anyone managing a switch is doing so perfectly, we need to be explicit about that. Consequently, an addition has been made which specifically states that those need to be set up. Section 11 goes into a lot of verbiage covering the step-by-step procedures which a project manager would use, providing basically a checklist with milestone points detailing all the things which need to be done at a particular point in the process as well as the information that needs to be gathered and documented and the people who need to be notified. There is a lot more documentation that PPD is working on establishing alarm monitoring agreements with each end-user and between PPD and UPD so expectations remain clear regarding UPD's response, if any, to alarms that may be generated by the system. Another agreement that PPD will be starting is between the PPD's Building Services and the user to assure that the custodial staff are aware of the proper procedure for gaining access to any area. There is a lot of documentation which PPD will be putting together that is not necessarily IT related; Patricia wanted us to understand that there are many different important aspects of this matter and some will require tweaking over time. PPD has learned quite a bit over the last three years about what can go wrong. They have quite a few happy customers now too. It has been a learning process for them but Patricia said they appreciated the opportunity to share with this committee today a little bit of the background of the project and to get our feedback. Patricia thanked the committee for their time. 2-25) What can this committee do to show our support?Dan Cromer asked if there was any action which the committee might take that could lend support to PPD's efforts on this. Dan believes this effort requires support at the highest level of administration. Dan Miller commented that Ty's suggestion about raising this to IG might be more suitable; he pointed out that we are an advisory committee focused on networking and that the Lenel system had far broader consequences than just the networking aspects. While Dan's motion for a committee recommendation that the Lenel project be properly funded was seconded, the question then arose of where our recommendation would go. Dan Cromer suggested it go to ITAC since we are a subcommittee thereof. When Ty pointed out that ITAC hadn't met in something like three-four years some people suggested Marc Hoit should be the recipient of any recommendation. 2-26) Perhaps involving the auditors would get actionThen it was asked if we would be expressing support or concern. Ty sidetracked at this point with a comment expressing his frustration with being part of a committee which reports to a committee that hasn't existed for quite some time. Returning to the motion, Ty said that he again would recommend raising this issue to the IG. Ty said that people actually pay attention to those guys and get upset when there are audit recommendations which they can't meet and have to explain that to the Florida Board of Governors. Jeff said that they didn't come here to complain, but the PPD's position is that they are providing an option to a physical key to get into an area, and the only people who really understand that there is a whole lot more involved with that is the three of them here today. It is very complex and involves security and response along with safety and protection of high-value assets. The mechanical lock replacement is just a small part of this, but that is all PPD had really been tasked to do. 2-27) Is money available through Homeland Security?Somebody asked if there might be any funding opportunities via Homeland Security. Patricia responded that some of the installations on campus have been funded in-part by Homeland Security and the grant writing was done by UPD in that instance. She said she would confirm this with Tony Dunn at UPD, but it was her understanding that those monies had dried up. 2-28) Committee action withheld pending security and liability reviewChris Leopold asked if the ITAC-ISM Committee had reviewed this and Patricia said that was where they were presenting next. Dan Miller then recognized the general support of this committee for what PPD was trying to do and deferred any specific action by the committee until this had been heard by the other relevant committees. Patricia mentioned that they might come back when the other committees had provided input and after, perhaps, this was run by the General Counsel's Office with liability questions as well. 3) Review of plan for improvements to the CSE machine room and its networkDan Miller said that he would talk a bit about the network aspects later, but first handed things over to Andy Olivenbaum. Andy was joining us to talk about the facilities. 3-1) The CSE machine roomAndy introduced himself as the CNS Associate Director for Data, saying that he is responsible for Computer Operations and that he manages about 24 staff as well as being responsible for the power, cooling, etc. infrastructure of the machine room itself. About 18 months ago CNS took over management of the old CIRCA computer room three floors below us here today. That room is about 1500 square feet with a raised floor, chilled-water cooling and a power distribution unit but not a large UPS. Their intention at the time was to promote that for use by departmental servers that were squirreled-away in closets and inadequately cooled and monitored. CNS has had some success with that and currently has 15-20 servers from 5-6 departmental groups there. 3-2) Building redundancy among our datacentersSince then, CNS (Tim Fitzpatrick) and Bridges (Mike Conlon) have been looking at the state-of-the-art planning for equipment at CNS. For as long as Andy has been here the UF approach has been "we don't have any money for that." Consequently, the plan was to harden the facility at CNS, which is something they have been working on. What we are looking at doing now as step one of a larger plan is to put some equipment in CSE 209 that is actually live for production use such that if SSRB goes down for some reason we would have storage servers, etc. already hot and running on the network and we could deploy other applications there. With the networking which was put in for departmental services they wanted to get the best connectivity they could for the least money so they were basically attached in the same manner as servers in SSRB, which means they were dependent on the network feeding SSRB. They were fed from the same dual BPOPS from SSRB, which gave them good connectivity, but also gave them a dependency on SSRB. Andy said that needs to be changed and Dan Miller would be speaking to us about that aspect shortly. All this is a result of the change in direction for this room. 3-3) Improving cooling and powerThe other thing they are doing for the room involves an engineering project with Moses Associates to add an additional chilled water cooling unit for redundancy. This will also involve a backup cooling system which will ensure the room is not totally dependent on chilled water. They are also getting a large group-size UPS instead of being dependent on a little rack. In front of that they will install a transfer switch anticipating the day when they will have real generator backup for the facility; or in the case of a planned outage they will be able to borrow the big PPD trailer generator and have a place to wheel that up and connect it. Those improvements are being done to the room in general to bring it closer to a datacenter standard with reasonable redundancy. It remains to be seen from the proposal whether CNS has the money to do that, but that is what they want to do. 3-4) Failover proof of concept for future DR plansTim added that the Disaster Recovery (DR) game plan is to take systems that run in a single datacenter at SSRB and distribute the processing across two datacenters on campus. That is first of all a proof of concept, but secondly it is a tough build with all the bits and pieces necessary for doing that. Once we have distributed those across two datacenters on campus, then in theory we could distribute that capability to another datacenter off-campus, around the state, or just about anywhere. So to prove that they can do it and operate and maintain it, CNS is going to try to do that at low cost and do it close by; then maybe someday we could relocate that to actual DR at a distance which is the more conventional approach. 3-5) Networking aspects of failover plansDan Miller then took over from Andy in order to discuss the networking aspects. He provided the committee a diagram of the "Proposed CSE COLO Network" and began by recalling the problem which we had last November where SSRB was off the net for a couple of hours. That also helped raised awareness of the need for improvement in the CSE network. Based on the need for as complete redundancy as possible from SSRB as well as the need to share IP addresses, subnets and VLANs between the CSE and SSRB machine rooms, network services came up with a plan for three BPOPs, all of the 6509 Cisco class with full routing topology and several HSRP (hot standby router protocol) domains. The bottom line is that CNS has invested a little over $100,000 in improving the network electronics for CSE. They also investigated significant money to increase the fiber availability between the two machine rooms and those new fiber runs are physically diverse. CNS is investing as much as they can to ensure that the network will stay up and meet the needs of the major players who are looking at DR scenarios. 3-6) Questions about this network topologyErik Deumens asked about the BPOPs. Dan Miller confirmed that there are two in SSRB today; the one which is going in CSE is on order, should be here in a few weeks and will hopefully be installed a few weeks after that. Dan also confirmed that the bottom part of the diagram already exists. Some asked if the idea was that if the two BPOPs in SSRB failed that we would still have the network on campus and Internet connectivity. Dan replied that this was true. If the entire building powered down CSE should still work through the CSE BPOP and the CSE Core POP; it would be able to get out to campus and to the Internet. Ty asked if the CTS Core had a single connection to SSRB as it appeared in the diagram. Dan responded that there is a single connection there, but the redundancy for that is on the opposite side with the connection to the SSRB Core. Dan said that the reason the costs for this were relatively high was due to the fact that they are going to use the same platform in CSE as is used in SSRB and that they are using 10G connections all around to provide necessary bandwidth. Hopefully CNS will soon have something in place which will support growth for several years to come. In response to a question, Dan mentioned that campus has two separate connections to the Internet; one in SSRB and a second in Centrex. During the failure last November, for example, the rest of campus could get out to the outside world. There was the little sticking point of DNS, however, and CNS has a separate project planned to address that which will be discussed at some future meeting. The network aspects should be in place in about 4-6 weeks barring any delays. Charles Benjamin mentioned that CNS may require a third location which actually monitors the other two. Dan Miller responded that CNS is talking about monitoring enhancements as well, but the reliability or failover should happen regardless of monitoring. The way things are today, Charles is correct; if there was an outage at SSRB our monitoring is essentially blind. Charles responded that he wasn't talking about that kind of monitoring. Depending on your applications, there is a potential that CSE will not realize the SSRB is down. Dan said that this is true and that he believed that this is what Tim was getting at regarding the DR aspect for the systems that are ultimately going to rely on this--it is a work in progress. There are a number of problems there that need to be resolved--and they aren't easy problems. 3-6) Final commentTim mentioned that when we speak of DR in this instance, what we really mean is failover. This project is the first step toward eventually building more robust DR systems. 4) Brief mention of security concerns with Go Daddy's SSL certificate requestsDan Miller passed along a preliminary warning from Marcus Morgan. He suggests that you use a certificate vendor other than Go Daddy. Go Daddy has recipient verification processes that make it difficult to ensure to whom a certificate is delivered. As an alternative, IPSCA works very well and is free at this point to EDU users. Marcus plans to follow up with a more official e-mail later on about Go Daddy and concerns there. Action ItemsNext MeetingThe next regular meeting is tentatively scheduled for Thursday, April 10th. |
last edited 10 April 2008 by Steve Lasley