![]() |
![]() |
ICC Meeting: |
IFAS COMPUTER COORDINATORS
|
I am sorry for any confusion, loss of data, wasted man-hours,
and any inconveniences that our outage caused. I want to be
completely honest with everyone and leave no doubt. This is
what we know in regards to our downtime on Monday:
Of the ten operational SharePoint sites we currently have in
use, four were wiped out on Monday: My IFAS home page, Admin,
Centers, & Research. What caused them to be deleted we are
not sure of. The databases for each of these sites was still
present, but the data that was contained in those databases
was gone. The backup procedures we had in place failed.
This is what we think happened:
1. The SharePoint environment was setup to have a feature
known as Site Use and Confirmation. This is an automatic
process that monitors each site and if it finds the site has
had no changes in 365 days, sends email notification to the
owners (28 times in consecutive days, one a day) of the site
to verify whether the site should be saved or deleted. If
the Owner wants to keep the site, they click a URL link in
the email and that confirms to keep the site. If the Owner
does not reply, the site gets automatically deleted after
the 28th notification. I am listed as the Owner, however, I
never received an email requiring me to confirm the site
use. We checked all deleted files, Junk folders, SPAM &
Barracuda folders, and the blocked email folders but could
not find any email notifications. The reason we suspected
this was I had received 4 such emails on 10/24/2008 for four
of the sites that did not go down and confirmed them. We
wanted to make sure there were no missed emails that I had
missed. We could not find any. If this had happened, then
the site & data would have been deleted just as our
environment had. That was why we questioned this first. This
feature was enabled to automatically clean-up our SharePoint
environment so we did not have a lot of unused subsites or
webpages build up in our environment.
2.The backups on our SQL server were in place on our
original SQL server and the SQL DB had recently updated the
SQL environment to a new server. When he did this, the
backup processes were supposed to migrate with the databases.
Apparently they did not. So the most recent backup we had was
from my backups when I migrated our environment from Windows
Server 2003 to Windows Server 2008 last month. So we lost
changes all the way back to 10/20/2008.
3. As such, with this last calamity, these are the
measures we have taken to ensure this does not happen again:
a. The Site Use and Confirmation feature has been
turned OFF. It will still monitor the process, but the
sites will not be automatically deleted. The Owners will
have to do that.
b. The SharePoint backup has been re-configured to
backup our environment every Wednesday night at 12:00 AM.
I am going through and manually backing up each site
every Wednesday & Friday night until I am positive we
have a valid backup schedule and it runs automatically.
This gives us a two – three day backup schedule. After we
have determined we can depend on these backup procedures,
the backups will take place once a week.
c. The SQL DB has re-instated the backup procedures on
his SQL server and monitoring every time it runs to
ensure it has run successfully.
d. We have gotten our AD administrator involved and he
is testing a SharePoint backup procedure through our
Veritas backup server. This feature he is testing will
cost our dept. roughly $500 but if it works, will be well
worth the expense.
That was the bad news. The only positive thing I can share
with everyone is that our design actually worked as it was
planned on for using our environment.
Each site has a separate database and was intended to be
used if another site went down. Well, our home page,
http://my.ifas.ufl.edu was down but the subsites were still
functional (such as http://my.ifas.ufl.edu/sites/depts/AEC).
In a normal webpage, that would not be so. And no other
site that was still operational had any lost data.
I know that this causes doubts in the security of our
environment, and will cause users such as yourselves to
question the security and dependability of this SharePoint
implementation, as well as my administration capabilities.
All I can say in defense of it is: this is the first time
we have had anything go down in our environment since it
was started two years ago. We actually setup
http://my.ifas.ufl.edu in January of last year (2007) and
had been testing it and designing it until we started
rolling it out in January of this year (2008), and never
had any data or site lost in that entire time. We had it go
down twice but those were DNS server issues, not SharePoint.
Once the DNS server was brought back on line, SharePoint
sites were working.
I am confident we will not have this problem happen again.
Again, I am sorry for any trouble and inconvenience.
|
Andrew added that they are investigating the use of Backup Exec for SharePoint and that option looks to be a good one.
Public folder file deletion policies and procedures status
Steve raised issues he saw with using SharePoint to replace the public folder. He is concerned that we would just be moving and propagating the problem. Since there currently isn't any SharePoint location for file sharing across all our various branches, individual units would have to create those. Then we would have a growing number of places where we would have to be concerned regarding inappropriate sharing of protected data.
Andrew responded that the planned web interface is likely a better solution, where a file would be uploaded and available temporarily via an obfuscated URL. The problem there is once again finding the staff time for implementation.
Dan Cromer mentioned that Patrick Pettus continues to work on the Tandberg Management software. It has the potential to save a lot of leg work in monitoring things and keeping them updated.
MicrosoftThere were six critical and two important Microsoft patches for December.
Third-party appsVersion 6 update 11 of Java was released and it does indeed correctly patch update 10 in-place. Update 11 addresses a number of recently reported vulnerabilities. Third-party vulnerabilities remain the biggest and growing concern. Secunia's vulnerability scanning is highly recommended to locate and patch such issues. Their online scan may be used here at work, but Secunia PSI 1.0 should be recommended to folks for use on home machines.
MS Office News update
There were no new updates to give at this time.
Job Matrix Update status
Steve wants to leave this matter as a standing agenda item for future discussion.
Steve wants to leave this matter as a standing agenda item for future discussion.
The meeting was adjourned rather early at about 10:55 AM.