Beta

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

HP Server Killer Firmware Update On the Loose

timothy posted about 3 months ago | from the just-this-one-little-problem dept.

Bug 100

OffTheLip (636691) writes "According to a Customer Advisory released by HP and reported on by the Channel Register website, a recently released firmware update for the ubiquitous HP Proliant server line could disable the network capability of affected systems. Broadcom NICs in G2-G7 servers are identified as potentially vulnerable. The release date for the firmware was April 18 so expect the number of systems affected to go up. HP has not released the number of systems vulnerable to the update."

cancel ×

100 comments

Sorry! There are no comments related to the filter you selected.

What? (0)

Anonymous Coward | about 3 months ago | (#46840009)

So, don't upgrade the firmware?

Re: What? (0)

Anonymous Coward | about 3 months ago | (#46840047)

I hope they have taken down the bad firmware patch already, and if so, what's the real story here? Don't upgrade to a non-existing patch?

Re: What? (2)

NatasRevol (731260) | about 3 months ago | (#46840311)

That was available for a week.

Hot firmware (4, Interesting)

DigiShaman (671371) | about 3 months ago | (#46840053)

And this is why I wait at least 2 months before installing firmware updates (unless it's a major security issue). It's not uncommon for a firmware update to be pulled shortly after being published. The 2 month window delay is generally more then enough time to ensure it's a proper update is solid.

Re:Hot firmware (4, Funny)

SJHillman (1966756) | about 3 months ago | (#46840467)

The company budget committee helps us avoid issues like this. The majority of our gear is old enough that there hasn't been a firmware update in four or five years. And no plans to replace it any time soon.

Re:Hot firmware (0)

Anonymous Coward | about 3 months ago | (#46842891)

What happens when it breaks?

Re:Hot firmware (3, Funny)

SJHillman (1966756) | about 3 months ago | (#46843327)

They approved two rolls of duct tape for this fiscal year, so we feel well prepared.

Re:Hot firmware (1)

RockDoctor (15477) | about 3 months ago | (#46848417)

What? No increase in the size of hammers available? You're fucked!

Re:Hot firmware (1)

Culture20 (968837) | about 3 months ago | (#46841079)

But this firmware is killer, which in software terms is the same as bodacious!
P.S. I find it sad that bodacious isn't in my browser's or OS' dictionary files.

Re:Hot firmware (2)

Lumpy (12016) | about 3 months ago | (#46841435)

I wait until I HAVE TO install a firmware update. Unless there is a major problem that will cause the server to explode, you dont update the firmware.

Re:Hot firmware (1)

DigiShaman (671371) | about 3 months ago | (#46843279)

I would say the most dicy of them all is HDD (generally SAS) firmware updates. A car analogy with else specific firmware updates is like changing the automatic transmission oil. If you update (change) HDD firmware as they become available, you prolong the life of what would otherwise be premature failing hardware. Otherwise if you never did any maintenance towards the end of their life, and then pushed an update through, you may find that the drive failed shortly after. Again, this *only* applies to HDD firmware updates.

Re:Hot firmware (0)

Anonymous Coward | about 3 months ago | (#46849817)

we swap out HDD's every 5 years, if they need them or not, very easy to do in a RAID60. WE believe that uptime and stability is more important than saving a couple of bucks on hard drives. HD firmwares never get updated. EVER.

Re:Hot firmware (1)

idontgno (624372) | about 3 months ago | (#46843863)

How 'bout it's a major problem because the company won't honor its warranty and support contract until you slavishly install every update?

The good news, though, is that after this update you'll probably have a better reason to open a ticket than the piddly-ass one you had before.

Re:Hot firmware (1)

AmiMoJo (196126) | about 3 months ago | (#46842103)

But didn't you hear? It's on the loose, it could be anywhere and upgrade your servers at any time!

Proliant Nightmare (2)

emil (695) | about 3 months ago | (#46842243)

I deployed a DL380p Gen8 last year, and it gave me heart failure.

Under Red Hat, I needed to change the IP address, so I modified the file /etc/sysconfig/network-scripts/ifcfg-eth0 then did a "service network restart"

Alas, the box did not come up on the new IP. Got to the console which was blank and unresponsive. Power cycled, and the RAID array was GONE (and let's just say this was EXTREMELY inconvenient timing).

Support was able to walk us through some BIOS disk recovery that (thankfully) worked. But I'll never change the IP address on a Proliant without a full reboot.

Re:Proliant Nightmare (0)

Anonymous Coward | about 3 months ago | (#46843477)

Um, do you even work on computers? Yeah, thought not. That had to be easily the silliest thing I've heard this week. Changing an IP deleted your RAID array. Uh-huh...I hope your boss was completely non-technical and bought that line of shinola.

Does not appear to affect the G7 (1)

nimbius (983462) | about 3 months ago | (#46840055)

We pushed a firmware update this morning to the firewall and its been smooth sai#*($$#[NO CARRIER]

Re:Does not appear to affect the G7 (1)

operagost (62405) | about 3 months ago | (#46840729)

Fantastic old-school post from someone with a high six-digit ID. Although, I guess a sixer counts as a old-timer now.

Re:Does not appear to affect the G7 (2)

UnknownSoldier (67820) | about 3 months ago | (#46842361)

> I guess a sixer counts as a old-timer now

Nah, just middle-age.

Re:Does not appear to affect the G7 (1)

morcego (260031) | about 3 months ago | (#46842647)

6 digits? Barely.

And yes, thank you 5 digiters for making me feel young again...

Re:Does not appear to affect the G7 (1)

blogan (84463) | about 3 months ago | (#46844427)

Get off my lawn!

Re:Does not appear to affect the G7 (1)

petteyg359 (1847514) | about 3 months ago | (#46848309)

Here, have a 7 digit to make you stop feeling young, you wrinkly decrepit fossil.

Re:Does not appear to affect the G7 (1)

idontgno (624372) | about 3 months ago | (#46843951)

[NO CARRIER]

There's your problem. You flashed your firewall with the firmware package for a mid-90s Hayes-compatible telephone modem. I hope you have a spare; that firewall is hung up good.

If it ain't broke... (4, Insightful)

Rich0 (548339) | about 3 months ago | (#46840057)

...don't flash it.

Do admins routinely flash firmware updates in the absence of some identified need? I could see flashing an update if I was suffering from a known problem, or if the vendor identified a security flaw in a previous release. I could see flashing it if necessary to install new hardware.

I just don't see why a server admin would flash a firmware update as if it were Patch Tuesday. In the absence of a security vulnerability or production issue there is no reason to treat a firmware change as an expedited change and not perform full testing before deploying it. That isn't to say that doing some testing of security patches/etc isn't wise - but I can see why it would get rushed.

Re:If it ain't broke... (2)

Charliemopps (1157495) | about 3 months ago | (#46840113)

and when your entire site goes down on a Monday morning because one of your vendors applied an update to some connecting hardware? And their response when asked for the reason for the outage is "Your hardware was 3yrs out of date. Your Sys Admin said it wasn't broke so he didn't fix it" What's your boss going to say after he gets done telling how many years of your salary the outage cost the company?

I delay updates, but I get that shit approved by executive officers first. I always make sure I have a very good reason to delay it as well.

Re:If it ain't broke... (2)

bravecanadian (638315) | about 3 months ago | (#46840131)

and when your entire site goes down on a Monday morning because one of your vendors applied an update to some connecting hardware? And their response when asked for the reason for the outage is "Your hardware was 3yrs out of date. Your Sys Admin said it wasn't broke so he didn't fix it" What's your boss going to say after he gets done telling how many years of your salary the outage cost the company?

I delay updates, but I get that shit approved by executive officers first. I always make sure I have a very good reason to delay it as well.

Ah yes, Sys Admin, the damned if you do and damned if you don't profession.

ITIL (0)

Anonymous Coward | about 3 months ago | (#46840237)

That's why you implement AT LEAST the change management part of ITIL. Some executive/manager will have to sign on it, either in the "do" or "don't do" checkbox.

And you fucking test these things in non-critical boxes first!

Re:ITIL (3, Insightful)

NatasRevol (731260) | about 3 months ago | (#46840329)

Unless the executives don't give you 'non-critical boxes' for every piece of infrastructure to test updates.

"Why do you need an additional SAN at $100k? We'll deal with that if it happens. It happened? It's all your fault!"

Re:ITIL (1)

DigiShaman (671371) | about 3 months ago | (#46840473)

It's why you should always have your resume' up to date and another potential job in your pocket. Easier said then done, I know. Sysadmin positions are always inherently sacrificial lambs for when shit hits the fan. Deal with it, or find another industry to work in. Just some friendly sage advice to offer you.

Re:ITIL (2)

gstoddart (321705) | about 3 months ago | (#46840639)

You know, if that's how your company is being ran, you should already be looking for another job.

Where I work, we've got proper test equipment, a CAB to review the proposed changes, and an expectation that you will test before deploying. When we schedule outages, we have to have a backout plan, and we're expected to have applied the updates in either the lab or a test environment.

The admins aren't considered sacrificial lambs, but they are expected to apply due diligence, test, and identify any risks. But once you've done that and made sure people know what you're doing and why, what the results of your testing is, and what you've done to mitigate any risks ... a bunch of senior people in IT have signed off on it and people have had a chance to voice their concerns. The people overseeing this tend to be department heads with a lot of industry experience, so they understand there is always risk, but they also understand what you need to do to minimize it.

If your company refuses to give you what you need to do your job without being able to do these things, your company is sailing straight towards a major disaster with or without you.

If your company is treating it as "stop talking and do it" combined with "but if you do it wrong you're SOL" ... your company is being managed by people who don't understand what is involved in your job, and will always have unrealistic expectations.

Companies which don't plan for these things, don't build a proper process around it, and don't fund being able to ensure things get tested are just being penny wise and pound foolish.

And, from a certain perspective ... I would never even consider applying a patch to a production environment which had only just been released by the vendor. At least a month, maybe as much as two. If someone wanted to put a firmware update on my production systems which was only just released from the vendor, the answer would be a firm "no bloody way". And my manager, and his manager, and all of the other people at that level would also be saying the same thing and back me on that position.

You have to have a company culture which owns the process, takes responsibility for it, and actually takes the time to understand the impact of it and plan for it.

Now, if a system admin does any of these things without going through all of the process, and things go wrong ... then you likely will be neck deep in crap pretty quickly. But if you have followed the process, and something goes wrong, the process shifts to remediating what went wrong, and understanding what can be done better next time. It has to be a continuous process, and it has to actually have some institutional memory, and companies have to take the process seriously.

Re:ITIL (1)

DigiShaman (671371) | about 3 months ago | (#46840669)

I agree with you. But...

If your company refuses to give you what you need to do your job without being able to do these things, your company is sailing straight towards a major disaster with or without you.

Which is the majority of SMB companies out there. They also make up the majority of the work force.

Must be nice to work for a fortune 500 company to have the resources available...

Re:ITIL (2)

gstoddart (321705) | about 3 months ago | (#46840879)

Must be nice to work for a fortune 500 company to have the resources available...

You don't need to be a Fortune 500 company to apply this level of rigor. I'm quite sure we're not one.

Yes, you need resources to do it. Yes, you need corporate will to do it. And you also need to have a company whose culture includes actively assessing risk against their needs, as well as understanding how the risks translate into business risk. If the systems affect the actual production of your business, you need to treat it as Very Important.

If you stand to lose millions of dollars per hour in the event of an outage, the cost of screwing up gets pretty high. Which means the expense is absorbed. If you have much less exposure due to an outage, your tolerance to risk is going to be much higher.

My wife does outsourced/leveraged IT ... and some of her clients, if some environments are down, basically have to halt all production, shut down equipment, and go through an expensive restart process.

Even at the SMB scale, you need to understand your risks, and have management be partly responsible for the decision making process, as well as having people who can provide the information needed to make decisions. These shops may not have the resources to test and deploy everything to a lab, which means, if anything, they should be staying away from applying a brand new patch as soon as it's released.

Re:ITIL (0)

Anonymous Coward | about 3 months ago | (#46844245)

"You don't need to be a Fortune 500 company to apply this level of rigor."..."If you stand to lose millions of dollars per hour". Two million dollars per hour x24x7x52 is 17 billion dollars per year. That's around #160 on the Fortune 500.

Of course if you're that big, you're also much more likely to actually encounter every obscure bug and to have duplicated environments. For actual SMBs, the risk side of the calculation involves probability of encountering a bug, not just its impact, and the probability of causing problems by applying every patch when not all equipment is available in duplicate becomes another factor.

Re:ITIL (1)

ewieling (90662) | about 3 months ago | (#46844079)

If an SMB does not have the resources to operate then they will not be successful. It doesn't matter what the resource is, office space, electricity, phone lines, etc. People who run SMBs seem to think electricity is important, but IT is not.

I see this all the time, an SMB customer buys telephone or internet service from the company I work for and then the customer thinks we have become their IT department. They are desperate for IT help, but not desperate enough to actually hire someone to take care of it. I have little sympathy for such companies.

Re: ITIL (0)

Anonymous Coward | about 3 months ago | (#46845177)

Yup been there buddy.

Re:ITIL (-1)

Anonymous Coward | about 3 months ago | (#46843053)

And, from a certain perspective ... I would never even consider applying a patch to a production environment which had only just been released by the vendor. At least a month, maybe as much as two.

So, I guess you get pwned by every security exploit then.

Fixing heartbleed would be next month, right?

Re:ITIL (0)

Anonymous Coward | about 3 months ago | (#46850731)

And, from a certain perspective ... I would never even consider applying a patch to a production environment which had only just been released by the vendor. At least a month, maybe as much as two. If someone wanted to put a firmware update on my production systems which was only just released from the vendor, the answer would be a firm "no bloody way". And my manager, and his manager, and all of the other people at that level would also be saying the same thing and back me on that position.

If everybody worked that way there'd be no guinea pigs to test new firmware updates. Everybody would wait two months and everybody would fail it, just two months after release date.

Re:ITIL (0)

Anonymous Coward | about 3 months ago | (#46843529)

Maybe just do your job, do it well and communicate, rather than look for a way out of every troubling situation. better update your resume again, it's almost May!

Re:ITIL (1)

hackus (159037) | about 3 months ago | (#46841663)

Which is exactly why I do not do ITIL.

Borne from a bunch of technocrats with nothing to do all day long but twiddle their thumbs, ITIL gives any spy organization an exact behaviour profile of your organizations entire operations.

You implement ITIL and your already half way to having your infrastructure compromised.

Re:ITIL (1)

FaxeTheCat (1394763) | about 3 months ago | (#46844139)

We use ITIL, and the CM part definitely has saved us from major issues.
And some of the problems we see would have been avoided if the teams responsible had followed the CM process.
The more complex the organization is (we are a truly global company), the more you need structure.

Re:ITIL (1)

hackus (159037) | about 3 months ago | (#46844275)

Structure is fine.

But a canned structure isn't, and no organization benefits from a canned structure it uses that either its competitors use, or its enemies.

Really this has been demonstrated so many times I am quite frankly puzzled why people/organizations feel the need to copy a plan to organize infrastructure, when more than likely they had exactly ZERO input to the process.

Google realized this when it created its infrastructure and came up with thier own plan.

Everyone should do the same, it isn't that hard really. Are people such simpletons they can't figure out what the best plan is for your organization to structure security and software updates that you have to refer to a bunch of non experienced, government bureaucrats who wrote ITIL?

I think not.

Re:ITIL (1)

FaxeTheCat (1394763) | about 3 months ago | (#46844563)

ITIL is a recommendation. As you mention, implemeting ITIL by the letter is simply wrong. But inventing a system from scratch is beyond the capability of most organizations (believe me, our company tried and failed miserably befoe adopting ITIL). Why should everybody reinvent the wheel?

Re:If it ain't broke... (2)

SJHillman (1966756) | about 3 months ago | (#46840477)

This is why I believe in half-assing everything. If it's only half done, you can only be half damned either way. Right?

Re:If it ain't broke... (1)

Lumpy (12016) | about 3 months ago | (#46841449)

My boss will be screaming at the vendor, they are not allowed to push any updates until we approve them.

Re:If it ain't broke... (1)

Charliemopps (1157495) | about 3 months ago | (#46843487)

My boss will be screaming at the vendor, they are not allowed to push any updates until we approve them.

Allowed? hahahaha...
Your vendor has a contractual obligation to... well... follow the contract. The contract does what the contract says... if they screw up, you get money off the bill. It doesn't stop their lead tech from going on a drunken binge after his wife leaves him and taking a golf club to the equipment.

Re:If it ain't broke... (1)

Lumpy (12016) | about 3 months ago | (#46849819)

What moron would sign a contract with a vendor like that, you have drooling morons working as management for your company?

Re:If it ain't broke... (5, Informative)

MrNemesis (587188) | about 3 months ago | (#46840209)

TBH, I suspect this is just getting publicity since it's the first super-dodgy HP firmware patch since they adopted their "no updates for YOU!" mentality - the explanation for which from HP was that they'd sunk a lot of money into their patching process and people shouldn't get to use it for free I guess. This won't be the last time this happens either.

As a sysadmin that's dealt with dozens of these "killer firmwares", there's often an indentified need. We make extensive use of the HP SPP's at work and they come with a list of fixes and known issues as long as your arm; it's part of my job to go through the advisories to see if we're at risk and if we are to analyse the risk of updating/not updating. Many of them aren't security vulns or emergency fixes and are often extremely obscure, but once in a while you'll encounter something like a NIC locking up on receiving a certain type of packet or the BIOS repeatedly saying a DIMM has failed when it hasn't, or if you mix hard drives with firmware X and firmware Y on RAID controller Z running firmware... er.. A it might drop the whole array... lots of little issues than can severely impact running systems if left unchecked. And then when you upgrade one component you'll frequently have to upgrade others to stay within the compatibility support matrix, until eventually you just run the damned SPP to make sure everything in that server is at a "known good compatible" level.

Sure, we don't just flash as if it were patch tuesday and no-one ever should - we wait for at least 2 months of testing on non-production boxes before we patch any prod kit with firmware unless it's an emergency fix - but lots of people use the HP SPP to automatically download the latest updates; we've had enough problems with them that we'd never do this (and in any case 97% of our servers have no net access). But the whole point of the SPP is meant to be that HP should have already done most of the regression testing for you.

That said, we've had nothing but trouble with Broadcom NICs for ages and I'm sure there's many admins here who have fond memories of the G6 blades, broadcom NICs, ESX and virtual connect from a few years back. Think HP switched much of their kit to Emulex after that debacle. Also, the latest web-based HP SPP (as opposed to the last one where you just ran a binary) is a complete train wreck on windows for ad-hoc updates, largely due to the interface being handed over to people who seemed to want to make it a User eXperience rather than a tool.

Re:If it ain't broke... (1)

NatasRevol (731260) | about 3 months ago | (#46840331)

Don't try to autodeploy ESXi on broadcom nics.

Why do you keep blinking RIGHT before you're going to be handed a DHCP address? And shockingly fail 45 seconds later.

Re:If it ain't broke... (0)

Anonymous Coward | about 3 months ago | (#46842925)

i've had similar troubles even with other hardware (non-hp, non-bcm)

some switches are absolutely abysmal with spanning tree. they see the dhcp broadcast
and put the port into listening mode for 3 years, and sometimes the switches schedule
plus the dhcp server's schedule are bad news for one another. at least the network is safe
from your server. :-) this can often be fixed with portfast, or edge mode on
the switch.

Re:If it ain't broke... (0)

Anonymous Coward | about 3 months ago | (#46840775)

This. HP has a history of this kind of thing. When they sacked most of the U.S. VMS developers and outsourced it to India, the next major release (V8.4) was horribly dodgy and buggy (for VMS, still a damn sight better than windows). It took months to get things corrected.

Then they blocked free patch/ECO access for VMS and started releasing patches that caused problems.

Now they move Proliant and Integrity firmware behind a paywall, and do it again.

Hopelessly Pathetic

Re:If it ain't broke... (1)

cthulhu11 (842924) | about 3 months ago | (#46855623)

HP's onboard NICs are Broadcom on the G8's.

Re:If it ain't broke... (1)

alen (225700) | about 3 months ago | (#46840231)

once or twice a year

what happens is that if you get a bad hard drive or something routine like that HP will make a big deal if your RAID firmware or HD firmware is not up to date or too many versions behind. so i run these service packs once or twice a year

Re:If it ain't broke... (1)

Rich0 (548339) | about 3 months ago | (#46842713)

once or twice a year

what happens is that if you get a bad hard drive or something routine like that HP will make a big deal if your RAID firmware or HD firmware is not up to date or too many versions behind. so i run these service packs once or twice a year

Well, that falls into the category of fixing things if they're broken (if you keep getting RAID failures, check your RAID firmware).

BUT...

Why the heck can't they get the firmware right in the first place? I appreciate the value in being able to update the firmware in the event of a rare problem. What I don't get is when the problems aren't rare. I ran into an HP server whose RAID kept failing drives until we updated the firmware on the RAID. For whatever reason the unpaid mdadm guys are able to build a software RAID controller that doesn't randomly fail drives.

I could see needing a firmware update to take advantage of some new SCSI standard that didn't exist when the controller was first made (though it would be optional since drives should support the older standards as well). There is really no excuse for buggy firmware.

If HP's firmware is buggy, and it causes HP to do unnecessary drive replacements, then that should be HP's problem. Sure, I can do them a favor and update the firmware, but there should be an acknowledgement that it was their initial incompetence that drove the need to do that.

Re:If it ain't broke... (0)

Anonymous Coward | about 3 months ago | (#46843565)

Why don't all those operating system providers get it right in the first place, or car makers, or hell, just about anything? Did you really, really just say that? Can you possibly be any dumber? No...but at least we know how to view your posts now.

Re:If it ain't broke... (1)

Rich0 (548339) | about 3 months ago | (#46843957)

Why don't all those operating system providers get it right in the first place, or car makers, or hell, just about anything?

Operating systems tend to be far more complex than RAID controllers. Generally speaking there is a higher expectation of right-first-time when it comes to hardware/firmware.

Sure, cars have recalls, but they tend to be expensive and rare.

Re:If it ain't broke... (0)

Anonymous Coward | about 3 months ago | (#46840333)

But plenty of admins would flash to the latest when they get a system to be deployed....

Re:If it ain't broke... (1)

Rich0 (548339) | about 3 months ago | (#46842723)

But plenty of admins would flash to the latest when they get a system to be deployed....

Sure, it only makes sense to start out with a current set of firmware if you haven't started testing the thing yet. If it breaks, I'd call that a warranty issue.

Re:If it ain't broke... (0)

Anonymous Coward | about 3 months ago | (#46840357)

Heartbleed scanning broke some iLOs by freezing them up, requiring power cycle.

Re:If it ain't broke... (1)

normaldotcom (1521757) | about 3 months ago | (#46840573)

...that might explain why my iLO locked up recently. It responds to pings, but is completely inaccessible from the web, ssh, and the host OS. Time to call my datacenter and have them power-cycle my server I guess...

Re:If it ain't broke... (0)

Anonymous Coward | about 3 months ago | (#46840933)

Power cycling is not enough. On servers you have to pull the power cord out, on blades you can do 'reset server X' on the OA, which uses the e-Fuse to do the same thing, but remotely. We only lost about 400 out of 4000 blades, but HP did come out with a new firmware yesterday.......

Re:If it ain't broke... (1)

normaldotcom (1521757) | about 3 months ago | (#46840833)

For anyone with ILO2-equipped servers that have frozen due to heartbleed vulnerability scanning, HP has released a (thankfully) free update to the ILO2 firmware to work around the issue. A physical power cycle of each server is required before the update may be applied, however. http://h20566.www2.hp.com/port... [hp.com]

Re:If it ain't broke... (1)

Rich0 (548339) | about 3 months ago | (#46842797)

Heartbleed scanning broke some iLOs by freezing them up, requiring power cycle.

Lovely. I occasionally run vulnerability scans that freeze up printers and the like.

The correct solution to this problem is for vendors to actually correctly implement protocols. The device should accept arbitrary data without locking up, short of that data including a valid password and an instruction to do something that is supposed to cause it to lock up.

Re:If it ain't broke... (1)

datapharmer (1099455) | about 3 months ago | (#46841157)

That is a terrible policy. I spent a long night at an office of a fortune 500 company for that very reason. They didn't see any reason to apply bios patches because they were just to add support for newer hardware, not to fix any sort of vulnerability. Fair enough. Several years went by and their terminal server had a processor go finicky on them. They determined the available spares included processors that were compatible. I asked "has the bios been updated to support the newer processors?" I was assured that they do regular patching and it would not be a problem. I arrive on site, install the new processors and get no post. A bit of troubleshooting and we determine it doesn't recognize the processors because the bios was out of date. Really long story shortened - we had to shutdown another server, pull the processors, install them in the problem server, boot, patch the bios, shut down move the processors back in the donor server, and then reinstall the new processors. Of course this was in a server room that was an overstuffed shoe box so a number of acrobatics were required to get the servers extended to a point they could be worked on.

So what should have been a 10-15 minute processor replacement ended up causing several hours of downtime and the unscheduled shutdown of another server.

Don't be lazy!

That said, as someone else stated, I usually wait a couple months to patch (especially HP) unless it is considered a critical issue or I have a straightforward fail-over plan. HP has screwed my arrays etc. more than once with their quality updates.

Re:If it ain't broke... (1)

Rich0 (548339) | about 3 months ago | (#46842767)

So, the server failure shouldn't be a big deal since everything is redundant in the first place. Then, anybody maintaining servers for which downtime is critical should have the necessary replacement parts just ready to go. They shouldn't be relying on other production servers to do configuration/etc work.

I don't think the solution to these problems is to just always keep reflashing firmware.

Re:If it ain't broke... (0)

Anonymous Coward | about 3 months ago | (#46841937)

A few years back, I made the mistake of updating the bios on my main system and bricked it and I learned my lesson the hard way. I refuse to update bios or drivers unless I am actually experiencing a problem that the upgrade fixed. Otherwise, I stick with stock firmware and such as much as possible because of the potential of bricking things. Another reason is that most times the damn drivers on the disk tend to be more stable then the update is.

Re:If it ain't broke... (0)

Anonymous Coward | about 3 months ago | (#46843499)

You and emil work at the same place don't you? Neither of you understand firmware releases and reasoning. You should not do them, ever. Safer that way.

Re:If it ain't broke... (1)

FaxeTheCat (1394763) | about 3 months ago | (#46844109)

If our company is typical, we update firmware only when needed.
Most servers keep most of the firmware for the entire lifespan.
An exception would be if the OS is upgraded (but even that is unusual).
A common exception is the HP ILO controllers. One reason we sometimes update them is that there are real improvements, ad it does not require downtime.

This would be why.. (1)

bravecanadian (638315) | about 3 months ago | (#46840093)

You don't flash firmware unless it is for an important issue.

Or at least not until it has been out quite some time so that other people have done your testing for you.

Re:This would be why.. (4, Insightful)

oodaloop (1229816) | about 3 months ago | (#46840319)

You don't flash firmware unless it is for an important issue. Or at least not until it has been out quite some time so that other people have done your testing for you.

Your advice isn't really a general solution if, in order for it to work for anyone, some people must not follow it.

Re:This would be why.. (1)

gstoddart (321705) | about 3 months ago | (#46840689)

Your advice isn't really a general solution if, in order for it to work for anyone, some people must not follow it.

And companies which choose to make themselves test subjects to allow the rest of us to wait for the dust to settle must live with the consequences.

In some organizations, they are willing to assume the risk. In other organizations, not so much.

There will always be companies who fall on the side of bleeding edge, and companies who fall on the side of a lot more caution.

And for those of us who fall on the side of caution, to the ones on the bleeding edge, we say good luck, and thanks. ;-)

I've known (of) people who will patch a production system in the middle of the day with very little notice. I figure those people are either small and open to risk. Me, if I have a patch for production, it's going to take me better part of a month to go through the process.

And, having worked on systems where lives could be at stake, I will stick with a more conservative approach.

What's really terrifying is when someone who is risk friendly ends up in a shop which is risk averse, and has to be reined in so they understand "no, that's not one of our options, and if you do it you'll likely be sacked". For some of us, the cowboy is someone who needs to be retrained, and who doesn't understand the stakes involved. Because they stand a good chance of doing a fair bit of damage if they can't be made to understand the severity of the issue.

Re:This would be why.. (1)

bill_mcgonigle (4333) | about 3 months ago | (#46841165)

Me, if I have a patch for production, it's going to take me better part of a month to go through the process.

That's going to limit your nimbleness and opportunity. Obviously that's a business decision that can go either way.

And, having worked on systems where lives could be at stake, I will stick with a more conservative approach.

If your architecture will put lives at risk when a bad firmware update for a subset of the servers comes in, then it's too brittle. If safety is paramount, fixing that should be pretty close to the top of the list.

Re:This would be why.. (1)

gstoddart (321705) | about 3 months ago | (#46842445)

That's going to limit your nimbleness and opportunity.

Well, in some industries, there is little opportunity in nimbleness. In some, risk is OK and pays off in terms of what that gets you. Depends on what you do, and what the systems are used for.

If your architecture will put lives at risk when a bad firmware update for a subset of the servers comes in, then it's too brittle.

Who said anything about architecture? I'm talking about end-to-end systems. If a production system went offline because the NICs were all hosed, and even a few hours of outage can have a run on effect on company operations, that's a pretty significant thing.

Do you know where I learned rigor in change management? On systems which scheduled maintenance on aircraft. You can't go to the FAA and say "ooops, we had a little issue and might have forgotten to re-attach the wings".

I currently work in another heavily regulated industry, where outages can be very costly and have legal implications. If you stop production of the company's core business, you quickly rack up large losses. I'm not prepared to be the one to cause that from lack of due diligence.

I work in environments where risk tolerance can be exceedingly small, and unless the risk of applying a steaming fresh patch outweighs the overall risks, it can wait until it's been tested properly. So, we apply changes in a lab, then we apply in smaller test systems, then we move to full scale test system, then we move on to the Production instances.

We have systems which we can be a little more 'adventurous' with because they're not mission critical. But, for things which are mission critical, we apply a level of rigor which some people would find oppressive.

If your systems touch your finance, your production, or anything involving human safety, the cost of fixing can outweigh the cost of testing very fast, and you test the bejeezus out of it.

Me, I come down heavily on the side of managing the hell out of risks, minimizing as much as you possibly can, and making damned sure the decision makers understand them. And you know what? My employers periodically say "in this case we're okay with it, you're being too paranoid this time". But generally they understand precisely why I do that, and even if I periodically sound like a broken record, they accept that is part of my job. They understand I'm doing it to maximize chance of success and minimize disruptions.

I knew a developer once who realized a bug in his code, and applied a "quick fix" to a live environment, despite having previously been told that was almost a hanging offense. And then it was a much larger undertaking to correct the issue and put things back to where they needed to be.

I knew a sysadmin once who would apply changes to environments in the middle of a workday without telling anybody, and then all of a sudden people start popping up from their cubicles looking around to see if anybody else suddenly can't reach something. In a few cases, an entire small office was dead in the water because of this until it could be fixed.

It all comes down to "what is this system used for, and what are the consequences of an outage". Not every system has the same risks and costs associated, but it is important to know which ones do.

Isn’t this one limited anyway? (3, Funny)

Movi (1005625) | about 3 months ago | (#46840111)

Aren’t they also the ones who limit their firmware updates only to customers who have support contracts? I guess you get what you pay for..

Re:Isn’t this one limited anyway? (1)

jaak (1826046) | about 3 months ago | (#46840359)

Actually, HP offers free lifetime warranties on a lot of their professional networking gear (this warranty includes support, software upgrades, even includes fans/power supplies and is transferable). All in all, it's a pretty good deal.

Re:Isn’t this one limited anyway? (0)

Anonymous Coward | about 3 months ago | (#46840855)

And a proliant server is considered networking gear? in what reality is that?

as I recall the proliant server line is as the GP stated updates only for companies with support.

HP Networking stuff is an entirely different line of business.

Well at least it only affects paying customers ;) (3, Funny)

Hohlraum (135212) | about 3 months ago | (#46840151)

http://slashdot.org/story/14/02/05/0258244/hp-to-charge-for-service-packs-and-firmware-for-out-of-warranty-customers

Yes, Hewlett Packard. A Genuine Legend. (1)

SpzToid (869795) | about 3 months ago | (#46840161)

Known for reliable oscillators and calculators, and then they made a line of laser printers that lasted for a while; great engineers behind all that stuff too. Yes, I remember them. How are they doing now post-Carley? (HP's calculators put Rockwell's to shame. I can still remember the Rockwell jingle from over the radio, "big green numbers, and little rubber feet.").

Re:Yes, Hewlett Packard. A Genuine Legend. (2)

kevmatic (1133523) | about 3 months ago | (#46841627)

I imagine they're doing fine, working for the company you're talking about, Agilent.

Make no mistake: The only thing HP has to do with the company that was founded in the 40s is the name. The company that Bill Hewlett and Dave Packard founded still exists, making great stuff. Its just called Agilent. And they still support scopes and multimeters that say HP on them.

Re:Yes, Hewlett Packard. A Genuine Legend. (1)

petermgreen (876956) | about 3 months ago | (#46842279)

I remember reading recently that Agilent were planning to split again so those scopes and multimeters will get yet another new badge on them.

i got hit by this (3, Interesting)

alen (225700) | about 3 months ago | (#46840205)

didn't brick my server but it screwed up the device list in Windows and caused a cluster not to see the one node where i upgraded the drivers/firmware. put a null device driver into the device manager and i had to delete it and all was OK. just $250 to MS to figure this out since didn't think it was a HP issue

on the server the network worked and all but the NIC's weren't "seen" by Windows and so the clustering was screwed up

Re:i got hit by this (2)

TechyImmigrant (175943) | about 3 months ago | (#46842921)

You're running Windows on a server?

The world is a stranger place than one can imagine.

Re:i got hit by this (0)

Anonymous Coward | about 3 months ago | (#46843561)

Hmm...don't you think that is a bit silly comment to make? Windows is being run on a massive amount of servers and provides the users great value. Especially with modern tools such as Azure and PowerShell.

Re:i got hit by this (1)

TechyImmigrant (175943) | about 3 months ago | (#46844087)

>don't you think that is a bit silly comment to make?

That was the intent.

Re:i got hit by this (0)

Anonymous Coward | about 3 months ago | (#46859157)

didn't brick my server but it screwed up the device list in Windows

You can blame Microsoft for that. NIC enumeration is broken by design in Windows. It isn't based on something predicable like PCI bus order, and it doesn't use MAC addresses to maintain associations between physical NIC's and logical Windows networks.

Oh, HP... (2, Funny)

Greyfox (87712) | about 3 months ago | (#46840297)

It's probably a mercy killing. Some of those poor servers were probably forced to run HP/UX. I'd want to die, if I had to run HP/UX...

Ah, this will help network throughput...... maybe. (2)

data plumber (1696008) | about 3 months ago | (#46840561)

As a network engineer, I can see being involved in arguments between the server platform support teams (read: off-shore) and the network engineering teams (read: on-shore). It'll be like this; "we need network support on a call" "Hello. what's wrong?" "The entire network is down for everyone!!!!! You need to fix this!!!!! The support we get from you is horrible!!!! AAHHHHhhhhhhh!!!!!!" "OK. What changed? What was being done at the time the entire network disappeared for everyone?" "we (15 people on the call - it apparently takes that many) were doing nothing (to do nothing)." "OK, well, I'm on the cores and I can see a lot of traffic, other servers, the outside world etc. you need to define the "everything being down" part." "well, we were in the middle of doing a firmware update on server xxx01 and...." "OK, so, you lied to me about doing nothing. what did you update?" "the NIC card to improve performance for..." "And now you're wondering why the network is down..." It'll go this way for some time until the next couple of layers of management get involved....... lots of yelling, me sending pictures of the network working I should write a script for this call. I know it'll be coming.

happened to me, yesterday (0)

Anonymous Coward | about 3 months ago | (#46840581)

i had a servers network die on me yesterday.
did a reinstall, and as usualy used the latetest SPP to upgrade everything before putting the DL380G7 back in production.
after the flash, no network anymore on the onboard dual network cards.

Always have a fallback ready (0)

Anonymous Coward | about 3 months ago | (#46840693)

This is about as bad as it gets, but still should not be the end of the world.

Always have a reasonably quick way to revert if stuff like this happens.

[I know none of the sysadmins in here need to hear this...]

Re:Always have a fallback ready (1)

HiThere (15173) | about 3 months ago | (#46845455)

Pardon me, but I've never flashed firmware, but ... "Always have a reasonably quick way to revert if stuff like this happens."? How is this possible on a firmware upgrade?

What a non story (0)

Anonymous Coward | about 3 months ago | (#46840819)

Company X releases patch, patch then retracted, how did this make the front page?

Re:What a non story (0)

Anonymous Coward | about 3 months ago | (#46844709)

As a warning to people here who might want to know about it. Updates that kill servers is both news for nerds and stuff that matters.

Testing updates? (2)

Hamsterdan (815291) | about 3 months ago | (#46840859)

Don't manufacturers test their updates? it's not like they couldn't keep some of the stuff they sell for said testing...

Re:Testing updates? (1)

dont_jack_the_mac (2882103) | about 3 months ago | (#46841225)

Yes, but management will ship with known bugs. Development will keep developing until the very end making it impossible for QA to keep up.

Users (0)

Anonymous Coward | about 3 months ago | (#46844203)

That's what users are for.

I'm confused now. (1)

Minwee (522556) | about 3 months ago | (#46840969)

Is the article suggesting that the Broadcom NICs that HP used in the old Proliants actually _did_ work before this update?

That goes against years of experience in the field with those things.

what money cant do (1)

Dimitry G. Verkholashin (3589231) | about 3 months ago | (#46841019)

With HP only one problem persists. Namely, it is what the money cant do.

Ouch, not good, but not surprising either (1)

ErichTheRed (39327) | about 3 months ago | (#46841201)

This is one of the first rules of administering servers -- unless it's an absolute necessity, let someone else find these firmware bugs.

This is especially true now that firmware controls so much in modern hardware. I've had business PCs that have gone through more than 10 EFI revisions in their 18 month lifecycle, and all the release notes show that they fix surprisingly low level things.

The unfortunate trend is that these firmware bugs are more and more prevalent. It seems like manufacturers are skimping on QA and testing. I'm not surprised that HP is affected -- their maintenance applications and documentation look like it's now written by an offshore team. So, I wouldn't be surprised if the EEs and SEs sitting in Houston have to write specs and have their offshore counterparts hack up the firmware changes. Worse, since they're getting the NICs from Broadcom, it's engineers --> offshore team --> Broadcom --> Broadcom's offshore team, making it even more likely that confusion will be introduced.

Dell has the same issue with broadcom (0)

Anonymous Coward | about 3 months ago | (#46841389)

Funny enough Dell has the same issue. Using their "lifecycle controller" or other in OS methods to update the broadcom firmware will disable the network card. The only supported way to do the upgrade is the boot from a Dell provided iso image OMSA live cd, then put the firmware on a USB (or virtual floppy) and install from there. I tried with the lifecycle controller and the interface flat out vanished. They said there is some sort of recovery procedure involving downgrading back to the previous firmware and then upgrading again but I havent tried it yet.

Not again! (1)

wezelboy (521844) | about 3 months ago | (#46842761)

Their last update bricked broadcom blades of the G1 variety.

Why this happened (0)

Anonymous Coward | about 3 months ago | (#46859393)

TL;DR version: A perfectly good firmware update was turned into a disaster by a Program Manager checking the wrong boxes on a release form.

Long version...

(1) The firmware is developed by Broadcom. HP doesn't have the source code. They may not even have anybody left on their staff who would be qualified to work on it (HP laid off most of their NIC engineering team years ago).

(2) HP uses a lot of different Broadcom NIC's in various ProLiant servers. This firmware update was intended for a subset of those servers, and was tested on that intended subset.

(3) HP is responsible for packaging the firmware for release. Somebody screwed up during that process, and set the package metadata so the update got installed on a much wider range of servers.

Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?
or Connect with...

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>