Home > SAN > Brocade DCX webtools authentication problem

Brocade DCX webtools authentication problem

Recently we had some problems with DCFM suddenly marking all virtual switches on all of our DCX directors with a nice tag of “Product status unknown”. Solving it was not hard to do in the end, but it took some time going through support and all. In this post I will explain how the problem looked like, and what the solution was.
On day one all virtual switches defined on one of our DCX’s, and the chassis, were marked with this unkown status. Since we use command line for most of the time, were busy with other things at that time and it didn’t disrupt operation, we didn’t directly look in to the problem. The next day another director did the same, and the day after that another two did it too. DCFM was constantly spitting out messages that it had security login violations. The day after the first one revealed this problem, we started looking into it, and found this errors in the logs:
datestamp, [FW-1342], 23858, SLOT 6 | FID id, WARNING, vFabricName, Sec Login Violation, is above high boundary(High=2, Low=1). Current value is 6 Violation(s)/minute.
datestamp, [SEC-1193], 23859, SLOT 6 | FID id, INFO, vFabricName, Security violation: Login failure attempt via HTTP. IP Addr: violating ipaddress
datestamp, [SEC-1193], 23860, SLOT 6 | FID id, INFO, vFabricName, Security violation: Login failure attempt via HTTP. IP Addr: violating ipaddress
datestamp, [SEC-1193], 23861, SLOT 6 | FID id, INFO, vFabricName, Security violation: Login failure attempt via HTTP. IP Addr: violating ipaddress
datestamp, [SEC-1193], 23862, SLOT 6 | FID id, INFO, vFabricName, Security violation: Login failure attempt via HTTP. IP Addr: violating ipaddress

The violating ipaddress was the address of the DCFM server. So I fired up a browser from my workstation and connected to webtools. Webtools showed up fine with the authentication screen:

So we use the user which is defined for DCFM, and it starts authenticating:

And after a second or two we get an invalid password error:

This happens starting Webtools from DCFM server, from workstations, from every vlan, and with every user we tried. We’re not using RADIUS authentication, this is normal local switch authentication. All of the tried users are working if you use them logging in through SSH. My guess at that time was that the authentication between the http server on the directors and the local switch database was broken, due to a bug. Contacted a Brocade engineer directly, he made some calls but no one had ever seen this strange behaviour. Logged a call with IBM support (directors are OEM’d by IBM) and then the hassle of logs and dumps sending and answering your standard L1 questions all came by. They purely focussed on DCFM losing it’s password in the discovery setup screen. To me it was obvious why it was gone there, DCFM was told it was an invalid user, so it clears the field. IBM L1 support however was persisting this was where the problem was. After persuading them to dispatch the call to Brocade, things sped up a bit.

First bullet on the action plan was to upgrade our Java plugin. For FOS 6.3.0 the plugin should be at least at 1.6.0.13 or later. Of course that didn’t work because it already was at 1.6.0.13. After telling them the complete story, apparently things got dropped in the conversation between IBM and Brocade, they came up with a HA failover. Effectively rebooting the CTP’s.

So we did:

SwitchName:vFabricName:username> hashow
Local CP (Slot 6, CP0): Active, Warm Recovered
Remote CP (Slot 7, CP1): Standby, Healthy
HA enabled, Heartbeat Up, HA State synchronized
SwitchName:vFabricName:username> hafailover
Local CP (Slot 6, CP0): Active, Warm Recovered
Remote CP (Slot 7, CP1): Standby, Healthy
HA enabled, Heartbeat Up, HA State synchronized
Warning: This command is being run on a redundant control processor(CP)
system, and this operation will cause the active CP to reset.
Therefore all existing telnet sessions are required to be restarted.

Are you sure you want to fail over to the standby CP [y/n]? y
Forcing Failover ...

SwitchName:vFabricName:username> hashow
Local CP (Slot 7, CP1): Active, Warm Recovered
Remote CP (Slot 6, CP0): Non-Redundant
SwitchName:vFabricName:username> hashow
Local CP (Slot 7, CP1): Active, Warm Recovered
Remote CP (Slot 6, CP0): Standby, Healthy
HA enabled, Heartbeat Up, HA State not in sync
SwitchName:vFabricName:username> hashow
Local CP (Slot 7, CP1): Active, Warm Recovered
Remote CP (Slot 6, CP0): Standby, Healthy
HA enabled, Heartbeat Up, HA State synchronized
SwitchName:vFabricName:username>

After this we tried Webtools, and it worked. DCFM picked it up immediately, without any changes it discovered the failed over DCX. Interesting to see however was the fact in discovery setup the password was still blanked for this DCX, although it was re-discovered automatically. Just filled the password in there again, and it accepted it. Field is now filled.

Problem solved. Although there’s no answer from Brocade yet explaining why this happened. Expecting a note in upcoming releasenotes somewhere :)

FIY:
This happened on:
Brocade DCX running FOS code 6.3.0b
DCFM 10.3.3 build 11

Categories: SAN Tags: , , ,
  1. Jesse
    February 24th, 2011 at 22:03 | #1

    Just wanted to say I just came across this exact situation in our environment. The brocade support guy really had no clue about what the issue was, nor did he really have a clue period. Anyways, I am glad I came across this because we have been scratching our head all week. We are going to do the failover next wednesday. I am curious though, if you failed back to the original CP, would the issue still exist?

    • February 24th, 2011 at 22:11 | #2

      Glad I helped you out with my post :)
      You can safely failover back to the CP which had the problem. It’ll work fine after the initial failover. If you failover, the active CP is rebooted after the standby becomes the active one and that solves the problem. We tested that and it worked fine.

  2. Jesse
    February 25th, 2011 at 18:03 | #3

    Awesome. I think this is the only post on the internet about this problem. Hopefully it will help others.

  3. July 7th, 2011 at 22:02 | #4

    Can u please tell me how much time is required to failover the CP Blade?

    • UltraSub
      July 7th, 2011 at 22:08 | #5

      After entering after a second or 5 your ssh (or Webtools) session drops. Don’t worry, the ip is being switched. There is no stop of data flowing. None at all. When it drops, you can immediately reconnect to the same ip. After a second or 10-20 all is in sync again.

  4. DatacenterMgr
    September 6th, 2011 at 18:38 | #6

    Just found your post. Had the exact same problem with our DCX. Brocade didn’t have a clue what was going on! Thanks for the help, you saved me from a week or more of no sleep!

  5. UltraSub
    September 6th, 2011 at 19:28 | #7

    Glad I could be of any help. :)

  6. phil
    February 8th, 2012 at 22:56 | #8

    Have the same exact problem on a 7500B router. running fos 6.1.0d. I’m gladd someone has figgured this out. been an issue for months now.

  1. No trackbacks yet.