Archive for February, 2010

OpsMgr 2007 DMZ-Based Agents Fail to Report to Gateway with Event ID 20070

February 19th, 2010

We recently noticed most of our DMZ-based OpsMgr agents were not connecting to their gateway server.  On the agent we saw the following event:

Event Type:          Error
Event Source:       OpsMgr Connector
Event Category:    None
Event ID:              20070
Computer:            <Computer>
Description:          The OpsMgr Connector connected to <domain>, but the connection was closed immediately after authentication occurred.  The most likely cause of this error is that the agent is not authorized to communicate with the server, or the server has not received configuration.  Check the event log on the server for the presence of 20000 events, indicating that agents which are not approved are attempting to connect.

On the gateway server the following event was being logged:

Event Type: Information
Event Source:        OpsMgr Connector
Task Category: None
Event ID:      20000
Description: A device which is not part of this management group has attempted to access this Health Service.  Requesting Device Name : <computer>
The strange thing is that these agents had been working fine, and a few still were working!  We checked the usual things and did the usual recovery steps:
  • Is TCP 5723 open to the gateway server?
  • Restart the HealthService
  • Is agent in Pending Management?
  • Restart the HealthService
  • Wait 5 minutes
  • Restart the HealthService

All to no avail.  We found http://blogs.technet.com/operationsmgr/archive/2009/02/17/opsmgr-2007-port-requirements-for-scom-agents-in-a-dmz.aspx which suggested opening ports 88 and 389 from the agent to the RMS. This did not make sense to us since some agents were working.  So we used Netmon 3.3 to trace the client while the HealthService starts.  It never used any port but 5723.

We even enabled verbose diagnostic tracing (http://support.microsoft.com/kb/942864) and reviewed the logs.  We saw where the 20070 event was being generated but not much interesting besides that:

5412.5956::02/19/2010-10:46:56.978 [Common] [] [Verbose] :Common::EventLogUtil::LogEvent{EventLogUtil_cpp311}Logging error event 20070 with args “<servername>”, “NULL”,”NULL”, “NULL”, “NULL”, “NULL”, “NULL”, “NULL”, “NULL”

5412.5956::02/19/2010-10:46:56.978 [Common] [] [Information] :Common::EventLogUtil::LogEvent{EventLogUtil_cpp397}Logging event 20070 from source “OpsMgr Connector” with severity Error and description “The OpsMgr Connector connected to <GatewayServer>, but the connection was closed immediately after authentication occurred.  The most likely cause of this error is that the agent is not authorized to communicate with the server, or the server has not received configuration.  Check the event log on the server for the presence of 20000 events, indicating that agents which are not approved are attempting to connect.”.

Solution…

We finally had to call Microsoft.  After about 30 minutes of troubleshooting the engineer saw that the OpsMgrConnector.Config.xml file in the C:\Program Files\System Center Operations Manager 2007\Health Service State\Connector Configuration Cache\<MgmtGrpName> folder on the gateway server was last modified several weeks ago.  He had us rename the Health Service State folder under C:\Program Files\System Center Operations Manager 2007 and restart the HealthService.  After this a new Health Service State folder was created and the OpsMgrConnector.Config.xml had a much more current last modified date.  We then restarted the HealthService on the agents and they reported in to the gateway server correctly.