Earlier this week I configured an Juniper SRX210 for testing. The configuration consisted of several security zones, IDP, UAC (layer3 enforcement) and Application Firewall and Identification. The Junos version I used was JUNOS 12.1X46-D15.3.
This setup worked until today. Today, the SRX was unresponsive. No ICMP reply, no SSH access, nothing. Accessing the SRX via the serial console showed me the Amnesiac login. This means that the configuration is gone. At least the configuration I created was reset to the factory defaults config. A typical WTF!!! moment.
Fortunately, I had configured logging to an external source (Splunk). So I went to investigate. Turned out that the SRX stopped sending syslog messages around 01:30PM. Further investigation showed that the config was actually reset (UI_FACTORY_OPERATION event), and checking the event-codes, it was (probably) done by pressing the reset button on the device.
(the following logging should be read bottom-top)
May 28 13:28:30 10.0.0.1 1 2014-05-28T13:28:30.808+02:00 srx210 mgd 6211 UI_COMMIT_PROGRESS [junos@2636.1.1.1.2.36 message="finished copying juniper.db to juniper.data+"] May 28 13:28:30 10.0.0.1 1 2014-05-28T13:28:30.369+02:00 srx210 mgd 6211 UI_COMMIT_PROGRESS [junos@2636.1.1.1.2.36 message="copying juniper.db to juniper.data+"] May 28 13:28:30 10.0.0.1 1 2014-05-28T13:28:30.369+02:00 srx210 mgd 6211 UI_COMMIT_PROGRESS [junos@2636.1.1.1.2.36 message="finished loading commit script changes"] May 28 13:28:30 10.0.0.1 1 2014-05-28T13:28:30.368+02:00 srx210 mgd 6211 UI_COMMIT_PROGRESS [junos@2636.1.1.1.2.36 message="no transient commit script changes"] May 28 13:28:30 10.0.0.1 1 2014-05-28T13:28:30.368+02:00 srx210 mgd 6211 UI_COMMIT_PROGRESS [junos@2636.1.1.1.2.36 message="no commit script changes"] May 28 13:28:30 10.0.0.1 1 2014-05-28T13:28:30.367+02:00 srx210 mgd 6211 UI_COMMIT_PROGRESS [junos@2636.1.1.1.2.36 message="start loading commit script changes"] May 28 13:28:30 10.0.0.1 1 2014-05-28T13:28:30.133+02:00 srx210 rmopd 1350 PING_TEST_COMPLETED [junos@2636.1.1.1.2.36 test-owner="XS4ALL" test-name="testsvr"] May 28 13:28:30 10.0.0.1 1 2014-05-28T13:28:30.355+02:00 srx210 mgd 6211 - - auto-snapshot is not configured May 28 13:28:28 10.0.0.1 1 2014-05-28T13:28:28.814+02:00 srx210 mgd 6211 UI_LOAD_JUNOS_DEFAULT_FILE_EVENT [junos@2636.1.1.1.2.36 pathname="/etc/config//srx210h-defaults.conf"] May 28 13:28:27 10.0.0.1 1 2014-05-28T13:28:27.823+02:00 srx210 mgd 6211 UI_LOAD_JUNOS_DEFAULT_FILE_EVENT [junos@2636.1.1.1.2.36 pathname="/etc/config//jsrxsme-series-defaults.conf"] May 28 13:28:27 10.0.0.1 1 2014-05-28T13:28:27.217+02:00 srx210 mgd 6211 UI_LOAD_JUNOS_DEFAULT_FILE_EVENT [junos@2636.1.1.1.2.36 pathname="/etc/config//junos-defaults.conf"] May 28 13:28:26 10.0.0.1 1 2014-05-28T13:28:26.808+02:00 srx210 mgd 6211 - - WARNING: activating factory configuration May 28 13:28:25 10.0.0.1 1 2014-05-28T13:28:25.378+02:00 srx210 mgd 6211 - - WARNING: removing all configurations May 28 13:28:25 10.0.0.1 1 2014-05-28T13:28:25.232+02:00 srx210 mgd 6211 UI_FACTORY_OPERATION - May 28 13:28:25 10.0.0.1 1 2014-05-28T13:27:35.000+02:00 srx210 rmopd 1350 PING_TEST_COMPLETED [junos@2636.1.1.1.2.36 test-owner="XS4ALL" test-name="testsvr"] May 28 13:27:19 10.0.0.1 1 2014-05-28T13:26:39.941+02:00 srx210 - - - - last message repeated 11 times May 28 13:17:19 10.0.0.1 1 2014-05-28T13:16:34.309+02:00 srx210 - - - - last message repeated 11 times May 28 13:07:19 10.0.0.1 1 2014-05-28T13:06:28.346+02:00 srx210 rmopd 1350 PING_TEST_COMPLETED [junos@2636.1.1.1.2.36 test-owner="XS4ALL" test-name="testsvr"] May 28 13:05:33 10.0.0.1 1 2014-05-28T13:05:33.297+02:00 srx210 rmopd 1350 PING_TEST_COMPLETED [junos@2636.1.1.1.2.36 test-owner="XS4ALL" test-name="testsvr"] May 28 13:04:38 10.0.0.1 1 2014-05-28T13:04:38.248+02:00 srx210 rmopd 1350 PING_TEST_COMPLETED [junos@2636.1.1.1.2.36 test-owner="XS4ALL" test-name="testsvr"] May 28 13:04:19 10.0.0.1 1 2014-05-28T13:04:19.548+02:00 srx210 utmd 1380 AV_PATTERN_UPDATED [junos@2636.1.1.1.2.36 version="05/28/2014 12:36 GMT, virus records: 522178" file-size="18635751"] May 28 13:03:42 10.0.0.1 1 2014-05-28T13:03:42.934+02:00 srx210 rmopd 1350 PING_TEST_COMPLETED [junos@2636.1.1.1.2.36 test-owner="XS4ALL" test-name="testsvr"]
This is strange, since there was no one around at the time. So it must have been some sort of bug that caused this.
Thankfully I had a backup of the config, so the device was up and running again within 10 minutes. So now I have to keep an out out for this. Especially the next couple of days.
UPDATE: There's is definitely something wrong with the hardware. I tried different Junos versions (also stable recommended versions), and different configs, but for some reason the device detect a 'Config button pressed' event and goes back to the default factory config. This happens within 12 hours.
The device keeps going back to the factory default config. Today I changed the behavior of the reset button. The reset button doesn't react to physical interactions when adding the following line to the config:
root@srx210# set chassis config-button no-clear no-rescue
Let's see if that helps. If it does, it means that the hardware reset button (mechanism) is malfunctioning.
UPDATE 2: looks like the config button config did the trick (so far). The device is still up-and-running for nearly 24 hours. -keepingfingerscrossed-
UPDATE 3: and we have a winner. The SRX210 is still operational. Just need to remember to add the command when I reconfigure it. Perhaps a sticker on top as a visual reminder :-)