WSRT stop-day activities August 15, 2018


Coordinator Teun Grit roadmin@astron.nl
Software Support Boudewijn hut@astron.nl
Science, Operations and Support None
Observer Jur Sluman observer@astron.nl

Actions

  • Update & reboot of all systems (incl LCU's and data writers)
  • CentOS 7 systems will be updated and rebooted using SpaceWalk (Jasmin)
  • SLES11_SP4 systems will be updated by Teun
  • We wil NOT reboot wcudata1 (no update)
  • We will update wcudata2 and reboot (Ubuntu 14.04 LTS)
  • We will update lcu-rt2 first, then test it and then update the rest of the lcu's
  • Enable write cache of the RAID controller of wop85
  • Try to isolate the memory errors on wop61 (switch off ECC in BIOS and run mem test) (There is a spare system, where you can take the memory from)
  • Connect lcu-rt0 (both IPMI and eth0), Install Ubuntu 14.04 latest, run Ansible playbook

Results

All the systems above were updated and rebooted. We ran into several issues:

  • wop59 Supervisor did not start. Reinstall did work. Started by hand (all systems), although it was “enabled”
  • Hypervisors: ZFS needed to reinstalled, since the latest kernel includes some ZFS parts.
  • wop63, wop75: Network config needed to be changed. It was starting bridge one first (“br1”)
  • wop54: The disk check to quite some time
  • wop61: We tested the memory with “ecc off”. A number of errors appeared. (We have spare memory for this one)
  • wop61: Added Zabbix check on memory errors
  • lcu-rt2..lcu-rtd: Supervisor does not start after reboot. Started by hand, although it was enabled
  • lcu-rt2..lcu-rtd: qpidd reinstalled with “apt-get -y install –reinstall qpidd” with parallel shell
  • jip updated and rebooted after 489 days
  • After LCU update one needs to start the qpidd federation on ccu-corr

For a complete overview, see https://www.astron.nl/wsrt/wiki/doku.php?id=cni:workstations&#at_westerbork


QR Code
QR Code public:stopdayactivities_170818 (generated for current page)