This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Last revision Both sides next revision | ||
projects:bpm-sis18:status [2012/07/11 17:13] rhaseitl |
projects:bpm-sis18:status [2012/07/12 19:00] rhaseitl |
||
---|---|---|---|
Line 2: | Line 2: | ||
Errors occurring sporadically: | Errors occurring sporadically: | ||
- | * the Liberas loose their connection: they appear red in the detailed status panel, giving the status " | + | * the Liberas loose their connection: they appear red in the detailed status panel, giving the status " |
In at least one case, I also had to restart the FESA classes on the CCCPs to make the system working again. | In at least one case, I also had to restart the FESA classes on the CCCPs to make the system working again. | ||
- | This happens sometimes during beamtime. Or after the system was not used for a while and is started again. | + | This happens sometimes during beamtime. Or after the system was not used for a while and is started again (= the GUI was closed for a while and is started again). |
- | (we could also include other errors: | + | |
- | Debugging ideas: | + | |
- | Every idea to track down the problems are appreciated. | + | |
- | == Suggestions: == | + | Other errors which could be related: |
+ | * the saving of raw data sometimes leads to a timeout | ||
+ | * aux BPM confuses the whole system (Start and Stop is triggered "out of itself" | ||
+ | * low system performance (switching of mode lasts several seconds) | ||
+ | |||
+ | == Debugging ideas: == | ||
+ | Suggestions (Every idea to track down the problems are appreciated): | ||
* add debug output into the Libera generic servers and the FESA Classes (see below) | * add debug output into the Libera generic servers and the FESA Classes (see below) | ||
* make testcases like: set a defined set of calibration values and check if they are set in the generic server/FPGA registers | * make testcases like: set a defined set of calibration values and check if they are set in the generic server/FPGA registers | ||
+ | \\ | ||
Log in the FESA classes: | Log in the FESA classes: | ||
* version number on startup | * version number on startup | ||
* connection to BPM established/ | * connection to BPM established/ | ||
* debug output at every status change of the system (Initializing, | * debug output at every status change of the system (Initializing, | ||
+ | * logging should use the Log4j framework (from within the FESA class possible with SDLog (HBr)) | ||
+ | * the GUI should **not** encapsulate exceptions thrown by cmw / rda into its own Exception class (HBr) | ||
+ | * a detailed documentation of the meaning and reasns for each error message, exception etc. should be made (HBr) | ||
+ | \\ | ||
Log on the generic servers (with timestamps!): | Log on the generic servers (with timestamps!): | ||
* version number (or similar) at startup | * version number (or similar) at startup | ||
- | * internal register values (when changed, on start, on stop) | + | * internal register values (when changed, on start trigger, on stop trigger) |
* when a start or stop trigger arrives | * when a start or stop trigger arrives | ||
* when the ring buffer is full | * when the ring buffer is full | ||
- | * operating mode (raw, bunch to bunch) | + | * operating mode (raw, bunch to bunch, calibrations, |
* log any other useful events | * log any other useful events | ||
+ | * log buffer overflows | ||
+ | * there seem to be logs on the liberas under /var/log. But without timestamps. When a separate network for the Liberas is used, the time can't be queried from a global NTP server. -> Setup a " | ||
+ | \\ | ||
Network considerations: | Network considerations: | ||
* put the Liberas into a own network (not the GSI/ACC network) | * put the Liberas into a own network (not the GSI/ACC network) | ||
* bootfile on CCCPs, static IPs, nfs mount to store debug logs | * bootfile on CCCPs, static IPs, nfs mount to store debug logs | ||
+ | * is this a lot of work? does it require changes in the gen servers / FPGA code? | ||
+ | |||
+ | \\ | ||
+ | Connection to the PTIF: | ||
+ | * Display the connection status and if a command which has been sent, was " | ||
+ | |||
+ | Have a standalone tool to see ALL system components directly: FESA server classes, CCCPs, Liberas (pingable), the gen servers (are up and running?), show error flags. It might be a good idea to have a button for each component to perform a check. E.g. perform a ping for the Liberas and CCCPs. Or perform a data query from the FESA classes. | ||
+ | Some of this information is provided by the detailed status panel, but you have to be an expert to interpret some errors. | ||
- | Would it make sense to have a standalone tool to see if the FESA serversm the BPMs, the gen Servers are up and running and without an error flag?! | ||
== Goals == | == Goals == | ||
- | Add logging output to all system components to know what is going on in each component. | + | Add logging output to all system components to know what is going on in each component |
Provide tools to observe the health status of the system components. | Provide tools to observe the health status of the system components. | ||
+ | |||
+ | |||
+ | |||
+ | Some more considerations (MSchw): | ||
+ | * I strongly support to have as much logging information as possible, e.g. to a textfile | ||
+ | * From my point of view it is very important that we clearly understand what SHOULD happen, e.g. when the user presses a button, BEFORE we try to understand why something we intend to do DOES NOT happen. | ||
+ | * I would recommend to have one or several (very basic, synoptic, NOT on code basis) diagramS of the internal process flow. This/these should be created by SD (Rainer?) together with Cosylab. The diagram/ | ||
+ | * I also support the idea of a stand-alone diagnostic tool as described above, just make sure the displayed information is clearly defined and leaves few space for misinterpretations. | ||
+ | |||