This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
projects:bpm-sis18:status [2012/07/11 17:23] rhaseitl |
projects:bpm-sis18:status [2012/07/13 09:11] (current) klang |
||
---|---|---|---|
Line 2: | Line 2: | ||
Errors occurring sporadically: | Errors occurring sporadically: | ||
- | * the Liberas loose their connection: they appear red in the detailed status panel, giving the status " | + | * the Liberas loose their connection: they appear red in the detailed status panel, giving the status " |
In at least one case, I also had to restart the FESA classes on the CCCPs to make the system working again. | In at least one case, I also had to restart the FESA classes on the CCCPs to make the system working again. | ||
This happens sometimes during beamtime. Or after the system was not used for a while and is started again (= the GUI was closed for a while and is started again). | This happens sometimes during beamtime. Or after the system was not used for a while and is started again (= the GUI was closed for a while and is started again). | ||
Line 10: | Line 10: | ||
* aux BPM confuses the whole system (Start and Stop is triggered "out of itself" | * aux BPM confuses the whole system (Start and Stop is triggered "out of itself" | ||
* low system performance (switching of mode lasts several seconds) | * low system performance (switching of mode lasts several seconds) | ||
+ | |||
+ | Bunch detection / Measurment data errors (KL): | ||
+ | * Under certain beam/signal conditions, Liberas deliver errernous measurment data. This causes TOPOS to display hard or not usable results. | ||
== Debugging ideas: == | == Debugging ideas: == | ||
Line 22: | Line 25: | ||
* connection to BPM established/ | * connection to BPM established/ | ||
* debug output at every status change of the system (Initializing, | * debug output at every status change of the system (Initializing, | ||
+ | * logging should use the Log4j framework (from within the FESA class possible with SDLog (HBr)) | ||
+ | * the GUI should **not** encapsulate exceptions thrown by cmw / rda into its own Exception class (HBr) | ||
+ | * a detailed documentation of the meaning and reasns for each error message, exception etc. should be made (HBr) | ||
\\ | \\ | ||
Log on the generic servers (with timestamps!): | Log on the generic servers (with timestamps!): | ||
* version number (or similar) at startup | * version number (or similar) at startup | ||
- | * internal register values (when changed, on start, on stop) | + | * internal register values (when changed, on start trigger, on stop trigger) |
* when a start or stop trigger arrives | * when a start or stop trigger arrives | ||
* when the ring buffer is full | * when the ring buffer is full | ||
- | * operating mode (raw, bunch to bunch) | + | * operating mode (raw, bunch to bunch, calibrations, |
* log any other useful events | * log any other useful events | ||
+ | * log buffer overflows | ||
+ | * there seem to be logs on the liberas under /var/log. But without timestamps. When a separate network for the Liberas is used, the time can't be queried from a global NTP server. -> Setup a " | ||
\\ | \\ | ||
Line 36: | Line 44: | ||
* put the Liberas into a own network (not the GSI/ACC network) | * put the Liberas into a own network (not the GSI/ACC network) | ||
* bootfile on CCCPs, static IPs, nfs mount to store debug logs | * bootfile on CCCPs, static IPs, nfs mount to store debug logs | ||
- | * is this a lot of work? does it require changes in the gen servers / FPGA code? | + | * is this a lot of work? does it require changes in the gen servers / FPGA code (Change of FPGA code won't be necessary for this issue (KL))? |
- | **Would it make sense to have a simple | + | \\ |
+ | Connection to the PTIF: | ||
+ | | ||
+ | |||
+ | Have a standalone tool to see ALL system components directly: | ||
+ | Some of this information is provided by the detailed | ||
+ | |||
+ | \\ | ||
+ | GUI improvements for wrong measurement data (KL): | ||
+ | | ||
== Goals == | == Goals == | ||
Line 44: | Line 61: | ||
Add logging output to all system components to know what is going on in each component for each status change. There should be a flag to en-/disable logging at startup. | Add logging output to all system components to know what is going on in each component for each status change. There should be a flag to en-/disable logging at startup. | ||
Provide tools to observe the health status of the system components. | Provide tools to observe the health status of the system components. | ||
+ | |||
+ | |||
+ | |||
+ | Some more considerations (MSchw): | ||
+ | * I strongly support to have as much logging information as possible, e.g. to a textfile | ||
+ | * From my point of view it is very important that we clearly understand what SHOULD happen, e.g. when the user presses a button, BEFORE we try to understand why something we intend to do DOES NOT happen. | ||
+ | * I would recommend to have one or several (very basic, synoptic, NOT on code basis) diagramS of the internal process flow. This/these should be created by SD (Rainer?) together with Cosylab. The diagram/ | ||
+ | * I also support the idea of a stand-alone diagnostic tool as described above, just make sure the displayed information is clearly defined and leaves few space for misinterpretations. | ||
+ | |||