public:lta_faq

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
public:lta_faq [2020-11-04 15:10] Bernard Asaberepublic:lta_faq [2025-01-23 06:48] (current) – [Troubleshoot] Sander ter Veen
Line 22: Line 22:
   * **Q**: [[:public:lta_faq#i_got_an_email_that_says_my_staging_request_was_only_partially_successful_what_s_going_on|I got an email that says my staging request was only partially successful! What's going on]]   * **Q**: [[:public:lta_faq#i_got_an_email_that_says_my_staging_request_was_only_partially_successful_what_s_going_on|I got an email that says my staging request was only partially successful! What's going on]]
   * **Q**: [[:public:lta_faq#oops_i_made_a_mistake_how_can_i_stop_a_request|Oops! I made a mistake! How can I stop a request]]   * **Q**: [[:public:lta_faq#oops_i_made_a_mistake_how_can_i_stop_a_request|Oops! I made a mistake! How can I stop a request]]
 +  * **Q:** [[:https:::www.astron.nl:lofarwiki:doku.php?id=public:lta_faq#i_get_a_time-out_when_searching_around_a_declination_of_zero|I get a time-out when searching around a declination of zero]]
   * **Q**: [[:public:lta_faq#my_files_only_contain_some_error_message_instead_of_data|My files only contain some error message instead of data]]   * **Q**: [[:public:lta_faq#my_files_only_contain_some_error_message_instead_of_data|My files only contain some error message instead of data]]
   * **Q**: [[:public:lta_faq#my_data_files_are_corrupted|My data files are corrupted / I cannot unpack my data]]   * **Q**: [[:public:lta_faq#my_data_files_are_corrupted|My data files are corrupted / I cannot unpack my data]]
Line 32: Line 33:
   * **Q**: [[:public:lta_faq#srm_grid_commands_fail_with_error_ac_validation_failed_or_no_trusted_path_can_be_constructed|SRM/Grid commands fail with error 'AC validation failed!' or 'No trusted path can be constructed']]   * **Q**: [[:public:lta_faq#srm_grid_commands_fail_with_error_ac_validation_failed_or_no_trusted_path_can_be_constructed|SRM/Grid commands fail with error 'AC validation failed!' or 'No trusted path can be constructed']]
   * **Q**: [[:public:lta_faq#srm_grid_commands_fail_and_i_cannot_figure_out_why|SRM/Grid commands fail and I cannot figure out why!]]   * **Q**: [[:public:lta_faq#srm_grid_commands_fail_and_i_cannot_figure_out_why|SRM/Grid commands fail and I cannot figure out why!]]
 +
 +
 ===== Answers ===== ===== Answers =====
  
Line 60: Line 63:
 === What is all this SRM / 'staging' stuff about? === === What is all this SRM / 'staging' stuff about? ===
  
-These are technical terms that refer to the storage backend of the LTA. Each of the three LTA sites (in Amsterdam, Juelich and Groningen) operates an SRM (Storage Resource Management) system. Each SRM system consists of magnetic tape storage and hard disk storage. Both are addressed by a common file system, where each file has a specific locality: it can be either on disk ('online') or on tape ('nearline') or both. The usual case for LTA data is, that it is on tape only. Since the tape is not directly accessible but placed in a library shelf, the data on it first has to be copied from tape to disk, in order to retrieve it. This process is called 'staging'. Only while the data is (also) on disk, you will be able to download it. (In physics terms, think of it as an excited state.) To save cost, the disk pool is of limited capacity and only meant for temporary caching data that a user wants to access right now. After 7 days, all data is automatically 'released', which means that it may be deleted from the disk storage, as soon as the space is required for other data. It then has to be staged again in order to become accessible again.+These are technical terms that refer to the storage backend of the LTA. Each of the three LTA sites (in Amsterdam, Juelichand Poznan) operates an SRM (Storage Resource Management) system. Each SRM system consists of magnetic tape storage and hard disk storage. Both are addressed by a common file system, where each file has a specific locality: it can be either on disk ('online') or on tape ('nearline') or both. The usual case for LTA data is, that it is on tape only. Since the tape is not directly accessible but placed in a library shelf, the data on it first has to be copied from tape to disk, in order to retrieve it. This process is called 'staging'. Only while the data is (also) on disk, you will be able to download it. (In physics terms, think of it as an excited state.) To save cost, the disk pool is of limited capacity and only meant for temporary caching data that a user wants to access right now. After 7 days, all data is automatically 'released', which means that it may be deleted from the disk storage, as soon as the space is required for other data. It then has to be staged again in order to become accessible again.
  
 Usually, you don't have to worry about the details. But be aware, that data retrieval is a two-step procedure: 1) preparation for download ('staging') and 2) the download itself. Also, take care not to request [[:public:lta_faq#what_is_an_appropriate_amount_of_data_to_retrieve|too much data at the same time.]] Usually, you don't have to worry about the details. But be aware, that data retrieval is a two-step procedure: 1) preparation for download ('staging') and 2) the download itself. Also, take care not to request [[:public:lta_faq#what_is_an_appropriate_amount_of_data_to_retrieve|too much data at the same time.]]
Line 107: Line 110:
 === I did not receive a mail notification that my data is ready for retrieval! Has my request gone lost? === === I did not receive a mail notification that my data is ready for retrieval! Has my request gone lost? ===
  
-After you got a notification that your requests was scheduled, it is in our database and there's hardly a possibility that it got lost. Staging requests can take up to a day or two, but will finish a lot sooner in most cases. This depends on your request's size but also on how busy the storage systems are by other user's requests at the moment. Sometimes, the LTA storage systems are down for maintenance and this can delay the whole procedure. You can [[http://web.grid.sara.nl/cgi-bin/lofar.py|check for downtimes here]].+After you got a notification that your requests was scheduled, it is in our database and there's hardly a possibility that it got lost. Staging requests can take up to a day or two, but will finish a lot sooner in most cases. This depends on your request's size but also on how busy the storage systems are by other user's requests at the moment. Sometimes, the LTA storage systems are down for maintenance and this can delay the whole procedure. You can [[https://ganglia.grid.surfsara.nl/cgi-bin/lofar.py|check for downtimes here]].
  
 It is not alarming when your request did not finish in 24 hours, even when your last request finished within 10 minutes. In urgent cases or if you did not receive a notification after 48 hours, please contact the [[https://support.astron.nl/rohelpdesk|ASTRON helpdesk]]. It is not alarming when your request did not finish in 24 hours, even when your last request finished within 10 minutes. In urgent cases or if you did not receive a notification after 48 hours, please contact the [[https://support.astron.nl/rohelpdesk|ASTRON helpdesk]].
Line 117: Line 120:
 If you used the xmlrpc interface to submit your request, please first check whether you made a mistake and e.g. entered the wrong SURLs. If you used the xmlrpc interface to submit your request, please first check whether you made a mistake and e.g. entered the wrong SURLs.
  
-**Note:**  We get notified of these issues as well and will usually re-schedule failed requests due to server issues after the problem was solved. So please first check whether you got a 'Data ready for retrieval' notification for the same request id after the error notification. If you did, the problem was already resolved. +**Note:** We get notified of these issues as well and will usually re-schedule failed requests due to server issues after the problem was solved. So please first check whether you got a 'Data ready for retrieval' notification for the same request id after the error notification. If you did, the problem was already resolved.
 === I got an email that says my staging request was only partially successful! What's going on? === === I got an email that says my staging request was only partially successful! What's going on? ===
  
Line 125: Line 127:
 If you used the xmlrpc interface to submit your request, please first check whether you made a mistake and e.g. entered the wrong SURLs. If you used the xmlrpc interface to submit your request, please first check whether you made a mistake and e.g. entered the wrong SURLs.
  
-**Note:**  We get notified of these issues as well and will usually re-schedule failed requests due to server issues after the problem was solved. So please first check whether you got a 'Data ready for retrieval' notification for the same request id after the error notification. If you did, the problem was already resolved. +**Note:** We get notified of these issues as well and will usually re-schedule failed requests due to server issues after the problem was solved. So please first check whether you got a 'Data ready for retrieval' notification for the same request id after the error notification. If you did, the problem was already resolved.
 === Oops! I made a mistake! How can I stop a request? === === Oops! I made a mistake! How can I stop a request? ===
  
 Unfortunately, this is currently not possible for you as a user. Stay calm and ask [[https://support.astron.nl/rohelpdesk|ASTRON helpdesk]] to stop the request for you. Unfortunately, this is currently not possible for you as a user. Stay calm and ask [[https://support.astron.nl/rohelpdesk|ASTRON helpdesk]] to stop the request for you.
  
 +=== I get a time-out when searching around a declination of zero ===
 +
 +The system has trouble finding data around a declination of zero. This can take a long time (minutes). If it still not succeeds you will get an nginx time-out error. To limit the search so it will still succeed you can either lower the search radius or limit the **Observing date** start and stop time. With that we have been able to find date with a radius of 1.2 in a two-year window for example. LOFAR observations start around 2011.
 === My files only contain some error message instead of data === === My files only contain some error message instead of data ===
  
 Most errors should result in a 404/50x return code. However, some error messages are still returned as a message. Please read the error message carefully. In many cases, it should give you some indication of what went wrong. If this does not help you, please contact the [[https://support.astron.nl/rohelpdesk|ASTRON helpdesk]] or retry after a few hours. Most errors should result in a 404/50x return code. However, some error messages are still returned as a message. Please read the error message carefully. In many cases, it should give you some indication of what went wrong. If this does not help you, please contact the [[https://support.astron.nl/rohelpdesk|ASTRON helpdesk]] or retry after a few hours.
  
-**Important:**  If you use wget with option '-c', please note the following: wget does not check the contents of an existing file, so when restarting wget with option '-c' (continue) to retrieve the failed files, it will append the later data chunk to the existing file that contains the error message (and not the first section of you data). Make sure to delete the existing error files (should be obvious by the small file size) before calling 'wget -ci' again, to avoid corrupted data. If you already ended up with a corrupted file, you have to delete that and re-retrieve the whole file. +**Important:** If you use wget with option '-c', please note the following: wget does not check the contents of an existing file, so when restarting wget with option '-c' (continue) to retrieve the failed files, it will append the later data chunk to the existing file that contains the error message (and not the first section of you data). Make sure to delete the existing error files (should be obvious by the small file size) before calling 'wget -ci' again, to avoid corrupted data. If you already ended up with a corrupted file, you have to delete that and re-retrieve the whole file.
 === My data files are corrupted === === My data files are corrupted ===
  
Line 147: Line 150:
 === My downloads don't start / time out === === My downloads don't start / time out ===
  
-Maybe the SRM system is down for maintenance, please check [[http://web.grid.sara.nl/cgi-bin/lofar.py|http://web.grid.sara.nl/cgi-bin/lofar.py]]. If there is nothing going on, there is probably something wrong with the download service. Please try again a bit later and submit a support request to the [[https://support.astron.nl/rohelpdesk|ASTRON helpdesk]], if the issue persists.+Maybe the SRM system is down for maintenance, please check [[https://ganglia.grid.surfsara.nl/cgi-bin/lofar.py|https://ganglia.grid.surfsara.nl/cgi-bin/lofar.py]]. If there is nothing going on, there is probably something wrong with the download service. Please try again a bit later and submit a support request to the [[https://support.astron.nl/rohelpdesk|ASTRON helpdesk]], if the issue persists.
  
 === Http downloads randomly fail with "503 Service Temporarily Unavailable" === === Http downloads randomly fail with "503 Service Temporarily Unavailable" ===
Line 169: Line 172:
 Ensure you have run 'voms-proxy-init'' to generate an up-to-date proxy file. In case the error persists: The SRM tools apparently do not always use the default proxy file location $HOME/.proxy or you used a non-standard proxy location in ''voms-proxy-init''. Ensure you have run 'voms-proxy-init'' to generate an up-to-date proxy file. In case the error persists: The SRM tools apparently do not always use the default proxy file location $HOME/.proxy or you used a non-standard proxy location in ''voms-proxy-init''.
  
-  * Either set the X509_USER_PROXY environment variable to your .proxy file, e.g.+   * Either set the X509_USER_PROXY environment variable to your .proxy file, e.g.
 <code> <code>
  
  • Last modified: 2020-11-04 15:10
  • by Bernard Asabere