Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
public:lta_tricks [2013-08-21 08:25] – Adriaan Renting | public:lta_tricks [2023-07-17 08:51] (current) – Robbie Luijben | ||
---|---|---|---|
Line 5: | Line 5: | ||
===== Queries ===== | ===== Queries ===== | ||
- | * You can use wildcards | + | * You can use colons |
- | {{: | + | |
+ | {{ : | ||
+ | |||
+ | In textual entries, wildcards can be used. {{ : | ||
* You can put a list of SAS/ | * You can put a list of SAS/ | ||
- | {{: | + | {{ : |
+ | |||
+ | ===== Viewing data ===== | ||
+ | |||
+ | When you are looking at the results of a query you might see something like this: | ||
+ | |||
+ | {{ : | ||
+ | |||
+ | This means that the observation is known in the LTA, it knows what data was produced, the produced data was not archived, but further processing happened on the raw data and the results of some of those pipelines were archived. If you click on the zero, you will see something like this: | ||
+ | |||
+ | {{ : | ||
+ | |||
+ | This allows you to navigate from a pipeline back to the original observation, | ||
===== Retrieving data ===== | ===== Retrieving data ===== | ||
Line 16: | Line 31: | ||
* You can retrieve data on the Observation and Pipeline level, you don't have to select all files individually. | * You can retrieve data on the Observation and Pipeline level, you don't have to select all files individually. | ||
- | {{: | + | {{ : |
* If you have a query with more than 1000 results, you can open the multiple pages each in a separate tab/window. | * If you have a query with more than 1000 results, you can open the multiple pages each in a separate tab/window. | ||
- | {{: | + | {{ : |
* With the small triangle next to a list, you can fold or unfold the list to get a better overview. | * With the small triangle next to a list, you can fold or unfold the list to get a better overview. | ||
- | ==== Folded entries ==== | ||
- | {{: | ||
+ | == Folded entries == | ||
+ | |||
+ | {{ : | ||
+ | |||
+ | == Unfolded entries == | ||
+ | |||
+ | {{ : | ||
+ | |||
+ | ===== DBView ===== | ||
+ | |||
+ | There is a server that gives the option to run your own queries on the database [[https:// | ||
+ | |||
+ | A useful query might be this one, that gives you all files for a certain Obs Id (SAS VIC tree ID). | ||
+ | < | ||
+ | |||
+ | SELECT fo.URI, dp." | ||
+ | | ||
+ | FROM AWOPER." | ||
+ | | ||
+ | | ||
+ | WHERE dp." | ||
+ | AND pr." | ||
+ | AND fo.data_object = dp." | ||
+ | AND dp." | ||
+ | |||
+ | </ | ||
+ | |||
+ | In this ' | ||
+ | |||
+ | **Example** | ||
+ | |||
+ | < | ||
+ | SELECT fo.URI, fo.hash_md5, | ||
+ | | ||
+ | FROM AWOPER." | ||
+ | | ||
+ | | ||
+ | WHERE dp." | ||
+ | AND pr." | ||
+ | AND fo.data_object = dp." | ||
+ | AND dp." | ||
+ | |||
+ | </ | ||
+ | |||
+ | ===== AstroWise Python Interface ===== | ||
+ | |||
+ | There is a Python client library for accessing the LTA. With this library, you can script your own queries. The installation description can be found here: [[: | ||
+ | |||
+ | Once you have installed the client, set up your user name and password. These are the same as for MoM. Remember that this is just a different interface to the LTA catalogue: you will need the same credentials as for the web interface. | ||
+ | |||
+ | After installing the LTA client, the file .awe/ | ||
+ | < | ||
+ | |||
+ | [global] | ||
+ | database_user | ||
+ | database_password | ||
+ | |||
+ | </ | ||
+ | |||
+ | The following script can be used to test your installation: | ||
+ | |||
+ | <code python> | ||
+ | # Python3 code | ||
+ | from pprint import pprint | ||
+ | from awlofar.main.aweimports import Observation, | ||
+ | from common.database.Context import context | ||
+ | result = {} | ||
+ | for project in sorted(context.get_projects()) : | ||
+ | print(" | ||
+ | ok = context.set_project(project) | ||
+ | # do your query | ||
+ | obs_ids = set() | ||
+ | query = (Pointing.rightAscension > 95) & \ | ||
+ | (Pointing.rightAscension < 105) & \ | ||
+ | (Pointing.declination | ||
+ | (Pointing.declination | ||
+ | print(" | ||
+ | for pointing in query : | ||
+ | print(" | ||
+ | query_subarr = SubArrayPointing.pointing == pointing | ||
+ | for subarr in query_subarr: | ||
+ | query_obs = Observation.subArrayPointings.contains(subarr) | ||
+ | for obs in query_obs : | ||
+ | obs_ids.add(obs.observationId) | ||
+ | result[project] = sorted(list(obs_ids)) | ||
+ | print(result[project]) | ||
+ | |||
+ | pprint(result) | ||
+ | |||
+ | </ | ||
+ | |||
+ | It should print out a list of pointings (note that in this example the library was installed in $HOME/tmp): | ||
+ | |||
+ | < | ||
+ | $ env PYTHONPATH=$HOME/ | ||
+ | Project ALL | ||
+ | Total Pointings 202 | ||
+ | Pointing found RA 95.003499 DEC 24.838742 | ||
+ | Pointing found RA 95.174754 DEC 28.660087 | ||
+ | Pointing found RA 95.220000 DEC 29.140000 | ||
+ | Pointing found RA 95.546250 DEC 23.331750 | ||
+ | Pointing found RA 95.561458 DEC 24.584056 | ||
+ | ..etc.. | ||
+ | |||
+ | </ | ||
+ | |||
+ | You may need to kill the script, because it will print out all the observations in a certain patch of the sky archived in the LTA. | ||
+ | |||
+ | In case of errors, there may be the need to open some port on the firewall at your institution. Specifically, | ||
+ | |||
+ | ==== Examples ==== | ||
+ | |||
+ | Once you have tested that your connection to the catalogue is working, you are ready to browse the archive and stage the data you need. Here we will list a few examples of python scripts that can be used to access the LTA. All of them will need to import some modules: | ||
+ | |||
+ | <code python> | ||
+ | from datetime import datetime | ||
+ | from awlofar.database.Context import context | ||
+ | from awlofar.main.aweimports import CorrelatedDataProduct, | ||
+ | FileObject, \ | ||
+ | Observation | ||
+ | from awlofar.toolbox.LtaStager import LtaStager, LtaStagerError | ||
+ | |||
+ | |||
+ | </ | ||
+ | |||
+ | The lines above must be added to each of the scripts below for these to work. | ||
+ | |||
+ | === Ex: get staging URI's === | ||
+ | |||
+ | This script will allow you to find all data within a single project, for example LC2_035. Please change the project name to the code of a project of yours. If you also want to stage the data you found, just set the do_stage variable to True. Be careful with how many files you stage and what size they have: the same limits as for the web interface apply here. | ||
+ | |||
+ | <code python> | ||
+ | # Should the found files be staged ? | ||
+ | do_stage = False | ||
+ | # The project to query, LC2_035 has public data | ||
+ | project = ' | ||
+ | # The class of data to query | ||
+ | cls = CorrelatedDataProduct | ||
+ | # Query for private data of the project, you must be member of the project | ||
+ | private_data = False | ||
+ | |||
+ | # To see private data of this project, you must be member of this project | ||
+ | if private_data : | ||
+ | context.set_project(project) | ||
+ | if project != context.get_current_project().name: | ||
+ | raise Exception(" | ||
+ | |||
+ | query_observations = Observation.select_all().project_only(project) | ||
+ | uris = set() # All URIS to stage | ||
+ | for observation in query_observations : | ||
+ | print(" | ||
+ | # Instead of querying on the Observations of the DataProduct, | ||
+ | dataproduct_query = cls.observations.contains(observation) | ||
+ | # isValid = 1 means there should be an associated URI | ||
+ | dataproduct_query &= cls.isValid == 1 | ||
+ | for dataproduct in dataproduct_query : | ||
+ | # This DataProduct should have an associated URL | ||
+ | fileobject = ((FileObject.data_object == dataproduct) & (FileObject.isValid > 0)).max(' | ||
+ | if fileobject : | ||
+ | print(" | ||
+ | uris.add(fileobject.URI) | ||
+ | else : | ||
+ | print(" | ||
+ | |||
+ | print(" | ||
+ | |||
+ | if do_stage : | ||
+ | stager = LtaStager() | ||
+ | stager.stage_uris(uris) | ||
+ | |||
+ | |||
+ | </ | ||
+ | |||
+ | === Ex: filter on subbands === | ||
+ | |||
+ | The following script will find subbands 301 and 302 for all targets within two different projects. | ||
+ | |||
+ | Pay attention to the difference between the keys subband and stationSubband; | ||
+ | |||
+ | As a general advise, before performing a search, you need to **understand thoroughly the meaning of the keywords that you are using and where their values are stored**, otherwise you may not find the data you are looking for. | ||
+ | |||
+ | <code python> | ||
+ | do_stage = False | ||
+ | project1 = ' | ||
+ | project2 = ' | ||
+ | subband1 = 301 | ||
+ | subband2 = 302 | ||
+ | cls = CorrelatedDataProduct | ||
+ | # Query for private data of the project, you must be member of the project | ||
+ | private_data = False | ||
+ | |||
+ | # All URIS to stage | ||
+ | uris = { | ||
+ | project1: set(), | ||
+ | project2: set(), | ||
+ | } | ||
+ | |||
+ | for project in (project1, project2) : | ||
+ | print(" | ||
+ | if private_data : | ||
+ | context.set_project(project) | ||
+ | if project != context.get_current_project().name: | ||
+ | raise Exception(" | ||
+ | query_observations = Observation.select_all().project_only(project) | ||
+ | for observation in query_observations : | ||
+ | print(" | ||
+ | dataproduct_query = cls.observations.contains(observation) | ||
+ | # isValid = 1 means there should be an associated URI | ||
+ | dataproduct_query &= cls.isValid == 1 | ||
+ | dataproduct_query &= ((cls.subband == subband1) | (cls.subband == subband2)) | ||
+ | # Or for stationSubband do : | ||
+ | # | ||
+ | for dataproduct in dataproduct_query : | ||
+ | # This DataProduct should have an associated URL | ||
+ | fileobject = ((FileObject.data_object == dataproduct) & (FileObject.isValid > 0)).max(' | ||
+ | if fileobject : | ||
+ | print(" | ||
+ | uris[project].add(fileobject.URI) | ||
+ | else : | ||
+ | print(" | ||
+ | |||
+ | for project in (project1, project2) : | ||
+ | print(" | ||
+ | |||
+ | stager = LtaStager() | ||
+ | if do_stage : | ||
+ | for project in (project1, project2) : | ||
+ | stager.stage_uris(uris[project]) | ||
+ | |||
+ | |||
+ | </ | ||
+ | |||
+ | === Ex: filter on frequency and observation date === | ||
+ | |||
+ | Here, we find data between freq1 and freq2 taken within one project between day1 and day2 | ||
+ | |||
+ | <code python> | ||
+ | do_stage = False | ||
+ | project = ' | ||
+ | freq1 = 172.0 | ||
+ | freq2 = 178.0 | ||
+ | day1 = datetime(2014, | ||
+ | day2 = datetime(2014, | ||
+ | # DataProduct class to query; CorrelatedDataProduct, | ||
+ | cls = CorrelatedDataProduct | ||
+ | # Query for private data of the project, you must be member of the project | ||
+ | private_data = False | ||
+ | |||
+ | # To see private data of this project, you must be member of this project | ||
+ | if private_data : | ||
+ | context.set_project(project) | ||
+ | if project != context.get_current_project().name: | ||
+ | raise Exception(" | ||
+ | |||
+ | query_observations = ( | ||
+ | (Observation.startTime >= day1) & | ||
+ | (Observation.endTime | ||
+ | |||
+ | uris = set() | ||
+ | for observation in query_observations : | ||
+ | print(" | ||
+ | dataproduct_query = cls.observations.contains(observation) | ||
+ | # isValid = 1 means there should be an associated URI | ||
+ | dataproduct_query &= cls.isValid == 1 | ||
+ | dataproduct_query &= cls.minimumFrequency >= freq1 | ||
+ | dataproduct_query &= cls.maximumFrequency < freq2 | ||
+ | for dataproduct in dataproduct_query : | ||
+ | # This DataProduct should have an associated URL | ||
+ | fileobject = ((FileObject.data_object == dataproduct) & (FileObject.isValid > 0)).max(' | ||
+ | if fileobject : | ||
+ | print(" | ||
+ | uris.add(fileobject.URI) | ||
+ | else : | ||
+ | print(" | ||
+ | |||
+ | print(" | ||
+ | |||
+ | if do_stage : | ||
+ | stager = LtaStager() | ||
+ | stager.stage_uris(uris) | ||
+ | |||
+ | |||
+ | </ | ||
+ | |||
+ | === Ex: query public data === | ||
+ | |||
+ | Querying public data in projects you are not member of. First set project ALL, then construct a query and optionally limit the query to a certain project : | ||
+ | |||
+ | <code python> | ||
+ | context.set_project(' | ||
+ | query = CorrelatedDataProduct.select_all() | ||
+ | query &= query.project_only(' | ||
+ | print(len(query)) | ||
+ | # 1800 | ||
+ | |||
+ | |||
+ | </ | ||
+ | |||
+ | === Ex: get release dates === | ||
+ | |||
+ | <code python> | ||
+ | from awlofar.main.aweimports import Observation, | ||
+ | from common.database.Context import context | ||
+ | |||
+ | project = ' | ||
+ | |||
+ | # Query for private data of the project, you must be member of the project | ||
+ | private_data = True | ||
+ | |||
+ | # To see private data of this project, you must be member of this project | ||
+ | if private_data : | ||
+ | context.set_project(project) | ||
+ | if project != context.get_current_project().name: | ||
+ | raise Exception(" | ||
+ | |||
+ | # Observations | ||
+ | query_observations = Observation.select_all().project_only(project) | ||
+ | for observation in query_observations : | ||
+ | print(" | ||
+ | |||
+ | # Pipelines | ||
+ | query_pipelines = PipelineRun.select_all().project_only(project) | ||
+ | for pipeline in query_pipelines : | ||
+ | print(" | ||
+ | |||
+ | # Data products | ||
+ | query_products = DataProduct.select_all().project_only(project) | ||
+ | query_products &= DataProduct.isValid == 1 | ||
+ | for product in query_products : | ||
+ | print(" | ||
+ | |||
+ | |||
+ | </ | ||
+ | |||
+ | ===== Python Module for Staging ===== | ||
+ | |||
+ | The python interaction with the LTA catalog can be complemented with the use of a specific module developed to give users more control over their staging requests. | ||
+ | |||
+ | Current released version 2.0 ([[http:// | ||
+ | |||
+ | * User documentation for __//stageit //__ can be found at: [[https:// | ||
+ | * Version 2.0 release can be found at: [[https:// | ||
+ | |||
+ | ==== Version 2.0 usage notes ==== | ||
+ | |||
+ | The module is made available [[http:// | ||
+ | |||
+ | __Notes:__ | ||
+ | |||
+ | * You need an access token to the stageit api. Please refer to the user guide linked above to sign up and login to stageit. After logging in, a token can be obtained in one of two ways: | ||
+ | * Visit [[https:// | ||
+ | * From anywhere in the application, | ||
+ | * The token is valid indefinitely. Requesting a token multiple times will yield the same token. | ||
+ | * Make sure the token is available in your ~/ | ||
+ | * api_token=YOUR_TOKEN_HERE | ||
+ | * remove the old username and password from the '' | ||
+ | * The script is Python2 compatible, there is a Dockerfile available for Python2 testing in '' | ||
+ | * The requests library is a required dependency. If you care about Python2 compatability, | ||
+ | |||
+ | Also note that some functions are not supported in the new LTA stager. The states that a request can be in have been simplified. As such, there is no need for these functions anymore. Upon use, they will display an error stating that the function is deprecated. Please look at the '' | ||
+ | |||
+ | ==== Functionality ==== | ||
+ | |||
+ | For a description of what the user can do, we list here the functions that are available. | ||
+ | |||
+ | **stage(surls)** \\ It takes in a list of surls, queues a staging request for those urls, and outputs the ID of the request. | ||
+ | |||
+ | **get_status(stageid)** \\ It tells the user if a request is queued, in progress or finished (success). Possible statuses: " | ||
+ | |||
+ | **abort(stageid)** \\ It allows users to end a staging request. | ||
+ | |||
+ | **get_surls_online(stageid)** \\ It gives a list of the surls that have been staged for the relative request. The list is updated whenever a new surl comes on line. | ||
+ | |||
+ | **get_srm_token(stageid)** \\ The srm token is useful to interact directly with the SRM site through GRID/SRM tools. | ||
+ | |||
+ | **reschedule(stageid)** \\ If a request failed, it can be rescheduled. | ||
+ | |||
+ | **get_progress()** \\ No input needed. It returns the statuses of all the requests owned by the user. | ||
+ | |||
+ | Below is an example of how to use this: | ||
+ | < | ||
+ | > python | ||
+ | Python 2.7.10 (default, Oct 23 2015, 19:19:21) | ||
+ | [GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin | ||
+ | Type " | ||
+ | |||
+ | 2016-11-24 16: | ||
+ | 2016-11-24 16: | ||
+ | |||
+ | + 12227 | ||
+ | - File count | ||
+ | - Files done -> 40 | ||
+ | - Flagged abort -> false | ||
+ | - Location | ||
+ | - Percent done -> 40 | ||
+ | - Status | ||
+ | - User id -> 1919 | ||
- | ==== Unfolded entries ==== | + | </ |
- | {{: | + | |