Update UCD section relative to last discussion with Semantics WG#61
Update UCD section relative to last discussion with Semantics WG#61loumir wants to merge 11 commits into
Conversation
Clarified the usage of Particle Data Group Identifier for particle detection and the electron case.
showing the result of recent discussions with Semantics WG
loumir
left a comment
There was a problem hiding this comment.
thanks for your careful reading.
many places changed due to the UCD related elements .
The different roles between the column names , using the dedicated language of the specific domain , here high energy , and the UCD terms using a more general language to foster crosswalks between the various usage in other spectral domains , should be clear in the note.
iannevans
left a comment
There was a problem hiding this comment.
Added a few comments and recommended changes where I thought they made sense.
| Observations of the universe at the highest energies are based on techniques that are radically different compared to the UV through radio domains. \gls{HEA} observatories\footnote{For example, Chandra, XMM-Newton, Fermi, H.E.S.S., MAGIC, VERITAS, HAWC, LHAASO, IceCube, ANTARES, Auger, and soon CTAO, KM3NeT, and SWGO.} are generally designed to detect particles ({\em e.g.\/}, individual photons, cosmic-rays, or neutrinos) with the ability to estimate multiple observables for those particles. These detection techniques all rely on {\em event counting\/}\footnote{As opposed to signal integrating ({\em e.g.\/}, using a detector that accumulates the total photon signal during an exposure).}, where an event has some probability of being due to the interaction of a particle from an astrophysical source with the detectors, but also has some probability of being from instrumental or background effects. The data corresponding to an event are first an instrumental signal, which is then calibrated and processed to estimate physical quantities such as a time of arrival, point-of-origin on the sky, and an energy proxy associated with the event. Several other intermediate and qualifying characteristics may be associated with a detected event, depending on the detection technique. The ensemble of events detected over a given time interval and spatial field-of-view is referred to as an {\em event list\/}, which we designate an {\bf event-list} in this document. | ||
|
|
||
| Though {\bf event-list}s {\em may\/} include estimators for calibrated physical values, they typically still have to be corrected for the photometric, spectral, spatial, and/or temporal responses of the telescope and detector combination to yield scientifically interpretable information. The mappings between physical measurements of the source properties and the observables are called Instrument Response Functions (\glspl{IRF}\footnote{We try to avoid using the term \gls{IRF} in a normative sense since historical usage across the broad \gls{HEA} community (and from facility to facility) varies. In some cases, \gls{IRF} has been used to mean specifically the product of the \gls{ARF} and \gls{RMF}, whereas in other cases \gls{IRF} has been used more generally to mean any instrumental response function regardless of type.}). Some \glspl{IRF} are probabilistic in nature\footnote{For example, the energy matrix is a probability density function.}, and in addition may depend on the set of events selected for analysis by the end user. They are usually not invertible, so methods such as forward-folding fitting (using source models with any combination of spectral, spatial, temporal, and/or polarization components that are estimated) are needed to estimate physical properties, such as the true flux of particles from a source arriving at the instrument, given the measured observable quantities. The \glspl{IRF} generally evolve over time with the instrument and observation characteristics, and are usually defined for a specific time interval and may be decomposed into a standard set of independent components (see \S~3.1.5 of \citealt{2024ivoa.note.heig}), such as the spatial point-spread function or the energy-migration matrix or different messenger particle types, where each component may be stored or computed separately. Since both \glspl{IRF} and {\bf event-list}s are required to analyze \gls{HEA} data, some \gls{IVOA} standards must be modified in order to expose both of them via the \gls{VO}. | ||
| Though {\bf event-list}s {\em may\/} include estimators for calibrated physical values, they typically still have to be corrected for the photometric, spectral, spatial, and/or temporal responses of the telescope and detector combination to yield scientifically interpretable information. The mappings between physical measurements of the source properties and the observables are called Instrument Response Functions (\glspl{IRF}\footnote{We try to avoid using the term \gls{IRF} in a normative sense since historical usage across the broad \gls{HEA} community (and from facility to facility) varies. In some cases, \gls{IRF} has been used to mean specifically the product of the \gls{ARF} and \gls{RMF}, whereas in other cases \gls{IRF} has been used more generally to mean any instrumental response function regardless of type.}). Some \glspl{IRF} are probabilistic in nature\footnote{For example, the energy matrix is a probability density function.}, and in addition may depend on the set of events selected for analysis by the end user. They are usually not invertible, so methods such as forward-folding fitting (using source models with any combination of spectral, spatial, temporal, and/or polarization components that are estimated) are needed to estimate physical properties, such as the true flux of particles from a source arriving at the instrument, given the measured observable quantities. The \glspl{IRF} generally evolve over time with the instrument and observation characteristics, and are usually defined for a specific time interval and may be decomposed into a standard set of independent components (see \S~3.1.5 of \citealt{2024ivoa.note.heig}), such as the spatial point-spread function or the energy-migration matrix or different messenger particle types, where each component may be stored or computed separately. Since both \glspl{IRF} and {\bf event-list}s are required to analyze \gls{HEA} data, some \gls{IVOA} standards must be modified expose both of them via the \gls{VO}. |
There was a problem hiding this comment.
| Though {\bf event-list}s {\em may\/} include estimators for calibrated physical values, they typically still have to be corrected for the photometric, spectral, spatial, and/or temporal responses of the telescope and detector combination to yield scientifically interpretable information. The mappings between physical measurements of the source properties and the observables are called Instrument Response Functions (\glspl{IRF}\footnote{We try to avoid using the term \gls{IRF} in a normative sense since historical usage across the broad \gls{HEA} community (and from facility to facility) varies. In some cases, \gls{IRF} has been used to mean specifically the product of the \gls{ARF} and \gls{RMF}, whereas in other cases \gls{IRF} has been used more generally to mean any instrumental response function regardless of type.}). Some \glspl{IRF} are probabilistic in nature\footnote{For example, the energy matrix is a probability density function.}, and in addition may depend on the set of events selected for analysis by the end user. They are usually not invertible, so methods such as forward-folding fitting (using source models with any combination of spectral, spatial, temporal, and/or polarization components that are estimated) are needed to estimate physical properties, such as the true flux of particles from a source arriving at the instrument, given the measured observable quantities. The \glspl{IRF} generally evolve over time with the instrument and observation characteristics, and are usually defined for a specific time interval and may be decomposed into a standard set of independent components (see \S~3.1.5 of \citealt{2024ivoa.note.heig}), such as the spatial point-spread function or the energy-migration matrix or different messenger particle types, where each component may be stored or computed separately. Since both \glspl{IRF} and {\bf event-list}s are required to analyze \gls{HEA} data, some \gls{IVOA} standards must be modified expose both of them via the \gls{VO}. | |
| Though {\bf event-list}s {\em may\/} include estimators for calibrated physical values, they typically still have to be corrected for the photometric, spectral, spatial, and/or temporal responses of the telescope and detector combination to yield scientifically interpretable information. The mappings between physical measurements of the source properties and the observables are called Instrument Response Functions (\glspl{IRF}\footnote{We try to avoid using the term \gls{IRF} in a normative sense since historical usage across the broad \gls{HEA} community (and from facility to facility) varies. In some cases, \gls{IRF} has been used to mean specifically the product of the \gls{ARF} and \gls{RMF}, whereas in other cases \gls{IRF} has been used more generally to mean any instrumental response function regardless of type.}). Some \glspl{IRF} are probabilistic in nature\footnote{For example, the energy matrix is a probability density function.}, and in addition may depend on the set of events selected for analysis by the end user. They are usually not invertible, so methods such as forward-folding fitting (using source models with any combination of spectral, spatial, temporal, and/or polarization components that are estimated) are needed to estimate physical properties, such as the true flux of particles from a source arriving at the instrument, given the measured observable quantities. The \glspl{IRF} generally evolve over time with the instrument and observation characteristics, and are usually defined for a specific time interval and may be decomposed into a standard set of independent components (see \S~3.1.5 of \citealt{2024ivoa.note.heig}), such as the spatial point-spread function or the energy-migration matrix or different messenger particle types, where each component may be stored or computed separately. Since both \glspl{IRF} and {\bf event-list}s are required to analyze \gls{HEA} data, some \gls{IVOA} standards must be modified to expose both of them via the \gls{VO}. |
| \subsection{{\em energy\_min\/}/{\em energy\_max\/}} | ||
|
|
||
| The existing attributes {\em em\_min\/} and {\em em\_max\/} that define the coverage of the spectral axis (defined as wavelength expressed in units of m) are not user friendly for \gls{HEA} where datasets are generally selected according to an energy range ({\em i.e.\/}, inverse wavelength) in units of eV (or scaled units of eV, for example keV, MeV, GeV, TeV, PeV). Unlike the radio domain where $\lambda = c/\nu$, where $c$ is an almost universally remembered physical constant, the conversion $\lambda = hc/E$ is not simple for the user to express. As the spectral range covered by \gls{HEA} data is many decades larger than for other wavebands, the accurate numerical representations of typical \gls{HEA} spectral ranges as {\em em\_min\/}/{\em em\_max\/} requires quantities with many digits of precision and exponents ranging from $\sim\!10^{-5}$--$10^{-22}$, and are misleading when used for energy ranges of massive particles. Since specification of the spectral range is largely fundamental to data discovery in the \gls{HEA} regime, we propose to add attributes {\em energy\_min\/} and {\em energy\_max\/} that specify the minimum and maximum spectral range values in units of eV\null. Note that the sense of these attributes is {\em opposite\/} that of {\em em\_min\/} and {\em em\_max\/} because of the inverse wavelength relationship between energy and wavelength, so numerical comparisons must be transposed ({\em e.g.\/}, $E>E_{\rm thresh}$ becomes $\lambda<hc/E_{\rm thresh}$). (An alternate approach would be to add attributes {\em em\_min\_energy\/} and {\em em\_max\_energy\/} that represent the energies corresponding to {\em em\_min\/} and {\em em\_max\/} in units of eV\null. This is less desirable since queries on an energy would need to be specified as {\em em\_max\_energy\/}${}\leq E <{}${\em em\_min\_energy\/}, which is likely confusing. This approach is not recommended when describing massive particles, including neutrinos.) | ||
| The existing attributes {\em em\_min\/} and {\em em\_max\/} that define the coverage of the spectral axis (defined as wavelength expressed in units of m) are not user friendly for \gls{HEA} where datasets are generally selected according to an energy range ({\em i.e.\/}, inverse wavelength) in units of eV (or scaled units of eV, for example keV, MeV, GeV, TeV, PeV). Unlike the radio domain where $\lambda = c/\nu$, where $c$ is an almost universally remembered physical constant, the conversion $\lambda = hc/E$ is not simple for the user to express. As the spectral range covered by \gls{HEA} data is many decades larger than for other wavebands, the accurate numerical representations of typical \gls{HEA} spectral ranges as {\em em\_min\/}/{\em em\_max\/} requires quantities with many digits of precision and exponents ranging from $\sim\!10^{-5}$--$10^{-22}$, and are misleading when used for energy ranges of massive particles. Since specification of the spectral range is largely fundamental to data discovery in the \gls{HEA} regime, we propose to add attributes {\em energy\_min\/} and {\em energy\_max\/} that specify the minimum and maximum spectral range values in units of eV\null. Note that the sense of these attributes is {\em opposite\/} that of {\em em\_min\/} and {\em em\_max\/} because of the inverse wavelength relationship between energy and wavelength, so numerical comparisons must be transposed ({\em e.g.\/}, $E>E_{\rm thresh}$ becomes $\lambda<hc/E_{\rm thresh}$). (An alternate approach would be to add attributes {\em em\_min\_energy\/} and {\em em\_max\_energy\/} that represent the energies corresponding to {\em em\_min\/} and {\em em\_max\/} in units of eV\null. This is less since queries on an energy would need to be specified as {\em em\_max\_energy\/}${}\leq E <{}${\em em\_min\_energy\/}, which is likely confusing. This approach is not recommended when describing massive particles, including neutrinos.) |
There was a problem hiding this comment.
| The existing attributes {\em em\_min\/} and {\em em\_max\/} that define the coverage of the spectral axis (defined as wavelength expressed in units of m) are not user friendly for \gls{HEA} where datasets are generally selected according to an energy range ({\em i.e.\/}, inverse wavelength) in units of eV (or scaled units of eV, for example keV, MeV, GeV, TeV, PeV). Unlike the radio domain where $\lambda = c/\nu$, where $c$ is an almost universally remembered physical constant, the conversion $\lambda = hc/E$ is not simple for the user to express. As the spectral range covered by \gls{HEA} data is many decades larger than for other wavebands, the accurate numerical representations of typical \gls{HEA} spectral ranges as {\em em\_min\/}/{\em em\_max\/} requires quantities with many digits of precision and exponents ranging from $\sim\!10^{-5}$--$10^{-22}$, and are misleading when used for energy ranges of massive particles. Since specification of the spectral range is largely fundamental to data discovery in the \gls{HEA} regime, we propose to add attributes {\em energy\_min\/} and {\em energy\_max\/} that specify the minimum and maximum spectral range values in units of eV\null. Note that the sense of these attributes is {\em opposite\/} that of {\em em\_min\/} and {\em em\_max\/} because of the inverse wavelength relationship between energy and wavelength, so numerical comparisons must be transposed ({\em e.g.\/}, $E>E_{\rm thresh}$ becomes $\lambda<hc/E_{\rm thresh}$). (An alternate approach would be to add attributes {\em em\_min\_energy\/} and {\em em\_max\_energy\/} that represent the energies corresponding to {\em em\_min\/} and {\em em\_max\/} in units of eV\null. This is less since queries on an energy would need to be specified as {\em em\_max\_energy\/}${}\leq E <{}${\em em\_min\_energy\/}, which is likely confusing. This approach is not recommended when describing massive particles, including neutrinos.) | |
| The existing attributes {\em em\_min\/} and {\em em\_max\/} that define the coverage of the spectral axis (defined as wavelength expressed in units of m) are not user friendly for \gls{HEA} where datasets are generally selected according to an energy range ({\em i.e.\/}, inverse wavelength) in units of eV (or scaled units of eV, for example keV, MeV, GeV, TeV, PeV). Unlike the radio domain where $\lambda = c/\nu$, where $c$ is an almost universally remembered physical constant, the conversion $\lambda = hc/E$ is not simple for the user to express. As the spectral range covered by \gls{HEA} data is many decades larger than for other wavebands, the accurate numerical representations of typical \gls{HEA} spectral ranges as {\em em\_min\/}/{\em em\_max\/} requires quantities with many digits of precision and exponents ranging from $\sim\!10^{-5}$--$10^{-22}$, and are misleading when used for energy ranges of massive particles. Since specification of the spectral range is largely fundamental to data discovery in the \gls{HEA} regime, we propose to add attributes {\em energy\_min\/} and {\em energy\_max\/} that specify the minimum and maximum spectral range values in units of eV\null. Note that the sense of these attributes is {\em opposite\/} that of {\em em\_min\/} and {\em em\_max\/} because of the inverse wavelength relationship between energy and wavelength, so numerical comparisons must be transposed ({\em e.g.\/}, $E>E_{\rm thresh}$ becomes $\lambda<hc/E_{\rm thresh}$). (An alternate approach would be to add attributes {\em em\_min\_energy\/} and {\em em\_max\_energy\/} that represent the energies corresponding to {\em em\_min\/} and {\em em\_max\/} in units of eV\null. This is less desirable since queries on an energy would need to be specified as {\em em\_max\_energy\/}${}\leq E <{}${\em em\_min\_energy\/}, which is likely confusing. This approach is not recommended when describing massive particles, including neutrinos.) |
| \subsubsection{Instrument-related Quantities} | ||
|
|
||
| We propose to add a new UCD {\em instr.event\/} as the base of the hierarchy to describe instrument-related properties of particle events detected by \gls{HEA} detectors. Initially, we propose a small set of event-related UCDs that identify key properties that are particularly important for \gls{HEA} data analysis. | ||
| We propose to add a new UCD {\em instr.detection\/} as the base of the hierarchy to describe instrument-related properties of particle events detected by \gls{HEA} detectors. Initially, we propose a small set of event-related UCDs that identify key properties that are particularly important for \gls{HEA} data analysis. |
There was a problem hiding this comment.
As I've argued before, instr.detection is in my view worse than instr.event because "detection" has a real and widely used meaning.
| \paragraph{Event Grade} | ||
|
|
||
| For imaging X-ray instruments (especially those based on CCD detectors), detected events typically deposit charge into more than a single detector pixel. The events are assigned a ``grade'' based on how charge is deposited into the central pixel and surrounding pixels, and the grade information is essential for data analysis since typically only a subset of grades will correspond to valid events. We propose to add a new UCD {\em instr.event.grade\/} that identifies event grades. | ||
| For imaging X-ray instruments (especially those based on CCD detectors), detected events typically deposit charge into more than a single detector pixel. The events are assigned a ``grade'' based on how charge is deposited into the central pixel and surrounding pixels, and the grade information is essential for data analysis since typically only a subset of grades will correspond to valid events. We propose to add a new UCD {\em meta.code.class;instr.detection\/} that identifies event grades. |
There was a problem hiding this comment.
The last sentence doesn't make sense any more. You are suggesting meta.code.class;instr.detection for both event-grade and event-type, so the proposed UCDs do NOT "identify event grade" or "identify event type" as you can't differentiate them.
There was a problem hiding this comment.
Thanks , that is true.
Identification of an event property like grade is made via the event_grade column name, and event_type identifies the type of event , as decided by the reduction pipeline.
Having the same UCD used in VOTable happens regularly in the VO datasets.
I will correct the text .
There was a problem hiding this comment.
Thanks , that is true. Identification of an event property like grade is made via the event_grade column name, and event_type identifies the type of event , as decided by the reduction pipeline. Having the same UCD used in VOTable happens regularly in the VO datasets. I will correct the text .
I understand that, but using the same UCDs does not help interoperability because the names of these significant data columns (recall that we have specifically called out these few quantities in the HEIG note from among at least several hundred different columns actually used in HEA datasets) are not standardized across facilities. As a thought experiment, suppose that the two spatial world coordinate axes (ra and dec) did not have standardized names from dataset to dataset and you only have a single UCD like pos.spatialaxis rather than pos.eq.ra, pos.eq.dec to describe them both. Wouldn't interoperability be significantly compromised in this circumstance? That is the equivalent of the case here for these data quantities.
|
|
||
| The shape of any statistical distribution is an essential quantity for interpreting the meaning of any statistical properties. Too often a Gaussian distribution or a distribution that can be characterized by a simple set of moments ({\em e.g.\/}, mean, variance, skewness, kurtosis) are assumed, but in the extreme Poisson regime common in \gls{HEA} these assumptions are often invalid. We propose adding a UCD {\em stat.distribution\/} to identify a quantity that defines the distribution of a statistical variable such as a likelihood profile. | ||
| % mireille This assumes that a column will contain a full set of data values. Is this ? | ||
| In fact it would make the UCD work as a term compatible to a data product type , so ambiguous in terms of role . |
There was a problem hiding this comment.
| In fact it would make the UCD work as a term compatible to a data product type , so ambiguous in terms of role . | |
| %In fact it would make the UCD work as a term compatible to a data product type , so ambiguous in terms of role . |
| P & {\em stat.lowerlimit\/} & Lower limit \cr | ||
| P & {\em stat.upperlimit\/} & Upper limit \cr | ||
| % Mireille we cannot have these terms in the UCD tree ; that would imply importing all possible encoding of any kind | ||
| %S & {\em phys.particle.pdgid\/} & Particle Data Group Identifier \cr |
There was a problem hiding this comment.
| %S & {\em phys.particle.pdgid\/} & Particle Data Group Identifier \cr | |
| S & {\em phys.particle.pdgid\/} & Particle Data Group Identifier \cr |
There was a problem hiding this comment.
The UCD combination 'meta.id;phys.particle' is useful in this case and is fully general if data providers use a different particle identifier.
The column can be named as 'messenger_pid' if this will be the adopted standard for every High Energy archive.
Is there a PDGid for all cases mentioned in the messenger table ?
There was a problem hiding this comment.
The UCD combination 'meta.id;phys.particle' is useful in this case and is fully general if data providers use a different particle identifier. The column can be named as 'messenger_pid' if this will be the adopted standard for every High Energy archive. Is there a PDGid for all cases mentioned in the messenger table ?
I wouldn't suggest changing 'messenger' to 'messenger_pdgid'. I don't think PDG ID is something that would be recognized by the vast majority of astrophysicists, even X-ray or gamma-ray astrophysicists. It's primarily used by particle physicists, but there is cross-over for particle astrophysics and cosmic ray astrophysics. That's why we proposed 'messenger' to allow common messenger names but also the option of specifying pdgid{+|-}NN, so that 'messenger' can handle all cases (I believe that there are over 500 different PDG IDs defined, although many of them are exotics).
If meta.id;phys.particle can be applied to (e.g., a column containing a list of) PDG IDs then phys.particle.pdgid may not be necessary. But in that case what UCD should be used similarly to cases where (e.g.) physical.particle.neutrino or physical.particle.proton are used if the particle is identified only by PDG ID?
| P & {\em stat.error.positive\/} & Positive statistical error \cr | ||
| P & {\em stat.lowerlimit\/} & Lower limit \cr | ||
| P & {\em stat.upperlimit\/} & Upper limit \cr | ||
| % Mireille we cannot have these terms in the UCD tree ; that would imply importing all possible encoding of any kind |
There was a problem hiding this comment.
I disagree about phys.particle.pdgid. phys.particle.pdgid is a UCD that identifies a Particle Group Identifier, for example a column in a FITS file that records the pdgid for each event. It's not the same as UCDs that encode all possible pdgids as part of the UCD name.
| S & {\em phys.particle.antiprotron\/} & Related to anti-proton \cr | ||
| S & {\em phys.particle.cosmicray\/} & Related to cosmic rays particles \cr | ||
| S & {\em phys.particle.electron\/} & Related to electron \cr | ||
| %S & {\em phys.particle.electron\/} & Related to electron \cr |
There was a problem hiding this comment.
| %S & {\em phys.particle.electron\/} & Related to electron \cr | |
| S & {\em phys.particle.electron\/} & Related to electron \cr |
Ok for me Co-authored-by: Ian Nigel Evans <ievans@cfa.harvard.edu>
+1 Co-authored-by: Ian Nigel Evans <ievans@cfa.harvard.edu>
Updated UCD descriptions for event grade, type, and physical quantities in HighEnergyObsCoreExt.tex.
loumir
left a comment
There was a problem hiding this comment.
I hope I filled the changes . I am not used to the GitHub on line editor , so what is taken into account for the PDF compilation is not clear to me.
I would like the results of the semantics WG discussions that happened to appear in this note. This was part of the work all along the year.
Therefore in the listed requirements for UCD we should use the terms that have been recently discussed and approved by the semantics group .
I have updated various parts of this section and included %comments to explain the changes .