Data handling with the Data Class¶
Note: This module is still a work-in-progress and the usage of these classes and/or functions may change in the future.
The following contains examples of how to use the various features in the Data
, whose purpose is to simplify the handling of neutron scattering data and provide basic analysis and visualization functions.
Initialization: loading data¶
There are a couple of options to initialize the Data
object. If your data is in SPICE (HFIR, ORNL), ICE or ICP (NCNR) formats then you can use load
, otherwise you will need to load the data and pass it to Data
manually.
Load from file¶
First, let’s assume that you have a data file, 'scan.dat'
from a HFIR instrument (SPICE format):
>>> from neutronpy.fileio import load_data
>>> data = load_data('scan.dat')
This builds data
automatically and loads all of the data columns from the file.
If you want to load more than one file at a time, simple add the file names as a list or a tuple of file names, e.g.
>>> data = load_data(('scan1.dat', 'scan2.dat'))
By default, load_data
attempts to determine the format of the input files automatically, but you can specify the filetype
if desired. Valid filetypes are currently:
'auto'
- Default: attempt to automatically determine the file type'SPICE'
- HFIR files'ICE'
- NCNR files'ICP'
- NCNR files'DCS_MSLICE'
- ASCII files exported from DCS MSLICE in Dave'MAD'
- ILL files'GRASP'
- ASCII and HDF5 files exported from GRASP
Pass pre-loaded data¶
If your data is not in a format supported by Data.load_data
you will need to load the data yourself and pass it to the Data
class and build Data.Q
using Data.build_Q
. To build Data.Q
you must have defined h
, k
, l
, e
, and temp
.
>>> data = Data(h=h, k=k, l=l, e=e, temp=temp, detector=detector,
monitor=monitor, time=time)
However, if you do not need to build Data.Q
, and want to use some subset of the predefined columns (h
, k
, l
, e
, temp
, monitor
, detector
, time
, the undefined columns will be fill by np.zeros, with the length of the data passed.
>>> data = Data(h=np.arange(0, 1, 0.1))
Note
In the future, it may be possible to pass arbitrary data to build a bare Data
object with the Data.data
attribute. It is already technically possible to do this by building an empty Data object, and reassigning the .data attribute.
>>> data = Data()
>>> data.data = dict(angle1=np.arange(47, 49, 0.25))
Data
properties¶
Below I outline some of the most common properties that you will want of the Data
class.
Intensity and error¶
Intensity, i.e. detector / monitor * m0
and square-root error are respectively obtained by
>>> data.intensity
>>> data.error
Monitor normalization¶
If you want to normalize to a particular monitor m0
then you will need to define it, e.g.
>>> data.m0 = 1e5
If you do not choose a m0
, when you call Data.intensity
one will be defined for you based on the monitor
already defined in data
.
Time normalization¶
If you want to normalize to a particular time t0
then you will need to set time_norm
to True
and define t0
in minutes, e.g.
>>> data.time_norm = True
>>> data.t0 = 5
If you do not choose a t0
, when you call Data.intensity
one will be defined for you based on the time
already defined in data
.
The Q
vector¶
In this case, Q
is collection of column arrays defined as [h, k, l, e, temp]
, with data.Q.shape = (N, 5)
. Typically, one would expect that temp
not be included in Q
, but for the purposes of rebinning it is included currently. In the future, rebinning may be expanded to include other arbitrary dimensions, rather than just these five. If data has been loaded from one of the supported file formats, or Data.build_Q
has been used then these variables can also be accessed separately by:
>>> h = data.h
>>> k = data.k
>>> l = data.l
>>> e = data.e
>>> temp = data.temp
Data
operations¶
Combining data is as easy as adding multiple Data
objects together, e.g.
>>> data1 = load_data('scan1.dat', filetype='SPICE')
>>> data2 = load_data('scan2.dat', filetype='SPICE')
>>> data = data1 + data2
This will combine monitor and detector counts for existing points and concatenate unique points in the two objects to create a new data
object.
Subtracting works in a similar way, but keep in mind that in its current form it doesn’t interpolate, so if Q
is different between the two data
variables then you will end up with negative intensities at positions where there isn’t an overlapping Q. Proper background subtraction will be implemented in the future.
The *
, /
and **
operators only act on the detector variable. This is useful for example if you want to apply the detailed balance factor obtained from Data.detailed_balance_factor
Quick analysis¶
Often you will want to know the integrated intensity, peak position, and mean-squared width for some part of your data, without relying on fitting. This is easily accomplished with Data.integrate
, Data.position
, and Data.width
.
It is possible to specify the bounds inside which you want to perform these analyses by forming a boolean expression. For example, below is the definition of the bounds of a 1x1 square around (100) at 4 meV:
>>> bounds = ((np.abs(data.h - 1) <= 0.5) & (np.abs(data.k) <= 0.5) &
(np.abs(data.e - 4) <= 0.25))
>>> int_inten = data.integrate(bounds=bounds)
Binning data¶
Often data is on an irregular grid with some arbitrary step-size, but you will want to regularly grid your data in some way. You can do this using Data.bin
. First you need to define the bin parameters as a dictionary of lists in the form [start, end, bins]
. Let’s say that we want to bin our data so that we have a hk0-e
volume with 0.025 r.l.u. step size in h
and k
between -2 and 2 r.l.u., and 0.25 meV in e
between -10 and 10 meV, at 300 K for a relatively stable temperature. We would form the bin parameters as follows:
>>> to_bin = {'h': [-2, 2, 161], 'k': [-2, 2, 161], 'l': [-0.2, 0.2, 1],
'e': [-10, 10, 81], 'temp': [290, 310, 1]}
>>> binned_data = data.bin(to_bin)
The output is a new Data
object, so that your original data is still maintained in the original data object variable.
Visualizing data¶
Note 1: Data.plot
is still relatively experimental. 1-D data plotting and fitting works as intended in its current form, but higher dimensional plotting is still very much a work in progress.
Note 2: For publication quality figures, even for 1-D data, it is not recommended to use Data.plot
, since some more advanced plot configuration options from matplotlib are not easily available to the user. Instead, Data.plot
is currently intended to be used for quickly plotting data for easy visualization.
Basic plotting¶
Plotting requires at least two parameters to be defined, x
and y
for a line scan plot. By defining z
and w
(or not) you control what type of plot is generated. x
, y
, z
, and w
are defined by assigning one of the following strings: 'h'
, 'k'
, 'l'
, 'temp'
, 'e'
, or 'intensity'
. For example, for a scatter plot with error bars of a line scan, a contour plot of a slice, and a scatter plot of a volume you can do the following, respectively,
>>> data.plot('h', 'intensity')
>>> data.plot('h', 'k', z='intensity')
>>> data.plot('h', 'k', z='e', w='intensity')
Options¶
There are several options that can currently be used to enhance the plots, including rebinning, fitting and smoothing. More options will be added in the future to make the plotting more extensible.
Binning¶
Binning can be achieved by passing the bin
dictionary, as defined in the manner described above in the binning section. For example,
>>> to_bin = {'h': [0.5, 1.5, 41], 'k': [-0.1, 0.1, 1], 'l': [-0.1, 0.1, 1],
'e': [3.5, 4.5, 1], 'temp': [290, 310, 1]}
>>> data.plot('h', 'intensity', bin=to_bin)
If bin
is not defined, then the raw data is plotted, meaning that if you have multidimensional data that you are trying to plot as a line scan, all of the data will be projected onto the line you want to plot.
Fitting¶
Fitting to arbitrary functions, only applicable for line scan plots, can be performed by passing the fit_options
dictionary. At a minimum, the initial parameters p
and the function
must be defined. Additionally, if holding a parameter fixed is desired, fixp
must be defined as a list
of the same length as p
where 1
indicates fixed and 0
indicates released. For example,
>>> from neutronpy.functions import gaussian
>>> data.plot('h', 'intensity', fit_options={'p': [0, 0, 1, 0.9, 0.06],
'function': gaussian, 'fixp': [1, 1, 0, 0, 0]})
Smoothing¶
Smoothing using a multidimensional gaussian filter can be enabled by passing the smooth_options
dictionary with at least a non-zero sigma
value. Other appropriate options can be found in the scipy.ndimage.filters.gaussian_filter definition. For example,
>>> data.plot('h', 'intensity', smooth_options={'sigma': 1.0})
Plot options¶
Matplotlib plot options may be passed as a dictionary plot_options
to Data.plot
for the appropriate plot type:
- Line scan : errorbar
- Slice : pcolormesh
- Volume : scatter
Miscellaneous¶
show_plot
: If False,plt.show()
will not be executed inside theData.plot
method, and will have to be executed separately. Useful if overplotting.output_file
: If defined, a file with the plot will be saved, in the format specified by the file extension. File type must be supported by the active matplotlib backendshow_err
: If False, will not plot error bars on the scan line plot.