A convenience function for returning the current time as a ISO 8601 or as a unix timestamp.
+
A convenience function for returning the current time as a ISO 8601 or as a Unix
+timestamp.
Return type:
Union[float, str]
@@ -184,8 +195,8 @@
headers (Optional[list]) – An array of strings. If spec is not supplied, must contain either “uts”
(float) or “timestep” (str) (conforming to ISO 8601).
spec (Optional[TimestampSpec]) – A specification of timestamp elements with associated column indices and
-optional formats. Currently accepted combinations of keys are: “uts”; “timestamp”;
-“date” and / or “time”.
+optional formats. Currently accepted combinations of keys are: “uts”;
+“timestamp”; “date” and / or “time”.
tz – Timezone to use for conversion. By default, UTC is used.
If the optional format is specified, the timestamp string is processed
+using the datetime.datetime.strptime() function; if no format is
+supplied, an ISO 8601 format is assumed and an attempt to parse using
+dateutil.parser.parse() is made.
+
+
Parameters:
+
+
timestamp (str) – A string containing the timestamp.
+
format (Optional[str]) – Optional format string for parsing of the timestamp.
+
timezone (str) – Optional timezone of the timestamp. By default, “UTC”.
+
strict (bool) – Whether to re-raise any parsing errors.
+
+
+
Returns:
+
uts – Returns the POSIX timestamp if successful, otherwise None.
A helper function ensuring that the Dataset ds contains a dimension "uts",
+and that the timestamps in "uts" are completed as instructed in the
+externaldate specification.
This sanitizer should be used where user-supplied units are likely to occur,
-such as in the parsers yadg.parsers.basiccsv. Currently, only two
+such as in the parsers yadg.extractors.basic.csv. Currently, only two
replacements are done:
A convenience function for returning the current time as a ISO 8601 or as a unix timestamp.
+
A convenience function for returning the current time as a ISO 8601 or as a Unix
+timestamp.
Return type:
Union[float, str]
@@ -655,8 +736,8 @@
headers (Optional[list]) – An array of strings. If spec is not supplied, must contain either “uts”
(float) or “timestep” (str) (conforming to ISO 8601).
spec (Optional[TimestampSpec]) – A specification of timestamp elements with associated column indices and
-optional formats. Currently accepted combinations of keys are: “uts”; “timestamp”;
-“date” and / or “time”.
+optional formats. Currently accepted combinations of keys are: “uts”;
+“timestamp”; “date” and / or “time”.
tz – Timezone to use for conversion. By default, UTC is used.
A helper function ensuring that the Dataset ds contains a dimension "uts",
+and that the timestamps in "uts" are completed as instructed in the
+externaldate specification.
This sanitizer should be used where user-supplied units are likely to occur,
-such as in the parsers yadg.parsers.basiccsv. Currently, only two
+such as in the parsers yadg.extractors.basic.csv. Currently, only two
replacements are done:
Extractors for data files generated by various proprietary Agilent software.
+
Extractor of Agilent OpenLab binary signal trace files (.ch and .it).
+Currently supports version “179” of the files. Version information is defined in
+the magic_values (parameters & metadata) and data_dtypes (data) dictionaries.
datatree.DataTree:
+{{detector_name}}:
+coords:
+uts:!!float# Unix timestamp
+elution_time:!!float# Elution time
+data_vars:
+signal:(uts, elution_time)# Signal data
+
Extractor of Agilent Chemstation Chromtab tabulated data files. This file format may
+include multiple timesteps consisting of several traces each in a single CSV file. It
+contains a header section for each timestep, followed by a detector name, and a sequence
+of “X, Y” datapoints, which are stored as elution_time and signal.
+
+
Warning
+
It is not guaranteed that the X-axis of the chromatogram (i.e. elution_time) is
+consistent between the timesteps of the same trace. The traces are expanded to the
+length of the longest trace, and the shorter traces are padded with NaNs.
datatree.DataTree:
+{{detector_name}}:
+coords:
+uts:!!float# Unix timestamp
+elution_time:!!float# Elution time
+data_vars:
+signal:(uts, elution_time)# Signal data
+
Extractor of Agilent OpenLab DX archives. This is a wrapper parser which unzips the
+provided DX file, and then uses the yadg.extractors.agilent.ch extractor
+to parse every CH file present in the archive. The IT files in the archive are currently
+ignored.
+
+
Note
+
Currently the timesteps from multiple CH files (if present) are appended in the
+timesteps array without any further sorting.
datatree.DataTree:
+{{detector_name}}:
+coords:
+uts:!!float# Unix timestamp
+elution_time:!!float# Elution time
+data_vars:
+signal:(uts, elution_time)# Signal data
+
Handles the reading and processing of any tabular files, as long as the first line
+contains the column headers. The columns of the table must be separated using a
+separator such as ,, ;, or \t.
+
+
Note
+
By default, the second line of the file should contain the units. Alternatively,
+the units can be supplied using extractor parameters, in which case the second line
+is considered to be data.
+
+
Since yadg-5.0, the basic.csv extractor handles sparse tables (i.e. tables with
+missing data) by back-filling empty cells with np.NaNs.
+
The basic.csv extractor attempts to deduce the timestamps from the column headers,
+using yadg.dgutils.dateutils.infer_timestamp_from(). Alternatively, the column(s)
+containing the timestamp data and their format can be provided using extractor
+parameters.
Extractors for files from MesaLabs DryCal Pro software for Defender flow meters.
+
This module includes shared functions for the drycal
+extractor, including functions for parsing the files, processing the tabulated data,
+and ensuring timestamps are increasing.
Given a table with headers and units in the first line, and data in the following
+lines, this function returns the headers, units, and data extracted from the table.
+The returned values are always of (str) type, any post-processing is done
+in the calling routine.
xarray.Dataset:
+coords:
+uts:!!float# Unix timestamp, without date
+data_vars:
+DryCal:(uts)# Standardised flow rate
+DryCal Avg.:(uts)# Running average of the flow rate
+Temp.:(uts)# Measured flow temperature
+Pressure:(uts)# Measured flow pressure
+
Handles the reading and processing of volumetric flow meter data exported from the
+MesaLabs DryCal software as a rtf file.
+
+
Note
+
The date information is missing in the timestamps of the exported files and has to
+be supplied externally. The timestamp in the header of the rtf file corresponds to
+the timestamp of export / report generation, not measurement.
xarray.Dataset:
+coords:
+uts:!!float# Unix timestamp, without date
+data_vars:
+DryCal:(uts)# Standardised flow rate
+DryCal Avg.:(uts)# Running average of the flow rate
+Temp.:(uts)# Measured flow temperature
+Pressure:(uts)# Measured flow pressure
+
xarray.Dataset:
+coords:
+uts:!!float# Unix timestamp, without date
+data_vars:
+DryCal:(uts)# Standardised flow rate
+DryCal Avg.:(uts)# Running average of the flow rate
+Temp.:(uts)# Measured flow temperature
+Pressure:(uts)# Measured flow pressure
+
Convert a supplied key of a certain parameter to its string or float value.
+
The function uses the map defined in param_map to convert between the
+entries in the tuples, which contain the str value of the parameter
+(present in .mpt files), the int value of the parameter (present
+in .mpr files), and the corresponding float value in SI units.
+
+
Parameters:
+
+
param (str) – The name of the parameter, a key within the param_map. If param
+is not present in param_map, the supplied key is returned back.
+
key (Union[int, str]) – The key of the parameter that is to be converted to a different representation.
+
to_str (bool) – A switch between str and float output.
+
+
+
Returns:
+
key – The key converted to the requested format.
+
+
Return type:
+
Union[str, float, int]
+
+
+
+
+
+
+yadg.extractors.eclab.common.techniques.get_resolution(name, value, unit, Erange, Irange)
+
Function that returns the resolution of a property based on its name, value,
+E-range and I-range.
+
The values used here are hard-coded from VMP-3 potentiostats. Generally, the
+resolution is returned, however in some cases only the accuracy is specified
+(currently freq and Phase).
+
+
Return type:
+
float
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Other Versions
+ v: master
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/master/apidoc/yadg.extractors.eclab.html b/master/apidoc/yadg.extractors.eclab.html
new file mode 100644
index 00000000..3213f62a
--- /dev/null
+++ b/master/apidoc/yadg.extractors.eclab.html
@@ -0,0 +1,858 @@
+
+
+
+
+
+
+ eclab: For BioLogic data files — yadg master documentation
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
The mpr files contain many columns that vary depending on the electrochemical
+technique used. Below is shown a list of columns that can be expected to be present
+in a typical mpr file.
+
xarray.Dataset:
+coords:
+uts:!!float# Unix timestamp, without date
+data_vars:
+Ewe (uts)# Potential of the working electrode
+Ece (uts)# Potential of the counter electrode, if present
+I (uts)# Instantaneous current
+time (uts)# Time elapsed since the start of the experiment
+<Ewe> (uts)# Average Ewe potential since last data point
+<Ece> (uts)# Average Ece potential since last data point
+<I> (uts)# Average current since last data point
+...
+
+
+
+
Note
+
Note that in most cases, either the instantaneous or the averaged quantities are
+stored - only rarely are both available!
.mpr files are structured in a set of “modules”, one concerning
+settings, one for actual data, one for logs, and an optional loops
+module. The parameter sequences can be found in the settings module.
+
This code is partly an adaptation of the galvani module by Chris
+Kerr, and builds on the work done
+by the previous civilian service member working on the project, Jonas Krieger.
+
These are the implemented techniques for which the technique parameter
+sequences can be parsed:
+
+
+
CA
+
Chronoamperometry / Chronocoulometry
+
+
CP
+
Chronopotentiometry
+
+
CV
+
Cyclic Voltammetry
+
+
GCPL
+
Galvanostatic Cycling with Potential Limitation
+
+
GEIS
+
Galvano Electrochemical Impedance Spectroscopy
+
+
LOOP
+
Loop
+
+
LSV
+
Linear Sweep Voltammetry
+
+
MB
+
Modulo Bat
+
+
OCV
+
Open Circuit Voltage
+
+
PEIS
+
Potentio Electrochemical Impedance Spectroscopy
+
+
WAIT
+
Wait
+
+
ZIR
+
IR compensation (PEIS)
+
+
+
+
At a top level, .mpr files are made up of a number of modules,
+separated by the MODULE keyword. In all the files I have seen, the
+first module is the settings module, followed by the data module, the
+log module and then an optional loop module.
After splitting the entire file on MODULE, each module starts with a
+header that is structured like this (offsets from start of module):
+
0x0000short_name# Short name, e.g. VMP Set.
+0x000Along_name# Longer name, e.g. VMP settings.
+0x0023length# Number of bytes in module data.
+0x0027version# Module version.
+0x002Bdate# Acquisition date in ASCII, e.g. 08/10/21.
+...# Module data.
+
+
+
The contents of each module’s data vary wildly depending on the used
+technique, the module and perhaps the software version, the settings in
+EC-Lab, etc. Here a quick overview (offsets from start of module data).
+
Settings Module
+
0x0000technique_id# Unique technique ID.
+...# ???
+0x0007comments# Pascal string.
+...# Zero padding.
+# Cell Characteristics.
+0x0107active_material_mass# Mass of active material
+0x010Bat_x# at x =
+0x010Fmolecular_weight# Molecular weight of active material
+0x0113atomic_weight# Atomic weight of intercalated ion
+0x0117acquisition_start# Acquisition started a: xo =
+0x011Be_transferred# Number of e- transferred
+0x011Eelectrode_material# Pascal string.
+...# Zero Padding
+0x01C0electrolyte# Pascal string.
+...# Zero Padding, ???.
+0x0211electrode_area# Electrode surface area
+0x0215reference_electrode# Pascal string
+...# Zero padding
+0x024Ccharacteristic_mass# Characteristic mass
+...# ???
+0x025Cbattery_capacity# Battery capacity C =
+0x0260battery_capacity_unit# Unit of the battery capacity.
+...# ???
+# Technique parameters can randomly be found at 0x0572, 0x1845 or
+# 0x1846. All you can do is guess and try until it fits.
+0x1845ns# Number of sequences.
+0x1847n_params# Number of technique parameters.
+0x1849params# ns sets of n_params parameters.
+...# ???
+
+
+
Data Module
+
0x0000n_datapoints# Number of datapoints.
+0x0004n_columns# Number of values per datapoint.
+0x0005column_ids# n_columns unique column IDs.
+...
+# Depending on module version, datapoints start 0x0195 or 0x0196.
+# Length of each datapoint depends on number and IDs of columns.
+0x0195datapoints# n_datapoints points of data.
+
+
+
Log Module
+
...# ???
+0x0009channel_number# Zero-based channel number.
+...# ???
+0x00ABchannel_sn# Channel serial number.
+...# ???
+0x01F8Ewe_ctrl_min# Ewe ctrl range min.
+0x01FCEwe_ctrl_max# Ewe ctrl range max.
+...# ???
+0x0249ole_timestamp# Timestamp in OLE format.
+0x0251filename# Pascal String.
+...# Zero padding, ???.
+0x0351host# IP address of host, Pascal string.
+...# Zero padding.
+0x0384address# IP address / COM port of potentiostat.
+...# Zero padding.
+0x03B7ec_lab_version# EC-Lab version (software)
+...# Zero padding.
+0x03BEserver_version# Internet server version (firmware)
+...# Zero padding.
+0x03C5interpreter_version# Command interpretor version (firmware)
+...# Zero padding.
+0x03CFdevice_sn# Device serial number.
+...# Zero padding.
+0x0922averaging_points# Smooth data on ... points.
+...# ???
+
+
+
Loop Module
+
0x0000n_indexes# Number of loop indexes.
+0x0004indexes# n_indexes indexes at which loops start in data.
+...# ???
+
The metadata will contain the information from the Settings module. This should
+include information about the technique, as well as any explicitly parsed cell
+characteristics data specified in EC-Lab.
The mapping between metadata parameters between .mpr and .mpt files
+is not yet complete. In .mpr files, some technique parameters in the settings
+module correspond to entries in drop-down lists in EC-Lab. These values are
+stored as single-byte values in .mpr files.
+
+
The metadata also contains the infromation from the Log module, which contains
+more general parameters, like software, firmware and server versions, channel number,
+host address and an acquisition start timestamp in Microsoft OLE format.
+
+
Note
+
If the .mpr file contains an ExtDev module (containing parameters
+of any external sensors plugged into the device), the log is usually
+not present and therefore the full timestamp cannot be calculated.
Puts together column info from a list of data column IDs.
+
+
Note
+
The binary layout of the data in the .mpr file is described by a
+sequence of column IDs. Some column IDs relate to (flags) which are
+all packed into a single byte.
+
+
+
Parameters:
+
column_ids (list[int]) – A list of column IDs.
+
+
Returns:
+
The column names, dtypes, units and a dictionary of flag names
+and bitmasks.
version (int) – Module version from the data module header.
+
+
+
Returns:
+
Processed data ([{column -> value}, …, {column -> value}]). If
+the column unit is set to None, the value is an int. Otherwise,
+the value is a dict with value (“n”), sigma (“s”), and unit
+(“u”).
The .mpt files contain many columns that vary depending on the electrochemical
+technique used. Below is shown a list of columns that can be expected to be present
+in a typical .mpt file.
+
xarray.Dataset:
+coords:
+uts:!!float# Unix timestamp, without date
+data_vars:
+Ewe (uts)# Potential of the working electrode
+Ece (uts)# Potential of the counter electrode, if present
+I (uts)# Instantaneous current
+time (uts)# Time elapsed since the start of the experiment
+<Ewe> (uts)# Average Ewe potential since last data point
+<Ece> (uts)# Average Ece potential since last data point
+<I> (uts)# Average current since last data point
+...
+
+
+
+
Note
+
Note that in most cases, either the instantaneous or the averaged quantities are
+stored - only rarely are both available!
These human-readable files are sectioned into headerlines and datalines.
+The header part of the .mpt files is made up of information that can be found
+in the settings, log and loop modules of the binary .mpr file.
+
If no header is present, the timestamps will instead be calculated from
+the file’s mtime().
lines (list[str]) – The data lines, starting right after the last header section.
+The first line is an empty line, the column names can be found
+on the second line.
+
+
Returns:
+
A dictionary containing the datapoints in the format
+([{column -> value}, …, {column -> value}]). If the column
+unit is set to None, the value is an int. Otherwise, the value
+is a dict with value (“n”), sigma (“s”), and unit (“u”).
xarray.Dataset:
+coords:
+uts:!!float# Unix timestamp
+species:!!str# Species name
+data_vars:
+height:(uts, species)# Peak height
+area:(uts, species)# Integrated peak area
+concentration:(uts, species)# Peak area with calibration applied
+retention time:(uts, species)# Position of peak maximum
+
xarray.Dataset:
+coords:
+uts:!!float# Unix timestamp
+species:!!str# Species name
+data_vars:
+height:(uts, species)# Peak height
+area:(uts, species)# Integrated peak area
+concentration:(uts, species)# Peak area with calibration applied
+retention time:(uts, species)# Position of peak maximum
+
Extractors for data files generated by Agilent’s EZChrom software.
+
Handles files created using the ASCII export function in the EZChrom software.
+This file format includes one timestep with multiple traces for each ASCII file. It
+contains a header section, and a sequence of Y datapoints (signal) for each
+detector. The X-axis (elution_time) is assumed to be uniform between traces, and
+its units have to be deduced from the header.
datatree.DataTree:
+{{detector_index}}:
+coords:
+uts:!!float# Unix timestamp
+elution_time:!!float# Elution time
+data_vars:
+signal:(uts, elution_time)# Signal data
+
A set of custom extractors for processing files generated by the MCPT instrument at FHI,
+now in the Risse group at FU Berlin.
+
This parser handles the reading and processing of the legacy log files created by
+the LabView interface for the MCPT instrument at FHI, now FU Berlin. These files contain
+information about the timestamp, temperatures, and inlet / process flows.
Used to process files generated using Agilent PNA-L N5320C via its LabVIEW driver.
+This file format includes a header, with the values of bandwidth and averaging,
+and three tab-separated columns containing the frequency \(f\), and the real
+and imaginary parts of the complex reflection coefficient \(\Gamma(f)\).
+
Note that no timestamps are present in the file and have to be supplied externally,
+e.g. from the file name. One trace per file. As the MCPT set-up for which this
+extractor was designed always uses the S11 port, the node name is is hard-coded to
+this value.
datatree.DataTree:
+S11:!!xarray.Dataset
+coords:
+freq:!!float# An array of measurement frequencies
+data_vars:
+Re(G):(freq)# Real part of Γ
+Im(G):(freq)# Imaginary part of Γ
+average:(None)# Number of traces averaged
+bandwidth:(None)# Filter bandwidth
+
For processing Inficon Fusion csv export format (csv). This is a tabulated format,
+including the concentrations, mole fractions, peak areas, and retention times.
+
+
Warning
+
As also mentioned in the csv files themselves, the use of this filetype
+is discouraged, and the json files (or a zipped archive of them) should
+be parsed instead.
For processing Inficon Fusion zipped data. This is a wrapper parser which unzips the
+provided zip file, and then uses the yadg.extractors.fusion.json extractor
+to parse every fusion-data file present in the archive.
+
Contains both the data from the raw chromatogram and the post-processed results.
These are xml-formatted files, which we here parse using the xml.etree
+library into a Python dict.
+
The angle returned from this parser is based on a linear interpolation of the start
+and end point of the scan, and is the \(2\theta\). The values of \(\omega\)
+are discarded.
These files basically just contain the [Scanpoints] part of Panalytical csv files.
+As a consequence, no metadata is recorded, and the format does not have an associated
+timestamp.
These binary files actually contain an ASCII file header, delimited by
+“SOFH
+“ and “EOFH
+“.
+
The binding energies corresponding to the datapoints in the later part
+of the file can be found from the “SpectralRegDef” entries in this
+header. Each of these entries look something like:
After the file header, the binary part starts with a short data header
+(offsets given from start of data header):
+
0x0000group# Data group number.
+0x0004num_traces# Number of traces in file
+0x0008trace_header_size# Combined lengths of all trace headers.
+0x000cdata_header_size# Length of this data header.
+
+
+
After this follow num_traces trace headers that are each structured
+something like this:
+
0x0000trace_number# Number of the trace.
+0x0004bool_01# ???
+0x0008bool_02# ???
+0x000ctrace_number_again# Number of the trace. Again?
+0x0010bool_03# ???
+0x0014num_datapoints# Number of datapoints in trace.
+0x0018bool_04# ???
+0x001cbool_05# ???
+0x0020string_01# ???
+0x0024string_02# ???
+0x0028string_03# ???
+0x002cint_02# ???
+0x0030string_04# ???
+0x0034string_05# ???
+0x0038y_unit# The unit of the datapoints.
+0x003cint_05# ???
+0x0040int_06# ???
+0x0044int_07# ???
+0x0048data_dtype# Data type for datapoints (f4 / f8).
+0x004cnum_data_bytes# Unsure about this one.
+0x0050num_datapoints_tot# This one as well.
+0x0054int_10# ???
+0x0058int_11# ???
+0x005cend_of_data# Byte offset of the end-of-data.
+
+
+
After the trace headers follow the datapoints. After the number of
+datapoints there is a single 32bit float with the trace’s dwelling time
+again.
The uncertainties of "E" are taken as the step-width of
+the linearly spaced energy values.
+
The uncertainties "s" of "y" are currently set to a constant
+value of 12.5 counts per second as all the signals in the files seen so
+far only seem to take on values in those steps.
datatree.DataTree:
+{{trace_index}}:
+coords:
+uts:!!float# Unix timestamp
+mass_to_charge:!!float# M/Z ratio
+data_vars:
+fsr:(None)# Full scale range
+y:(uts, mass_to_charge)# Signal data
+
Uncertainties in mass_to_charge are set to one step in M/Z spacing.
+
Uncertainties in the signal y are either based on the analog-to-digital conversion
+(i.e. using the full scale range), or from the upper limit of contribution of
+neighboring M/Z points (50 ppm).
This module parses the files generated by the dummy and biologic devices
+within tomato-0.2. As the dummy device has been mainly used for testing,
+the below discusses the output of a biologic device.
xarray.Dataset:
+coords:
+uts:!!float# Unix timestamp
+data_vars:
+Ewe:(uts)# Potential of the working electrode
+Ece:(uts)# Potential of the counter electrode, if present
+I:(uts)# Instantaneous current
+technique:(uts)# Technique name
+loop number:(uts)# Loop number (over techniques)
+cycle number:(uts)# Cycle number (within technique)
+index:(uts)# Technique index
+
The files generated by the dummy driver do not contain the technique, and all
+values present in the json files are simply copied over assuming an uncertainty of 0.0.
+
For the biologic driver, each tomato data file contains the following four sections:
+
+
technique section, describing the current technique,
+
previous section, containing status information of the previous file,
+
current section, containing status information of the current file,
+
data section, containing the timesteps.
+
+
The reason why both previous and current are requires is that the device
+status is recorded at the time of data polling, which means the values in current
+might be invalid (after the run has finished) or not in sync with the data (if
+a technique change happened). However, previous may not be present in the first
+data file of an experiment.
To determine the measurement errors, the values from BioLogic manual are used: for
+measured voltages (\(E_{\text{we}}\) and \(E_{\text{ce}}\)) this corresponds
+to a constant uncertainty of 0.004% of the applied E-range with a maximum of 75 uV,
+while for currents (\(I\)) this is a constant uncertainty of 0.0015% of the applied
+I-range with a maximum of 0.76 uA.
+
+
+
\ No newline at end of file
diff --git a/master/apidoc/yadg.html b/master/apidoc/yadg.html
index 9dcb128a..b67874dc 100644
--- a/master/apidoc/yadg.html
+++ b/master/apidoc/yadg.html
@@ -57,22 +57,20 @@
A helper function ensuring that the Dataset ds contains a dimension "uts",
-and that the timestamps in "uts" are completed as instructed in the
-externaldate specification.
The data is returned as a xarray.Dataset or a datatree, and is stored in
a NetCDF file. The output location can be configured using the outfile
argument, by default this is set to the stem of infile with a .nc suffix.
Optionally, an export of just the metadata can be requested by setting the
diff --git a/master/citing.html b/master/citing.html
index ebc9ca10..7133d581 100644
--- a/master/citing.html
+++ b/master/citing.html
@@ -22,7 +22,7 @@
-
+
@@ -60,22 +60,20 @@
adding their implementation in a separate Python package under yadg.parsers
+
adding their implementation in a separate Python package under yadg.parsers
-
Each parser should be documented by adding a structured docstring into the __init__.py file of each new sub-module of yadg.parsers. This documentation should describe the application and usage of the parser, and refer to the Pydantic audotocs via DataSchema to discuss the features exposed via the parameters dictionary. Generally, code implementing the parsing of specific filetypes should be kept separate from the main parser function in the module.
+
Each parser should be documented by adding a structured docstring into the __init__.py file of each new sub-module of yadg.parsers. This documentation should describe the application and usage of the parser, and refer to the Pydantic audotocs via DataSchema to discuss the features exposed via the parameters dictionary. Generally, code implementing the parsing of specific filetypes should be kept separate from the main parser function in the module.
New filetypes should be implemented as sub-modules of their parser. They should be documented using a top-level docstring in the relevant sub-module. If the filetype is binary, a description of the file structure should be provided in the docstring. Every new filetype will have to be added into the filetype module as well.
New extractors can be registered using a shim in the yadg.extractors module, referring to the filetype. The __init__.py of each extractor should expose:
In the resulting NetCDF files, the unit annotations are stored in .attrs["units"] on each xarray.DataArray, that is within each “column” of each “node” of the datatree.DataTree. If an entry does not contain .attrs["units"], the quantity is dimensionless.
+
In the resulting NetCDF files, the unit annotations are stored in .attrs["units"] on each xarray.DataArray, that is within each “column” of each “node” of the datatree.DataTree. If an entry does not contain .attrs["units"], the quantity is dimensionless.
Warning
A special pint.UnitRegistry was exposed in yadg-4.x under yadg.dgutils.ureg. Use of this pint.UnitRegistry is deprecated as of yadg-5.0, and it will be removed in yadg-6.0.
yadg is a set of tools and parsers aimed to extract and standardise data from raw files generated by scientific instruments. The supported types of files that can be extracted are listed in the sidebar. The data (or metadata) extracted from the supplied file is returned as a xarray.Dataset or a NetCDF file.
+
yadg is a set of tools and parsers aimed to extract and standardise data from raw files generated by scientific instruments. The supported types of files that can be extracted are listed in the sidebar. The data (or metadata) extracted from the supplied file is returned as a xarray.Dataset or a NetCDF file.
For extracting and combining data from multiple files, yadg can be used to process a special configuration file called dataschema. The combined data is returned as a datatree.DataTree or a NetCDF file. This allows reproducible processing of structured experimental data, and takes care of issues such as timezone resolution, unit annotation, uncertainty determination, and keeps track of provenance.
The top level datatree.DataTree contains the following metadata stored in its attributes:
@@ -134,7 +132,7 @@
yadg datagramxarray.Dataset. The following conventions are used:
+
The contents of the attribute fields for each step will vary depending on the parser used to create the corresponding xarray.Dataset. The following conventions are used:
a coord field uts contains a Unix timestamp (float),
The top level datatree.DataTree contains the following metadata stored in its attributes:
@@ -193,7 +191,7 @@
yadg datagramxarray.Dataset. The following conventions are used:
+
The contents of the attribute fields for each step will vary depending on the parser used to create the corresponding xarray.Dataset. The following conventions are used:
a coord field uts contains a Unix timestamp (float),
diff --git a/master/objects.inv b/master/objects.inv
index 947fdd09..750231c0 100644
Binary files a/master/objects.inv and b/master/objects.inv differ
diff --git a/master/py-modindex.html b/master/py-modindex.html
index 4e639fe1..bd74472e 100644
--- a/master/py-modindex.html
+++ b/master/py-modindex.html
@@ -59,22 +59,20 @@
The metadata are returned as a .json file, and are generated using the to_dict() function of xarray.Dataset. They contain a description of the data coordinates (coords), dimensions (dims), and variables (data_vars), and include their names, attributes, dtypes, and shapes.
The list of supported filetypes that can be extracted using yadg can be found in the left sidebar. For more information about the extractor concept, see MaRDA Metadata Extractors WG.
if peak integration data is present in the raw data file, this is now included
in the "raw" key directly. The included quantities are height, area,
@@ -139,7 +137,7 @@
the PEIS/GEIS data is split into timesteps, not cycles.
NaN and Inf in the metadata of some input formats should now be handled
diff --git a/master/version.4_2.html b/master/version.4_2.html
index 2aa33e7f..851d6bee 100644
--- a/master/version.4_2.html
+++ b/master/version.4_2.html
@@ -55,22 +55,20 @@
the chromtrace parser now focuses on parsing chromatography
-traces only, use chromdata for parsing post-processed chromatographic
+
the chromtrace parser now focuses on parsing chromatography
+traces only, use chromdata for parsing post-processed chromatographic
data;
-
the flowdata parser now no longer creates a default "flow"
+
the flowdata parser now no longer creates a default "flow"
entry in derived data;
data post-processing within yadg, including chromatographic trace integration,
reflection coefficient processing, and calibration functionality is deprecated in favour
@@ -135,13 +133,13 @@
Data post-processing within yadg has been removed, following its deprecation in yadg-4.2. All previously included post-processing functionality should be available in dgpost-2.0. If you find functionality that has been broken since yadg-4.2 and which cannot be implemented in dgpost, please file an issue on GitHub with example files.
The yadgupdate functionality is now only for updating dataschema; the ability to update datagrams has been removed.
-
The parameter transpose from electrochem parser is no longer available; all electrochemistry data is returned as plain timesteps.
-
The valve number in the fusion-json extractor of chromtrace is now stored as data instead of metadata.
+
The parameter transpose from electrochem parser is no longer available; all electrochemistry data is returned as plain timesteps.
+
The valve number in the fusion-json extractor of chromtrace is now stored as data instead of metadata.
Bug fixes include:
-
the electrochem parser now properly parses files with WAIT technique;
-
the electrochem parser understands more versions of the MB technique versions in the biologic.mpr filetype;
-
the electrochem parser can handle localized versions of data in the biologic.mpt filetype;
-
the chromtrace parser now properly unzips data when using the agilent.dx filetype.
+
the electrochem parser now properly parses files with WAIT technique;
+
the electrochem parser understands more versions of the MB technique versions in the biologic.mpr filetype;
+
the electrochem parser can handle localized versions of data in the biologic.mpt filetype;
+
the chromtrace parser now properly unzips data when using the agilent.dx filetype.
Data post-processing within yadg has been removed, following its deprecation in yadg-4.2. All previously included post-processing functionality should be available in dgpost-2.0. If you find functionality that has been broken since yadg-4.2 and which cannot be implemented in dgpost, please file an issue on GitHub with example files.
The yadgupdate functionality is now only for updating dataschema; the ability to update datagrams has been removed.
-
The parameter transpose from electrochem parser is no longer available; all electrochemistry data is returned as plain timesteps.
-
The valve number in the fusion-json extractor of chromtrace is now stored as data instead of metadata.
+
The parameter transpose from electrochem parser is no longer available; all electrochemistry data is returned as plain timesteps.
+
The valve number in the fusion-json extractor of chromtrace is now stored as data instead of metadata.
Bug fixes include:
-
the electrochem parser now properly parses files with WAIT technique;
-
the electrochem parser understands more versions of the MB technique versions in the biologic.mpr filetype;
-
the electrochem parser can handle localized versions of data in the biologic.mpt filetype;
-
the chromtrace parser now properly unzips data when using the agilent.dx filetype.
+
the electrochem parser now properly parses files with WAIT technique;
+
the electrochem parser understands more versions of the MB technique versions in the biologic.mpr filetype;
+
the electrochem parser can handle localized versions of data in the biologic.mpt filetype;
+
the chromtrace parser now properly unzips data when using the agilent.dx filetype.
the chromtrace parser now focuses on parsing chromatography
-traces only, use chromdata for parsing post-processed chromatographic
+
the chromtrace parser now focuses on parsing chromatography
+traces only, use chromdata for parsing post-processed chromatographic
data;
-
the flowdata parser now no longer creates a default "flow"
+
the flowdata parser now no longer creates a default "flow"
entry in derived data;
data post-processing within yadg, including chromatographic trace integration,
reflection coefficient processing, and calibration functionality is deprecated in favour
@@ -196,13 +194,13 @@
if peak integration data is present in the raw data file, this is now included
in the "raw" key directly. The included quantities are height, area,
@@ -246,7 +244,7 @@