A file format for publication and exchange of data from experiments in single-molecule biophysics
The single-molecule dataset (SMD) format has been jointly developed in the groups of Dan Herschlag (Stanford) and Ruben Gonzalez (Columbia) to facilitate publication and exchange of data and analysis results obtained in single-molecule studies. This repository contains Matlab utility functions for creating, validating, saving and loading SMD structures in Matlab.
The smd-python repository contains similar Python utility functions for creating, validating, saving and loading SMD structures.The representation of a SMD structure in Matlab is as follows
- dataset :
struct
- .id :
string
Unique identifier for collection of traces (e.g. a hash)- .desc :
string
Human-readable decriptor for dataset- .types :
struct
- .index :
"bool" | "float" | "double" | "int" | "long" | "string"
Data type for index- .values :
struct
Data types for column values. Each field .column_name contains a format string as in .index- .attr :
struct
Dataset level features (e.g. descriptors of experimental conditions)- .data :
1 x N struct
- .id :
string
Unique identifier for trace (e.g. a hash)- .attr :
struct
Any trace-specific features that are not series- .index :
1 x T vector
Row index for trace data (e.g. acquisition times)- .values :
struct
Column values. Each field .column_name holds a1 x T vector
Download this repository from
https://github.com/smdata/smd-matlab/archive/master.zip
Unzip master.zip
to some location (e.g. c:\path\
)
Add the smdata
directory to the Matlab path by typing
addpath(genpath('c:\path\smd-matlab\'))
where c:\path\
is the directory where master.zip
was unpacked.
smd.create(data, types, varargin): Creates a SMD structure from supplied data.
smd.write_json(filename, dataset): Saves a SMD structure as JSON (.json
)or compressed JSON (.json.gz
).
smd.read_json(filename): Loads a SMD structure from JSON (.json
)or compressed JSON (.json.gz
).
smd.isvalid(dataset): Checks if supplied struct is a valid SMD instance.
smd.filter(dataset): Returns a filtered dataset by matching id
and attr
values, or by applying a custom function with boolean output to each trace.
smd.merge(data1, data2, ...): Returns a merged dataset containing all traces in multiple datasets.
Generate some fake data: Mixture of 3 Gaussian distributions
state_mean = [0.1, 0.5, 0.7];
state_noise = [0.05, 0.10, 0.05];
num_traces = 10;
max_length = 100;
for n = 1:num_traces
T = ceil(max_length * rand());
states = ceil(length(state_mean) * rand(T,1));
observations = state_mean(states)' + state_noise(states)' .* randn(T,1);
data{n} = [states, observations];
end
Create a SMD structure
% initialize smd structure
dataset = smd.create(data, {'state', 'int', 'observation', 'float'})
% add global attributes
dataset.attr.description = 'example data: mixture of 3 gaussians with equal occupancy';
dataset.attr.state_mean = state_mean;
dataset.attr.state_noise = state_noise;
dataset.attr.max_length = max_length;
Save data to disk
% save as Matlab data
save('example.mat', '-struct', 'dataset');
% save as plain text JSON (uncompressed)
smd.write_json('example.json', dataset);
% save as plain text JSON (with gzip compression)
smd.write_json('example.json.gz', dataset);
Load data from disk
% read matlab data
example = load('example.mat');
% read plain text json (uncompressed)
example = smd.read_json('example.json', dataset);
% read plain text json (with gzip compression)
example = smd.read_json('example.json.gz', dataset);
Filter data
% filter out traces with <= 50 data points
filtered = smd.filter(example, 'func', @(d) size(d.values,1) > 50);