Skip to content

Commit 3ff7a5c

Browse files
committed
Keep .md files in the docs/
1 parent 3bb637b commit 3ff7a5c

File tree

4 files changed

+139
-0
lines changed

4 files changed

+139
-0
lines changed

docs/Model.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# Model builder
2+
3+
Depends on Keras and model loader
4+
5+
Contains 2 classes: ModelBuilder and LossHistory (utility)
6+
7+
ModelBuilder takes conf and builds a model using Keras library, and provides methods to manipulate the model (save, load, etc)
8+
9+
## MPI builder
10+
11+
Serves a similar purpose, provides a set of MPI wrapper classes. Uses Keras SGD with Theano backend.
12+
13+
# Model runner
14+
15+
Depends on model Loader, performance utils
16+
17+
Contains a set of standalone functions, which givena shotlist perform training, make predictions, make evaluations and produce plots.
18+
19+
20+
# Targets
21+
22+
Defines a class hierarchy of targets, specifying loss, activation functions and other params for the NNs
23+
24+
25+
# Loader
26+
27+
Depends on from primitives.shots
28+
29+
Given conf and shotlist, provides tools to load shotlist, get batches, construct patches and manipulate them.
30+
31+
It is a way to deliver preprocessed data into model and prepare it for training.

docs/Preprocessing.md

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# Raw data
2+
3+
The raw 0D data comes in a plain structured text format:
4+
5+
1. Shot list: a 2 column CSV file having a unique identifier of a shot and a disruption time columns (-1 for non-disruptive).
6+
1. Individual shot files: a 2 column CSV files having a time and a plasma current value columns. The time grid used is common, but the length os shots is rather different (for each file in the shot list)
7+
8+
See `plasma.jet_signals` for more details.
9+
10+
# Preprocessing
11+
12+
The goal of the preprocessing step is to go from the raw data to the higher level primitives: Shots, ShotLists.
13+
In addition, signal is cut, clipped and resampled (use univariate linear spline, log transformation on signal) and a `ttd` (time-to-disruption) variable is introduced.
14+
15+
Certain shots are marked invalid depending on the magnitude of the plasma current.
16+
17+
Preprocessed results are saved in a numpy binary `npz` file.
18+
19+
The core methods are:
20+
1. `plasma.preprocessor.preprocess.get_signals_and_times_from_file`
21+
1. `plasma.preprocessor.preprocess.cut_and_resample_signals`
22+
1. `plasma.utils.processing.cut_and_resample_signal`
23+
24+
25+
# Normalization
26+
27+
Shot normalization is done to address the problem of different scales of plasma signals which could potentially have a negative effect on the neural network training and inference.
28+
29+
Normalizers are trained on the training shots (requires one pass over data before the RNN training). Normalizer training essentially means extracting a set of statistics about shots and incorporating them into shot (mean, std, min-max).
30+
Similarly to preprocessing step, an entire ShotList is split into sublists, a random sublist is picked, then stats are extracted on a shot-by-shot basis and saved in a normalizer object.
31+
32+
Example:
33+
34+
```python
35+
class MeanVarNormalizer(Normalizer):
36+
def __init__(self,conf):
37+
Normalizer.__init__(self,conf)
38+
self.means = None
39+
self.stds = None
40+
```
41+
42+
Will contain lists of means and standard deviations of signals in the training shot list.
43+
44+
Normalization is implemented as a class hierarchy, wirth a base `plasma.preprocessor.Normalizer` class defining how stats are extracted and how training is perfromed. A set of specific normalization classes e.g. `MeanVarNormalizer`, `VarNormalizer` is derived from it, implementing different methods of shot normalization.

docs/Primitives.md

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
## Shot
2+
3+
Each shot is a measurement of plasma current as a function of time. The Shot objects contains following attributes:
4+
5+
1. number - integer, unique identifier of a shot
6+
1. t_disrupt - double, disruption time in milliseconds (second column in the shotlist input file)
7+
1. ttd - array of doubles, time profile of the shot converted to time-to-disruption values
8+
1. valid - boolean, whether plasma current reaches a certain value during the shot
9+
1. is_disruptive - boolean,
10+
11+
12+
For 0D data, each shot is modeled as 2D array - time vs plasma current.
13+
14+
## ShotList
15+
16+
Is a wrapper around list of shots. Therefore, it is a list of 2D arrays.
17+
18+
## Sublist
19+
20+
Shot lists is split into sublists having `num_at_once` shots from an entire dataset contained in ShotList.
21+
22+
## Patch
23+
24+
The length of shots varies by a factor of 20. For data parallel synchronous training it is essential that amounds of train data passed to the model replica is about the same size.
25+
26+
Patches are subsets of shot time/signal profiles of equal length. Patch size is approximately equal to the minimum shot length (or the largest number less or equal to the minimum shot length divisible by the LSTM model length).
27+
28+
Since shot lengthes are not multiples of the min shot length in general, some non-deterministic fraction of patches is created.
29+
30+
## Chunk
31+
32+
A subset of `patch` defined as:
33+
```
34+
num_chunks = Length of the patch/ num_timesteps
35+
```
36+
where `num_timesteps` is the sequence length fed to the RNN model.
37+
38+
## Batch
39+
40+
Mini-batch gradient descent is used to train neural network model.
41+
`num_batches` represents the number of *patches* per mini-batch.
42+
43+
### Batch input shape
44+
45+
The data in batches fed to the Keras model should have shape:
46+
47+
```
48+
batch_input_shape = (num_chunks*batch_size,num_timesteps,num_dimensions_of_data)
49+
```
50+
51+
where `num_dimensions_of_data` is the signal dimensionality. For 0D dataset we only have a time profile of plasma current,
52+
so `num_dimensions_of_data = 1`

docs/Targets.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# Understanding targets
2+
3+
An abstract base class implemented using Python ABC library and a set of classes derived from it.
4+
5+
## Data members
6+
7+
activation and loss, type string
8+
9+
10+
## Static methods
11+
12+
remapper and threshold_range

0 commit comments

Comments
 (0)