You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<p><strong>The notebooks in this directory demonstrate the "Ten Rules for Reproducible Research in Jupyter Notebooks". Throughout the notebooks we refer to some the rules we applied.</strong></p>
11786
+
<p><strong>The notebooks in this directory demonstrate and apply the "Ten Rules for Reproducible Research in Jupyter Notebooks". Throughout the notebooks we refer to some the rules we applied.</strong></p>
11787
11787
<p><strong>For example, this notebook demonstrates:</strong></p>
11788
11788
<hr>
11789
-
<p><strong>Rule 1: Tell a Story for an Audience.</strong> This notebook was developed for biologists to learn how to apply a simple machine learning model to protein sequences.</p>
11789
+
<p><strong>Rule 1: Tell a Story for an Audience.</strong> This notebook was developed to learn how to apply a simple machine learning model to predict protein features based on protein sequences.</p>
11790
11790
<p><strong>Rule 3: Build a Pipeline.</strong> This notebook describes the entire workflow from data preparation, feature calculation, model fitting, to prediction. The modularity makes it easy to replace one of the steps, for example, use a different method to calculate features or apply a different machine learning model.</p>
11791
-
<p><strong>Rule 5: Use Cell, Section adn Notebook Divisions to Make Steps Clear.</strong> We broke the workflow into separate notebooks and use this top-level notebook to explain and orchestrate the workflow.</p>
11791
+
<p><strong>Rule 5: Use Cell, Section and Notebook Divisions to Make Steps Clear.</strong> We broke the workflow into separate notebooks and use this top-level notebook to explain and organize the workflow.</p>
<p>Protein chains fold in regular patterns. Secondary structure describes the geometry of segments of a protein chain. The most common secondary structure elements are</p>
11809
+
<p>Proteins have four different levels of structure – primary, secondary, tertiary and quaternary. Secondary structure describes the geometry of segments of a protein chain. The most common secondary structure elements are:</p>
<h2id="Goal">Goal<aclass="anchor-link" href="#Goal">¶</a></h2><p>This notebook demonstrates how to create a reproducible record to create a machine learning model. We train a simple model to predict the fold class of a protein given its protein sequence using a representative set of 3D structures from the Protein Data Bank.</p>
11836
+
<h2id="Goal">Goal<aclass="anchor-link" href="#Goal">¶</a></h2><p>This notebook demonstrates how to create a reproducible record using a machine learning model. We train the model to predict the fold class of a protein given its amino acid sequence using a representative set of 3D structures from the Protein Data Bank.</p>
11837
11837
<p><strong>Run the following notebooks and explore how we applied the Ten Simple Rules.</strong></p>
<p>Protein sequences cannot be directly used for machine learning. Here use the Word2vec method to calculate a fixed-sized feature vector for each protein sequence.</p>
11890
+
<p>Protein sequences cannot be directly used for machine learning. Here we use the Word2vec method to calculate a fixed-sized feature vector for each protein sequence.</p>
11891
11891
<p>Run the following notebook to calculate feature vectors.</p>
11892
11892
11893
11893
</div>
@@ -12044,7 +12044,7 @@ <h2 id="Version-and-Hardware-Information">Version and Hardware Information<a cla
Copy file name to clipboardExpand all lines: example1/0-Workflow.ipynb
+8-8Lines changed: 8 additions & 8 deletions
Original file line number
Diff line number
Diff line change
@@ -11,17 +11,17 @@
11
11
"cell_type": "markdown",
12
12
"metadata": {},
13
13
"source": [
14
-
"**The notebooks in this directory demonstrate the \"Ten Rules for Reproducible Research in Jupyter Notebooks\". Throughout the notebooks we refer to some the rules we applied.**\n",
14
+
"**The notebooks in this directory demonstrate and apply the \"Ten Rules for Reproducible Research in Jupyter Notebooks\". Throughout the notebooks we refer to some the rules we applied.**\n",
15
15
"\n",
16
16
"**For example, this notebook demonstrates:**\n",
17
17
"\n",
18
18
"---\n",
19
19
"\n",
20
-
"**Rule 1: Tell a Story for an Audience.** This notebook was developed for biologists to learn how to apply a simple machine learning model to protein sequences.\n",
20
+
"**Rule 1: Tell a Story for an Audience.** This notebook was developed to learn how to apply a simple machine learning model to predict protein features based on protein sequences.\n",
21
21
"\n",
22
22
"**Rule 3: Build a Pipeline.** This notebook describes the entire workflow from data preparation, feature calculation, model fitting, to prediction. The modularity makes it easy to replace one of the steps, for example, use a different method to calculate features or apply a different machine learning model.\n",
23
23
"\n",
24
-
"**Rule 5: Use Cell, Section adn Notebook Divisions to Make Steps Clear.** We broke the workflow into separate notebooks and use this top-level notebook to explain and orchestrate the workflow.\n",
24
+
"**Rule 5: Use Cell, Section and Notebook Divisions to Make Steps Clear.** We broke the workflow into separate notebooks and use this top-level notebook to explain and organize the workflow.\n",
25
25
"\n",
26
26
"---"
27
27
]
@@ -37,7 +37,7 @@
37
37
"cell_type": "markdown",
38
38
"metadata": {},
39
39
"source": [
40
-
"Protein chains fold in regular patterns. Secondary structure describes the geometry of segments of a protein chain. The most common secondary structure elements are\n",
40
+
"Proteins have four different levels of structure – primary, secondary, tertiary and quaternary. Secondary structure describes the geometry of segments of a protein chain. The most common secondary structure elements are:\n",
41
41
"* Alpha helices\n",
42
42
"* Beta sheets"
43
43
]
@@ -46,7 +46,7 @@
46
46
"cell_type": "markdown",
47
47
"metadata": {},
48
48
"source": [
49
-
"We can classify proteins into three major fold classes based on their predominant secondary structure content\n",
49
+
"We can classify proteins into three major fold classes based on their predominant secondary structure content:\n",
"* alpha+beta: contains alpha helices and beta sheets"
@@ -57,7 +57,7 @@
57
57
"metadata": {},
58
58
"source": [
59
59
"## Goal\n",
60
-
"This notebook demonstrates how to create a reproducible record to create a machine learning model. We train a simple model to predict the fold class of a protein given its protein sequence using a representative set of 3D structures from the Protein Data Bank.\n",
60
+
"This notebook demonstrates how to create a reproducible record using a machine learning model. We train the model to predict the fold class of a protein given its amino acid sequence using a representative set of 3D structures from the Protein Data Bank.\n",
61
61
"\n",
62
62
"**Run the following notebooks and explore how we applied the Ten Simple Rules.**"
63
63
]
@@ -103,7 +103,7 @@
103
103
"cell_type": "markdown",
104
104
"metadata": {},
105
105
"source": [
106
-
"Protein sequences cannot be directly used for machine learning. Here use the Word2vec method to calculate a fixed-sized feature vector for each protein sequence.\n",
106
+
"Protein sequences cannot be directly used for machine learning. Here we use the Word2vec method to calculate a fixed-sized feature vector for each protein sequence.\n",
107
107
"\n",
108
108
"Run the following notebook to calculate feature vectors. "
109
109
]
@@ -230,7 +230,7 @@
230
230
"source": [
231
231
"---\n",
232
232
"\n",
233
-
"**Authors:** Peter W. Rose, Shih-Cheng Huang, UC San Diego, October 1, 2018\n",
233
+
"**Authors:** [Peter W. Rose](mailto:pwrose.ucsd@gmail.com), Shih-Cheng Huang, UC San Diego, October 1, 2018\n",
0 commit comments