From 303c2fa60f30bdc9e096b81b7ebe3cdb1afa3aa1 Mon Sep 17 00:00:00 2001 From: Oleg Serikov Date: Thu, 1 Nov 2018 17:41:03 +0300 Subject: [PATCH 1/4] [WIP] project proposal. no timeline, abstract, intro, references yet --- 2018-komp-ling/projects/serikov.md | 75 ++++++++++++++++++++++++++++++ 1 file changed, 75 insertions(+) create mode 100644 2018-komp-ling/projects/serikov.md diff --git a/2018-komp-ling/projects/serikov.md b/2018-komp-ling/projects/serikov.md new file mode 100644 index 00000000..cde64b09 --- /dev/null +++ b/2018-komp-ling/projects/serikov.md @@ -0,0 +1,75 @@ +# Compling project proposal + +RE: extracting phonological rules from neural networks paper: Jennifer Rodd (1997) "Recurrent Neural-Network Learning of Phonological Regularities in Turkish". CoNLL97: Computational Natural Language Learning, http://www.aclweb.org/anthology/W97-1012 + +## Abstract + +## Introduction +the background of the problem to be solved or topic to be studied, with references for the low-background reader. + +## Proposed goals + +### MVP +* re-implement the paper +* release the code and documentation showing how to run it, and extract the graphs that she shows. + +### EP +Run it with other languages in the Turkic family to see if their networks split the same way. She tries it for Turkish, but other Turkic languages have different vowel harmony systems, and Uzbek does not have vowel harmony at all. + +### HAP +Apply the received knowledge of computational phonology and turkic languages in particular to solve the task of phonetic embeddings alignment without having good parallel corpus. + +## Requirements +breaks each goal into sub-goals. Each sub-goal should be labeled with a brief list of anticipated skills required, with references for skills not already possessed by Applicants. + +**NB!** Each sub-goal is accompanied with the documentation composing and sources publishing where needed. + +**NB!** Goals may be changed on the run if necessary (e.g. smth unexpected discovered). All the possible changes should be discussed with mentors. + +### MVP requirements +#### skills required +Brief knowledge of actual NN creating techniques. +Data preprocessing skills +Vizualization skill. + +#### sub-goals +* Reproduce the dataset used in original paper +* Reproduce the NNs used in original paper +* Reproduce the training process used in original paper +* Reproduce the NNs analysis described in original paper +* Compare the reproduced and original results +* Draft the results report +* Discuss the results with mentors +* Report the results of the MVP stage online (e.g. repo readme) + +### EP requirements + +#### sub-goals +* Collect the data to repeat the research on different languages data +* Accomplish the ML part of the research +* Accomplish the analysis part of the research +* Interpret and visualize the analysis results if needed +* Draft the results report +* Discuss the results with mentors +* Report the results of the EP stage online (e.g. repo readme) + +### HAP requirements +Knowledge of the embeddings theory and its mathematical backend + + +#### sub-goals +* Build embeddings for a pair of languages +* Apply alignment technique described in [] (link) replacing the original idea of minimal-parallel-words-vocabulary with parallel phonemes vocabulary built on the idea of similarity of phonemes having + * the same glosses + * similar hidden layer units activation levels +* Evaluate the results +* Analyze and interpret the results, visualize smth if needed +* Draft the results report +* Discuss the results with mentors +* Report the results of the HAP stage in the paper format + +## Data policy +The work is continously published via GitHub repo under the MIT license. + +## References + From 600df73a0881b40666e5831e0f85dc429713eee4 Mon Sep 17 00:00:00 2001 From: Oleg Serikov Date: Wed, 7 Nov 2018 02:14:20 +0300 Subject: [PATCH 2/4] added time consumption info, fixed typos. --- 2018-komp-ling/projects/serikov.md | 82 +++++++++++++++--------------- 1 file changed, 41 insertions(+), 41 deletions(-) diff --git a/2018-komp-ling/projects/serikov.md b/2018-komp-ling/projects/serikov.md index cde64b09..735fe061 100644 --- a/2018-komp-ling/projects/serikov.md +++ b/2018-komp-ling/projects/serikov.md @@ -1,75 +1,75 @@ # Compling project proposal -RE: extracting phonological rules from neural networks paper: Jennifer Rodd (1997) "Recurrent Neural-Network Learning of Phonological Regularities in Turkish". CoNLL97: Computational Natural Language Learning, http://www.aclweb.org/anthology/W97-1012 - ## Abstract +RE: extracting phonological rules from neural networks paper: Jennifer Rodd (1997) "Recurrent Neural-Network Learning of Phonological Regularities in Turkish". CoNLL97: Computational Natural Language Learning, http://www.aclweb.org/anthology/W97-1012 ## Introduction -the background of the problem to be solved or topic to be studied, with references for the low-background reader. +**TO-DO** the background of the problem to be solved or topic to be studied, with references for the low-background reader. ## Proposed goals ### MVP -* re-implement the paper -* release the code and documentation showing how to run it, and extract the graphs that she shows. +* Re-implement the paper +* Release the code and documentation showing how to run it +* Extract the graphs that are contained in the original paper ### EP -Run it with other languages in the Turkic family to see if their networks split the same way. She tries it for Turkish, but other Turkic languages have different vowel harmony systems, and Uzbek does not have vowel harmony at all. +Run the NNs explored in the original paper with other languages in the Turkic family to see if their networks split the same way (other Turkic languages have different vowel harmony systems, and Uzbek does not have vowel harmony at all). ### HAP -Apply the received knowledge of computational phonology and turkic languages in particular to solve the task of phonetic embeddings alignment without having good parallel corpus. +Apply the received knowledge of computational phonology and Turkic family languages phonology in particular to solve the task of phonetic embeddings alignment without having big parallel corpus. ## Requirements -breaks each goal into sub-goals. Each sub-goal should be labeled with a brief list of anticipated skills required, with references for skills not already possessed by Applicants. **NB!** Each sub-goal is accompanied with the documentation composing and sources publishing where needed. **NB!** Goals may be changed on the run if necessary (e.g. smth unexpected discovered). All the possible changes should be discussed with mentors. ### MVP requirements -#### skills required -Brief knowledge of actual NN creating techniques. -Data preprocessing skills -Vizualization skill. - -#### sub-goals -* Reproduce the dataset used in original paper -* Reproduce the NNs used in original paper -* Reproduce the training process used in original paper -* Reproduce the NNs analysis described in original paper -* Compare the reproduced and original results -* Draft the results report -* Discuss the results with mentors -* Report the results of the MVP stage online (e.g. repo readme) +#### Skills required +* Brief knowledge of actual NN creating techniques. +* Data preprocessing skills +* Vizualization skill. + +#### Sub-goals +* 1 week| Reproduce the dataset used in original paper +* 1/2 week| Reproduce the NNs used in original paper +* 1/2 week| Reproduce the training process used in original paper +* 1/2 week| Reproduce the NNs analysis described in original paper +* 1/4 week| Compare the reproduced and original results +* 1/4 week| Draft the results report +* 1/4 week| Discuss the results with mentors +* 1/4 week| Report the results of the MVP stage online (e.g. repo readme) ### EP requirements -#### sub-goals -* Collect the data to repeat the research on different languages data -* Accomplish the ML part of the research -* Accomplish the analysis part of the research -* Interpret and visualize the analysis results if needed -* Draft the results report -* Discuss the results with mentors -* Report the results of the EP stage online (e.g. repo readme) +#### Sub-goals +* 1 week| Collect the data to repeat the research on different languages data +* 1/4 week| Accomplish the ML part of the research +* 1/2 week| Accomplish the analysis part of the research +* 1/2 week| Interpret and visualize the analysis results if needed +* 1/4 week| Draft the results report +* 1/2 week| Discuss the results with mentors +* 1/2 week| Report the results of the EP stage online (e.g. repo readme) ### HAP requirements -Knowledge of the embeddings theory and its mathematical backend +#### Skills required +* Knowledge of the embeddings theory and its mathematical backend -#### sub-goals -* Build embeddings for a pair of languages -* Apply alignment technique described in [] (link) replacing the original idea of minimal-parallel-words-vocabulary with parallel phonemes vocabulary built on the idea of similarity of phonemes having +#### Sub-goals +* 3/2 week| Build embeddings for a pair of languages +* 2 week| Apply alignment technique described in [] (link) replacing the original idea of minimal-parallel-words-vocabulary with parallel phonemes vocabulary built on the idea of similarity of phonemes having * the same glosses * similar hidden layer units activation levels -* Evaluate the results -* Analyze and interpret the results, visualize smth if needed -* Draft the results report -* Discuss the results with mentors -* Report the results of the HAP stage in the paper format +* 1 week| Evaluate the results +* 1 week| Analyze and interpret the results, visualize smth if needed +* 1/2 week| Draft the results report +* 1 week| Discuss the results with mentors +* 1 week| Report the results of the HAP stage in the paper format ## Data policy -The work is continously published via GitHub repo under the MIT license. +The work should be continously published via GitHub repo under the MIT license. ## References - +**TO-DO** From f15f304b1a3971eb354266c6da43a6bb85a8021c Mon Sep 17 00:00:00 2001 From: oserikov Date: Tue, 27 Nov 2018 03:12:54 +0300 Subject: [PATCH 3/4] wip: introduction --- 2018-komp-ling/projects/serikov.md | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/2018-komp-ling/projects/serikov.md b/2018-komp-ling/projects/serikov.md index 735fe061..5ad7ae5a 100644 --- a/2018-komp-ling/projects/serikov.md +++ b/2018-komp-ling/projects/serikov.md @@ -1,10 +1,17 @@ # Compling project proposal ## Abstract -RE: extracting phonological rules from neural networks paper: Jennifer Rodd (1997) "Recurrent Neural-Network Learning of Phonological Regularities in Turkish". CoNLL97: Computational Natural Language Learning, http://www.aclweb.org/anthology/W97-1012 +**TO-DO** one-paragraph summary of the rest of the proposal. ## Introduction **TO-DO** the background of the problem to be solved or topic to be studied, with references for the low-background reader. +RE: extracting phonological rules from neural networks paper: Jennifer Rodd (1997) "Recurrent Neural-Network Learning of Phonological Regularities in Turkish". CoNLL97: Computational Natural Language Learning, http://www.aclweb.org/anthology/W97-1012 + +Задача интерпретации нейронных сетей хорошая, добрая. Задача применения машинного обучения для извлечения интересных с точки зрения лингвистики данных хорошая, добрая. + +Статья (статья) показывает подход к решению поставленных задач, опробованный на турецком языке. Интересно проверить описанные в статье подходы на других языках, особенно на родственных рассмотренному в статье турецкому. + +Параллели в фонетических свойствах между родственными языками было бы интересно попробовать применить к исследованию фонетических эмбеддингов в родственных языках. ## Proposed goals From 6806fcd3a3f11fea5351bea7357699b79c260769 Mon Sep 17 00:00:00 2001 From: oserikov Date: Tue, 27 Nov 2018 03:26:07 +0300 Subject: [PATCH 4/4] added timings for mvp --- 2018-komp-ling/projects/serikov.md | 50 +++++++++++++++--------------- 1 file changed, 25 insertions(+), 25 deletions(-) diff --git a/2018-komp-ling/projects/serikov.md b/2018-komp-ling/projects/serikov.md index 5ad7ae5a..a98ac8e0 100644 --- a/2018-komp-ling/projects/serikov.md +++ b/2018-komp-ling/projects/serikov.md @@ -38,42 +38,42 @@ Apply the received knowledge of computational phonology and Turkic family langua * Data preprocessing skills * Vizualization skill. -#### Sub-goals -* 1 week| Reproduce the dataset used in original paper -* 1/2 week| Reproduce the NNs used in original paper -* 1/2 week| Reproduce the training process used in original paper -* 1/2 week| Reproduce the NNs analysis described in original paper -* 1/4 week| Compare the reproduced and original results -* 1/4 week| Draft the results report -* 1/4 week| Discuss the results with mentors -* 1/4 week| Report the results of the MVP stage online (e.g. repo readme) +#### Sub-goals (~ 21 nov – 15 dec 2018) +* 1 week (21 – 27 nov 2018) Reproduce the dataset used in original paper +* 1/2 week (28 – 30 nov 2018) Reproduce the NNs used in original paper +* 1/2 week (1 – 4 dec 2018) Reproduce the training process used in original paper +* 1/2 week (5 – 7 dec 2018) Reproduce the NNs analysis described in original paper +* 1/4 week (8 – 9 dec 2018) Compare the reproduced and original results +* 1/4 week (10 – 11 dec 2018) Draft the results report +* 1/4 week (12 – 13 dec 2018) Discuss the results with mentors +* 1/4 week (14 – 15 dec 2018) Report the results of the MVP stage online (e.g. repo readme) ### EP requirements -#### Sub-goals -* 1 week| Collect the data to repeat the research on different languages data -* 1/4 week| Accomplish the ML part of the research -* 1/2 week| Accomplish the analysis part of the research -* 1/2 week| Interpret and visualize the analysis results if needed -* 1/4 week| Draft the results report -* 1/2 week| Discuss the results with mentors -* 1/2 week| Report the results of the EP stage online (e.g. repo readme) +#### Sub-goals (~ 16 dec 2018 – 15 jan 2019) +* 1 week (d-d m y) Collect the data to repeat the research on different languages data +* 1/4 week (d-d m y) Accomplish the ML part of the research +* 1/2 week (d-d m y) Accomplish the analysis part of the research +* 1/2 week (d-d m y) Interpret and visualize the analysis results if needed +* 1/4 week (d-d m y) Draft the results report +* 1/2 week (d-d m y) Discuss the results with mentors +* 1/2 week (d-d m y) Report the results of the EP stage online (e.g. repo readme) ### HAP requirements #### Skills required * Knowledge of the embeddings theory and its mathematical backend -#### Sub-goals -* 3/2 week| Build embeddings for a pair of languages -* 2 week| Apply alignment technique described in [] (link) replacing the original idea of minimal-parallel-words-vocabulary with parallel phonemes vocabulary built on the idea of similarity of phonemes having +#### Sub-goals (~ 16 jan 2019 – 15 mar 2019) +* 3/2 week (d-d m y) Build embeddings for a pair of languages +* 2 wee (d-d m y)| Apply alignment technique described in [] (link) replacing the original idea of minimal-parallel-words-vocabulary with parallel phonemes vocabulary built on the idea of similarity of phonemes having * the same glosses * similar hidden layer units activation levels -* 1 week| Evaluate the results -* 1 week| Analyze and interpret the results, visualize smth if needed -* 1/2 week| Draft the results report -* 1 week| Discuss the results with mentors -* 1 week| Report the results of the HAP stage in the paper format +* 1 week (d-d m y) Evaluate the results +* 1 week (d-d m y) Analyze and interpret the results, visualize smth if needed +* 1/2 week (d-d m y) Draft the results report +* 1 week (d-d m y) Discuss the results with mentors +* 1 week (d-d m y) Report the results of the HAP stage in the paper format ## Data policy The work should be continously published via GitHub repo under the MIT license.