diff --git a/_posts/labnews/2011-02-21-fire-ant-genome-out.markdown b/_posts/labnews/2011-02-21-fire-ant-genome-out.markdown index f7dc050c..94d35294 100644 --- a/_posts/labnews/2011-02-21-fire-ant-genome-out.markdown +++ b/_posts/labnews/2011-02-21-fire-ant-genome-out.markdown @@ -13,7 +13,7 @@ tags: - work --- -Two papers just out! Our [Solenopsis invicta fire ant genome ](http://www.pnas.org/cgi/doi/10.1073/pnas.1009690108) paper is out in PNAS. Win! And a study [on fire ant Odorant Binding Proteins](http://www.plosone.org/article/info:doi/10.1371/journal.pone.0016289) in PLoS ONE. [Anurag Priyam](http://yeban.in) and are developing a [generic BLAST web interface](http://www.sequenceserver.com) in ruby. It's already super useful for our [fourmidable ant genome database](http://www.antgenomes.org), and I'm sure will be for others working with non-model organisms. (easy to use; less of a hassle to set up than gmod...). Using the server, you can [blast ant genome sequences](http://www.antgenomes.org/blast) (and predicted genes). +Two papers just out! Our [Solenopsis invicta fire ant genome ](http://www.pnas.org/cgi/doi/10.1073/pnas.1009690108) paper is out in PNAS. Win! And a study [on fire ant Odorant Binding Proteins](http://www.plosone.org/article/info:doi/10.1371/journal.pone.0016289) in PLoS ONE. Anurag Priyam and are developing a [generic BLAST web interface](http://www.sequenceserver.com) in ruby. It's already super useful for our [fourmidable ant genome database](http://www.antgenomes.org), and I'm sure will be for others working with non-model organisms. (easy to use; less of a hassle to set up than gmod...). Using the server, you can [blast ant genome sequences](http://www.antgenomes.org/blast) (and predicted genes). @@ -23,4 +23,4 @@ Two papers just out! Our [Solenopsis invicta fire ant genome ](http://www.pnas.o -Photo of fire ants on their genome (C) [Romain Libbrecht](http://www.unil.ch/dee/page50472_en.html) & [Yannick Wurm](http://www.sbcs.qmul.ac.uk/staff/yannickwurm.html) +Photo of fire ants on their genome (C) Romain Libbrecht & [Yannick Wurm](https://www.qmul.ac.uk/sbbs/staff/yannickwurm.html) diff --git a/_posts/labnews/2011-06-09-june-update.markdown b/_posts/labnews/2011-06-09-june-update.markdown index 8d3cde65..0f811493 100644 --- a/_posts/labnews/2011-06-09-june-update.markdown +++ b/_posts/labnews/2011-06-09-june-update.markdown @@ -6,12 +6,12 @@ layout: post slug: june-update title: May Taiwan Conf & June genome updates wordpress_id: 29 -categories: +categories: - labnews --- -Had a great two weeks visiting [John Wang's lab at Academia Sinica, Taiwan](http://biodiv.sinica.edu.tw/en2007/index.php?pi=157), and join National Taiwan University's [International Symposium on Social Insects](http://twentomolsoc.blogspot.com/2011/03/international-symposium-on-social.html) for wonderfully stimulating talks by [Jo Billen](http://bio.kuleuven.be/ento/), [Lars Chittka](http://chittkalab.sbcs.qmul.ac.uk/), [James Nieh,](http://www-biology.ucsd.edu/labs/nieh/) [Kenji Matsuura](http://www.agr.okayama-u.ac.jp/LIECO/englishpage.html) & [Bob Vander Meer](http://ars.usda.gov/pandp/people/people.htm?personid=5796). The symposium gave me the opportunity to share some thoughts about [sequencing genomes with high throughput technologies](http://yannick.poulet.org/publications/wurm2011antGenomeBehindTheScenes.pdf) in the journal of the Taiwan Entomological Society, [Formosan Entomologist](http://140.112.100.38/english.htm). +Had a great two weeks visiting John Wang's lab at Academia Sinica, Taiwan, and join National Taiwan University's [International Symposium on Social Insects](http://twentomolsoc.blogspot.com/2011/03/international-symposium-on-social.html) for wonderfully stimulating talks by [Jo Billen](http://bio.kuleuven.be/ento/), [Lars Chittka](http://chittkalab.sbcs.qmul.ac.uk/), [James Nieh,](http://www-biology.ucsd.edu/labs/nieh/) Kenji Matsuura & [Bob Vander Meer](http://ars.usda.gov/pandp/people/people.htm?personid=5796). The symposium gave me the opportunity to share some thoughts about [sequencing genomes with high throughput technologies](http://yannick.poulet.org/publications/wurm2011antGenomeBehindTheScenes.pdf) in the journal of the Taiwan Entomological Society, Formosan Entomologist. @@ -21,9 +21,9 @@ Had a great two weeks visiting [John Wang's lab at Academia Sinica, Taiwan](http - -In genomic news, the _Acromyrmex echinatior _leafcutter ant genome, led by [Sanne Nygaard](http://www1.bio.ku.dk/english/research/oe/cse/personer/sanne/) & [Koos Boosma](http://www1.bio.ku.dk/english/research/oe/cse/personer/koos/) is _in press_! The data are already on [Fourmidable](http://www.antgenomes.org); and Fourmdiable's [ant genome BLAST interface](http://www.antgenomes.org) was updated to the latest [SequenceServer](http://www.sequenceserver.com). + +In genomic news, the _Acromyrmex echinatior _leafcutter ant genome, led by Sanne Nygaard & Koos Boosma is _in press_! The data are already on [Fourmidable](http://www.antgenomes.org); and Fourmdiable's [ant genome BLAST interface](http://www.antgenomes.org) was updated to the latest [SequenceServer](http://www.sequenceserver.com). diff --git a/_posts/labnews/2011-07-03-shenzhen-social-insect-conference.markdown b/_posts/labnews/2011-07-03-shenzhen-social-insect-conference.markdown index cfd70f73..c8439634 100644 --- a/_posts/labnews/2011-07-03-shenzhen-social-insect-conference.markdown +++ b/_posts/labnews/2011-07-03-shenzhen-social-insect-conference.markdown @@ -6,12 +6,12 @@ layout: post slug: shenzhen-social-insect-conference title: Social insect genomics conference 2011 wordpress_id: 41 -categories: +categories: - labnews - genomics --- -Many interesting talks and stimulating discussions during [Shenzhen's Social Insect Genomics Conference](http://ldl.genomics.org.cn/event/conference.jsp?conId=31) which coincided with the release of Sanne Nygaard's [_Acromyrmex echinatior_ leaf-cutter ant genome paper](http://www.genome.org/cgi/doi/10.1101/gr.121392.111) showing adaptations linked to fungal farming. More excitement is on its way with next generation sociogenetics projects bubbling up around the world & across the phylogeny! +Many interesting talks and stimulating discussions during Shenzhen's Social Insect Genomics Conference which coincided with the release of Sanne Nygaard's [_Acromyrmex echinatior_ leaf-cutter ant genome paper](http://www.genome.org/cgi/doi/10.1101/gr.121392.111) showing adaptations linked to fungal farming. More excitement is on its way with next generation sociogenetics projects bubbling up around the world & across the phylogeny! diff --git a/_posts/labnews/2011-12-01-Solenopsis-invicta-fire-ant-genome-paper.markdown b/_posts/labnews/2011-12-01-Solenopsis-invicta-fire-ant-genome-paper.markdown index 6ca45cfd..9991a5c5 100644 --- a/_posts/labnews/2011-12-01-Solenopsis-invicta-fire-ant-genome-paper.markdown +++ b/_posts/labnews/2011-12-01-Solenopsis-invicta-fire-ant-genome-paper.markdown @@ -24,7 +24,7 @@ categories:
+
diff --git a/_posts/labnews/2014-01-27-reference.markdown b/_posts/labnews/2014-01-27-reference.markdown
index e67c7881..01ffe1d3 100644
--- a/_posts/labnews/2014-01-27-reference.markdown
+++ b/_posts/labnews/2014-01-27-reference.markdown
@@ -3,49 +3,49 @@ layout: post
title: Reference Letters
date: 2014-01-27
comments: true
-categories:
+categories:
- labnews
- teaching
- writing
---
-Current or former students *very regularly* ask me for a reference to help them apply for a job or a new study program. The process is facilitated & the letter is improved by the following advice.
+Current or former students *very regularly* ask me for a reference to help them apply for a job or a new study program. The process is facilitated & the letter is improved by the following advice.
-If you need a reference letter from me, I need you to write a first draft. First, you are best positioned to know what makes you great for what you're applying for. Second, you'll end up with a better letter if my time is spent revising something than if I try to create something from scratch.
+If you need a reference letter from me, I need you to write a first draft. First, you are best positioned to know what makes you great for what you're applying for. Second, you'll end up with a better letter if my time is spent revising something than if I try to create something from scratch.
-Your draft should in the form of a letter from me about you (yes, it can feel awkward to write like this). You basically need to say that you are a great and justify why). Some general tips:
+Your draft should in the form of a letter from me about you (yes, it can feel awkward to write like this). You basically need to say that you are a great and justify why). Some general tips:
* Please respect the style guidelines given by Strunk & White's "The Elements of Style".
-* Keep things concise.
+* Keep things concise.
* Use a spell-checker and a grammar-checker (on strict mode!).
* It's better if the examples you use are relevant to the degree you're applying to.
-* Don't highlight weaknesses. E.g. if you have a "C" in something don't mention it.
-* Whatever you do, don't lie. Any lies will come back to hurt you 1000-fold (karma).
+* Don't highlight weaknesses. E.g. if you have a "C" in something don't mention it.
+* Whatever you do, don't lie. Any lies will come back to hurt you 1000-fold (karma).
* Send it as a document I can edit (not a PDF).
** 2015 update: a much more [exhaustive list of writing tips here]({% post_url /labnews/2015-02-05-scientific-writing %}).**
** 2020 update: ** please use an automatic grammar and style checker such as [Grammarly](https://grammarly.go2cloud.org/SH2na) or Microsoft Word's grammar checker. They aren't magical solutions, but can help you a lot!
-### Structure
+### Structure
-Introductory paragraph. This should include:
+Introductory paragraph. This should include:
* Why I am writing
- * Why I know you well (e.g., I am your academic advisor/tutor/supervisor/lecturer since at Queen Mary since xxx when you started your degree in XX).
- * Which degree you are doing and when you are expected to graduate.
- * The last sentence should be a small list of ideas (see below), summarizing why you are great for the opportunity you're applying to. This also announces the structure of the subsequent pre-conclusion paragraphs (i.e., it should end with a list of 2 or 3 or 4 items as below).
+ * Why I know you well (e.g., I am your academic advisor/tutor/supervisor/lecturer since at Queen Mary since xxx when you started your degree in XX).
+ * Which degree you are doing and when you are expected to graduate.
+ * The last sentence should be a small list of ideas (see below), summarizing why you are great for the opportunity you're applying to. This also announces the structure of the subsequent pre-conclusion paragraphs (i.e., it should end with a list of 2 or 3 or 4 items as below).
-One paragraph per idea (no ping-ponging back and forth!). Some examples of ideas:
+One paragraph per idea (no ping-ponging back and forth!). Some examples of ideas:
* academic achievements (e.g., coursework or overall grades, predicted final grade ("first?"))
- * evidence that you are dedicated/serious/hardworking/intelligent/creative (e.g., based on your project, punctuality, behavior in tutorials).
- * evidence that you have a good personality (e.g., social intelligence, teamwork, helping others).
+ * evidence that you are dedicated/serious/hardworking/intelligent/creative (e.g., based on your project, punctuality, behavior in tutorials).
+ * evidence that you have a good personality (e.g., social intelligence, teamwork, helping others).
* extra-curricular activities (jobs, volunteering)
-Conclusion: a quick summary stating that you're great for the degree/program/job because of the 3 or 4 ideas.
+Conclusion: a quick summary stating that you're great for the degree/program/job because of the 3 or 4 ideas.
-Overall, the reference should not take more than 1 page - people are unlikely to read anything that is longer.
+Overall, the reference should not take more than 1 page - people are unlikely to read anything that is longer.
---
-Thanks to Rob Hammond for telling me about The Elements of Style years ago.
+Thanks to Rob Hammond for telling me about The Elements of Style years ago.
diff --git a/_posts/labnews/2015-06-02-avoidgenomicsretractions.md b/_posts/labnews/2015-06-02-avoidgenomicsretractions.md
index 883d6baa..7955c1cb 100644
--- a/_posts/labnews/2015-06-02-avoidgenomicsretractions.md
+++ b/_posts/labnews/2015-06-02-avoidgenomicsretractions.md
@@ -23,7 +23,7 @@ categories:
## Biology is a data-science
-The dramatic [plunge in DNA sequencing costs](http://www.genome.gov/images/content/cost_megabase_.jpg) means that a single MSc or PhD student can now generate data that would have cost $15,000,000 only ten years ago. We are thus leaping from lab-notebook-scale science to research that requires extensive programming, statistics and high performance computing.
+The dramatic plunge in DNA sequencing costs means that a single MSc or PhD student can now generate data that would have cost $15,000,000 only ten years ago. We are thus leaping from lab-notebook-scale science to research that requires extensive programming, statistics and high performance computing.
This is exciting & empowering – in particular for small teams working on emerging model organisms that lacked genomic resources. But with great powers come great responsibilities... and risks of doing things wrong. These risks are far greater for genome biologists than, say physicists or astronomers who have strong traditions of working with large datasets. In particular:
@@ -83,13 +83,13 @@ Additionally, the essentials of experimental design are long established: ensuri
There is no way around it: analysing large datasets is hard.
-When genomics projects involved tens of millions of $, much of this went to teams of dedicated data scientists, statisticians and bioinformaticians who could ensure data quality and analysis rigor. As sequencing has gotten cheaper the challenges [and costs](http://genomebiology.com/2011/12/8/125/figure/F1?highres=y) have shifted even further towards data analysis. For large scale human resequencing projects this is well understood. Despite the challenges being even greater for organisms with only few genomic resources, surprisingly many PIs, researchers and funders focusing on such organisms suppose that individual researchers with little formal training will be able to perform all necessary analysis. This is worrying and suggests that important stakeholders who still have limited experience of large datasets underestimate how easily mistakes with major negative consequences occur and go undetected. We may have to see additional publication retractions for awareness of the risks to fully take hold.
+When genomics projects involved tens of millions of $, much of this went to teams of dedicated data scientists, statisticians and bioinformaticians who could ensure data quality and analysis rigor. As sequencing has gotten cheaper the challenges and costs have shifted even further towards data analysis. For large scale human resequencing projects this is well understood. Despite the challenges being even greater for organisms with only few genomic resources, surprisingly many PIs, researchers and funders focusing on such organisms suppose that individual researchers with little formal training will be able to perform all necessary analysis. This is worrying and suggests that important stakeholders who still have limited experience of large datasets underestimate how easily mistakes with major negative consequences occur and go undetected. We may have to see additional publication retractions for awareness of the risks to fully take hold.
Thankfully, multiple initiatives are improving visibility of the data challenges we face (e.g., [1](http://www.nature.com/news/core-services-reward-bioinformaticians-1.17251), [2](https://www.epsrc.ac.uk/funding/calls/rsefellowships/), [3](http://www.nature.com/nature/journal/v498/n7453/full/498255a.html), [4](http://www.nytimes.com/2011/12/01/business/dna-sequencing-caught-in-deluge-of-data.html?_r=0), [5](http://ivory.idyll.org/blog/2015-docker-and-replicating-papers.html), [6](http://www.software.ac.uk)). Such visibility of the risks – and of how easy it is to implement practices that will improve research robustness – needs to grow among funders, researchers, PIs, journal editors and reviewers. This will ultimately bring more people to do better, more trustworthy science that will never need to be retracted.
## Acknowledgements
-*This post came together thanks to the [SSI Collaborations workshop](http://software.ac.uk), [Bosco K Ho's post on Geoffrey Chang](http://boscoh.com/protein/a-sign-a-flipped-structure-and-a-scientific-flameout-of-epic-proportions.html), discussions in [my lab](http://wurmlab.github.io) and through interactions with colleagues at the [social insect genomics conference](https://meetings.cshl.edu/meetings/2015/insect15.shtml) and the [NESCent Genome Curation group](http://genomecuration.github.io). YW is funded by the Biotechnology and Biological Sciences Research Council [BB/K004204/1], the Natural Environment Research Council [NE/L00626X/1, [EOS Cloud](http://environmentalomics.org/portfolio/big-data-infrastructure/)] and is a fellow of the [Software Sustainablity Institute](http://software.ac.uk).*
+*This post came together thanks to the [SSI Collaborations workshop](http://software.ac.uk), [Bosco K Ho's post on Geoffrey Chang](http://boscoh.com/protein/a-sign-a-flipped-structure-and-a-scientific-flameout-of-epic-proportions.html), discussions in [my lab](http://wurmlab.github.io) and through interactions with colleagues at the social insect genomics conference and the [NESCent Genome Curation group](http://genomecuration.github.io). YW is funded by the Biotechnology and Biological Sciences Research Council [BB/K004204/1], the Natural Environment Research Council [NE/L00626X/1, EOS Cloud] and is a fellow of the [Software Sustainablity Institute](http://software.ac.uk).*
[Please cite The Winnower version of this article](https://thewinnower.com/papers/avoid-having-to-retract-your-genomics-analysis)
diff --git a/_posts/labnews/2016-02-01-sequenceserverpaper.markdown b/_posts/labnews/2016-02-01-sequenceserverpaper.markdown
index 950c54f7..c6565955 100644
--- a/_posts/labnews/2016-02-01-sequenceserverpaper.markdown
+++ b/_posts/labnews/2016-02-01-sequenceserverpaper.markdown
@@ -11,12 +11,12 @@ categories:
---
-
+
Happy to announce that we now have a manuscript describing the rationale and current features of SequenceServer - our easy to setup BLAST frontend. Importantly, the manuscript also provides extensive detail about the sustainable software development and user-centric design approaches we used to build this software. The full bioRxiv reference is:
Sequenceserver: a modern graphical user interface for custom BLAST databases 2015. Priyam, Woodcroft, Rai, Munagala, Moghul, Ter, Gibbins, Moon, Leonard, Rumpf and Wurm. bioRxiv doi: 10.1101/033142 [PDF].
-Be sure to check out the interactive figure giving a guided tour of Sequenceserver's BLAST results. +Be sure to check out the interactive figure giving a guided tour of Sequenceserver's BLAST results. Finally, I'll note that Sequenceserver arose from our own needs; these are clearly shared by many as Sequenceserver has already been cited in ≥20 publications and has been downloaded ≥30,000 times! Thanks to all community members who have made this tool successful. diff --git a/_posts/labnews/2016-04-25-GoogleSummerOfBioinformaticsCode.markdown b/_posts/labnews/2016-04-25-GoogleSummerOfBioinformaticsCode.markdown index 882f9d80..db229020 100644 --- a/_posts/labnews/2016-04-25-GoogleSummerOfBioinformaticsCode.markdown +++ b/_posts/labnews/2016-04-25-GoogleSummerOfBioinformaticsCode.markdown @@ -13,5 +13,5 @@ categories: Congratulations to our 2016 [Google Summer of Code](https://en.wikipedia.org/wiki/Google_Summer_of_Code) students! We are pround & excited to host them: * Hiten Chowdhary (Indian Institue of Technology, Karaghpur) will create a **BLAST result visualization methods** for [BioRuby](http://bioruby.org) and [SequenceServer](http://www.sequenceserver.com). This work should significantly facilitate the interpretation of results produced with our Sequenceserver custom BLAST-ing tool (see
-
- * After reviewing in detail the [strengths and weaknesses of bash, make, snakemake and nextflow as biological analysis pipelines](//github.com/thejmazz/jmazz.me/blob/master/content/post/ngs-workflow.md), [Julian Mazzitelli](//www.jmazz.me) created [Bionode waterwheel](//github.com/bionode/bionode-watermill), a tool demonstrating the capabilities of javascript streams for real-time analysis of biological data. [Read more about how it works.](//github.com/bionode/bionode-watermill/blob/master/README.md)
+
+ * After reviewing in detail the [strengths and weaknesses of bash, make, snakemake and nextflow as biological analysis pipelines](https://github.com/thejmazz/jmazz.me/blob/master/_posts/NGS-Workflows.md), Julian Mazzitelli created [Bionode waterwheel](//github.com/bionode/bionode-watermill), a tool demonstrating the capabilities of javascript streams for real-time analysis of biological data. [Read more about how it works.](//github.com/bionode/bionode-watermill/blob/master/README.md)
As the finishing touches are implemented, we look forward to being able to deploy the work of these students into production releases of [SequenceServer](//www.sequenceserver.com) and [Bionode](//bionode.io).
-
diff --git a/_posts/labnews/2017-01-30-BriefNewYearsUpdate.markdown b/_posts/labnews/2017-01-30-BriefNewYearsUpdate.markdown
index 4e6912df..8bb9da2f 100644
--- a/_posts/labnews/2017-01-30-BriefNewYearsUpdate.markdown
+++ b/_posts/labnews/2017-01-30-BriefNewYearsUpdate.markdown
@@ -13,6 +13,4 @@ Just a brief update to:
* congratulate Emeline Favreau, Carlos Martinez-Ruiz and Eckart Stolle on their great presentations at the [London NW-Europe IUSSI meeting](http://www.iussi.org/NWEurope/meetings.htm) and at [Popgroup 50 in Cambridge](http://populationgeneticsgroup.org.uk).
* congratulate [Anurag Priyam](/team/priyam) who is *finally* joining us to begin a PhD.
- * congratulate [Bruno Vieira](/team/bmpvieira.html) on his [Mozilla Science Fellowship](https://science.mozilla.org/programs/fellowships/fellows).
-
-
+ * congratulate [Bruno Vieira](/team/bmpvieira.html) on his Mozilla Science Fellowship.
diff --git a/_posts/labnews/2017-02-17-social-supergene-evolution.markdown b/_posts/labnews/2017-02-17-social-supergene-evolution.markdown
index 2aec38b9..079a9bcd 100644
--- a/_posts/labnews/2017-02-17-social-supergene-evolution.markdown
+++ b/_posts/labnews/2017-02-17-social-supergene-evolution.markdown
@@ -24,7 +24,7 @@ queens. The team had previously discovered that colony type is
determined by a chromosome that carries one of two variants of a
‘supergene’ region containing more than 500 genes.
In a new research paper, published in the journal Molecular Ecology, the team from QMUL’s School of Biological and Chemical +"https://www.qmul.ac.uk/sbbs/">School of Biological and Chemical Sciences sequenced the DNA and compared the genomes of two types of individuals: those carrying the supergene version responsible for colonies with a single queen, and those carrying @@ -35,9 +35,9 @@ homogeneously over the entire length of the supergene. This suggests that a single event, such as a large chromosomal rearrangement, was responsible for the origin of this remarkable system for determining social organisation,” said lead author -Dr +Dr Yannick Wurm from QMUL’s School of Biological and Chemical +"https://www.qmul.ac.uk/sbbs/">School of Biological and Chemical Sciences.
The team also discovered a large number of unfavourable @@ -50,9 +50,7 @@ advantages of having several queens in the colony outweigh the costs of the unfavourable mutations in the supergene region.”
This finding can help scientists understand how chromosomes evolve over time.
-Rodrigo -Pracana, a PhD student at QMUL and first author of the study, +
Rodrigo Pracana, a PhD student at QMUL and first author of the study,
said: “We know that the Y chromosome in mammals has also been
affected by unfavourable mutations. It is exciting to see that the
fire ant social chromosome has evolved in a similar way to the
diff --git a/_posts/labnews/2017-02-21-scientists_explore_the_evolution_of_a_social_supergene_in_the_red_fire_ant.md b/_posts/labnews/2017-02-21-scientists_explore_the_evolution_of_a_social_supergene_in_the_red_fire_ant.md
index dc37977d..12242691 100644
--- a/_posts/labnews/2017-02-21-scientists_explore_the_evolution_of_a_social_supergene_in_the_red_fire_ant.md
+++ b/_posts/labnews/2017-02-21-scientists_explore_the_evolution_of_a_social_supergene_in_the_red_fire_ant.md
@@ -20,9 +20,9 @@ Red fire ants are found in two different types of colonies: some colonies have a
{: width="307" height="197" style="max-width:100%; height: auto"}
-In a new research paper, published in the journal [*Molecular Ecology*](//onlinelibrary.wiley.com/journal/10.1111/(ISSN)1365-294X){:target="_blank"}, the team from QMUL’s [School of Biological and Chemical Sciences](//www.sbcs.qmul.ac.uk/){:target="_blank"} sequenced the DNA and compared the genomes of two types of individuals: those carrying the supergene version responsible for colonies with a single queen, and those carrying the supergene variant responsible for colonies with multiple queens.
+In a new research paper, published in the journal [*Molecular Ecology*](//onlinelibrary.wiley.com/journal/10.1111/(ISSN)1365-294X){:target="_blank"}, the team from QMUL’s [School of Biological and Chemical Sciences](https://www.qmul.ac.uk/sbbs/){:target="_blank"} sequenced the DNA and compared the genomes of two types of individuals: those carrying the supergene version responsible for colonies with a single queen, and those carrying the supergene variant responsible for colonies with multiple queens.
-“We found that the two versions of the chromosome differ homogeneously over the entire length of the supergene. This suggests that a single event, such as a large chromosomal rearrangement, was responsible for the origin of this remarkable system for determining social organisation,” said lead author [Dr Yannick Wurm](//www.sbcs.qmul.ac.uk/staff/yannickwurm.html){:target="_blank"} from QMUL’s [School of Biological and Chemical Sciences](//www.sbcs.qmul.ac.uk/){:target="_blank"}.
+“We found that the two versions of the chromosome differ homogeneously over the entire length of the supergene. This suggests that a single event, such as a large chromosomal rearrangement, was responsible for the origin of this remarkable system for determining social organisation,” said lead author [Dr Yannick Wurm](https://www.qmul.ac.uk/sbbs/staff/yannickwurm.html){:target="_blank"} from QMUL’s [School of Biological and Chemical Sciences](https://www.qmul.ac.uk/sbbs/){:target="_blank"}.
#### Evolutionary advantage?
@@ -32,7 +32,7 @@ Dr Wurm added: “It is likely that only a few genes among the hundreds present
This finding can help scientists understand how chromosomes evolve over time.
-[Rodrigo Pracana](//www.sbcs.qmul.ac.uk/staff/rodrigopracana.html){:target="_blank"}, a PhD student at QMUL and first author of the study, said: “We know that the Y chromosome in mammals has also been affected by unfavourable mutations. It is exciting to see that the fire ant social chromosome has evolved in a similar way to the human Y chromosome, although it controls social organisation and not sex.”
+Rodrigo Pracana, a PhD student at QMUL and first author of the study, said: “We know that the Y chromosome in mammals has also been affected by unfavourable mutations. It is exciting to see that the fire ant social chromosome has evolved in a similar way to the human Y chromosome, although it controls social organisation and not sex.”
#### A real pest
@@ -47,4 +47,4 @@ Rodrigo Pracana added: “Our discoveries could help to develop novel pest contr
- [The Wurm lab study](//wurmlab.github.io/){:target="_blank"} the lives of social insects including ants and bees. They combine behavioural experiments with genomics and bioinformatics approaches.
-- Find out more about studying [postgraduate Ecological and Evolutionary Genomics MSc](//www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/121430.html){:target="_blank"} at QMUL's [School of Biological and Chemical Sciences](//www.sbcs.qmul.ac.uk/){:target="_blank"}.
+- Find out more about studying [postgraduate Ecological and Evolutionary Genomics MSc](//www.qmul.ac.uk/postgraduate/taught/coursefinder/courses/121430.html){:target="_blank"} at QMUL's [School of Biological and Chemical Sciences](/https://www.qmul.ac.uk/sbbs/){:target="_blank"}.
diff --git a/_posts/labnews/2017-02-21-supergene-diversity-accepted.markdown b/_posts/labnews/2017-02-21-supergene-diversity-accepted.markdown
index 56566835..c4061ff0 100644
--- a/_posts/labnews/2017-02-21-supergene-diversity-accepted.markdown
+++ b/_posts/labnews/2017-02-21-supergene-diversity-accepted.markdown
@@ -16,7 +16,7 @@ The fire ant social chromosomes carry a supergene that controls the number of qu
* There is a large number non-synonymous substitutions between the two variants.
* The never recombining variant Sb is almost fixed in the North American population.
-You can check out [the press release](http://www.qmul.ac.uk/media/news/items/se/192904.html), which covers some of the details about our work.
+You can check out the press release, which covers some of the details about our work.
The full reference is:
R Pracana, A Priyam, I Levantis, RA Nichols and Y Wurm. (2017) *The fire ant social chromosome supergene variant Sb shows low diversity but high divergence from SB* Molecular Ecology. DOI: 10.1111/mec.14054
diff --git a/_posts/labnews/2018-02-15-iussi_symposium_evolution_of_social_organization.markdown b/_posts/labnews/2018-02-15-iussi_symposium_evolution_of_social_organization.markdown
index e2d26fc3..9dfc9c39 100644
--- a/_posts/labnews/2018-02-15-iussi_symposium_evolution_of_social_organization.markdown
+++ b/_posts/labnews/2018-02-15-iussi_symposium_evolution_of_social_organization.markdown
@@ -13,12 +13,12 @@ Join us in Guarujá!
We (Emeline, Carlos & Yannick) are excited to host a symposium on the evolution of social organisation at the upcoming [IUSSI conference](http://iussi2018.com/).
* [Tim Linksvayer](http://www.bio.upenn.edu/people/timothy-linksvayer) will give a plenary talk.
- * We invite abstract submissions for talks and posters (deadline March 2nd!).
+ * We invite abstract submissions for talks and posters deadline March 2nd!.
-We welcome a diversity of approaches and study systems. If you're unsure about the relevance of your work, don't hesitate to get in touch.
+We welcome a diversity of approaches and study systems. If you're unsure about the relevance of your work, don't hesitate to get in touch.
-Full symposium title and abstract below:
+Full symposium title and abstract below:
### Evolution of social organization
@@ -29,7 +29,4 @@ Understanding how and when changes in social lifestyle occur is central to the s
Encompassing the complexities of such multifaceted topics requires interdisciplinary discussion. This symposium will thus include both theoretical and empirical research addressing the topic from a variety of scales and angles.
-
-
-
-
+
diff --git a/_posts/labnews/2018-10-10-better_genomics_analysis_code_at_IUSSI.markdown b/_posts/labnews/2018-10-10-better_genomics_analysis_code_at_IUSSI.markdown
index 5cc248f8..8f87b121 100644
--- a/_posts/labnews/2018-10-10-better_genomics_analysis_code_at_IUSSI.markdown
+++ b/_posts/labnews/2018-10-10-better_genomics_analysis_code_at_IUSSI.markdown
@@ -24,7 +24,7 @@ This disruptive shift is largely due to the **50,000-fold drop in DNA sequencing
A major challenge for small research labs now wielding in large genomic datasets is that it is easy to make a small mistake that [has](http://science.sciencemag.org/content/314/5807/1856.full) [high](http://science.sciencemag.org/content/351/6275/aaf3945) [costs](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0649-6).
-In light of this, as part of a [workshop on genomics approaches](https://www.iussi2018.com/news) organised with Tim Linksvayer and Alex Mikheyev, I gave an overview of some of the lessons we can transfer from the worlds of "other" data sciences to our expanding world of social insect genomics. This includes:
+In light of this, as part of a workshop on genomics approaches organised with Tim Linksvayer and Alex Mikheyev, I gave an overview of some of the lessons we can transfer from the worlds of "other" data sciences to our expanding world of social insect genomics. This includes:
- writing analysis code for humans;
- respecting style guides for code (e.g., [R style guide](http://adv-r.had.co.nz/Style.html)), and for [how to structure a genomic analysis](http://wurmlab.github.io/news/2018-10-01-project_structures/);
- benefits of peer-reviewing code, and of peer-coding sessions;
@@ -47,5 +47,3 @@ It is worth highlighting three additional, important points raised during the co
A fun and highly stimulating conference.
-
-
diff --git a/_posts/labnews/2019-02-27-phd_studentship.markdown b/_posts/labnews/2019-02-27-phd_studentship.markdown
index 2f8040fe..4ce88840 100644
--- a/_posts/labnews/2019-02-27-phd_studentship.markdown
+++ b/_posts/labnews/2019-02-27-phd_studentship.markdown
@@ -9,28 +9,28 @@ categories:
- labnews
---
-We have an exciting PhD position open through the London NERC DTP.
+We have an exciting PhD position open through the London NERC DTP.
-[**Apply by March 18th**](https://www.qmul.ac.uk/sbcs/postgraduate/phd-programmes/projects/display-title-655614-en.html) on the QMUL website.
+**Apply by March 18th** on the QMUL website.
The studentship is funded by the London NERC DTP will cover tuition fees and provide an annual tax-free maintenance allowance for 4 years at the Research Council rate (£17,009 in 2019/20). Candidates must meet RCUK eligibility criteria (I think this means ok for UK citizens and medium-term residents).
The project is *highly* interdisciplinary.
-Great candidates fulfill at least 3 of the following 4 criteria:
+Great candidates fulfill at least 3 of the following 4 criteria:
* smart
* hard working
* understands genomes or social insects
* not scared of data analysis or coding.
-We can adapt the project to the students’ interests and background.
+We can adapt the project to the students’ interests and background.
If you have any questions regarding prerequisites, scope or nature of the project, please don't hesitate to get in touch with me (Yannick).
## Research context
-We have two main lines of research, in collaboration with national and international colleagues and stakeholders.
+We have two main lines of research, in collaboration with national and international colleagues and stakeholders.
**Genetics of social behaviour**. Social animals exhibit a broad range of behaviors, and some theoretical understanding exists of the tradeoffs between different forms of social organisation. However, we know little about the genes and processes underpinning social organisation or how it evolves. The diversity of social behaviors across the 20,000 species of ants represents a unique opportunity to empirically understand the mechanisms and tradeoffs involved in social change. We use highly molecular approaches, including genomics and bioinformatics but also potentially behavioural or field work to address major questions about social evolution. We aim to generate exciting new insights into genes and processes underpinning a major social transition, with implications on understanding evolution of complex phenotypes.
@@ -39,4 +39,3 @@ We have two main lines of research, in collaboration with national and internati
## Training
The student will receive extensive training in big data bioinformatics, phylogenomics, data visualisation, and experimental research approaches in evolution and genomics. Furthermore, they will receive hands-on training in interdisciplinary project management, communicating science in writing and verbally, including by presenting at workshops and conferences.
-
diff --git a/_posts/labnews/2020-11-01-student_controlled_unix_cloud_servers.markdown b/_posts/labnews/2020-11-01-student_controlled_unix_cloud_servers.markdown
index 0d8d7df3..eab29a7f 100644
--- a/_posts/labnews/2020-11-01-student_controlled_unix_cloud_servers.markdown
+++ b/_posts/labnews/2020-11-01-student_controlled_unix_cloud_servers.markdown
@@ -13,7 +13,7 @@ Getting into big data science can be a big leap if you're a biologist who is new
We try to cut that down into a series of smaller, more manageable steps.
-As part of that, we run a hands-on [genome bioinformatics course](http://wurmlab.github.io/genomicscourse/practicals) that introduces students to UNIX, and covers topics from Illumina read cleaning to genome assembly, annotation, population genomics and genome-wide association mapping.
+As part of that, we run a hands-on genome bioinformatics course that introduces students to UNIX, and covers topics from Illumina read cleaning to genome assembly, annotation, population genomics and genome-wide association mapping.
For obvious 2020 reasons, we needed to do this online in a manner that:
- has **manageable costs but sufficient power for genomics analyses**;
@@ -58,4 +58,3 @@ We can potentially deploy our solution for other courses. If you're interested,
{: width="499" height="397" style="max-width:100%; height: auto"}
{: width="1141" height="532" style="max-width:100%; height: auto"}
-
diff --git a/_posts/oldblogarchive/2004-12-09-fire-ants-whats-the-point.markdown b/_posts/oldblogarchive/2004-12-09-fire-ants-whats-the-point.markdown
index 2c883a56..3d9e9557 100644
--- a/_posts/oldblogarchive/2004-12-09-fire-ants-whats-the-point.markdown
+++ b/_posts/oldblogarchive/2004-12-09-fire-ants-whats-the-point.markdown
@@ -10,7 +10,7 @@ categories:
- oldblogarchive
---
-[Red Fire Ants](http://en.wikipedia.org/wiki/Red_Imported_Fire_Ant) are natives of South America where they occupy an ecologic niche, under pressure of predators and competitors. In other places, such as the southern [United States](http://www.invasivespecies.gov/profiles/fireant.shtml) or [Australia](http://www.dpi.qld.gov.au/fireants/), fire ants are considered an _invasive species_: given almost no predators or competitors, their proliferation is unlimited. They have become a considerable agricultural and thus **economic pest** as well as a significant **health hazard**.
+[Red Fire Ants](http://en.wikipedia.org/wiki/Red_Imported_Fire_Ant) are natives of South America where they occupy an ecologic niche, under pressure of predators and competitors. In other places, such as the southern United States or Australia, fire ants are considered an _invasive species_: given almost no predators or competitors, their proliferation is unlimited. They have become a considerable agricultural and thus **economic pest** as well as a significant **health hazard**.
Understanding these guys could contribute to solving these issues. It might also help understand how a useful social insect, the honey bee, works. We could get **better honey**!
@@ -18,19 +18,19 @@ Other issues which could be of interest for ants as well as generally concerning
-
+
* Eggs laid by a queen are practically identical. How does the environment (temperature variations...) and handling (by nurses) determine that a larvae will become a queen while another will become a worker? Which worker will become a soldier? a nurse? a scout?
-
+
* A queen can live a long time - maybe 20 or 40 years. But a worker's life lasts only one or two years. And a male only one or two weeks. And yet they carry identical genetic information. Could we also live longer?
-
+
* How do ants form alliances with other colonies? How do they use slavery, propaganda, deception, appeasement, spying? How does an individual know what it should do and communicate the result?
-
+
* To which extent is it possible to use ideas from social insects to solve our problems? A large number of cooperating small interchangeable robots might solve certain issues better than one big robot...
-
+
* ...
diff --git a/_posts/oldblogarchive/2005-03-02-refined-nucleotide-blast-matrix.markdown b/_posts/oldblogarchive/2005-03-02-refined-nucleotide-blast-matrix.markdown
index 8ffc82ea..a95f8893 100644
--- a/_posts/oldblogarchive/2005-03-02-refined-nucleotide-blast-matrix.markdown
+++ b/_posts/oldblogarchive/2005-03-02-refined-nucleotide-blast-matrix.markdown
@@ -18,13 +18,13 @@ blastn is not good at finding these sequence's homologues:
-
+
* blastn searches for homologous sequences by trying to identify windows of 12 identical nucleotides.
-
+
* for blastn, a C-T mismatch is just like any other mismatch. For bisulfite treated sequences, we know that many Ts are in fact Cs which have been modified by chemical treatment. Thus we should penalize them less.
-
+
* blastn is optimized for speed, not flexibility. That means the window-size and scoring matrix are hard-coded - the user cannot edit them.
@@ -41,13 +41,13 @@ Poking around on the internet for alternatives did not turn anything up, so I as
>
>6. Remember that your scores will be making some wrong assumptions about using proteins. You should still find the hits you are looking for.
-Contacting NCBI confirmed this... Wayne Matten pointed me towards a METHODS [paper](http://blast.wustl.edu/doc/ntmats.pdf) describing *The Use of BlastP For Nucleic Acid Searches*. He also indicated [example matrices](ftp://ftp.ncbi.nlm.nih.gov/blast/matrices/).
+Contacting NCBI confirmed this... Wayne Matten pointed me towards a METHODS paper describing *The Use of BlastP For Nucleic Acid Searches*. He also indicated [example matrices](ftp://ftp.ncbi.nlm.nih.gov/blast/matrices/).
So the next step was downloading and compiling [NCBI Blast](http://www.ncbi.nlm.nih.gov/BLAST/) sources, and getting [Apple-Genentech's G5-optimized Blastall](http://www.apple.com/acg/). Then for each nucleotide sequence database I wanted to blast against, I had to:
-
+
* call formatdb (supplied with ncbi's Blast: `~/bin/blast-2.2.10/bin/formatdb -i Group10_20050120.fa -l Group10.formatdb.log -t "Apis Contig Group10"`
-
+
* blast my sequences against this database: `~/bin/blastall-2.2.9-apple-genentech -p blastp -d genomes/Amel20050120-freeze/contigs/Group10_20050120.fa -i ~/treatedSequence.fasta -o /Users/admin/Documents/Perl/generated\ data/heleneTest.2005-feb-25-mini -M BLOSUM80 -F F`
@@ -59,22 +59,22 @@ This let me test the custom scoring matrix to give an increased difference in sc
-
+
* not penalizing Ns or other non-ACGT bases
-
+
* giving increased importance to conserved C-C alignments (rare since in in a lightly methylated sequence, most Cs are transformed to Ts)
-
+
* not penalizing C-T alignments when C is in a "normal" sequence and T is in bisulfite-treated sequence.
-
+
* reducing positive influence of T-T alignments (in bisulfite-treated sequence, T could really be a modified C).
-
+
* Venues not explored include:
-
+
* modifying influence of transversions and transitions, since the probability of their occuring differs, especially between related species.
diff --git a/_posts/oldblogarchive/2006-11-18-development.markdown b/_posts/oldblogarchive/2006-11-18-development.markdown
index e388e830..6d83adcb 100644
--- a/_posts/oldblogarchive/2006-11-18-development.markdown
+++ b/_posts/oldblogarchive/2006-11-18-development.markdown
@@ -36,9 +36,9 @@ iConvert Images
Bebetes Project
Along with 5 others...
Les Fourmis
-
+
Projet Regulation
- Ok. This isn't code. But Am,ao?(C)lie V,ao?(C)ron and I spent a lot of time on the computer for it! It's an attempt at modeling part of _E. coli_'s global regulation, using a tool called [Genetic Network Analyzer](http://www.inrialpes.fr/helix/logic_GNA_mn.html), developed at Helix Inria. We did this as part of a fourth-year project at Insa de Lyon. (more info in the pdf file's intro). [This is it](/attic/dev/gna2003.pdf).
+ Ok. This isn't code. But Am,ao?(C)lie V,ao?(C)ron and I spent a lot of time on the computer for it! It's an attempt at modeling part of _E. coli_'s global regulation, using a tool called Genetic Network Analyzer, developed at Helix Inria. We did this as part of a fourth-year project at Insa de Lyon. (more info in the pdf file's intro). [This is it](/attic/dev/gna2003.pdf).
[Timepark](/attic/dev/timepark)
2003-2004 first semester project: an edo-based modeling framework (C++) and graphical end-user app (Obj-C).
diff --git a/_posts/oldblogarchive/2006-11-18-timepark.markdown b/_posts/oldblogarchive/2006-11-18-timepark.markdown
index e8266461..3c4730c2 100644
--- a/_posts/oldblogarchive/2006-11-18-timepark.markdown
+++ b/_posts/oldblogarchive/2006-11-18-timepark.markdown
@@ -16,7 +16,7 @@ categories:
## Timepark
-Development report (for my school, Insa de Lyon). [Timepark-report January 2004](http://yannick.poulet.org/dev/timepark_report-jan2004.pdf).
+Development report (for my school, Insa de Lyon).
@@ -46,18 +46,16 @@ An open source modeling and simulation framework, used as the backend for Timepa
-
+
* An object's position is defined by it's (x,y,z) properties; object classes may inherit these and additionally defined properties.
-
+
* A property's value can be defined by ordinary differential equations (ODEs).
-
+
* A property can evolve differently depending on the system's state through the use of control statements which are functions of any of the system's objects properties (eg: if _light is green_ then _d(x)/dt = 10_ else _d(x)/dt =0_.
Technologies: C++ STL, Flex/Yacc, Xerces, OpenGL
Download Source and documentation soon...
-
-
diff --git a/data/supergene_introgression/gt.vcf.gz/index.html b/data/supergene_introgression/gt.vcf.gz/index.html
index 05274b4f..49b4a694 100644
--- a/data/supergene_introgression/gt.vcf.gz/index.html
+++ b/data/supergene_introgression/gt.vcf.gz/index.html
@@ -9,7 +9,7 @@
DAY 1 EXERCISES
First - of all let's login to Vital-IT infrastructure. Each of you received a + of all let's login to Vital-IT infrastructure. Each of you received a user name to use in the secure connection command below.
$ ssh username@prd.vital-it.ch
Now we are connected to a front-end node (prd.vital-it.ch) that can only be used to submit jobs to the Vital-IT cluster. For this practical we will be using - 2 big-memory machines of the UNIL Department of Ecology and Evolution -(dee-serv01 and dee-serv02). Half of you will connect to one of these -machines and half to the other. Use the machine name you were assigned + 2 big-memory machines of the UNIL Department of Ecology and Evolution +(dee-serv01 and dee-serv02). Half of you will connect to one of these +machines and half to the other. Use the machine name you were assigned below.
$ ssh dee-serv0X
Once you get there check where you are with the command printing the current directory.
$ pwd
Alternatively, - use the "echo" command to find out the address of your home directory. -If you did not change your directory upon login this address should be + use the "echo" command to find out the address of your home directory. +If you did not change your directory upon login this address should be the same as when you did "pwd".
$ echo $HOME
If you forgot which username you used, you can always check with the command below.
$ whoami
List file and directories in your location.
$ ls
Wherever - you are located there is always an easy way to get back to home + you are located there is always an easy way to get back to home directory. Just type "cd" (change directory) without arguments.
How to get home?
$ cd
To see what you programs you can run simply type the TAB key twice.
$ TAB TAB
Possible - options are the program files found in locations specified in $PATH. + options are the program files found in locations specified in $PATH. What are these locations? Look at $PATH with "echo" command:
$ echo $PATH
Right now your $PATH is missing the software we will need. For this one would typically do the following:
$ export PATH="$PATH:/some/additional/software/location"
Vital-IT - have prepared something that does this for you. So that it happens + have prepared something that does this for you. So that it happens automatically when you login, you’ll need to edit the .bashrc in your home directory.
To see your .bashrc file with the command "ls" we need to add additional arguments, as this file is hidden by default.
$ ls -lah
Browse the contents of .bashrc using command "less".
$ less .bashrc
Does your .bashrc contain the following lines?
source /mnt/common/R-BioC/R-BioC.bashrc
source /mnt/common/DevTools/DevTools.bashrc
source /mnt/common/UHTS/UHTS.bashrc
These - lines add to your $PATH the locations and configurations of R, and of - Ultra High Throughput Sequencing (UHTS) applications. If they are + lines add to your $PATH the locations and configurations of R, and of + Ultra High Throughput Sequencing (UHTS) applications. If they are missing from your .bashrc append them to .bashrc using ">>":
$ echo "source /mnt/common/R-BioC/R-BioC.bashrc" >> .bashrc
$ echo “source /mnt/common/DevTools/DevTools.bashrc” >> .bashrc
$ echo "source /mnt/common/UHTS/UHTS.bashrc" >> .bashrc
Now please logout and then ssh to the server again. Check how your $PATH has changed:
$ echo $PATH
If you are not familiar with file and folder operations try the following commands.
Make a new empty folder.
$ mkdir newfolder
Go to this folder.
$ cd newfolder
Go one folder back.
$ cd ..
$ rmdir newfolder
Make the folder again
$ mkdir newfolder
Create a file in that folder by redirecting printed output of "echo" command to a file.
$ echo "Some text" > newfolder/file.txt
Try removing the folder again
$ rmdir newfolder
Q: Why this does not work?
Try command "rm" (removes files and folders)
$ rm newfolder
This still does not work. Check in the manual (type "man" before the command) to see which parameters to put.
$ man rm
Files - can be edited locally on Vital-IT (using nano or vi or emacs), or on -your laptop using a text editor of your choice (Aquamacs, -TextWrangler... NOT Microsoft Word!). To edit a file locally you must + can be edited locally on Vital-IT (using nano or vi or emacs), or on +your laptop using a text editor of your choice (Aquamacs, +TextWrangler... NOT Microsoft Word!). To edit a file locally you must first download it from Vital-IT. For this you can use scp (or Cyberduck or something like FileZilla).
For example, if you choose scp do the following locally on your computer:
$ mkdir Scripts
$ scp username@prd.vital-it.ch:/location/of/Scripts/MyScript.rb Scripts/
Once the script was modified upload it back to Vital-IT.
$ scp Scripts/MyScript.rb username@prd.vital-it.ch:/location/of/Scripts/
We will be working on Solenopsis invicta (the red fire ant) using Illumina DNA and RNA-seq reads. For the official release of de novo genome assembly (Wurm et al. 2011, PNAS) we combined Illumina and 454 technologies in a hybrid assembly approach. For this course we will use only Illumina because it has become clear that it is possible to perform de novo genome assembly using only this technology. Furthermore, we are only considering for this practical a very small subset (less than 5%) of the fire ant genome.
Please form the groups of two (one more computational; one less computational). From now on you will use only one access to Vital-IT per group.
Login to Vital-IT. The - command below connects first to the prd server and then to the dee + command below connects first to the prd server and then to the dee server. We are only allowed to use ssh to connect to prd, but cannot use it for calculations.
$ ssh -t username@prd.vital-it.ch ssh dee-serv0X
According - to the Vital-IT rules, no calculations can be carried out in the home -directory. We will use /scratch/cluster/weekly/ for all practicals -(files will be deleted after one week!). Create there a directory named -according to your user-name and change from your home directory to this + to the Vital-IT rules, no calculations can be carried out in the home +directory. We will use /scratch/cluster/weekly/ for all practicals +(files will be deleted after one week!). Create there a directory named +according to your user-name and change from your home directory to this new directory.
$ mkdir /scratch/cluster/weekly/username
$ cd /scratch/cluster/weekly/username
Extract the files required for today's practical.
$ unzip /scratch/cluster/monthly/oribagro/summer2012_Oksana.zip
We will use the FastQC package - installed locally on your computer. We will analyse files located in -DNA-seq/Raw/ and DNA-seq/Nr/. They are big. So please take a copy of these files from provided USB key or from the local web server (zipped as DNA-seq.zip). [probably http://192.168.167.32 ]
$ ls DNA-seq/Raw
$ ls DNA-seq/Nr
Our - goal is to make decisions on de novo assembly strategy based on FastQC + installed locally on your computer. We will analyse files located in +DNA-seq/Raw/ and DNA-seq/Nr/. They are big. So please take a copy of these files from provided USB key or from the local web server (zipped as DNA-seq.zip). [probably ]
$ ls DNA-seq/Raw
$ ls DNA-seq/Nr
Our + goal is to make decisions on de novo assembly strategy based on FastQC quality report. Open FastQC and open files to analyse. Process first Raw - files, which are subsets of Illumina Hi-seq lanes as they came out of + files, which are subsets of Illumina Hi-seq lanes as they came out of the sequencer.
Q: What do the file names mean?
Q: Do you think both lanes should be used for assembly?
Q: Do we need to trim or filter reads?
Q: Which information is important to take the decision about trimming/filtering?
Q: How can you explain a significant drop in the quality in the beginning of the reads of 2nd pair members of lane 7?
Process Nr reads now. Nr means “Non-redundant”: reads were processed to remove exact duplicates.
Q: Do you see a big change in duplications levels?
Q: What can be the reason for that (consider information given during presentation)?
We - will use FASTX-Toolkit to implement filtering/trimming decided in the -previous step. This part takes place at Vital-IT. Process Raw or Nr -(non-redundant) reads and output "clean" files to -/scratch/cluster/weekly/username/DNA-seq/Clean. Toolkit allows us to do + will use FASTX-Toolkit to implement filtering/trimming decided in the +previous step. This part takes place at Vital-IT. Process Raw or Nr +(non-redundant) reads and output "clean" files to +/scratch/cluster/weekly/username/DNA-seq/Clean. Toolkit allows us to do different types of filtering/trimming.
Find out how to trim reads based on coordinates:
$ fastx_trimmer -h
Find out how to trim/filter reads based on quality:
$ fastq_quality_trimmer –h
Use - fastx_trimmer or fastq_quality_trimmer on each file individually + fastx_trimmer or fastq_quality_trimmer on each file individually (replace first base -f and/or last base -l values to desired trimming in - the command below). Or alternatively use a provided launch script that + the command below). Or alternatively use a provided launch script that will do the job on all files at once.
$ fastx_trimmer -f xxxx -l yyyy -i DNA-seq/Nr/101104_s_7_1.subset.fastq -o DNA-seq/Clean/101104_s_7_1.subset.fastq
$ fastx_trimmer -f xxxx -l yyyy -i DNA-seq/Nr/101104_s_7_2.subset.fastq -o DNA-seq/Clean/101104_s_7_2.subset.fastq
Ideally we want to remember what exactly was done to Raw data. A good way to achieve this is by doing - all processing using bash wrapper scripts. You have an example of such -script in Scripts folder. Another advantage of using + all processing using bash wrapper scripts. You have an example of such +script in Scripts folder. Another advantage of using scripts that execute the command on each input file automatically, is to handle large data sets comprised of multiple files.
$ less Scripts/run_fastx_trimmer.sh
This - script should be launched in project directory containing DNA-seq + script should be launched in project directory containing DNA-seq directory in it. You can use vi or nano to modify this script on Vital-IT, - or if you do not know how to use these, you can download this file to -your local computer and modify using your preferred text editor to do -what you judge necessary as trimming/filtering. Hint: you + or if you do not know how to use these, you can download this file to +your local computer and modify using your preferred text editor to do +what you judge necessary as trimming/filtering. Hint: you need to modify the line that launches fastx_trimmer command by adjusting - trimming coordinates (-f xxx and -l yyy specifying the first and the + trimming coordinates (-f xxx and -l yyy specifying the first and the last base respectively). If you forgot how to edit files, check the end of the UNIX introduction.
To launch the script you should go to the following folder on Vital-IT:
$ cd /scratch/cluster/weekly/username
$ Scripts/run_fastx_trimmer.sh
Few definitions are important for de novo assembly: contigs (contiguous sequences) and scaffolds, illustrated in the figure - below. A genome assembly consists in hundreds to thousands of + below. A genome assembly consists in hundreds to thousands of scaffolds.

We - will use SOAPdenovo for genome assembly. Depending on genome -characteristics different software might be the most appropriate. For + will use SOAPdenovo for genome assembly. Depending on genome +characteristics different software might be the most appropriate. For the red fire ant data in 2009 SOAPdenovo was the best performing - assembler of Illumina reads. To keep track of our actions and ensure -the reproducibility of all steps we will rely on bash scripts to run -assemblies. A usual approach to Illumina assembly is to do multiple -assemblies using different combinations of data quality -trimming/filtering and different assembler parameters. Due to time and + assembler of Illumina reads. To keep track of our actions and ensure +the reproducibility of all steps we will rely on bash scripts to run +assemblies. A usual approach to Illumina assembly is to do multiple +assemblies using different combinations of data quality +trimming/filtering and different assembler parameters. Due to time and resource constraints, each pair of students will only perform 2 or 3 assemblies.
Illumina - assemblers rely on de Bruijn graph constructed from K-mers (K-length + assemblers rely on de Bruijn graph constructed from K-mers (K-length words) of all reads. This makes K-mer length a key software parameter to optimise.
SOAPdenovo package consists of four programs: pregraph, contig, map and scaff. Assembly with paired end reads involves the use of all four programs. - Command "all" allows executing full SOAPdenovo package easily. We -basically need to specify "SOAPdenovo all -s config_file -o + Command "all" allows executing full SOAPdenovo package easily. We +basically need to specify "SOAPdenovo all -s config_file -o output_prefix".
$ cd /scratch/cluster/weekly/username/SOAPdenovo/Assembly
This folder contains two files “conf01-lanes47_maplen60” and “Run_conf01-RL200D.sh”.
$ less conf01-lanes47_maplen60
$ less Run_conf01-RL200D.sh
Don’t run anything yet! Config - file is created according to a format defined in SOAPdenovo -requirements. Run_conf01-RL200D.sh will use config file to run -SOAPdenovo with the specified parameters and config files and output -results to folders named according to parameter values.
Check the explanations of commands and config file specifications at http://soap.genomics.org.cn/soapdenovo.html .
Modify conf01-lanes47_maplen60. You need at least to specify the correct location of input fastq files that you want to use (replace username). Optionally you can change some of the parameters.
Q: Can you trim reads within SOAPdenovo config file?
Modify Run_conf01-RL200D.sh. Use K-mer values of at least 1/3 of read length. Do not put more than three K-mer values as it will increase run time (Please do not run more than one assembly at a time - we are a big group of students sharing few compute resources).
Q: What are the different parameters used to run SOAPdenovo in Run_conf01-RL200D.sh
After you have edited the script, make sure you are in the correct folder and launch it:
$ ./Run_conf01-RL200D.sh
After the assembly is finished examine the contents of output folders. Look at the contents of LOG file form the assembly
$ less LOG
Files out.contig and out.scafSeq respectively contain FASTA format scaffold and contig sequences.
Q: What does asm_flags=3 in the config file mean?
There are multiple versions of SOAPdenovo. To see them, do:
$ SOAPdenovo TAB TAB
Q: Why are there several versions? When is it dangerous to use SOAPdenovo-127mer?
A - common way to select the optimal assembly strategy is to look at -various types of statistics, like total number of assembled base pairs, + file is created according to a format defined in SOAPdenovo +requirements. Run_conf01-RL200D.sh will use config file to run +SOAPdenovo with the specified parameters and config files and output +results to folders named according to parameter values.
Check the explanations of commands and config file specifications at soap.genomics.org.cn/soapdenovo.html .
Modify conf01-lanes47_maplen60. You need at least to specify the correct location of input fastq files that you want to use (replace username). Optionally you can change some of the parameters.
Q: Can you trim reads within SOAPdenovo config file?
Modify Run_conf01-RL200D.sh. Use K-mer values of at least 1/3 of read length. Do not put more than three K-mer values as it will increase run time (Please do not run more than one assembly at a time - we are a big group of students sharing few compute resources).
Q: What are the different parameters used to run SOAPdenovo in Run_conf01-RL200D.sh
After you have edited the script, make sure you are in the correct folder and launch it:
$ ./Run_conf01-RL200D.sh
After the assembly is finished examine the contents of output folders. Look at the contents of LOG file form the assembly
$ less LOG
Files out.contig and out.scafSeq respectively contain FASTA format scaffold and contig sequences.
Q: What does asm_flags=3 in the config file mean?
There are multiple versions of SOAPdenovo. To see them, do:
$ SOAPdenovo TAB TAB
Q: Why are there several versions? When is it dangerous to use SOAPdenovo-127mer?
A + common way to select the optimal assembly strategy is to look at +various types of statistics, like total number of assembled base pairs, number of scaffolds, N50 length, maximum scaffold length etc.
A script distributed with the assembly software Abyss called "abyss-fac" allows generating this kind of statistics.
Run abyss-fac script on results on one of your scaffolds files (replace the last assembly folder with the one you generated).
$ cd /scratch/cluster/weekly/username/SOAPdenovo/Assembly/conf01_K35_R_L200_D
$ abyss-fac out.scafSeq
Now run program called seqstat to generate statistics.
$ seqstat out.scafSeq
Q: Why do some numbers of seqstat differ from those generated with abyss-fac?
Q: Which do you find the most appropriate to use in selecting best assembly?
We - can compare statistics between contig fasta and scaffold fasta using a + can compare statistics between contig fasta and scaffold fasta using a cutoff equal to the value of -L parameter used in the assembly.
Run - abyss-fac script on scaffolds again specifying -t 200 (the value of -L + abyss-fac script on scaffolds again specifying -t 200 (the value of -L used in the assembly) as a cutoff for minimum scaffold length.
$ abyss-fac -t 200 out.scafSeq
Do the same for contigs
$ abyss-fac -t 200 out.contig
Q: Why is the number of bp not the same?
In order to submit a newly assembled genome to a public database one needs - to submit contig sequences and an AGP file specifying how these -sequences are arranged in scaffolds. Scaffolds are built from contigs -(contiguous sequences), that are joined by stretches of N bases using -the information form paired reads with known insert size. Unfortunately -SOAPdenovo, as most other de Brujin graph based programs does not -provide such output and report a contig file with all sequences -(regardless of inclusion in scaffolds). We can generate an -AGP file containing only contigs that were used for scaffold -construction by defining contigs based on stretches of N sequences + to submit contig sequences and an AGP file specifying how these +sequences are arranged in scaffolds. Scaffolds are built from contigs +(contiguous sequences), that are joined by stretches of N bases using +the information form paired reads with known insert size. Unfortunately +SOAPdenovo, as most other de Brujin graph based programs does not +provide such output and report a contig file with all sequences +(regardless of inclusion in scaffolds). We can generate an +AGP file containing only contigs that were used for scaffold +construction by defining contigs based on stretches of N sequences within scaffolds.
$ /scratch/cluster/weekly/username/Scripts/fasta2agp.pl -f out.scafSeq -p out
This will generate files out.contigs.fa and out.agp. Run abyss-fac on new contigs file
$ abyss-fac -t 200 out.contigs.fa
Now the total number of bp is the same as in scaffold fasta file.
Have a look at AGP file. Read about format specifications at the following link: http://www.ncbi.nlm.nih.gov/projects/genome/assembly/agp/AGP_Specification.shtml
$ less out.agp
Often one would generate multiple assemblies to refine the strategy (many more than three). - Although we have generated 2-3 assemblies, we will act as if we had a -lot and process statistics from these files into R-readable format. We -will save the file in folder called SOAPdenovo/Statistics .
$ cd /scratch/cluster/weekly/username/SOAPdenovo/Assembly
$ ./get.stats.sh > stats
Now use scp or other means to get file “stats” locally on your computer to use with R. If you obtained no statistics, modify the read.table command in what follows by replacing “stats” with “http://www.antgenomes.org/~yannickwurm/tmp/manyAssemblyStats.txt”.
Lets use R to find the best assembly in terms of quantitative metrics. Locally on your computer:
$ R
stats <- read.table("stats", header=T, sep="\t")
## get K-mer values from file name
myKmers <- substr(stats$file, 8,10)
## get config names
myConf <- as.numeric(substr(stats$file, 5,6))
## get all color names containing string "dark"
allColors <- colors()[grep("dark",colors())]
## randomly sample the neccessary number of colors
myColors <- sample(allColors, length(unique(myConf)))
## make vector of colors per assembly setting
confColors <- myColors[as.factor(myConf)]
configName <- paste("config", as.character(unique(myConf)), sep="")
## Lets plot this:
myTitle <- "N50 vs Number of contigs >= 200bp\nSize is proportional to total assembly bp"
plot(x = stats$n.200,
y = stats$N50,
cex = (stats$sum/1000000), # circle size
col = confColors,
pch = 19, # symbol "filled circle"
main = myTitle,
xlab = "Number of Contigs > = 200 bp",
ylab = "N50")
legend("topright", configName, col=unique(confColors), pch=19)
text(stats$n.200, stats$N50, myKmers, cex=0.7, col="white")
myTitle <- "N50 vs Number of contigs >= N50\nSize is proportional to total assembly bp"
plot(x = stats$n.N50,
y = stats$N50,
cex = (stats$sum/1000000),
col = confColors,
pch = 19,
main = myTitle,
xlab = "Number of Contigs >= N50",
ylab ="N50")
legend("topright", configName, col=unique(confColors), pch=19)
text(stats$n.N50, stats$N50, myKmers, cex=0.7, col="white")
## Finally make a plot with the values we expect to get.
## The same subset of data represented 3066758 bp accross eight
## scaffolds with N50 of 1867187
myTitle <- "N50 vs Number of contigs >= 200bp\nSize is proportional to total assembly bp"
plot(x = c(stats$n.200, 8),
y = c(stats$N50, 1867187),
cex = c(stats$sum, 3066758)/1000000,
col = c(confColors, "red"),
pch = 19,
main = myTitle,
xlab = "Number of Contigs >= 200 bp",
ylab ="N50")
legend("topright",
c(configName, "Official"),
col=unique(c(confColors,"red")),
pch=19)
Q: Which assembly is the best in terms of quantitative metrics?
Q: Why there is a significant difference with official release?
Independently obtained information can provide the most reliable measures of whether or not your assembly is accurate.
$ cd /scratch/cluster/weekly/username/SOAPdenovo/Assembly
$ ./get.stats.sh > stats
Now use scp or other means to get file “stats” locally on your computer to use with R. If you obtained no statistics, modify the read.table command in what follows by replacing “stats” with “”.
Lets use R to find the best assembly in terms of quantitative metrics. Locally on your computer:
$ R
stats <- read.table("stats", header=T, sep="\t")
## get K-mer values from file name
myKmers <- substr(stats$file, 8,10)
## get config names
myConf <- as.numeric(substr(stats$file, 5,6))
## get all color names containing string "dark"
allColors <- colors()[grep("dark",colors())]
## randomly sample the neccessary number of colors
myColors <- sample(allColors, length(unique(myConf)))
## make vector of colors per assembly setting
confColors <- myColors[as.factor(myConf)]
configName <- paste("config", as.character(unique(myConf)), sep="")
## Lets plot this:
myTitle <- "N50 vs Number of contigs >= 200bp\nSize is proportional to total assembly bp"
plot(x = stats$n.200,
y = stats$N50,
cex = (stats$sum/1000000), # circle size
col = confColors,
pch = 19, # symbol "filled circle"
main = myTitle,
xlab = "Number of Contigs > = 200 bp",
ylab = "N50")
legend("topright", configName, col=unique(confColors), pch=19)
text(stats$n.200, stats$N50, myKmers, cex=0.7, col="white")
myTitle <- "N50 vs Number of contigs >= N50\nSize is proportional to total assembly bp"
plot(x = stats$n.N50,
y = stats$N50,
cex = (stats$sum/1000000),
col = confColors,
pch = 19,
main = myTitle,
xlab = "Number of Contigs >= N50",
ylab ="N50")
legend("topright", configName, col=unique(confColors), pch=19)
text(stats$n.N50, stats$N50, myKmers, cex=0.7, col="white")
## Finally make a plot with the values we expect to get.
## The same subset of data represented 3066758 bp accross eight
## scaffolds with N50 of 1867187
myTitle <- "N50 vs Number of contigs >= 200bp\nSize is proportional to total assembly bp"
plot(x = c(stats$n.200, 8),
y = c(stats$N50, 1867187),
cex = c(stats$sum, 3066758)/1000000,
col = c(confColors, "red"),
pch = 19,
main = myTitle,
xlab = "Number of Contigs >= 200 bp",
ylab ="N50")
legend("topright",
c(configName, "Official"),
col=unique(c(confColors,"red")),
pch=19)
Q: Which assembly is the best in terms of quantitative metrics?
Q: Why there is a significant difference with official release?
Independently obtained information can provide the most reliable measures of whether or not your assembly is accurate.
We will use the Trinity software to assemble transcriptome from single Illumina reads:
$ cd /scratch/cluster/weekly/username/Trinity
First, check that your .bashrc contains the following line (if not, please add it)
source /mnt/common/DevTools/DevTools.bashrc
Have a look at options of Trinity.pl:
$ Trinity.pl -h
We - have RNA-seq data for 3 conditions: males, queens and workers prefixed -M, Q and W, respectively. Each condition is present in 4 replicates, -numbers in file names indicate the ant colony from which the sample was + have RNA-seq data for 3 conditions: males, queens and workers prefixed +M, Q and W, respectively. Each condition is present in 4 replicates, +numbers in file names indicate the ant colony from which the sample was taken. Due to time constraints we will restrict de novo assembly to a single RNA-seq file. Choose one RNA-seq file in /scratch/cluster/weekly/usernameRNA-seq/Raw/ and run Trinity on it. You can store the file name in variable rawRnaseqFile
$ rawRNAseqFile=/scratch/cluster/weekly/username/RNA-seq/Raw/Q415.subset.fastq
$ Trinity.pl --seqType fq --kmer_method jellyfish --max_memory 25G --CPU 1 --bflyCalculateCPU --single $rawRNAseqFile > stdout 2> stderr
If - you do not manage to run Trinity, then copy its output as described + you do not manage to run Trinity, then copy its output as described below (make sure you are in the directory called “Trinity”):
$ mkdir -p trinity_out_dir
$ cp /scratch/cluster/monthly/oribagro/summer2012/Trinity/trinity_out_dir/Trinity.fasta trinity_out_dir/
Find out how many transcripts were assembled using “seqstat” or “abyss-fac”.
$ seqstat trinity_out_dir/Trinity.fasta
Q: How many genes are represented in the fasta file? Are you sure?
Q: What are the different steps of the Trinity pipeline?
You - will align transcripts assembled with Trinity to the genomic assembly -that is best according to quantitative quality metrics. We will use a + will align transcripts assembled with Trinity to the genomic assembly +that is best according to quantitative quality metrics. We will use a script that uses the BLAT aligner to - generate alignment statistics and to construct network files. Network -files include all cases where transcript is partially aligned on -different scaffolds (non overlapping, complementary parts of + generate alignment statistics and to construct network files. Network +files include all cases where transcript is partially aligned on +different scaffolds (non overlapping, complementary parts of transcript). Subsequently we will visualise the order inferred form such transcript alignments and the order of contigs in scaffolds. See more details at https://github.com/ksanao/TGNet .
Generate AGP file with scaffold structure (if you haven't already done so yet for the best assembly)
$ bestAssembly=conf01_K35_R_L200_D
$ runDir=/scratch/cluster/weekly/username
$ cd "$runDir"/SOAPdenovo/Assembly/"$bestAssembly"
$ "$runDir"/Scripts/fasta2agp.pl -f out.scafSeq -p out
$ cd "$runDir"
We will use a script that wraps the necessary commands: scaffold and contig files will be processed individually and result in two networks. Have - a look in the script. The second paragraph of this script specifies -that it will not execute unless provided a directory of best assembly. + a look in the script. The second paragraph of this script specifies +that it will not execute unless provided a directory of best assembly. Try launching it without arguments.
$ less Scripts/run_TGNet.sh
$ Scripts/run_TGNet.sh
Finally launch the script providing it the directory with best assembly.
$ Scripts/run_TGNet.sh "$bestAssembly"
$ cd SOAPdenovo/Validation/"$bestAssembly"
The log.contigs and log.scaffolds files summarize what was done.
These files mention additional output files that will help you examine the following:
$ less out.contig.blat.stat
$ less out.scaffold.blat.stat
Q: How do you explain not all transcripts match over their full length?
Q: How do the statistics of mapping to scaffold and mapping to contigs differ? Why? What could be done to improve these numbers?
Choose - one of the transcripts in cat out.scaffold.nw network file (column 2, -tab delimited) and look up the alignments for this transcript in + one of the transcripts in cat out.scaffold.nw network file (column 2, +tab delimited) and look up the alignments for this transcript in filtered and non-filtered blast output
$ myTranscript=`tail -1 out.scaffold.nw | cut -f2`
$ echo $myTranscript
$ grep $myTranscript out.scaffold.psl
$ grep $myTranscript out_filtered.scaffold.psl
Q: What does it mean when a transcripts aligns to multiple scaffolds or contigs? How can this information be useful?
Now download the whole /scratch/cluster/weekly/username/SOAPdenovo/Validation folder (including the network files you just created in "$bestAssembly" and TGNet.props) to your local computer for Cytoscape visualisation.
Run Cytoscape and import network files and visual properties file first for contigs network, then for scaffolds as described below.
Importing network
Importing node attributes
Importing visual style
Network Layout
Save your session (file.cys).
Q: Can transcript alignments confirm scaffolding in contigs network?
Q: Can your scaffolds be extended based on transcript alignment displayed in scaffold network?
Q: Do you have any problematic regions, potentially misassembled?
Q: What else can Cytoscape be used for?