Linux Command Line for NGS - Planning

From HUB
Jump to navigation Jump to search

This page is for planning the course Linux_Command_Line_for_NGS.

Documentation from linux course given by Holger and others at EMBL: File:LinuxCourse.pdf

Course material is being assembled in a GitHub repository available at GitHub IntroCommandLineNGS or from the command line:

git clone

How we'll teach

  • Participants will split into groups to research file types (google it). Will present a file type at random (pull from hat?). Good chance to ask questions and get discussion among participants.
  • Large emphasis on how to read manual/help pages
  • What the command expects as input – required parameters and optional
  • What will the output of the command be.
  • Each section will include commands that do not work. These could be of the form: “The next command will not work. Why?” to “This will not work for one of 3 reasons, which are a) b) c) – which one is it” Make the command work.
  • Round up: quiz style summary of day, where participants could create the questions/create an exercise.


  1. Commands should be given as images, so participants cannot copy and paste.
  2. Make sure they know that they should get support to install and set up computer. Emphasise they should get in contact with relevant people at their institutions.
  3. Point out that very useful tools are galaxy and taverna, which have a lot of unix commnds in graphical form and that you can build pipelines to share with others/importance of reproducible research (merging, concatenating, extracting columns from text files, search and replace)
  4. Can ask people to bring cake/luncheon material. Getting people involved HUB style :-)
  5. If weather permits could have bbq afterwards.

Task List

Note: 'person responsible' means the person who makes sure the task is done - delegation is allowed :-)


Task Person Responsible Done
Create email advertisement Yes
Send email advertisement Yes
Create Poster (Not needed.)
Hang-up Poster (Not needed.)
Choose participants after registration deadline Done
Email participants after registration deadline Matt Done
Ensure required programs are installed and working Matt
Open Room Matt
Clear up afterwards All of us
Close Room Matt

Required programs

(List programs we need to use in the course, so we can check that they're installed and running beforehand.)

Extra info boxes

Task Person Responsible Done
ftp to get a lot of data at once
file corrupted end - any way to fix it?
sed (to get e.g. sample name/version number, from file name )


Please add these to the outline below in the appropriate place, and/or add others.

Interactive Sessions

HUB-style. Please ideas and content here, inc. rough amount of time needed.


(Grainne's draft email, slightly edited, and added the deadlines and something about participant selection, so they don't think it's first come, first served.)

Subject: 'Introduction to the linux command line for NGS analysis', Friday September 20

Often when presented with data generated from high throughput sequencing experiments,
we are expected to use analysis programs/methods/workflows which assume a familiarity
with basic command line usage. This assumption is not always correct, even for a
basic analysis.

To address this, we are organising the one-day course: "Introduction to the command line
for NGS analysis", on Friday September 20 at BioQuant (

What will be covered?

Firstly covering elementary unix operating concepts, you will learn how to apply these
methods for the purposes of sequence analysis. You will learn how to run commands on the
command line, trouble shoot when things don't go according to plan, download, view and
understand data formats required for NGS analysis, manipulate files, run a basic workflow.
The goal of this course is for you to learn basic concepts that you can take forward and
apply to your own projects.

Who are you?

You have, or will have in the course of your research, data from high throughput sequencing
experiments. You know what the command line is and are keen to learn more, but are getting
frustrated by the seemingly constant stumbling blocks and weird error messages.

Who are we?

This course is put together by participants of Heidelberg Unseminars in Bioinformatics
(HUB,, a participant-driven meeting where people with an interest in
bioinformatics come together to discuss hot topics and exchange ideas.


Please register here by September 1st: The course will be limited to
20 participants, selected to ensure a broad mix of institutes and groups, and to ensure those
who will receive the most benefit can take part. Confirmation of your place on the course
will be sent by September 3rd.

On behalf of Heidelberg Unseminars in Bioinformatics.

Application Accepted

Subject: application accepted - 'Introduction to the linux command line for NGS analysis', Friday September 20

We're happy to inform you that your application for our course 'Introduction to the linux
command line for NGS analysis' has been accepted.

Date: Friday September 20
Time: 10:00 - 18:00
Place: Room 037, BioQuant, Im Neuenheimer Feld 267, 69120 Heidelberg,

Please note that lunch will not be provided. There should be enough of us with a campus card to
help out those who want to eat at the mensa, or there are other options where you can pay in cash,
or of course you can bring your own food.

After the course we will go to Cafe Botanik on the university campus (, for
more informal discussions including about the potential for more advanced NGS analysis courses
in the future.

Best wishes,

on behalf of Heidelberg Unseminars in Bioinformatics (

Application Rejected

Subject: application not accepted - 'Introduction to the linux command line for NGS analysis', Friday September 20

We're sorry to inform you that your application for our course 'Introduction to the linux
command line for NGS analysis' has not been accepted. 

From your responses on the application form, your current level of linux experience means
this course will be too basic for you. However, for this reason we'd be very happy if you
are willing to give up some of your time to act as a teaching assistant, to help out one-on-one
with any linux questions that arise. Please let us know if you're willing to do this. However,
we realise that your time is valuable, so please feel free to say no and still just to join us
after the course in Cafe Botanik (, around 18:00 (September 20). This will
give you the chance to meet other people doing NGS analysis in the Heidelberg area.

Based on how this course goes, and on the level of interest, we're also considering giving
more advanced NGS courses in the future. Let us know if this is something in which you would
be interested.

Best wishes,

on behalf of Heidelberg Unseminars in Bioinformatics (

Planning Meetings

(Some of these are on the HUBTraining page, mostly those from before the specific topic was decided.)

Tuesday September 3, 2013


  1. Categorise applicants based on our spreadsheet votes: accepted, rejected (too experienced), rejected.
  2. Additional trainers
    1. Should we invite those with too much experience as assistant trainers?
    2. Any other specific HUB people that we can ask? (Agnes, Gideon, ?)
    3. General email to HUB asking for helpers?
  3. Fill out programme.
    1. Drinks in Cafe Botanik rather than having a barbecue? (Much less hassle.)
  4. Assign tasks.


Present: Matt, Holger, Grainne, Jon

Things for particular people to do are indicated by names in bold.

  • choosing course participants
    • will accept all 22 with one or more vote (i.e. not including Peter Beardsley :-))
    • Matt to draft two letters on the wiki
      • registration accepted
      • registration rejected because of too much linux experience for this course. These people will be invited to come along as teaching assistants. We initially thought this might not work if they're just milling around with nothing to do, and because we don't have much time to prep them. But then we decided even just simple linux advice would help, as long as the email accurately represents what we're after and what they'll get out of helping. The email should also mention the possibility of more advanced courses in the future.
  • Additional trainers
    • In general, the four of us should be enough to manage ca. 20 students. But we can ask others if they are happy to help with general linux without too much prep from us (and if we are clear that this is what we're after).
    • Will invite applicants with too much experience for the course to come along as teaching assistants (see above)
    • Matt to ask Agnes and Gideon (HUBers with NGS experience) if they would like to help
    • Decided not to send a general email to HUB to ask for helpers as the lack of concrete reasons beyond being helpful and the possibility of misunderstanding this as a normal or full HUB event might risk putting people off future events where their participation will be more useful.
  • After course barbecue / picnic - because of the hassle of organising barbecue equipment, access to bioquant after 7pm (requires a card), and the lack of fridge space to keep food from 10am to 6pm, we decided to go to Cafe Botanik (on campus) for beer / coffee / food. The course info / emails should emphasise that this is not just for 'a beer' (so that we don't put off non-drinkers), and we should invite rejected applicants (even if they don't want to help during the day).
  • Holger asked if there is a German-wide DFG compute cluster that we might be able to use, for this or future courses. Jon said there's one based in Munich, one in Dresden, and regiobal ones in Karlsruhe etc. Apparently difficult to access - need to apply in advance and difficult to set up software. Holger suggested that it'd be good if they had an Amazon-like model, for easy scaling of virtual machines based on expected demand? Jon to ask a colleague about this.
  • We should use Markdown to write any text documents we put on github (Jon).
  • Matt to:
    • upload Holger's linux course material to wiki (done: File:LinuxCourse.pdf)
    • check software on relevant bioquant computers and install if necessary.
    • check where programs will run, in case we'll be overloading one computer
      • can students can use bioquant cluster?
      • check bw_grid
      • we'll possibly allow time for checking intensive jobs at the end of the day.
    • may also preload necessary data files, either on the network share or on a publically accessible ftp site (Grainne pointed out that Github only allows 1Gb.)
    • check room and computers on the Wednesday morning (Sept 18), then let the others know if it's worth meeting that afternoon to run through the course. This will give us some time to fix problems and also give us an incentive to have course material finished by then.
  • Grainne to make Github writeable.
  • Matt, Jon and Holger to send Grainne our github crendentials.
  • When we test examples, we should note ways that it didn't work (if any), as possible source of error questions for students.
  • Holger will give a short presentation for each section of the linux part, and will lead this section. (Based on a NGS-example version of his existing linux course).
  • Grainne will prepare most of the exercises for the NGS section (since she has the most NGS experience), for the rest of us to test.
  • For the section on investigation of file formats:
    • split the students in to ca. four groups
    • get each group to research one of the 5-6 most popular / useful formats
    • then what they found present to everyone.
    • Should use a flipboard for presentation.
    • aim is to get them away from thinking that files are a black box of data.
    • it is important that at least one of the formats is binary
  • maybe aim to finish by 5, to allow breathing room for overruns and time for discussion.
  • We should give out a questionnaire (anonymous feedback) at the end of the course, but also get verbal feedback (maybe at the pub afterwards)
    • Matt to ask Aidan if he has an existing general evaluation form we can use as a basis for this

Summary of Questionnaire Responses

Generated from the R script in GitHub IntroCommandLineNGS

R --slave --no-save -f questionnaire_responses.R > questionnaire_responses.txt

Key to Numeric Responses

0 = no response, 1 = not at all, 5 = completely

The course fulfilled the advertised objectives

  Var1 Freq
1    4    5
2    5   15

I found the course useful

  Var1 Freq
1    4    6
2    5   14

The course was well organised

  Var1 Freq
1    3    1
2    4    6
3    5   13

Communication before the course was clear and timely

  Var1 Freq
1    3    1
2    4    3
3    5   16

The teaching standard on the course was high

  Var1 Freq
1    3    2
2    4    4
3    5   14

The course content was relevant to my needs

  Var1 Freq
1    3    2
2    4    6
3    5   12

I would recommend the course to colleagues

  Var1 Freq
1    4    6
2    5   14

I enjoyed the course

  Var1 Freq
1    3    2
2    4    4
3    5   14

I would be interested in taking a more advanced course to further improve my NGS skills

  Var1 Freq
1  yes   20

I would be willing to spend X days to attend an advanced course

  Var1 Freq
1  >=1    1
2    2    2
3  2-3    6
4    3    4
5  3-4    1
6  3-5    3
7    4    1
8    5    2

I would like to further improve my command line Linux skills

  Var1 Freq
1  yes   17
2 <NA>    3

I would like to better understand the theory behind NGS analysis

  Var1 Freq
1  yes   18
2 <NA>    2

Holger Helpfulness

  Var1 Freq
1    0    3
2    4    1
3    5   16

Holger Understandability

  Var1 Freq
1    0    3
2    4    1
3    5   16

Matt Helpfulness

  Var1 Freq
1    0    6
2    5   14

Matt Understandability

  Var1 Freq
1    0    7
2    4    1
3    5   12

Grainne Helpfulness

  Var1 Freq
1    0    3
2    4    2
3    5   15

Grainne Understandability

  Var1 Freq
1    0    2
2    3    1
3    4    3
4    5   14

Jon Helpfulness

  Var1 Freq
1    0    7
2    4    1
3    5   12

Jon Understandability

  Var1 Freq
1    0    8
2    4    2
3    5   10

How did you find out about the course

  • by email
  • Internet
  • Through email from EMBL staff
  • My boss forwarded me a mail, which he got via DKFZ mail list
  • email
  • email
  • Colleague
  • internal email
  • email alert
  • Mailing list bioquant
  • email announcement
  • Mail was forwarded by a friend working at Bioquant
  • email
  • Very good
  • email
  • email

Please describe an aspect of the course that you found particularly useful

  • Everything, it's just what I wanted
  • The handout is excellent
  • help of people around
  • Getting understanding of linux
  • I could learn basic of bash/shell programming and aquainting different software packages for NGS analysis was also useful.
  • File formats; script is really useful! Best: running the programs
  • Found out, that what I did until now was OK!
  • Trying fastq, tophat, samtools on data -> very useful to demonstrate practically
  • Introduction to NGS file formats and explanation of their differences
  • Basic introduction to command line combined with some useful algorithms. Advanced course would be great
  • Detailed explanation of all necessary commands. Very good handout!
  • Step by step walkthrough RNAseq analysis. Very informative and well-designed handout
  • Pipe and export. Handout
  • everything was useful
  • Being in a room away from my own computer and lab members so I could concentrate. (Under supervision!)

Please describe an aspect of the course that you did not find useful

  • I know it is not easy, but would be nice to present things in a more interesting way... not easy to stay awake
  • focus on eukaryotes
  • Just nothing
  • TIME! It was too slow in the intro, then too fast at the harder part!
  • In general, everything was useful. However, I did some basic unix command line courses before, so I would prefer we spent less time on that and more on the actual NGS part.
  • Too much time spent on file formats; go through that more quickly and spend more time on the commands, e.g. tophat etc.
  • Interactive presentation about NGS-related data types. Too long and too little information which will stick. Short overview about most important ones w/i the script would probably be more useful
  • The file formats part was a bit dry because we weren't using them yet.

Additional Comments

  • Didn't like the presentations part, focusing on my task intensively really makes it hard to listen and understand other people's presentations
  • Run the commans from one of the terminals to check if it works. :p
  • Very useful, maybe need more time and information
  • It was awesome course. Thanks a lot
  • As I has a course before some infos were redundant - nevertheless it is good to repeat the basics as well. I would be really interested in doing more data analysis
  • Really great course, you just need to spread it out over several days! But since you do it in your free time, some material benefit wouldn't hurt! :-)
  • Printed summary of file formats after presentations would be nice
  • All in all, very nice course! Looking forward to the more advanced one :) Thanks!
  • A short introduction to the course organisers and what and where the work would have been nice. Thanks for put the course together!
  • It would have been nice if you had introduced yourselves in the beginning =)