From HUB
Revision as of 13:43, 11 May 2015 by Khoueiry (talk | contribs) (Report)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Sign up for the event Let us know you whether you plan to come by filling in this Doodle poll

What? Open Data - debate the pros and cons

When? Tuesday 5th May, 2015, 18:30-20:30

Where? BioQuant room 043, Im Neuenheimer Feld 267, 69120 Heidelberg (How to get there.)

Who? A team of people working at several different life-science organisations within Heidelberg

To keep track of people you've met at HUB, join the HUB LinkedIn group

Get involved in organising this and other HUBs!


Rough time Activity MC
18:30-18:35 Welcome and introduction Florian
18:35-18:40 Flash talk on Open Data Andy Hufton, Managing Editor, Scientific Data
18:40-18:45 Flash talk about the SourceData initiative Sara El-Gebali, SourceData Biocurator
18:45-18:55 Ice-breaker Matt
18:55-19:20 Deciding important topics in Open Data using Open Space Technology. Florian & Pierre
19:20-19:50 Group debates on the defined topics. Florian & Pierre
19:50-20:10 Short summary per group to everyone Florian
20:30 Drinks in cafe Botanik

Task List

A list of tasks that need to be done on or before the meeting day. Just sign your name to say that you'll do a task, and the same again when the task is done.

To arrange before evening Person responsible
List of registered people
Drinking Water and Cups? Matt Matt
Signs to help people find the room Matt
Create the poster Florian Florian
Print and display poster where relevant
  • BioQuant: Matt
  • EMBL: Holger
  • DKFZ:
  • ZMBH:
  • Mensa:
  • BioQuant: Matt
  • EMBL: Holger
Advertise on linkedin Matt Matt
Projector Already in room
Email advertising the event Florian Florian
Send reminder email the day of the event Matt Matt

List of stationery/things for organisers to bring to the meeting
stickers for name badges
lots of (if possible thick) pens for writing on the badges
HUB signs (HUB logo with an arrow)
/sellotape/to put up signs
postit notes

To do on the evening Person responsible
Open the room Matt
Arrange tables/chairs Pierre
Hang up signs to the room Matt
Setup and welcome attendees Pierre
Tidy and close room Matt+Pierre
Take participants to Cafe Botanik Holger

Planning Meetings

Tuesday April 21, 08:00 in Cafe Fresco

Present: Florian, Pierre, Matt


  • focus on open data in science (not eg. open government data), inc.
    • what experiences do people already have?
    • journal requirements and what this will mean to eg. authors and referees
    • we include code and file formats as well as data in 'open data'
  • we'd like to get people to debate the pros and cons
  • suggested programme:
    • Florian to MC, inc. introducing HUB and the programme at the beginning
    • would be nice to have an invited speaker (eg. Thomas Lemberger from MSB/EMBO or Andy Hufton ex EMBO, currently 'Scientific Data') but may be too short notice, in which case Florian will give a short talk (5 min)
    • Matt to organise the ice-breaker (10mins), probably the familiar 'standing on a line'. Start with a general question (how many ice creams did you eat last week), then 'how much experience do you have of open data', then 'do you think data should always be open'. This will lead in to the rest of the evening.
    • We will then use post-its / 'open-space' (15 + 15 mins) to define questions or positions for debating.
    • Form groups of roughly five people each to debate the questions defined earlier. These groups will have one moderator / time-keeper, with the other people split in to the 'pro' and 'con' side, and arguing the point appropriately. We thought this would be a good way to get people to think of all points, rather than just arguing from the side that they're on already. It will require some briefing of the moderators, but we can do this on the night. We will do this in approximately three rounds of ten minutes each. After each round, everybody apart from the moderator should swap to another group (world-cafe style).
    • The moderator should frame the question, keep people on topic, be time-keeper and make sure the pro and con sides have time to speak. (Whilst realising that it's OK if people just want to discuss rather than debate.)
    • At the end, the moderator of each group (or anyone else) should briefly summarise the discussion for everyone else (1min per group).
  • Logistics:
    • Florian to check if there's a room available at ZMBH. If not, Matt to check BioQuant and then COS
    • Florian to check with Aidan re. suggested invited speakers
    • Florian to be overall MC, inc. introduction to hub and the evening's programme, and 5min talk on open data if no invited speaker available
    • Matt to MC the ice-breaker
    • Florian and Pierre to MC / time-keep the open-space discussions and the small-group debates.
    • Florian to create the poster
    • Matt to set up the wiki page
    • We decided next week (April 28) is too short notice, and to go for Tuesday May 5 instead (which gives a reasonable gap, 2+ weeks, until the HFYLS/HUB on May 21).


The flash talks came first, Andy taking it easy and reading from the back of an envelope to get across his main message: As scientists, we should not think about whether or not our data should be free: eventually, they are all going to be anyway. Rather, we should prepare and present them with the prospect of them being publicly available in mind. He also talked about the challenges of presenting research as a publisher in the light of increasing amounts of data and spoke out in favour of Open Access publishing.

Sara talked about the highly interesting SourceData initiative and its implementation. The basic idea is to make figures searchable by attaching to them metadata about what type of experiment was conducted, which things were observed and which parameters were manipulated. The SourceData initiative aims for metadata attachment becoming a standard point in the publication process, thus making it possible to not only search text but also figures, which will be a powerful complementation to text mining approaches.

Next came the icebreaker and I must say, I was really surprised at the amount of data people deal with on a regular basis!

When we asked people to put (provocative) statements about Open Data on post-its, it seemed as if the previous points had heated up people's minds enough to fill the whole blackboard in no time. This was an ideal start for our debates (actually, rather peaceful discussions), which filled the rest of the evening. At the end, the discussion groups' 'moderators' summed up the main points/problems that people found to the audience:

  • Everyone finds the idea of open data in science good. However, there is a lack of motivation, funding and credit for the ones who have to go through the hassle of cleaning up the data, making them accessible, hosting them on the internet etc.
  • One solution that was suggested was that journals demand that data be accessible in a useful manner. Only then will funding bodies understand that a certain amount of money has to be dedicated for this purpose.
  • People also pointed out that open standards and file formats are in many circumstances more important than making all data public.
  • Another questions was whether there is such a thing as "bad data" that cannot be used for publication. The bioinformaticians usually said that good data are data that are reproducible or that give good QC scores when they run it through their pipelines.

I think everyone enjoyed this HUB, there was a good mood and many different things were discussed which touched upon many issues that will continue to stay a "hot topic" in bioinformatics and biology.

Here are the 4 categories defined and the different topics discussed in each of them:

Category 2 Category 3 Category 4 Category 1