What? Open Data - debate the pros and cons

When? Tuesday 5th May, 2015, 18:30-20:30

Where? BioQuant room 043, Im Neuenheimer Feld 267, 69120 Heidelberg (How to get there.)

Who? A team of people working at several different life-science organisations within Heidelberg

Rough time Activity MC
18:30-18:35 Welcome and introduction Florian
18:35-18:40 Flash talk on Open Data Andy Hufton, Managing Editor, Scientific Data
18:40-18:45 Flash talk about the SourceData initiative Sara El-Gebali, SourceData Biocurator
18:45-18:55 Ice-breaker Matt
18:55-19:20 Deciding important topics in Open Data using Open Space Technology. Florian & Pierre
19:20-19:50 Group debates on the defined topics. Florian & Pierre
19:50-20:10 Short summary per group to everyone Florian
20:30 Drinks in cafe Botanik

The flash talks came first, Andy taking it easy and reading from the back of an envelope to get across his main message: As scientists, we should not think about whether or not our data should be free: eventually, they are all going to be anyway. Rather, we should prepare and present them with the prospect of them being publicly available in mind. He also talked about the challenges of presenting research as a publisher in the light of increasing amounts of data and spoke out in favour of Open Access publishing.

Sara talked about the highly interesting SourceData initiative and its implementation. The basic idea is to make figures searchable by attaching to them metadata about what type of experiment was conducted, which things were observed and which parameters were manipulated. The SourceData initiative aims for metadata attachment becoming a standard point in the publication process, thus making it possible to not only search text but also figures, which will be a powerful complementation to text mining approaches.

Next came the icebreaker and I must say, I was really surprised at the amount of data people deal with on a regular basis!

When we asked people to put (provocative) statements about Open Data on post-its, it seemed as if the previous points had heated up people's minds enough to fill the whole blackboard in no time. This was an ideal start for our debates (actually, rather peaceful discussions), which filled the rest of the evening. At the end, the discussion groups' 'moderators' summed up the main points/problems that people found to the audience:

  • Everyone finds the idea of open data in science good. However, there is a lack of motivation, funding and credit for the ones who have to go through the hassle of cleaning up the data, making them accessible, hosting them on the internet etc.
  • One solution that was suggested was that journals demand that data be accessible in a useful manner. Only then will funding bodies understand that a certain amount of money has to be dedicated for this purpose.
  • People also pointed out that open standards and file formats are in many circumstances more important than making all data public.
  • Another questions was whether there is such a thing as "bad data" that cannot be used for publication. The bioinformaticians usually said that good data are data that are reproducible or that give good QC scores when they run it through their pipelines.

I think everyone enjoyed this HUB, there was a good mood and many different things were discussed which touched upon many issues that will continue to stay a "hot topic" in bioinformatics and biology.

Here are the 4 categories defined and the different topics discussed in each of them:

