edoc-Server

Humboldt-Universität zu Berlin | edoc-Server | Publishing | FAQs | Frequently Asked Questions

Frequently Asked Questions

Electronic publishing in general

Sorry, this website is still incomplete. For general questions on electronic publishing switch to DE.

 

Publication of research data

Questions:

 

Answers:

What are research data?

Research data are data that are generated or collected during the research process and serve as a basis for research results. This includes, among others, measurement data, field diaries, images, audio and video files, digitized materials, interview transcripts, 3D simulations and applied software.

 

Which types of research data can be published on the edoc server?

The edoc server does not technically restrict the types of research data. From the point of long-term archiving and re-usability, we recommend using open and well established file formats for research data publications. The following categories are available for selection:

  • audio
  • images
  • records
  • research data collections
  • models
  • software
  • video
  • other research data

 

Can research data be published with a blocking period (embargo)?

Yes. The maximum duration for an embargo is five years. During the embargo the metadata of the research data will be published, but the research data itselfs are not visible and downloadable. During the lock-up period, the author may allow certain persons access. After expiry of the embargo, the research data publication will be released to the public.

 

How is a blocking period (embargo) agreed?

Please use the field "Public readable note".


How long research data will be kept on the edoc-Sever?

The aim is to archive and keep accessible all publications with a maximum long-term perspective. A minimum of 10 years is guaranteed. Dissertations as well as the research data that are indispensable for the understanding are also subject to a legal collecting mandate and thus to long-term archiving on behalf of the German National Library.


How is the visibility and discoverability of the research data ensured?

Via an OAI interface, the contents are also accessed by external reference services and can thus be accessed via relevant search engines. Google Scholar or BASE research them worldwide. All published content has persistent identifiers (URN and DOI) and can therefore be clearly cited, permanently linked and quickly found.

 

How can research data be reused?

The terms of use for third parties are governed by licenses. In accordance with the Research Data Policy of the Humboldt University of Berlin, the use of the open standard license Creative Commons (CC) is offered.

 

How to deal with very large data sets?

If you have a large number of files, you should pack useful units into zip files if possible. In this case, the persistent identifier and descriptive metadata are assigned to the entire zip file as a publication unit. If you want to publish large files (> 500 MB), we ask you to contact the author's service in advance.

 

How to select relevant research data for publication?

Which research data is published and to what extent depends on the objective of the publication. Research data can have an illustrative function by demonstrating certain findings. In this case, research data should be selected tailored to this purpose. For the reproduction of research results, however, it may be necessary to publish entire data sets as well as to disclose the analysis tools or algorithms.
Another goal of a research data publication may be provision for reuse and further research. In this case, the subject-specific and inter-disciplinary relevance should be estimated taking into account the survey effort and the reproducibility of the respective data. Whether the raw data or already selected and edited data is suitable for this purpose depends on the research context. In any case, it is recommended to make the selection criteria for the published research data transparent in a documentation.


How to submit additional research data?

The submission of the text publication and the submission of research data are separate processes. We recommend first to submit the research data so that they can be cited in the text publication. Start submitting by logging in or registering as described in "Submission". Select a suitable collection in the "Forschungsdaten" section.


How are the research data described?

Upon submission, an entry form opens in which you enter all the required information. The form is largely self-explanatory. Required fields are marked with *. This information (metadata) is primarily used to make your publication visible and searchable.
For the reproduction and the subsequent use of research data, it is necessary to document their context of origin and the tools used for the data collection, processing or analysis. The documentation of research data can be very subject-specific and should be based on disciplinary guidelines. It is recommended to upload a research data documentation in addition to the descriptive metadata in the form of a * .txt file.

 

What should be in a readme file?

To enable potential re-use, it is important to provide an description of the research data. Therefore, we recommend to include a README file in .txt or .md format. This file may contain the following information:

1. Details about the collection of the research data:

Who collected the data?

  • For what purpose was the data (originally) collected?
  • What method was used to collect the data?
  • When and where was the data collected?
  • What equipment was used and what settings were made?
  • Further details depending on the type of research data, e.g. precise information on materials/samples used or on the approach and instruction of test subjects.

2. Information on the further processing of the data:

  • How was the data validated in terms of content and technology?
  • How was the data cleansed?

3. Details on the structure of the available data set:

  • How was the data selected for the publication? Which of the originally collected data is not included and why?
  • How is the dataset structured, e.g. different types of data, folders, files?
  • What do labels, codes, variables, abbreviations mean?
  • What are the differences between different file versions?

Please note that this list is not conclusive. Depending on the type of research data, further information may be required. Providing a comprehensive and clear documentation makes it easier for other researchers to use and interpret your data. A template as well as an example of a readme file can be found on the HU research data management website.


How are research data linked to the text publication?

Both the text publication and the associated digital research data receive persistent identifiers (DOI - Digital Object Identifier). With the aid of these persistent identifiers, different publication objects can be related to each other or referenced. When submitting a publication on the edoc server, the relations "Has Part" or "Is Part Of" in the sense of an integral part or "Has Supplement" or "Is Supplement To" in the sense of a supplement can be specified. A research data publication can also be related to several forms of publication (eg dissertation thesis and journal articles). The publication of the text publication and the research data can also be delayed.


How is the reuse of the research data regulated?

The terms of use for third parties are governed by licenses. Select a license in the entry form! The same license must also be ticked in the first publication contract. If multiple files are uploaded under one submission, the same license terms apply to all files.


Which data formats are suitable?

To ensure long-term archiving and availability of research data publications, you should use open, non-proprietary, and well-established data formats. The files should not be password-protected or encrypted. The following table gives you an overview of the compatibility of common file formats for long-term archiving.

If a certain proprietary format is common in your discipline, you may also publish your dataset in this format. However, we recommend to additionally provide the data in a format which is suitable for long-term archiving.

Preferred Accepted Not suitable
Text
  • PDF/A (*.pdf)
  • unformatted text (z.B. *.txt, *.asc, *.xml)
  • OpenDocument Text (*.odt)
  • PDF (*.pdf)
  • MS Word (*.docx)
  • Rich Text Format (*.rtf)
  • HTML (*.htm, *.html)
  • LaTeX and TeX
  • MS Word (*.doc)
Statistical data
  • Text file, comma-separated values (*.csv) 
  • Excel (*.xlsx)
  • OpenDocument Formate (*.odm, *.odt, *.odg, *.odc, *.odf)
  • Commonly used (proprietary) formats of statistical packages, such as. SPSS Portable (*.por), SPSS (*.sav), SAS Transport (*.sas), Stata (*.dta) and R (*.R)
  • Excel (*.xls, *.xlsb)
Images and graphis
  • JIFF uncompressed (*.tif)
  • PNG (*.png)
  • SVG (*.svg)
  • JIFF compressed (*.tif)
  • GIF (*.gif)
  • Bitmap-Grafik (*.bmp)
  • JPEG (*.jpg)
  • PDF/A, PDF (*.pdf)
  • Photoshop (*.psd)
  • Illustrator (*.ai)
  • Encapsulated Postcript (*.eps)
Audio
  • WAV (*.wav)
  • Advanced Audio Coding (*.mp4)
  • MP3 (*.mp3)
 
Video
  • FFV1 Codec in Mastroka container (*.mvk)
  • QuickTime Movie (*.mov)
  • Motion JPEG 2000 (*.mj2, *.mjp2)
  • MPEG-4 (*.mp4)
  • Audio Video Interleave (*.avi)
  • Windows Media Video (*.wmv)

Here you will find a comprehensive list of file formats and their suitability for long-term archiving.


How do you name the files?

Research data should be organized and saved using a clear directory structure and file naming. Before starting data collection we recommend to define a set on conventions on how to organize and name your files. Renaming is not possible after publication. All files should be named according to its content or function.

The following criteria should be considered when naming files:

  • Do not use spaces, special characters (\ / ? : * " > < | : # % " { } | ^ \[ ] \` ~), extended Latin alphabet characters (ä ö ü ß etc.), diacritical marks (à é ô etc.), or non-ASCII characters

  • Use the underscore (_), hyphen (-) or capitalization of the first letter to separate

  • Use a date/time stamp or separate ID (e.g. v1.0.0) for each version

  • Document any naming conventions or abbreviations used (e.g., in the data management plan)

  • The date should be at the beginning or at the end of the filename to make sorting easier

  • Avoid generic file names such as "manuscript.pdf," "result1.csv," "image12.png"

What will be checked before publishing?

There is only a formal review of metadata and files. In case of problems, you will receive a message and the opportunity for correction.
    

Does an additional publication contract have to be concluded?

For the publication of research data, a first publication contract must be concluded.  Use the form "Erstveröffentlichungsvertrag für Forschungsdaten".