Module Title: Data annotation and curation
Module Description:
Data preparation, developing standardized quality assurance processes and pipelines
Team Lead(s): Nicole Vasilevsky Team Members: Nicole Vasilevsky
At the completion of this component, the learner will be able to:
- Apply data preparation and planning best practices
- Describe data annotation and biocuration
- Apply data standards to research data sets using manual methods
Module Prerequisites: None
Description: This unit describes best practices for data preparation and planning including deciding the best formats to store data, directory and file naming conventions, basic metadata considerations, and data sharing considerations.
Unit 1 Slides: BDK12-1.pptx
Unit 1 Audio: BDK12-1.mp3 - Full lecture, Audio File - Individual Slides
Example: online presentation
Description: This unit describes best practices for digital file and directory naming.
Unit 2 Slides: BDK12-2.pptx
Unit 2 Audio: BDK12-2.mp3 - Full lecture, Audio File - Individual Slides
Unit 1 & 2 Exercise: BDK12_Exercise01.docx
Example: online presentation
Description: This unit describes professional biocuration and how researchers can better annotate their data to become biocurators themselves.
Unit 3 Slides: BDK12-3.pptx
Unit 3 Audio: BDK12-3.mp3 - Full lecture, Audio File - Individual Slides
Unit 3 Exercise: BDK12_Exercise02.docx (in BDK12_exercises.zip)
Unit 3 Exercise: Read the blog post: Ontological Annotation of Data and complete BDK12_Exercise03.docx
Example: online presentation
Exercises: BDK12_exercises.zip Glossary: BDK12_GlossaryTerms.pdf
References & Resources: BDK12_Ref.pdf
- 10 Simple Rules for the Care and Feeding of Scientific Data: http://arxiv.org/pdf/1401.2134v1.pdf
- Preparing the Workforce for Digital Curation: http://www.ncbi.nlm.nih.gov/books/NBK293667/#sec_13
References cited in lecture:
- Hirschman J, Berardini TZ, Drabkin HJ, et al . A MOD(ern) perspective on literature curation. Mol. Genet. Genomics 2010;283:415-425.
- Howe D, Costanzo M, Fey P, et al . Big data: the future of biocuration. Nature 2008;455:47-50.
- http://en.wikipedia.org/wiki/Annotation
- https://en.wikipedia.org/wiki/Data_curation
- Tenopir, C. and King, D.W. (2007), “Perceptions of value and value beyond perceptions: measuring the quality and value of journal article readings”, Serials, Vol. 20 No. 3, pp. 199-207.
- http://www.getty.edu/research/publications/electronic_publications/intro_controlled_vocab/what.pdf
- Vasilevsky et al. (2013), “On the reproducibility of science: unique identification of research resources in the biomedical literature.” PeerJ 1:e148; DOI 10.7717/peerj.148
Nothing makes a learning session more engaging than fabulous visuals. While many in the education realm are accustomed to using a variety of rich images under the educational use exception, materials presented in an online educational resource (OER) format that are freely available and allow for users to remix, tweak and build upon the OERs present a unique problem. Images used in these circumstances must carry stringent CC BY-NC-SA (Creative Commons: Attribution – Non-Commercial – Share Alike) copyright.
As a result, the materials provided here have limited imagery as we intend for the users to remix, tweak and make these modules their own. At points in this module I have suggested inserting images of your choosing, not only to help create visual interest, but also to help tailor the educational experience to your audience. For examples, images that are being produced by researchers on your campus or in your department will drive a point home more effectively than generic or stock photos.
How does all of this copyright stuff work? For more information on copyright and fair use, I recommend a couple of resources.
- The Stanford Copyright and Fair Use page is very straightforward: http://fairuse.stanford.edu Of particular use is the “Academic and Education Permissions” section.
- Dr. Kenneth Cruse is an internationally known copyright expert, and he established the Copyright Advisory Office at Columbia University. The Copyright Advisory Office provides excellent explanations and worksheets: https://copyright.columbia.edu/basics.html
- For more information on Creative Commons licenses, please see https://creativecommons.org/licenses/
When should you look to add additional images? When you see the clipboard icon, please consider identifying relevant images to the presentation. Suggested images may be hyperlinked, but not embedded in the presentation. Use your creativity when identifying images!
Where do I find images? There are several sources that might be available to you. Depending on how you plan on using the BD2K modules, you may have more flexibility to locate images. Once you have identify the license that you wish to use, you can search with those restrictions in mind.
- Google Images: Head to Google Advanced Image Search and under the “usage rights” filter, select the filter that matches your requirements.
- Flickr Creative Commons: Many users of Flickr have elected to allow their photographs to be reused. To browse or search for CC licensed images, head to https://www.flickr.com/creativecommons/
- Institutional licenses: depending on your home institution, your library may subscribe to an image database that may be useful. Please consult with your librarian to see if such assets are available to you.