From d6d4290def6de57e31300b9ba4cdc834570c6cc4 Mon Sep 17 00:00:00 2001
From: "Zhao, Xin" <xin.zhao@pnnl.gov>
Date: Thu, 14 Mar 2024 23:33:37 -0400
Subject: [PATCH] Address Manuscript Suggestions from Klau506

---
 paper.md                 | 44 ++++++++++++++++++++--------------------
 vignettes/references.bib | 41 +++++++++++++++++++++++++++++++++++++
 2 files changed, 63 insertions(+), 22 deletions(-)

diff --git a/paper.md b/paper.md
index 2a77f1a4..6f8e09f2 100644
--- a/paper.md
+++ b/paper.md
@@ -43,9 +43,9 @@ The **`gcamfaostat`** R package is designed for the preparation, processing, and
 
 # Statement of need
 
-Global economic and multisector dynamic models have become pivotal tools for investigating complex interactions between human activities and the environment, as evident in recent research [@Doelman2022Quantifying;@Fujimori2022Land-based;@IPCC2022Annex;@Ven2023multimodel]. Agriculture and land use (AgLU) plays a critical role in these models, particularly when used to address key agroeconomic questions [@Graham2023Agricultural;@Yarlagadda2023Trade;@Zhang2023Agriculture;@Zhao2021Global;@Zhao2020critical]. Sound economic modeling hinges significantly upon the accessibility and quality of data [@Bruckner2019FABIO;@Calvin2022GMD;@Chepeliev2022JGEA]. The FAOSTAT serves as one of the key global data sources, offering open-access data on country-level agricultural production, land use, trade, food consumption, nutrient content, prices, and more [@FAOSTAT2023FAOSTAT]. However, the raw data from FAOSTAT requires cleaning, balancing, and synthesis, involving assumptions such as interpolation and mapping, which can introduce uncertainties. In addition, some of the core datasets reported by FAOSTAT, such as FAO’s Food Balance Sheets (FBS), are compiled at a specific level of aggregation, combining together primary and processed commodities (e.g., wheat and flour), which creates additional data processing challenges for the agroeconomic modeling community [@Chepeliev2022JGEA]. It is noteworthy that each agroeconomic modeling team typically develops its own assumptions and methods to prepare and process FAOSTAT data [@bond2019gcamdata]. While largely overlooked, the uncertainty in the base data calibration approach likely contribute to the disparities in model outcomes [@Lampe2014AgMIP;@zhao2021role]. Hence, our motivation is to create an open-source tool (**`gcamfaostat`**) for the preparation, processing, and synthesis of FAOSTAT data for global agroeconomic modeling. This tool bridges a crucial gap in the literature by offering several key features and capabilities.
+Global economic and multisector dynamic models have become pivotal tools for investigating complex interactions between human activities and the environment, as evident in recent research [@Doelman2022Quantifying;@Fujimori2022Land-based;@Ven2023multimodel]. Agriculture and land use (AgLU) plays a critical role in these models, particularly when used to address key agroeconomic questions [@Graham2023Agricultural;@Yarlagadda2023Trade;@Zhang2023Agriculture;@Zhao2021Global;@Zhao2020critical]. Sound economic modeling hinges significantly upon the accessibility and quality of data [@Bruckner2019FABIO;@Calvin2022GMD;@Chepeliev2022JGEA]. The FAOSTAT serves as one of the key global data sources, offering open-access data on country-level agricultural production, land use, trade, food consumption, nutrient content, prices, and more [@FAOSTAT2023FAOSTAT]. However, the raw data from FAOSTAT requires cleaning, balancing, and synthesis, involving assumptions such as interpolation and mapping, which can introduce uncertainties. In addition, some of the core datasets reported by FAOSTAT, such as FAO’s Food Balance Sheets (FBS), are compiled at a specific level of aggregation, combining together primary and processed commodities (e.g., wheat and flour), which creates additional data processing challenges for the agroeconomic modeling community [@Chepeliev2022JGEA]. It is noteworthy that each agroeconomic modeling team typically develops its own assumptions and methods to prepare and process FAOSTAT data [@bond2019gcamdata]. While largely overlooked, the uncertainty in the base data calibration approach likely contribute to the disparities in model outcomes [@Lampe2014AgMIP;@zhao2021role]. Hence, our motivation is to create an open-source tool (**`gcamfaostat`**) for the preparation, processing, and synthesis of FAOSTAT data for global agroeconomic modeling. This tool bridges a crucial gap in the literature by offering several key features and capabilities.
 
-1.	**Transparency and Reproducibility**: **`gcamfaostat`** incorporates functions for downloading, cleaning, synthesizing, and balancing agroeconomic datasets in a traceable, transparent, and reproducible manner. This enhances the credibility of the processing and allows for better scrutiny of the methods. We have documented and demonstrated the use of the package in generating and updating agroeconomic data needed for the GCAM.  
+1.	**Transparency and Reproducibility**: **`gcamfaostat`** incorporates functions for downloading, cleaning, synthesizing, and balancing agroeconomic datasets in a traceable, transparent, and reproducible manner [@wilkinson_fair_2016]. This enhances the credibility of the processing and allows for better scrutiny of the methods. We have documented and demonstrated the use of the package in generating and updating agroeconomic data needed for GCAM v7 [@bond_lamberty_2023].  
 2.	**Expandability and Consistency**: **`gcamfaostat`** can be used to flexibly process and update agroeconomic data for any agroeconomic model. The package framework can be also easily expanded to include new modules for consistently processing new data.          
 3.	**Community Collaboration and Efficiency**: The package provides an open-source platform for researchers to continually enhance the processing methods. This collaborative approach, which establishes a standardized and streamlined process for data preparation and processing, carries benefits that extend to all modeling groups. By reducing the effort required for data processing and fostering harmonized base data calibration, it contributes to a reduction in modeling uncertainty and enhances the overall research efficiency.    
 4.	**User Accessibility**: Where applicable, the processed data can be mapped and aggregated to user-specified regions and sectors for agroeconomic modeling. However, beyond the modeling community, **`gcamfaostat`** can be valuable to a broader range of users interested in understanding global agriculture trends and dynamics, as it provides user-friendly data processing and visualization tools. 
@@ -55,28 +55,28 @@ Global economic and multisector dynamic models have become pivotal tools for inv
 ## Bridging the gap between FAOSTAT and global economic modeling
 
 
-\autoref{fig:Fig1} shows a standard framework of using FAOSTAT data in GCAM. GCAM is a widely recognized global economic and multisector dynamic model complemented by the gcamdata R package, which serves as its data processing system. Particularly, gcamdata includes modules (data processing chunks) and functions to convert raw data inputs into hundreds of XML input files used by GCAM [@bond2019gcamdata]. As an illustration, in the latest GCAM version, GCAM v7 [@bond_lamberty_2023], about 280 XML files, with a combined size of 4.1 GB, are generated. Although AgLU-related XMLs represent only about 10% of the total number of files, they contribute over 50% in size (~2.1 GB). The majority of AgLU-related data, whether directly or indirectly, rely on raw data sourced from the FAOSTAT. 
+\autoref{fig:Fig1} shows a standard framework of using FAOSTAT data in GCAM. GCAM is a widely recognized global economic and multisector dynamic model complemented by the `gcamdata` R package, which serves as its data processing system. Particularly, `gcamdata` includes modules (data processing chunks) and functions to convert raw data inputs into hundreds of XML input files used by GCAM [@bond2019gcamdata]. As an illustration, in the latest GCAM version, GCAM v7 [@bond_lamberty_2023], about 280 XML files, with a combined size of 4.1 GB, are generated. Although AgLU-related XMLs represent only about 10% of the total number of files, they contribute over 50% in size (~2.1 GB). The majority of AgLU-related data, whether directly or indirectly, rely on raw data sourced from FAOSTAT. 
 
-Nonetheless, the FAOSTAT data employed within gcamdata has traditionally involved manual downloads and may have undergone preprocessing. In light of the increasing data needs, maintaining the FAOSTAT data processing tasks in gcamdata has become increasingly challenging. In addition, the processing of FAOSTAT data in the AgLU modules of gcamdata is tailored specifically for GCAM. Consequently, the integration of FAOSTAT data updates has proven to be a non-trivial task, and the data processed by the AgLU module has limited applicability in other modeling contexts [@zhao_cmp360]. The **`gcamfaostat`** package aims to address these limitations (\autoref{fig:Fig2}). The targeted approach incorporates data preparation, processing, and synthesis capabilities within a dedicated package, gcamfaostat, while regional and sectoral aggregation functions in the model data system are implemented using standalone routines within the gcamdata package. This strategy not only ensures the streamlined operation of **`gcamfaostat`** but also contributes to keeping model data system lightweight and more straightforward to maintain.  
+Nonetheless, the FAOSTAT data employed within `gcamdata` has traditionally involved manual downloads and may have undergone preprocessing. In light of the increasing data needs, maintaining the FAOSTAT data processing tasks in `gcamdata` has become increasingly challenging. In addition, the processing of FAOSTAT data in the AgLU modules of `gcamdata` is tailored specifically for GCAM. Consequently, the integration of FAOSTAT data updates has proven to be a non-trivial task, and the data processed by the AgLU module has limited applicability in other modeling contexts [@zhao_cmp360]. The **`gcamfaostat`** package aims to address these limitations (\autoref{fig:Fig2}). The targeted approach incorporates data preparation, processing, and synthesis capabilities within a dedicated package, **`gcamfaostat`**, while regional and sectoral aggregation functions in the model data system are implemented using standalone routines within the `gcamdata` package. This strategy not only ensures the streamlined operation of **`gcamfaostat`** but also contributes to keeping model data system lightweight and more straightforward to maintain.  
   
-![The original framework of utilizing FAOSTAT data in GCAM and similar large-scale models. Note that FAOSTAT data is mainly processed in the AgLU modules in gcamdata while there could be interdependency across data processing modules. \label{fig:Fig1}](./man/figures/Fig_FAOSTAT_gcamdata.jpg){width=70%}  
+![Original framework of utilizing FAOSTAT data in GCAM and similar large-scale models. Note that FAOSTAT data is mainly processed in the AgLU modules in gcamdata while there could be interdependency across data processing modules. \label{fig:Fig1}](./man/figures/Fig_FAOSTAT_gcamdata.jpg){width=70%}  
 
 
-![The new framework of utilizing FAOSTAT data in GCAM and similar large-scale models through gcamfaostat. Modules with identifier "_xfaostat_" only exist in gcamfaostat. The AgLU-related modules ("_aglu_") that rely on outputs from gcamfaostat can run in both packages. Other gcamdata modules that process data in such areas as energy, emissions, water, and socioeconomics only exist in gcamdata. \label{fig:Fig2}](./man/figures/Fig_gcamfaostat_and_gcamdata.jpg){width=70%} 
+![New framework of utilizing FAOSTAT data in GCAM and similar large-scale models through gcamfaostat. Modules with identifier "_xfaostat_" only exist in gcamfaostat. The AgLU-related modules ("_aglu_") that rely on outputs from gcamfaostat can run in both packages. Other gcamdata modules that process data in such areas as energy, emissions, water, and socioeconomics only exist in gcamdata. \label{fig:Fig2}](./man/figures/Fig_gcamfaostat_and_gcamdata.jpg){width=70%} 
 
 
 ## Key functions 
 
-In this section we describe key functions included in gcamfaostat. More details about the functions and documentations can be found in the online [**User Guide**](https://jgcri.github.io/gcamfaostat/index.html). 
+In this section we describe key functions included in **`gcamfaostat (v1.0.0)`**. More details about the functions and documentations can be found in the online [**User Guide**](https://jgcri.github.io/gcamfaostat/index.html). 
 
 ### Data preparation 
 
-`gcamfaostat` includes functions to generate metadata (`gcamfaostat_metadata`) and download FAOSTAT raw data from either a remote archive (`FF_download_RemoteArchive`) or directly from FAOSTAT (`FF_download_FAOSTAT`).  
+**`gcamfaostat`** includes functions to generate metadata (`gcamfaostat_metadata`) and download FAOSTAT raw data from either a remote archive (`FF_download_RemoteArchive`) or directly from FAOSTAT (`FF_download_FAOSTAT`).  
 
 
 [`gcamfaostat_metadata()`](https://jgcri.github.io/gcamfaostat/reference/gcamfaostat_metadata.html)  
 
-* The function accesses both the latest FAOSTAT metadata and local data information and returns a summary table including the dataset information needed for gcamfaostat (see [Table 1](#Tab1) below).
+* The function accesses both the latest FAOSTAT metadata and local data information and returns a summary table including the dataset information needed for **`gcamfaostat`** (see [Table 1](#Tab1) below).
 * The function will save the latest FAOSTAT metadata to the [metadata_log](https://github.com/JGCRI/gcamfaostat/tree/main/inst/extdata/aglu/FAO/FAOSTAT/metadata_log)
 * The dataset code needed were specified in the function to get a subset of the FAOSTAT metadata. The function will return only dataset code required when setting `OnlyReturnDatasetCodeRequired = FALSE`. 
 * The function will check whether FAOSTAT raw data exists locally (`Exist_Local`) and in [Prebuilt Data](https://github.com/JGCRI/gcamfaostat/blob/main/data/PREBUILT_DATA.rda) (`Exist_Prebuilt`). If `Exist_Prebuilt` is `TRUE` for all dataset, the package is ready to be built based on the Prebuilt package data.
@@ -93,7 +93,7 @@ In this section we describe key functions included in gcamfaostat. More details
 * The function downloads the latest raw data from FAOSTAT.
 
 
-Table 1. FAOSTAT dataset processed in gcamfaostat v1.0.0. 
+Table 1. FAOSTAT dataset processed in **`gcamfaostat v1.0.0`**. 
 
 | Dataset Code | Dataset Name                                                | Exist_Local | Exist_Prebuilt | FAO update date | FAO size |
 |:------------:|:----------------------------------------------------------:|:-----------:|:--------------:|:--------------:|:--------:|
@@ -115,10 +115,10 @@ Table 1. FAOSTAT dataset processed in gcamfaostat v1.0.0.
 
 **Module structure**
 
-The architecture of gcamfaostat processing modules is depicted in \autoref{fig:Fig3}. This framework currently comprises eight preprocessing modules and nine processing and synthesizing modules, generating twelve output files tailored for
+The architecture of **`gcamfaostat`** processing modules is depicted in \autoref{fig:Fig3}. This framework currently comprises eight preprocessing modules and nine processing and synthesizing modules, generating twelve output files tailored for
 [GCAM v7](https://github.com/JGCRI/gcam-core/releases/tag/gcam-v7.0). Each module is essentially an `R` function with well-defined inputs and outputs. To showcase the flexibility and expandability of our package, we also incorporated two AgLU modules (from `gcamdata`) that exemplify the data aggregation processes, e.g., across regions, sectors, and time. Moreover, the `driver_drake` function plays a pivotal role by executing all available data processing modules, thereby generating both intermediate and final outputs, which are vital components of our comprehensive data processing pipeline. 
 
-![The architecture of data processing modules in gcamfaostat. \label{fig:Fig3}](./man/figures/Fig_data_processing_flow.jpg){width=100%}
+![Data processing architecture in gcamfaostat. \label{fig:Fig3}](./man/figures/Fig_data_processing_flow.jpg){width=100%}
 
 
 **Data synthesizing in a key module**
@@ -128,7 +128,7 @@ Of particular significance is the `module_xfaostat_L105_DataConnectionToSUA`, wh
 As an illustrative example, the first tier comprises 168 commodities, generated by combining production data from QCL, trade data from TM, and other essential balancing elements (such as opening and closing stocks, food and feed uses, and other industrial uses) from SCL. For a more comprehensive understanding of these procedures, we encourage an interested user to explore the mapping file, `FAO_items`. It is crucial to underscore the importance of these processing procedures, as raw FAOSTAT data often contains duplicated elements and inconsistencies among different datasets. For instance, trade data can be found in TCL, TM, SCL, and FBS, while production data exists in QCL and SCL (please see Table 1 for the corresponding dataset codes).
 
 
-![FAOSTAT agricultural supply utilization data synthesis in module_xfaostat_L105_DataConnectionToSUA. Note that the nine tiers of data, distinguished by commodities (or items in FAOSTAT terms) included, have different sources for generating agricultural supply utilization accounts. \label{fig:Fig4}](./man/figures/Fig_KeyModule_xfaostat_L105.jpg){width=100%}  
+![FAOSTAT agricultural supply utilization data synthesis in `module_xfaostat_L105_DataConnectionToSUA`. Note that the nine tiers of data, distinguished by commodities (or items in FAOSTAT terms) included, have different sources for generating agricultural supply utilization accounts. \label{fig:Fig4}](./man/figures/Fig_KeyModule_xfaostat_L105.jpg){width=100%}  
 
 
 
@@ -136,14 +136,14 @@ As an illustrative example, the first tier comprises 168 commodities, generated
 
 [`driver_drake()`](https://jgcri.github.io/gcamfaostat/reference/driver_drake.html) 
 
-* The function runs data processing modules sequentially to generate intermediate data outputs and final output (e.g., csv or other files) for GCAM (gcamdata) or other models.
-* The function is inherited from gcamdata and it uses the drake [@Landau2018] pipeline framework, which simplifies module updates, data tracing, and results visualization process. 
+* The function runs data processing modules sequentially to generate intermediate data outputs and final output (e.g., csv or other files) for GCAM (`gcamdata`) or other models.
+* The function is inherited from `gcamdata` and it uses the drake [@Landau2018] pipeline framework, which simplifies module updates, data tracing, and results visualization process. 
 * It stores the outputs in a drake cache so that when the function is run again, it skips the steps that are up-to-date.
-* In constants.R, users can set `OUTPUT_Export_CSV = TRUE` and specify the output directory (`DIR_OUTPUT_CSV`) to export and store the output csv files (currently the default option for GCAM v7). 
+* In `constants.R`, users can set `OUTPUT_Export_CSV = TRUE` and specify the output directory (`DIR_OUTPUT_CSV`) to export and store the output csv files (currently the default option for GCAM v7). 
 
 ### Data tracing
 
-As gcamfaostat is built upon the foundation of gcamdata and leverages the powerful drake framework, inheriting functions designed for tracking data flows. Here we describe several key functions. 
+As **`gcamfaostat`** is built upon the foundation of `gcamdata` and leverages the powerful drake framework, it inherits functions designed for tracking data flows. Here we describe several key functions. 
 
 
 [`info()`](https://jgcri.github.io/gcamfaostat/reference/info.html)  
@@ -183,7 +183,7 @@ To update the output data by including new data years, e.g., for model base year
 
 **Generating output for a new agroeconomic model**
 
-If all the necessary FAOSTAT raw data is already incorporated into gcamfaostat, users can directly produce output for a new agroeconomic model. This can be achieved by either adding an output exporting module (e.g., `module_xfaostat_L199_CSVExportAgSUA`) or adapting an existing module (e.g., `module_xfaostat_L201_Forestry`) to export data in the required format. Notably, gcamfaostat presently includes a function, `output_csv_data`, for exporting data to CSV files. Additionally, users have the flexibility to expand the functionality by incorporating new functions to export data in alternative formats as needed. In cases when the required data is not readily available, users should proceed by introducing new processing modules.  
+If all the necessary FAOSTAT raw data is already incorporated into **`gcamfaostat`**, users can directly produce output for a new agroeconomic model. This can be achieved by either adding an output exporting module (e.g., `module_xfaostat_L199_CSVExportAgSUA`) or adapting an existing module (e.g., `module_xfaostat_L201_Forestry`) to export data in the required format. Notably, **`gcamfaostat`** presently includes a function, `output_csv_data`, for exporting data to CSV files. Additionally, users have the flexibility to expand the functionality by incorporating new functions to export data in alternative formats as needed. In cases when the required data is not readily available, users should proceed by introducing new processing modules.  
 
 
 **Country aggregation and disaggregation**
@@ -198,8 +198,8 @@ Since the 1970s, the number of countries in the world has increased due to the d
 
 [`FAOSTAT_AREA_RM_NONEXIST()`](https://jgcri.github.io/gcamfaostat/reference/FAOSTAT_AREA_RM_NONEXIST.html)  
 
-* The function removes nonexistent FAO region using area_code, e.g., USSR after 1991.
-* All nonexistent countries due to dissolution are removed by default.
+* The function removes nonexistent FAO regions (e.g., USSR after 1991) using the FAO `area_code` ID defined in the function.
+* All nonexistent countries due to dissolutions are removed by default.
 * Small regions/areas with low data quality can also be removed using this function.  
 
 
@@ -213,10 +213,10 @@ Data development is never a once and for all task, and continued efforts are nee
 1.	**Sustain processing functions for updated raw data**: ensuring that our processing functions remain up-to-date when raw data undergoes revisions is imperative.  
 2.	**Evaluate and enhance assumptions**: a critical examination of the assumptions utilized in processes like interpolation, extrapolation, aggregation, disaggregation, and mapping is essential and should be an ongoing endeavor.  
 3.	**Revise assumptions in low-quality data zones**: regions and sectors with little or low-quality data require careful consideration. We will need to adjust our assumptions when improved data becomes available.  
-4.	**Promoting broader applications**: leveraging data processed by gcamfaostat can significantly contribute to harmonizing input data in global agroeconomic modeling. Encouraging the utilization of this data and fostering collaboration to enhance data processing is crucial.  
+4.	**Promoting broader applications**: leveraging data processed by **`gcamfaostat`** can significantly contribute to harmonizing input data in global agroeconomic modeling. Encouraging the utilization of this data and fostering collaboration to enhance data processing is crucial.  
 5.	**Assess sensitivity in downstream applications**: understanding the sensitivity of downstream data applications, e.g., global agroeconomic projections, to upstream data processing assumptions is crucial. This awareness empowers us to make informed decisions and refinements.  
   
-We welcome and value community contributions to gcamfaostat.Through collective and collaborative efforts, we hope to improve the interface between raw data, modeling community, and broader audience. We would be grateful for the feedback and suggestions on potential improvements of the developed data processing framework.
+We welcome and value community contributions to **`gcamfaostat`**. Through collective and collaborative efforts, we hope to improve the interface between raw data, modeling community, and broader audience. We would be grateful for the feedback and suggestions on potential improvements of the developed data processing framework.
 
 # Acknowledgements
 
diff --git a/vignettes/references.bib b/vignettes/references.bib
index 34825795..6edee9a3 100644
--- a/vignettes/references.bib
+++ b/vignettes/references.bib
@@ -345,3 +345,44 @@ @article{wilkinson2016fair
   publisher={Nature Publishing Group},
   DOI={10.1038/sdata.2016.18}
 }
+@article{zhaoLandCDR2024,
+	title = {Trade-offs in land-based carbon removal measures under 1.5 °{C} and 2 °{C} futures},
+	volume = {15},
+	copyright = {2024 The Author(s)},
+	issn = {2041-1723},
+	url = {https://www.nature.com/articles/s41467-024-46575-3},
+	doi = {10.1038/s41467-024-46575-3},
+	abstract = {Land-based carbon removals, specifically afforestation/reforestation and bioenergy with carbon capture and storage (BECCS), vary widely in 1.5 °C and 2 °C scenarios generated by integrated assessment models. Because underlying drivers are difficult to assess, we use a well-known integrated assessment model, GCAM, to demonstrate that land-based carbon removals are sensitive to the strength and scope of land-based mitigation policies. We find that while cumulative afforestation/reforestation and BECCS deployment are inversely related, they are both typically part of cost-effective mitigation pathways, with forestry options deployed earlier. While the CO2 removal intensity (removal per unit land) of BECCS is typically higher than afforestation/reforestation over long time horizons, the BECCS removal intensity is sensitive to feedstock and technology choices whereas the afforestation/reforestation removal intensity is sensitive to land policy choices. Finally, we find a generally positive relationship between agricultural prices and removal effectiveness of land-based mitigation, suggesting that some trade-offs may be difficult to avoid.},
+	language = {en},
+	number = {1},
+	urldate = {2024-03-14},
+	journal = {Nature Communications},
+	author = {Zhao, Xin and Mignone, Bryan K. and Wise, Marshall A. and McJeon, Haewon C.},
+	month = mar,
+	year = {2024},
+	note = {Publisher: Nature Publishing Group},
+	keywords = {Climate-change mitigation, Climate-change policy, Environmental impact, Geography},
+	pages = {2297},
+	file = {Full Text PDF:C\:\\Users\\zhao752\\Zotero\\storage\\I9GGJKZ8\\Zhao et al. - 2024 - Trade-offs in land-based carbon removal measures u.pdf:application/pdf},
+}
+
+@article{wilkinson_fair_2016,
+	title = {The {FAIR} {Guiding} {Principles} for scientific data management and stewardship},
+	volume = {3},
+	copyright = {2016 The Author(s)},
+	issn = {2052-4463},
+	url = {https://www.nature.com/articles/sdata201618},
+	doi = {10.1038/sdata.2016.18},
+	abstract = {There is an urgent need to improve the infrastructure supporting the reuse of scholarly data. A diverse set of stakeholders—representing academia, industry, funding agencies, and scholarly publishers—have come together to design and jointly endorse a concise and measureable set of principles that we refer to as the FAIR Data Principles. The intent is that these may act as a guideline for those wishing to enhance the reusability of their data holdings. Distinct from peer initiatives that focus on the human scholar, the FAIR Principles put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals. This Comment is the first formal publication of the FAIR Principles, and includes the rationale behind them, and some exemplar implementations in the community.},
+	language = {en},
+	number = {1},
+	urldate = {2024-03-15},
+	journal = {Scientific Data},
+	author = {Wilkinson, Mark D. and Dumontier, Michel and Aalbersberg, IJsbrand Jan and Appleton, Gabrielle and Axton, Myles and Baak, Arie and Blomberg, Niklas and Boiten, Jan-Willem and da Silva Santos, Luiz Bonino and Bourne, Philip E. and Bouwman, Jildau and Brookes, Anthony J. and Clark, Tim and Crosas, Mercè and Dillo, Ingrid and Dumon, Olivier and Edmunds, Scott and Evelo, Chris T. and Finkers, Richard and Gonzalez-Beltran, Alejandra and Gray, Alasdair J. G. and Groth, Paul and Goble, Carole and Grethe, Jeffrey S. and Heringa, Jaap and ’t Hoen, Peter A. C. and Hooft, Rob and Kuhn, Tobias and Kok, Ruben and Kok, Joost and Lusher, Scott J. and Martone, Maryann E. and Mons, Albert and Packer, Abel L. and Persson, Bengt and Rocca-Serra, Philippe and Roos, Marco and van Schaik, Rene and Sansone, Susanna-Assunta and Schultes, Erik and Sengstag, Thierry and Slater, Ted and Strawn, George and Swertz, Morris A. and Thompson, Mark and van der Lei, Johan and van Mulligen, Erik and Velterop, Jan and Waagmeester, Andra and Wittenburg, Peter and Wolstencroft, Katherine and Zhao, Jun and Mons, Barend},
+	month = mar,
+	year = {2016},
+	note = {Publisher: Nature Publishing Group},
+	keywords = {Publication characteristics, Research data},
+	pages = {160018},
+	file = {Full Text PDF:C\:\\Users\\zhao752\\Zotero\\storage\\AEIVY5BT\\Wilkinson et al. - 2016 - The FAIR Guiding Principles for scientific data ma.pdf:application/pdf},
+}