Skip to main content

Results Reproduction (R-squared)

Results Reproduction (R-squared)

Results Reproduction (R-Squared) is a service that computationally reproduces the results of your research to ensure Reproducibility and Transparency – think of it as enhanced proofreading for your Data and Code.

Diagram of Results Reproduction Workflow

The client deposits with R-Squared. After CCSS Research Support receives the submission we check it for completeness, run the code and compare the results making sure they are identical. Any differences we discuss with the client and fix. Then we archive it in either a journal-specified repository or in the CISER Data & Reproduction Archive where people can access it online.

  • Outline of R-Squared Service

      • Reproduces output precisely

      • Provides suggested code edits to facilitate reproduction

      • Ensures

        • Transparency and Reproducibility of your research

        • Ease of access to data and code for others

        • Independent understandability by others

      • Creates an archive package with all information in one place (example: https://doi.org/10.6077/zrvc-pc31)

      • Check your research methodology

      • Question or check conclusions or theories

      • Share anything before the article is published

      • Independent party review provides higher tier confirmation of

        • Transparency – understandability of your code and variables by laypeople

        • Reproducibility – ease for others, or for you years later

      • Increased discoverability and accessibility of your study

      • Ability to promptly and confidently provide reproduction materials

      • Comparative advantage in the journal review process

      • Increases value of your publication for citation by others

      • Trusted source – CCSS-RS (formerly CISER) is Core Trust Seal certified (see our certification)

      • Experts with years of experience in data curation, analysis, and results reproduction

      • Proven results reproduction workflow

      • Average first review takes 7 – 15 hours

        • Suggested steps taken ahead of time can reduce the time

        • Long papers with significant additional output/other factors can increase the time

      • We are working with CCSS-RS leadership to scale up the R2 program and to develop a sustainable funding model

      1. Follow the post-manuscript writing review of reproduction materials instructions found in the Reproduction Materials Review Recommendations

      2. Then share your reproduction materials via Cornell Box or sft.cornell.edu to Florio Arguillas (foa2@cornell.edu)

    • Each semester CCSS-RS provides free workshops which include How to Make Your Research Transparent and Reproducible.

      For more information about this workshop or the R-Squared process, contact Florio Arguillas at foa2@cornell.edu or (607)255-7838.

Reproduction Materials Review Recommendations

To best utilize CCSS Research Support’s Results Reproduction (R-squared) service, we have some recommendations for you to complete before working with us. These are optional in the sense that if you don’t have time we will help you get them done, but that will cost more than if they are done by you and these need to be completed before we can finish the project.

Recommendations Guide

  • Part I: Manuscript

    Please do the following:

    1. Highlight numbers, figures, charts derived from the output generated by your code(s). This could be an in-text number (mentioned within a paragraph) or an entire table of numbers, or figures/charts. For in-text numbers, highlight only those that do not correspond to (or not found in) tables or figures/charts produced by your code.

      Importance of highlighting:
      1. These are the tables/figures/in-text numbers that you will verify against the outputs produced by the code. It is normal to find inconsistencies, so you have to update either the code and/or manuscript to reflect changes.
      2. After having corrected all inconsistencies between the outputs and manuscript, highlight the numbers again because you will compare them this time to the proof, and assuming no inconsistencies found, you will compare them later to the published version.
         
    2. Complete the Code-Output Checklist: code-output_checklist.xlsx. Use the code-output_checklist.xlsx file as a template. It is pre-populated with examples for reference. Remove those examples when done.
       
      • List on the spreadsheet all Figures, Tables, and any in-text numbers that you have identified in 1 above. Identify the program and program line number(s) that created them. In-text numbers that reference numbers in tables and figures mentioned above do not need to be listed. Only in-text numbers produced by the code that correspond to no table or figure need to be listed. Later, you will run the code (following instructions in Part III – step 4 below) and indicate on this spreadsheet whether the code reproduces your results.

        Importance of the Code-Output checklist:
        • This is your checklist to ensure that your codes are able to account for and reproduce all tables/figures in your manuscript.
        • This is also your checklist to ensure that all in-text numbers produced by your code, but not refenced in your tables/figures above, are accounted for and reproduced by your code.
        • Some of you have sophisticated programming skills or use commands that other researchers maybe unfamiliar with and this checklist will help make them/us understand.
        • By completing this checklist, you are also addressing the requirements of some journals that use this checklist when verifying the reproducibility of a study.
           
    3. Highlight statements on how variables were operationalized, so we can check if they were correctly implemented in the code.
       
    4. Highlight claims relating to testing of model assumptions (if any), so we can verify these tests were implemented in the code.
       
    5. Highlight claims of additional analyses done by you, but not shown, and promising to make them available upon request, so we can verify that these were implemented in code.
       
    6. Highlight in the manuscript the first occurrence/mention of each dataset used in the study. Make sure they are cited in the reference section as well as the Readme file.

  • Part II. Data

    Please do the following:

    1. Make sure all variables and values in your dataset are labeled and documentation related to the dataset provided (e.g., source, codebook and questionnaire if survey, etc.).
       
    2. Provide a Data Availability Statement. A data availability statement (sometimes referred to as a ‘Data Access Statement’) is crucial in signposting where the data associated with a paper is available, and under what conditions the data can be accessed, including links (where applicable) to the data set. Here are sites with examples of Data Availability Statements: It is possible, depending on the data sharing for reproducibility policy of the data provider, that your reproducibility package may not include your data, so be explicit as to where to download the files and where to put them in your reproducibility package, so researchers can download the data themselves and know where to put them. For example: “Download this from this website and put it in this on the reproducibility package.”
       
    3. Specify the license of the data. Without a license the default would be re-users do not have the permissions to use, modify, or share your data, even if you [the creator] shared it (https://choosealicense.com/no-permission/). A common license for open data is CC-BY 4.0 https://creativecommons.org/licenses/by/4.0/
       
    4. Do not forget to cite your data sources (including yourself if you created the dataset) in your bibliography or reference section for proper attribution. Some data providers have suggested citation language. Here are sites with examples for citing data:
    5. Identify the creators of the dataset, specify contact e-mail, affiliation, ORCID numbers.
       
    6. For each data used in the study, create a sheet in the Data Metadata file (https://ciser.cornell.edu/wp-content/uploads/2021/01/Data_Metadata.xlsx) and complete all applicable fields.

  • Part III. Code

    Please do the following:

    1. If more than one, specify the sequence of execution of your program files (or code) either through a readme file or preferably by prefixing the program name with Step##_.
       
    2. Create a master file to run all your codes in sequence (if you have multiple codes).
       
    3. Make sure your code has a command that opens/reads the data, so it is clear which dataset is being used (even if you only have one dataset). Use relative paths, instead of absolute paths, in your code when calling or saving files.
       
    4. AFTER YOU HAVE WRITTEN YOUR MANUSCRIPT, run your code(s) again from beginning to end in one sitting. If possible, use the base (or fresh) installation of the statistical software package. Base or new software installation will allow you to detect packages and libraries needed by your code to run correctly. You may not be able to detect these required packages if you use the same computing environment because all required packages had been installed there for a while.
       
    5. Address all errors in your code found during this process.
       
    6. Ensure that all variables and values in your dataset, including those created while processing your code, are labeled.
       
    7. Check to insure that Items from steps I.3, I.4, and I.5., if present, were implemented correctly in your code.
       
    8. Using comments, map in your code the section that created the table/figure/in-text number in your manuscript. For example, “This block of code creates Table 1”.
       
    9. Check your code comments for typos and make sure that if comments are signposting its output to locations in the manuscript that the location still exists and are correct. It is possible that tables were moved, re-arranged, or re-numbered in the manuscript during the editing process and so comments in the code should be updated as well.
       
    10. If it is safe to re-organize your code(s) without altering your results, so that outputs appear in the same order as they are displayed in the manuscript, please do so as it would make it significantly easy, not confusing, and less prone to error when entering the results in your manuscript. If you re-organize your code(s), you may have to edit the line numbers in you code-output checklist to reflect the new locations.
       
    11. Ensure that all outputs produced are saved by the code. Make the code produce and save log files, if possible, especially those that produce lots of outputs as results can be buried deep in the results window of your statistical software package and make it difficult to find.
       
    12. If you have hard-coded values in the code (i.e., values that are fixed/static or cannot be changed directly), explain why they are in the code and how they were derived. These could be values you calculated manually because you did not know how to implement the formula in your statistical software of choice.
       
    13. Comment out or remove command lines that are not necessary or not relevant in the study, especially those that produce output as they would only increase the size of the output file and the amount of output to check.
       
    14. If your code saves outputs to a file, prefix the file name with its placement or location in the manuscript. For example, if you name the output file as summary_statistics.tex, and this output is Table 1 of your manuscript, then rename the output file as Table_1_Summary_Statistics.tex. Depending on their location in the manuscript some files could be named Table_2_Panel_A_Column_2_Regression_model_2.tex. This would make it easy for researchers to spot the output file associated with the result in your manuscript. Notice that the filenames do not have blank spaces. Avoid naming files or folders with blank spaces.
       
    15. Identify the creators of the code, specify contact e-mail, affiliation, ORCID numbers.
       
    16. Specify the license for your code. Without a license the default would be re-users do not have the permissions to use, modify, or share your data or code, even if you [the creator] shared it (https://choosealicense.com/no-permission/). A common license for code is BSD-3-Clause: https://opensource.org/licenses/BSD-3-Clauseod.
       
    17. Specify the date you last updated the code and the date you last run your analysis. These dates provide the time periods when your analysis was conducted, which are important in determining the version of software and packages/libraries/ados you have used. And, do step 18 below.
       
    18. Specify in detail the computing environment you have used to run your analysis:
       
      1. Operating System and version (e.g, Windows 10, Ubuntu 18.0.4, etc.).
      2. Number of CPUs/Cores.
      3. Size of memory.
      4. Statistical software package(s) used in the analysis and their version.
      5. Packages, ado files, or libraries used in the analysis and their version.
         
        • In R: use sessionInfo(), and packageVersion(“nameofpackage”)

          Note: Do these after the program finishes or at the point in the code where all required packages have been installed. Do not do these as at the start of the session as some packages required by the analysis may not have been installed yet.

        • In R for Windows, use checkpoint()package with date to ensure all required packages and version used at the time the authors did the analysis are being used.Specify a folder in checkpoint to redirect the libraries a specific folder that can be included in the reproducibility package and can be called by the code using the .libPaths(“”)command.
        • In Stata, redirect the sysdir set PLUScommand to a specific folder so all required ado files will be stored there and can be included in the reproducibility package.

         
      6. File Encoding. Specify the file encoding if necessary. Example: UTF-8. 
      7. Estimate of the time to complete processing (from beginning to end) using this computing environment.

  • Part IV. Output

    Please do the following:

    1. Compare the output produced by the code with that of the manuscript. Address all inconsistencies found during this process.
       
    2. Ensure that all highlighted items on the manuscript that are listed in the Code-Output Checklist are in the output generated by the code. Some values in the manuscript may have been entered directly in the manuscript and not produced by the code. Explain how they were derived.
       
    3. If the software or procedure allows, make sure all output tables and figures display labeled variable names and values (if categorical).
       
    4. Highlight in the output file(s) the numbers, figures, or charts used in the manuscript. This will allow you to review and show where you got the results entered in the manuscript. Then save the output file.

      Importance of the output file:
       
      • Some researchers want to see the output produced by the code.
      • Some researchers do not know how to use the software used by the authors, and want to just see the output.
      • To be used as comparison file so researchers feel comfortable that they executed the code correctly.
      • Some reproducibility packages do not contain data because of Data Provider access conditions such as registration before permission to use the data is granted. Having a comparison output file makes researchers feel comfortable that they have used the same dataset as the authors and implemented the instructions correctly.
      • The highlighted numbers in the output file lets researchers know exactly where the results are being pulled from.
      • Some commands or libraries used in the code might be deprecated in the future due to software updates. When the code gets executed in the future, it may produce an error. Thus, the comparison output file is proof that at some point in the past the code worked and produced the output.

       
    5. Remove extraneous outputs that are not necessary and not used in the study.

  • Part V. Submission

    Zip the manuscript, dataset, output, code, readme, and other documentation relevant to this study and send it to foa2@cornell.edu via sft.cornell.edu (formerly Cornell Dropbox).

    • If you are using Cornell Box, share the Cornell Box folder with foa2@cornell.edu
    • If you are using a Shared folder on the CCSS-RS Research Server, request CCSS-RS to give foa2_ts read-write access to the folder.

    While working with you, we may request additional materials (such as questionnaires, data metadata, other documentation), but we are asking for now are the baseline that we require to begin checking your work as without them we can’t ensure that we are checking the right values.   Once we’ve verified and created your reproducibility package, we will ask you to review and validate it.  This highlighted output file will be preserved and used as a Comparison Output File that researchers can compare against when they use your reproducibility package. 

  • Contact

    Should you have additional questions, please contact:

    Florio Arguillas, Jr. Ph.D.
    Research Associate
    CCSS Research Support
    Cornell University
    foa2@cornell.edu
    607-255-7838

  • We'd love to hear your ideas, suggestions, or questions!

    Are you
    CAPTCHA This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
    7 + 5 =
    Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.