Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback #1

Open
wants to merge 48 commits into
base: feedback
Choose a base branch
from
Open

Feedback #1

wants to merge 48 commits into from

Conversation

github-classroom[bot]
Copy link

@github-classroom github-classroom bot commented Sep 10, 2024

👋! GitHub Classroom created this pull request as a place for your teacher to leave feedback on your work. It will update automatically. Don’t close or merge this pull request, unless you’re instructed to do so by your teacher.
In this pull request, your teacher can leave comments and feedback on your code. Click the Subscribe button to be notified if that happens.
Click the Files changed or Commits tab to see all of the changes pushed to the default branch since the assignment started. Your teacher can see this too.

Notes for teachers

Use this PR to leave feedback. Here are some tips:

  • Click the Files changed tab to see all of the changes pushed to the default branch since the assignment started. To leave comments on specific lines of code, put your cursor over a line of code and click the blue + (plus sign). To learn more about comments, read “Commenting on a pull request”.
  • Click the Commits tab to see the commits pushed to the default branch. Click a commit to see specific changes.
  • If you turned on autograding, then click the Checks tab to see the results.
  • This page is an overview. It shows commits, line comments, and general comments. You can leave a general comment below.
    For more information about this pull request, read “Leaving assignment feedback in GitHub”.

Subscribed: @sean310066 @derek1249241692 @WangXinyuan01 @Lemonade-ait

Copy link

@ctessum ctessum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good Job!

Title: 10/10
Description: 28/30
Proposal: 30/30
Clarity etc: 30/30


__Data description__

The dataset we plan to use is the meteorological records of Champaign, Illinois. We want to predict corn yield by analyzing precipitation and temperature. Data will be obtained from wunderground.com (Savoy, IL Weather History | Weather Underground). And daily temperature and the annual precipitation amount would be needed. The format would be primarily in CSV. The four columns will be temperature (including max, avg and min) and precipitation every day, while the rows will be the date for a whole year.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't really discussed this in class so I'm not going to take many points off here, but in this case the dataset will be generated from your "Field Crop Cultivation" model. Basically, what you will need to do is:

  1. figure out what the inputs to the model are (e.g. temperature)
  2. Figure out what a reasonable range is for each input variable (e.g. temperature between 0 and 40 C)
  3. Randomly sample from the ranges of input variables to produce the independent variable observations of the dataset (e.g. for 1000 observations, just sample each variable 1000 times)
  4. Run your "field crop cultivation" model with each set of input variables to get the dependent variable that you want to predict.

I'd be happy to discuss more about how to do this!

Copy link

@ctessum ctessum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start so far, but in general it will be good to increase the focus on data analysis for future deliverables.

Plots and tables: 25/30
Narrative description: 25/30
Predictive modeling plan: 9/10
Clarity, formatting, etc: 25/30

__1. Background and Research Proposal__
Crop models are computational tools that assess the effects of environmental variation and cultivation strategies on crop yield (Chapagain et al., 2022; Huang et al., 2019). By incorporating factors such as precipitation, humidity, temperature, fertilization, and soil properties, crop models establish relationships between input parameters and agricultural yield outcomes. From a structural perspective, crop models can be either empirical or mechanistic. Empirical models create statistical relationships based on existing data, while mechanistic models aim to explain relationships by exploring physiological mechanisms and causal connections (Reynolds and Acock, 1985). From a parameter standpoint, crop models generally include weather, soil, and crop-specific parameters to estimate crop biomass. Weather parameters cover solar radiation, precipitation, temperature, and more, while soil parameters focus on humus content, organic matter content, and other soil characteristics. Crop-specific parameters include maximum crop yield, specific nitrogen uptake rate, and related factors.

In this project, we aim to replicate Hartmut Bossel's 'Field Crop Cultivation' simulation model as a white-box reference and develop a black-box model using SVD, PCA, and/or Fourier series. The original model is a parsimonious one, primarily focusing on the dynamic effects of precipitation on crop yield across a spectrum of crops in Germany. Initially created in BASIC (Hartmut, 1985) and later in Vensim (Hartmut, 2007) for educational purposes, the model simulates the impact of water and nutrient (nitrogen) availability on plant growth dynamics. Built from first principles, it captures complex interactions between water and nutrient dynamics and can be adapted to different scenarios by applying specific plant and soil parameters.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually avoid repeating the abstract word-for-word.


__3.1 Selecting Climate Zone and Sampling Points__

![image](https://github.com/user-attachments/assets/7be65a57-758d-4047-a86e-b4c41ae70b67){width=70%}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each figure should have a figure number and a caption, and should be referred to by number in the text.

Also, it is not clear whether you created these figures for this project, or if you got them from somewhere else.

If you got the image (or parts of the image) from somewhere else, you need to cite the source where you got them from.

And in general, keep in mind that the goal here is for you to be generating the figures yourself using software tools, it's okay to include a couple from other places, but most of the figures should be your analysis of your dataset.

__3.1 Selecting Climate Zone and Sampling Points__

![image](https://github.com/user-attachments/assets/7be65a57-758d-4047-a86e-b4c41ae70b67){width=70%}
![image](https://github.com/user-attachments/assets/fe99328d-13a8-4363-bab3-ce82132e9209){width=70%}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, these need numbers and captions.

![image](https://github.com/user-attachments/assets/2364a33a-967a-4ad3-93f9-29d43bd26aca){width=70%}
![image](https://github.com/user-attachments/assets/bdbdb62e-3e8a-401f-94ee-00474315e772){width=70%}

In this study, three typical U.S. crops, corn, cotton, and wheat, were selected as examples in our analysis. These crops are grown in different climate zones and play a vital role in U.S. agriculture. To scientifically analyze the impact of climate on crop yields, the study first chose climate data sampling points based on the United States crop production maps (USDA United States - Crop Production Maps), which clearly shows the key production areas for different crops. Climate data, including monthly average temperature and monthly precipitation, were collected from three different weather monitoring stations within each key production area. Climate data from 2004 to 2024, within a 20-year period of time, were collected and used in this study for model analysis.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this 20-year climate data what is being summarized in these figures?

There should also be here a quantitative analysis and description of the data. How, specifically, do these figures, and the data behind them, help you to understand the problem you are trying to solve? What is the point you that you want to make that these figures are providing evidence for?

__4. Reference Model Results Analysis and Questions__

__4.1 Exploratory Data Analysis on Reference Model Results, Humid Subtropical Climate__
![image](https://github.com/user-attachments/assets/d9aff41d-6891-4189-b007-3cd3ae2e0c01){width=80%}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text in the figures needs to be large enough to read. It should be a similar size to the text in your narrative, maybe somewhat smaller, but not a lot smaller.


Figure 1. shows the mechanistic model results for soil water and nutrient dynamics in a humid subtropical climate (Stoneville, Mississippi) under varying precipitation scenarios (maximum, mean, and minimum). The results are shown in two sets of plots. The x-axis represents time within one year, ranging from 0 to 1. Plots in the first row (a, b, c) display the changes in soil nutrients over time, specifically total biomass (red), nitrogen available to plants (green), and organic matter fraction in soil (blue). Plots in the second row (d, e, f) illustrate the in precipitation with randomized weather event (blue line), groundwater levels (red line), and soil water content (blue line) across three precipitation scenarios.

From the soil nutrient data, we can observe the seasonal dynamics of biomass levels as well as plant-available nitrogen in the soil. At the beginning of cultivation, with the application of fertilizer, nitrogen levels reach their peak and then decrease as the crop continues to grow. From the soil water data, we can see that a water surplus exists in both the max and mean rainfall scenarios, leading to a significant rise in groundwater levels (d, e). Additionally, since the reference model did not account for surface runoff, there could be a significant overestimation of precipitation's contribution to groundwater levels.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should include a quantitive description as well as a qualitative description. It's good that you mention that you can observe the dynamics, but what, specifically and quantitatively, are the dynamics you are observing?

For example, you could include a quantitative or statistical description of how the different precipitation scenarios are different.

Figure 2. shows the mechanistic model results for soil water and nutrient dynamics in a humid subtropical climate (Arnold, Iowa) under varying precipitation scenarios (maximum, mean, and minimum).
The similarity of soil nutrient results is probably cause by model error. In soil water, we can see three precipitation cases covers highwater excess, minor water deficit and large water deficit, indication that the locational conditions can be a good setting for us to use this reference model to explain the situation. Also, notice that the precipitation pattern (blue) is different from Stoneville, and hence causing different dynamics in soil water content (green), for example, not significant seasonal variation .

__4.4 Further Questions on Reference Model Results, Humid Subtropical Climate__
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a data science class, we want to be focusing our analysis on the data that is generated by the model, rather than the model itself. So it's okay to be making these conclusions about the model, but you should be supporting these conclusions with numerical analysis of the data that is generated by the model.


__5 Predictive Modeling Plan__
The aim of this project is to create a statistical model to produce similar estimates as the mechanistic model. The benefit of this simplification is to reduce computational costs. Another potential outcome of this approach is that, by using regression analysis, we can test correlations and rank the inputs that have the most significant effect on the output, thereby helping to determine the dominant factors influencing crop growth and decision-making in crop management.
To do this, we will first ensure that the reference mechanistic model functions correctly, making it capable of generating predictive yield based on the precipitation data. Then, to create sufficient data, we can randomly generate 1,000 (or more) precipitation curves for each scenario using the mean value and standard deviation obtained from the data in section 3. Third, the generated precipitation curves will be stored as CSV files, ready for model input. Fourth, we will translate the current Python model into Julia and automatically run simulations to obtain the corresponding yield for each precipitation scenario. Finally, we will compile a new DataFrame that includes both the precipitation and yield data, allowing us to apply SVD, PCA, and/or Fourier series to identify the dominant eigenvectors and underlying patterns in the data.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good plan, but ideally you would have generated the 1,000 precipitation curves already and then analyzed them as part of this exploratory data analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants