Data Science for Ecology

WEC33806

Over deze cursus

Advancements in technology and information processing are rapidly changing many fields of plant sciences, animal sciences and ecology, including research, agriculture and conservation. For example, distributed sensor networks currently allow for the acquisition of huge volumes of data on many relevant aspects, ranging from soil and vegetation characteristics, abiotic conditions like weather, to the behaviour of animals. The availability of unprecedented amounts of data is unlocking potential, however, it also creates a major challenge: the ability to effectively process and analyse it. In the current data-centered digital era that is driven by technological change, the volume of data will continue to skyrocket due to decreasing costs of data collection, storage and processing. Fostered by these technological developments, researchers and various branches of business are increasingly embracing data science: a concept to unify data processing, statistics, artificial intelligence and their related algorithms to extract knowledge from data. Hence, data science is increasingly becoming an integral part of decision making in many fields, including precision agriculture, livestock management and nature conservation, as it fosters automated prediction and classification (e.g.: is this animal ill?, is this plant a weed?, is this apple ready to pick?, when should we harvest?).

To keep up with these technological developments, students need to become acquainted with the terms, concepts and methodology accompanying these developments. This is especially important since it can require a different approach to using data and conducting science than the approaches they are familiar with. Namely, the large volumes of data usually come from various sources, each with their own characteristics, uncertainties and measurement errors. The data from these different sources need to be integrated, and the inherent heterogeneity should be accounted for. Moreover, the collected sensor data are generally not immediately fit for analyses, so that pre-processing of the raw data is needed. After initial data pre-processing, the engineering of informative and discriminating features (i.e., measurable properties of the phenomenon being observed) is a crucial step for creating effective algorithms. Furthermore, the collection of large volumes of data leads to a shift away from frequentist hypothesis testing towards analytics that is more focussed on prediction, classification, pattern recognition or anomaly detection. To this end, machine learning techniques are often used, usually by high performance computing.

This course covers the main elements of using a data science approach to solving agricultural or ecological problems. The students will be guided through the main conce

Leerresultaten

  • Explain important concepts in data science needed to solve typical ecological problems

  • Explain how key features of ecological data influence the selection, training, validation and evaluation of algorithms

  • Identify and select machine learning algorithms appropriate to specific ecological problems

  • Create a reproducible workflow (loading raw data, data processing, feature engineering, and machine learning algorithms) to efficiently analyse ecological datasets

  • Critically evaluate the reliability and adequacy of trained algorithms

  • Create ecological insight from data using a data science approach

  • Communicate the key elements and findings of a data science project clearly and concisely

Toetsing

  • Performance (30%) Acquired skills regarding the application of data science methods to solving ecological problems
  • Assignment oral presentation (40%) A group-based examination based on the group work (execution of the project, data analysis, and presentation)
  • Written test with open and closed questions (30%) General principles in data science for ecological applications as covered in the lectures

Voorkennis

Experience with programming in R is needed to follow and successfully complete this course. For example, students who followed a course in which R is heavily used, e.g. CSA40306 Ecological Modelling and Data Analysis in R, will likely have sufficient background knowledge to participate in this course. We strongly urge students without prior experience with programming in R to learn programming in R before the start of the course, either by:

We advice students that are unsure about their level of R skills to go through the first 2 parts of the online book 'Hands-On Programming with R' (the latter url above). If most elements discussed in these first 2 parts are understood, then the understanding of R programming is sufficient to participate in this course.

We assume general understanding on ecology, mathematics and statistics. Familiarity with the concept of data science (e.g. INF34306 Data Science Concepts), the application of statistical methods to ecological data (e.g., CSA40306 Ecological Modelling and Data Analysis in R), and algorithms used in data science (e.g., MAT32806 Statistics for Data Scientists; FTE35306 Machine Learning; AIN31306 Deep Learning in Data Science) is helpful but not urgent.

Bronnen

  • The book R for Data Science by Wickham and Grolemund (available in print or for free online) is used throughout the course, as well as a collection of supplied book chapters or journal articles that cover relevant elements covered during the course.

Aanvullende informatie

cursus
6 ECTS
  • Niveau
    master
  • Instructievorm
    op de campus
Als er nog iets onduidelijk is, kijk even naar de FAQ van Wageningen University.

Startdata

  • 9 mrt 2026

    tot 3 mei 2026

    VoertaalEngels
    Periode *P5
    Period 5 morning