Loss of instructional time due to off-task behavior is a well-established problem in educational settings, recognized both by
researchers and practitioners for over a hundred years. The link between the quality of attention and performance has been
demonstrated in the cognitive psychology literature. In this analysis challenge, you need to build a set of detectors
predicting students’ on-task behavior in classrooms through examining whether specific instructional strategies are
associated with incidence of off-task behavior in elementary school children.
Data Source
The data you will be using for this challenge came from 22 classrooms from 5 local charter schools. K-4 students were
observed in their classrooms about their on/off-task behavior.
Here are the variables in this dataset:
General variables
UNIQUEID: The unique id for each observation
SCHOOL: School name. Five schools in total.
Class: Classroom name
GRADE: Grade level, 0 = Kintergarden; 1 = First Grade;…
STUDENTID: 1226 unique student in total.
Gender: 0 = Female, 1 = Male
Observation variables
CODER: The coder who coded on/off task behavior.
OBSNUM: The observation made on one student. 1 = The first observation on the student;… 32 = The 32nd
observation on the student.
Activity: Six different format of activities: (1) individual work, (2)small-group or partner work, (3) whole-group instruction
at desks, (4) whole-group instruction while sitting on the carpet, (5) dancing, and (6) testing
ONTASK: N = On task; Y = Off-task
Total Time: Total time in seconds of how long each activity is. 0 means the instruction was given but the activity did not
actually happen.
Class session Variables
totalobs-forsession: total observations made per session
NumACTIVITY: How many activities one session has taken?
TRANSITIONS: How many times the activities have changed in one session, TRANSITIONS = NumACTIVITY – 1;
Transitions were noted every time the teacher paused instruction to change from one activity to another (e.g.,
transitioning from working on a math problem to listening to a short story).
NumFORMATS: How many format of activity one session has taken?
FORMATchanges: How many times the format of instruction have changed in one session, FORMATchanges =
NumFormats -1
Obsv/act: The average duration of an instructional activity (sec). The total duration of an observation session divided by
the number of activities.
Transition/Durations: Average times of transition per session. The total number of activity divided by the duration of an
observation session (sec).
The data set you will use in this assignment, available on the course website, is very similar (but not identical) to the data set
used in:
Godwin, K.E., Almeda, M.V., Petroccia, M., Baker, R.S., & Fisher, A.V. (2013). Classroom activities and off‐task behavior in
elementary school children. Proceedings of the Annual Meeting of the Cognitive Science Society, 2428‐2433. [pdf]
(https://www.upenn.edu/learninganalytics/ryanbaker/CEDP_A_1894324.pdf)
Your task
1. Build a classifier that can predict on or off-task behavior with the aca2_dataset_training.csv data.
2. You can choose any or multiple features to include in your classifier.
3. You can choose any or multiple algorithms to build you classifier.
4. You may need to be strategic in terms of selecting variables, recoding some of the variables, or making reasonable
transformations.
5. Necessary descriptive analysis (e.g., mean, sd, correlation) is highly recommended.
6. Be sure to report your model performance on aca2_dataset_validation.csv .
7. Make sure the process (both data clean and analysis) is clearly documented, and your code is reproducible.
8. Write one or two brief paragraphs on your interpretation of the result. What does the result mean to you?
Submission
You can choose to work on this assignment individually or in a team (team size <= 3). If you want to work in a larger group,
email Lukas.
Your submission should be in .html format or .pdf format .ipynb or .Rmd file will not be accepted. This will
demonstrate how you communicate your code/analysis with others who may not have access to your data. If you work
choose to work in a team, only one of the team members needs to submit the assignment.
Your work will be evaluated on three simple criteria: (a) the implementation process of the classifier (but it does not have to
be the best-performing model), (b) the clarity of your documentation (be as clear as possible), and (c) the insights of your
interpretation (but it does not have to be very long).