A Beginning in Environmental Data Science

The first major project Alexander was assigned to when he initially joined the Air Quality Modeling and Exposure Laboratory (AQMD) was titled, “Hyperlocal Monitoring of Traffic-Related Air Pollution to Assess Near-Term Impacts of Sustainable Transportation Interventions.” The aim of the project is to evaluate and provide insight on the relationship between temporal concentration of air pollutants, NO2 and PM2.5, and smart transportation technologies implemented along the Innovation Corridor testbed. The implications of this project’s success can lead to positive changes in the air quality for the City of Riverside, which recently received an “F” for ozone and PM2.5 pollution by the American Lung Association.

Utilizing a series of 6 low-cost sensors placed along the testbed, high resolution sensor data for different environmental pollutants is obtained at 15-minute intervals. By combining this time data with Onboard GPS and monitoring equipment within test vehicles, it is possible to develop data that can be analyzed to form relationships that prove the effectiveness of the various sustainable traffic interventions on local air pollutants.

Prior to starting this project Alexander had no prior data science knowledge and no experience working with data. Therefore, when he was assigned to this project his tasks, at first, consisted of basic conversion of temporal data to account for geospatial related differences in data download from sensors. After a few iterations of processing the data with the old script, which required data within the script to be constantly changed and adjusted to match the current set, he became proficient enough with python and made a new script that automated all the setting changes by either reading the data provided, or by use of functions with parameters that allowed for easy scalability to larger datasets. Script adjustments that could take a half hour to an hour after debugging were no longer necessary. This led to the total calculation times to be performed in less than 5 minutes – saving the PI’s and grad students hour of time over the data collection period.

Currently, Alexander is developing a full python library for the project that will automate the processes involved in the project which provide major implications for the overall success of the research. This allows for higher quality of analysis as the data taken over the course of months creates difficulty pinpointing independent factors that can affect the surrounding NO2 and PM2.5 data.

Besides data manipulation the module aims to provide serious environmental data calibration support and even calculating a statistical analysis of inputted datasets. His current developmental stage in the module is finalizing a function that would calculate the correlation of multivariate sets of data based on regression models, automatically determine the best model based on statistical test, and return the associated data. This will save the principal investigators and analyst for the data hours of time and effort. This means that hopefully the entire tasks 1 and 3 of the proposed research project can be automatically completed with the functions within the module.

Center for Environmental Research & Technology

A Beginning in Environmental Data Science

Center for Environmental Research & Technology