Generator Data & Toolbox


The generator data toolbox is a set of programs that can be used to combine US data from different sources, combine US and Canadian data, provide imputation for missing data, place generators on a transmission network model, and aggregate similar generators. Users working with data different from that used by the E4ST team, for any part of the world, may still find parts of this code to be useful.

Power grid simulation modeling that includes costs and emissions requires several types of data about the generators on the system. The US federal government collects and publishes a considerable amount of data about generators, but those data are in multiple datasets published by the EIA and the EPA.

Several factors make it difficult to match up those datasets. First, the datasets number the generators differently. Second, sometimes one dataset reports a pair of generators as one, while the other dataset does not. Third, the EPA dataset omits some small generators and all non-emitting generators, and includes some generators that are not in the EIA dataset (perhaps because they are for self-generation but not for supplying the grid).

As part of the generator data toolbox, we have developed methods for identifying which data in each dataset corresponds to which generator in the EIA’s annual basic dataset of currently operating generation units, for imputing missing values, and for adding this additional information to that basic dataset.

Generator Data

The result of applying parts of the generator data toolbox to a set of databases from the EIA and EPA is a database with the capacities, heat rates, emission rates, and various other characteristics of the generators in the contiguous United States that supply the wholesale electricity markets.

The data file available here is the 2011 Energy Information Administration (EIA) generation unit dataset (the GeneratorsY2011.xls dataset available at augmented with additional columns from files obtained from the EIA and EPA, as described in the documentation that accompanies the data in the link below. The documentation accompanying the generator data and the associated toolbox summarize how the E4ST team produced this dataset.


We request that those using the generator data or toolbox for their research please cite the following reference:

Daniel L. Shawhan, John T. Taber, Di Shi, Ray D. Zimmerman, Jubo Yan, Charles M. Marquet, Yingying Qi, Biao Mao, Richard E. Schuler, William D. Schulze, and Daniel J. Tylavsky, “Does a Detailed Model of the Electricity Grid Matter? Estimating the Impacts of the Regional Greenhouse Gas Initiative,” Resource and Energy Economics, Volume 36 Issue 1, January 2014, pp. 191–207.