singularity-energy / open-grid-emissions Goto Github PK
View Code? Open in Web Editor NEWTools for producing high-quality hourly generation and emissions data for U.S. electric grids
License: MIT License
Tools for producing high-quality hourly generation and emissions data for U.S. electric grids
License: MIT License
The eGRID2020 technical guide notes:
The CO2 emissions for units with a fuel cell prime mover are also assumed to be zero.
However, as noted by several sources, fuel cells not not necessarily have zero emissions if they use natural gas as a fuel.
Currently our data pipeline does not set fuel cell emissions to zero, but uses the default natural gas combustion emission factor to calculate emissions - it is unclear if this is appropriate.
To do:
Also low priority for now, but it would be nice to include types on all arguments and return types.
I propose that we wait until everything is refactored and in a more stable state, and then go through and add the types. I imagine you'll be doing a pass through the code to add more documentation/cleanup before the release anyways.
Packaging open-grid-emissions
will allow us (and others) to use code snippets in other projects.
hourly-egrid
As the EIA-930 data about page notes,
Entities occasionally stop performing the BA role because their electric system is incorporated into another BA's system or they have made other arrangements. Five BAs retired after July 1, 2015, the first date of EIA-930 data availability:
- Gila River Power, LLC (GRMA) – retired May 3, 2018
- Ohio Valley Electric Corporation (OVEC) – retired December 1, 2018
- Utilities Commission of New Smyrna Beach (NSB) – retired January 8, 2020
- Electric Energy, Inc. (EEI) – retired February 29, 2020
- PowerSouth Energy Cooperatives (AEC) – retired September 1, 2021
We need to implement a filter to remove retired BAs from final outputs if they retired prior to the reporting year. This leads to two additional questions:
Our data pipeline currently calculates CO2, CH4, and N2O emissions but does not combine these into a single CO2e value. Part of the reason for this is that the global warming potentials (GWPs) used to calculate CO2e vary based on the IPCC assessment report from which they come, and whether they are for a 20-year horizon or 100-year horizon.
Functionality to do this was added in #25, but there remain some outstanding questions:
The AR6 GWPs can be found here. According to this summary, there are now different GWPs for methane depending on whether the methane is of fossil or non-fossil origin.
data/manual/ipcc_gwp.csv
It is my understanding that GWPs change over time because as the atmospheric concentrations of GHG changes over time, the GWP of newly emitted GHGs also changes. Thus, each time the GWPs are updated, the new values should be used for all emissions starting in that year, but the values from previous years should not be retroactively changed.
When cleaning hourly CO2 data in CEMS, we figured that in any hours where a unit reported zero CO2 emissions but non-zero fuel consumption, the zero CO2 reported should be treated as a missing value and imputed using the fuel consumption and fuel-specific emission factor. It seems that this only affects a small number of hours/units, and in many cases occured because CO2 mass is reported in tons in CEMS, the zero values in this case might be rounding errors. Because our pipeline works in lb instead of tons, this imputation should result in non-zero CO2 values.
This may be even less of an issue for reported NOx/SO2 emissions, but we should implement the same imputation method used for CO2 for these emissions as well.
The eGRID2020 technical guide notes that:
The emission factors are primarily from the default CO2 emission factors from the EPA Mandatory Reporting of Greenhouse Gases Final Rule (EPA, 2009, Table C-1). For fuel types that are included in eGRID2020 but are not in the EPA Mandatory Reporting of Greenhouse Gases Final Rule, additional emission factors are used from the 2006 Intergovernmental Panel on Climate Change (IPCC) Guidelines for National Greenhouse Gas Inventories and the EPA Inventory of U.S. Greenhouse Gas Emissions and Sinks: 1990-2015 (IPCC, 2007a; EPA, 2017).
However, it is unclear whether there might be more up to date emissions factors that should be used:
The technical guide also notes:
Several fuel types do not have direct reported emission factors, so emission factors from similar fuel types are used:
• The emission factor for natural gas is used to estimate emissions from process gas and other gas;
• The emission factor for anthracite, bituminous, and lignite coal are used to estimate emissions from refined coal and waste coal; and
• The emission factor for other biomass liquids is used to estimate emissions from sludge waste and liquid wood waste
Currently, subplant IDs are only created for units that exist both in CEMS and EIA-923, meaning that there are certain generators/units that have a subplant ID of NaN
.
subplant_id
as one of the keys are not dropping observations with missing subplant values.pudl.analysis.epa_crosswalk
code to generate subplant IDs for all boilers/generators that exist in the EIA data, regardless of whether data exists in CEMS.There is a set of subplants for which we have incomplete hourly data from CEMS. Instead of using EIA-930 residual profiles to assign an hourly profile to the EIA-923 data for these subplants, we can use the hourly profiles for the subplant units that do report hourly data as the hourly profile for the entire subplant.
Once we assign an hourly profile to these subplants, we will need to concatenate these profiles with the CEMS hourly data and the EIA-923 that was distributed using EIA-930 data.
eGRID reports mercury (Hg) emissions, although the technical guide notes that:
However, while electric generating units started to report mercury data to CAMD’s Power Sector Emissions Data in 2015, the data are incomplete. We have included the unit-level emissions, but since only a subset of the units at one plant may list mercury emissions, we have not summed these emissions to the plant-level. Therefore, we have retained these fields in anticipation of being able to report plant-level mercury emissions and emission rates in a future edition of eGRID
Currently eGRID only reports those mercury emissions that are reported in CAMD, and does not attempt to calculate mercury emissions based on reported EIA-923 data.
To do:
Very low priority issue, but at some point we should do a pipenv install
for all of the dependencies so that they get saved to Pipfile
. Some users (like myself) might be using pip
instead of conda
, so we should support that type of environment.
Any time we use a hard-coded dictionary to update/fix values from one of our data input sources, we should instead be using a csv table located in data/manual
. This will help:
In certain limited cases, some CEMS generators report heat input, but no gross generation or steam load in an hour, which seems to suggest that the gross generation data might be missing (if there is fuel consumption, there should in theory be some gross output).
We had previously implemented a function data_cleaning.impute_missing_hourly_net_generation(cems, eia923_allocated)
, but this is currently removed from the data pipeline.
To determine if we should address this we need to:
If/when we focus on this issue, one of the first changes we would need to implement to the existing function would be to perform the matching on the unit or subplant level rather than the plant level. One example of why this is important is the Ivanpah concentrating solar plant (plant id 57075), which primarily consumes solar energy, but also runs some fossil generation at night to keep the thermal storage warm.
To convert hourly gross generation in CEMS to hourly net generation, we use a combination of five different methods applied in hierarchical order to the data based on the quality of the conversion factors. However, the order in which we apply these methods could affect the final outcome. Thus, we should test how the ordering of these methods affects the results, and set the method hierarchy based on those results.
One validation test we could do is to compare the annual sum of our calculated net generation values to the annual sum of the reported EIA-923 generation for each plant. Whichever approach minimizes the residual between the two should be the one we use.
Although hourly measured NOx and SO2 emissions are reported for a majority generation in CEMS, in certain cases we must impute missing hourly NOx/SO2 emissions or calculate these emissions based on reported EIA-923 fuel consumption.
NOx and SO2 emissions depend on not only the fuel being combusted, but also the prime mover, boiler firing type, and air emissions control equipment. Currently our data pipeline only includes information about the generator fuel type and prime mover, so when performing these imputations, we average the NOx or SO2 emissions factors by fuel and prime mover type. To improve NOx and SO2 emissions calculations, we should incorporate information about the boiler firing type and emissions control equipment into the data pipeline.
Information about these characteristics exists in EIA-860 and EIA-923, but is currently not included in the PUDL ETL pipeline. See this issue in the PUDL repository. The preferred method to fix this would be to integrate this into the PUDL data pipeline, although as a temporary fix we could consider loading this data directly from the raw EIA-860 and 923 files.
Information about the boiler firing type is located in EIA-860 Schedule 6C, 'Boiler Information - Design Parameters'. Once we have this information, it could be merged into our intermediate data files and be used as a merge key for the NOx and SO2 emissions factors.
EIA-923 Schedule 8C, 'Air Emissions Control Info' also reports ozone season and non-ozone season-specific NOx emissions factors for each unit based on the NOx control equipment used. While this data does not specifically identify a unit or boiler number, it reports these emissions factors for each "NOx control ID" at each plant, which appear to line up with boilers or units. If these unit and season specific NOx emissions factors are reported, they should probably be used in place of the generic NOx emissions factors.
The eGRID2020 Technical Guide notes that:
For some units, EIA reports unit-level NOx emission rates (lb/MMBtu) for both annual and ozone season
emissions, from EIA Form 923, Schedule 8C. These unit-level emissions rates are multiplied by the
unit-level heat input used to estimate annual and ozone season NOx emissions. For all other units that
report to EIA but are not included in CAMD’s Power Sector Emissions Data, the unit-level heat input
is multiplied by a prime mover- and fuel-specific emission factor from EPA’s AP-42 Compilation of
Air Pollutant Emission Factors or the EIA Electric Power Annual (EPA, 1995; EIA, 2021f, Table A-2)
The eGRID2020 Technical Guide notes that:
For some units for which we calculated SO2 emissions with an emission factor, EIA reports SO2
control efficiencies. For these units the estimated SO2 emissions are multiplied by (1 – control
efficiency) to estimate the controlled emissions. Units that do not have unit-level control efficiency
data are assumed to be uncontrolled. The control efficiencies are not used for units where the
emissions data are from CAMD’s Power Sector Emissions Data, because these emissions already take
controls into account.
These SO2 control efficiencies are reported in EIA-923 Schedule 8C, 'Air Emissions Control Info'.
For some combined cycle plants, it is possible that fuel input or net generation data is allocated to one portion of a combined cycle but not another, meaning that unit or generator-level statistics might appear to be outliers. Most of these issues should be addressed by aggregating the data at the subplant level (which should aggregate all parts of a combined cycle plant together), but this is something that we should look into.
Generally, our linear equation for regressing gross to net generation seems to fit the data very well (in some cases almost exactly). However, in the future we may want to consider refining our gross to net generation regression to see if any additional factors help explain some of the variation in the data. These factors could include:
In some BAs, wind generation is reported in EIA-923, but there is no hourly wind generation reported in EIA-930. In those cases, we have implemented a method that imputes this missing generation data by averaging together the reported wind generation profiles for directly-interconnected BAs located in the same time zone and using that as a proxy. However, we have not yet validated whether this is a reasonable approach.
To do so, we can evaluate the correlation between regional wind generation profiles, and also cross-validate by estimating profiles using this method for regions that do not have missing wind generation data and comparing the estimate to the actual reported value.
When calculating residual profiles, negative data might not always represent “bad” data that needs to be scaled. There might actually be negative net generation if all of the power plants are idling and consuming more electricity than they generate. In this case, we might want to check the monthly data that we plan to distribute and see if there is any negative net generation represented there.
If a plant has reported negative net generation for an entire month, we cannot simply multiply this by a profile, since multiplying the profile by a negative number will invert the shape of the profile.
If a plant has negative net generation and no fuel consumption, it likely didn’t generate at all - we can probably assign a flat profile to this to represent a flat house load.
If a plant has negative net generation but some fuel consumption, it might have generated in certain hours. Thus, we might want to shift the residual profile such that some hourly values are greater than zero, and some are less than zero, and the sum of all these values adds up to the total negative net generation. One way to do this could be how this was implemented here. Instead of using a scaling factor, use a shift factor to shift the profile up or down.
Context: Currently, the clean_eia923
function returns (1) gen_fuel_allocated
and (2) primary_fuel_table
. For each plant/generator/month row in gen_fuel_allocated
, I've been using primary_fuel_table
to look up the energy source code for that plant and generator.
Problem: For a small number of generators in gen_fuel_allocated
, there is no corresponding data in primary_fuel_table
. Usually, that generator "appears" in the primary_fuel_table
in a later year, which has allowed me to manually fix those generators.
Any idea why this might be @grgmiller ? Maybe this is a problem you've already found? I can either create a manual data file to assign fuel types to these generators, or we can dig deeper into why this is happening.
For reference, here is my manual workaround:
MANUAL_PLANT_GENERATOR_ENERGY_SOURCE_CODE = {
'6058': {
'2': 'NG'
},
'54224': {
'GEN6': 'BIT'
},
'6190': {
'3': 'PC'
},
'7790': {
'2': 'BIT' # Default to the plant primary fuel type.
},
'10612': {
'GEN2': 'NG'
},
'54690': {
'6000': 'SUB'
},
'55821': {
'BCT': 'NG',
'BST': 'NG'
},
'645': {
'GT4': 'NG'
},
'7652': {
'1': 'BIT'
},
'54408': {
'2': 'WDS'
},
'1904': {
'6': 'NG'
},
'10562': {
'GEN5': 'WDS',
},
'54851': {
'SOL1': 'LFG',
'SOL2': 'LFG',
'SOL3': 'LFG',
},
# IC1/IC2 generators are both DFO, so assume the same for the rest.
'676': {
'IC3': 'DFO',
'IC4': 'DFO',
'IC5': 'DFO',
'IC6': 'DFO',
'IC7': 'DFO'
}
}
When adjusting emissions for CHP and biomass, eGRID first makes the biomass adjustment, then adjusts for CHP.
However, in our data pipeline, we first adjust for CHP, then for biomass. We use a slightly different method for these adjustments because we are working with hourly unit-level data rather than monthly plant-level data (like eGRID), but the question remains:
In validation.test_for_outlier_heat_rates()
we currently test for outliers within each fuel type, although we may want to consider refining this to also filter by prime mover, since this may significantly impact the heat rate of a generator.
As the EIA-930 data about page notes,
Generation-only BAs consist of a power plant or group of power plants and do not directly serve retail customers. Therefore, they only report net generation and interchange and do not report demand or demand forecasts.
Eleven active BAs are generation-only:
- Avangrid Renewables, LLC (AVRN)
- Arlington Valley, LLC – AVBA (DEAA)
- GridLiance (GLHB)
- Gridforce Energy Management, LLC (GRID)
- Griffith Energy, LLC (GRIF)
- Gila River Power, LLC (GRMA)
- NaturEner Power Watch, LLC (GWA)
- New Harquahala Generating Company, LLC – HGBA (HGMA)
- Southeastern Power Administration (SEPA)
- NaturEner Wind Watch, LLC (WWA)
- Alcoa Power Generating, Inc. – Yadkin Division (YAD)
The EIA also notes that there are "limited generation balancing authorities":
Most BAs produce electricity within their BA area. However, the following active BA has a small number of local generators that do not always produce electricity, therefore it will not always have net generation to report
EIA notes that these BAs (as well as HST, CPLW, and NSB) may have zero or even negative net generation during some hours because they might not be running their generators during all hours.
We should:
data/manual/ba_reference.csv
to indicate those BAs which the EIA reports as generation-only or limited-generation BAsIn certain cases, it is possible for net generation values if a generator (or an entire fleet) was consuming more electricity than it was generation. However, these negative values have the potential to result in counterintuitive or strange results. Thus, we should ensure that we have a consistent approach to handling negative values throughout our pipeline.
The eGRID technical guide mentions digester gas (DG) several times, although this energy source code does not seem to appear in any static tables or EIA data.
In addition to results files in U.S. units, we should include outputs in metric units.
Data type | US unit | Metric unit |
---|---|---|
Emissions mass | lb | kg |
Electricity | MWh | MWh |
Heat content | mmbtu | GJ |
Percentages | Decimal between 0 and 1 | Decimal between 0 and 1 |
The eGRID 2020 technical guide notes:
Emissions adjustments for NOx , SO2 , CH4 , and N2O emissions are only conducted for landfill gas in eGRID. This adjustment is based on the assumption that in many cases landfills would flare the gas if they did not combust it for electricity generation. Therefore, we assume that, at a minimum, the gas would have been combusted in a flare and would have produced some emissions of NOx , SO2 , CH4 , and N2O anyway.
It also notes:
For NOx emissions from landfill gas, an emission factor for flaring of landfill gas, 0.02 tons per MMBtu,
is used (EPA, 1995). Note that this factor was converted from units of lb/standard cubic foot (scf) to tons/MMBtu based on a value
of 500 Btu/scf (EPA, 2016).
Using eGRID table C-2 and C-3, we should be able to compute NOx and SO2 emissions from the EIA-923 fuel_consumed_units
and fuel_consumed_for_electricity_units
columns. I thought it would make more sense to implement this within Hourly eGRID so that I can use the cleaning steps you've already established for 923.
Although we do not yet have hourly storage profiles integrated into this pipeline (see #59), once we do have hourly charging and discharging profiles, the question becomes how we should treat energy storage and stored emissions in our output emission factor calculations.
Although energy storage does not generally have any direct emissions, if it charges using electricity with associated emissions, you could say that the discharged electricity might have an emissions intensity associated with the carbon intensity of the stored electricity. I am not sure if formalized rules have yet been standardized for how to do this.
Some generators have a reported energy_source_code
of "OTH" or other. The challenge with this code is that there are no emissions factors for "OTH" fuels, so any emissions imputation results in missing values. To address this, we will need to manually replace the energy_source_code
for any generators with OTH as the code with another fuel code that best matches the fuel actually burned at the generator.
We currently doing this using the function data_cleaning.update_energy_source_codes
which manually replaces these values for three plants. However to be more systematic about this we should:
data/manual
, such as updated_energy_source_codes.csv
We do not currently calculate nonbaseload emission rates, which are a type of marginal emission factor estimate.
In eGRID, nonbaseload emission rates are calculated based on the plant-level capacity factor.
All generation and emissions at plants with a low capacity factor (less than 0.2) are considered nonbaseload and are assigned a nonbaseload factor of 1. Plants with a capacity factor greater than 0.8 are considered baseload and are assigned a nonbaseload factor of 0. For plants with a capacity factor between 0.2 and 0.8, we use a linear relationship to determine the percent of generation and emissions that is nonbaseload:
Nonbaseload_Factor = -5/3 * (Capacity_Factor) + 4/3
It is unclear whether nonbaseload factors would make sense at the hourly resolution or if an alternate methodology would need to be developed. We could consider publishing monthly and annual resolution nonbaseload factors even if hourly factors do not make sense.
This could also be an opportunity to consider whether the nonbaseload methodology could be improved.
If we publish these, they should probably be separated as a different use case in results, such as results/marginal emissions
We should implement some sort of outlier detection and screening for the hourly values reported in CEMS. This outlier detection could use a combination of statistical methods and physics-based methods (e.g. gross generation should not exceed nameplate capacity).
This should probably be implemented after loading the CEMS data but before any missing data imputation steps.
Certain named columns should always have a certain data type in order to work properly in the code. For example plant_id_eia
should always be of dtype int
, and report_date
should always be a datetime. We should probably implement this when loading data from csvs (explicitly set the dtype) or immediately after loading the data.
See the apply_pudl_dtype
function in the pudl repository as a potential example of this
There are several ways that the CHP adjustment (used to calculate emission_mass_lb_for_electricity
could be improved.
useful_thermal_output
, data_cleaning.calculate_electric_allocation_factor()
uses an assumed efficiency factor of 0.8, because this is what is used in the eGRID methodology. We should investigate whether this assumption can be improved.electric_allocation_factor
, data_cleaning.calculate_electric_allocation_factor()
uses an additional assumed efficiency factor of 0.75, because this is what is used in the eGRID methodology. We should investigate whether this assumption can be improved.The eGRID techincal support document notes regarding their CHP adjustment methodology that:
This assumes that the CHP units generate electricity first and use the waste heat for other purposes, also
known as “topping.” While there aresome units that generate and use heat first and then use the waste heat
to generate electricity, also known as “bottoming,” data from the EIA shows that the vast majority of CHP
facilities are topping facilities
However, the EIA-860 generator table contains information about whether each plant uses a topping or bottoming cycle, so we could incorporate this information to create a different calculation for bottoming cycle plants.
According to the data for 2020, of the 72, 337 MW of operable capacity for CHP generators, 67,571MW (93%) uses a topping cycle, while the remaining 7% uses a bottoming cycle.
Because CEMS reports data by the unit, our current understanding is that each unit either only produces steam (heat), or only produces electricity, but not both. If this is the case, it simplifies the calculation because we can simply exclude steam-only units from the calculation of emissions for electricity production. However, we need to investigate this further to understand whether this is the case, and whether there would be any reason that any emissions from these plants should be allocated to electricity generation.
We want to make it as easy as possible for people to understand and contribute the code, so we want to make the code as readable and easy to follow as possible
data_cleaning
and other modules in the order they are called.load_data
)load_data
into download_data
and load_data
, or split data_cleaning
into multiple modules based on the functions that are used for cleaning, and the functions that are calculating new data.Historically, any generators that burned municipal solid waste (MSW) reported this fuel consumption under a single fuel code. In recent years, however, EIA-923 began reporting these data under two separate codes for the biogenic portion (MSB) and non-biogenic portion (MSN). This is important because each portion has different emission rates. We should ensure that when this data is available, the more specific fuel codes are being used instead of MSW.
We do not currently have a method for identifying hourly charging/discharging profiles for energy storage, which primarily consists of battery energy storage (energy_source_code == MWH
and prime_mover_code == BA
) or pumped storage hydro (energy_source_code == WAT
and prime_mover_code == PS
).
EIA-923 only reports net generation (net discharge) for energy storage technologies, but we do not have any information about the total charging and total discharging from these storage resources.
Furthermore, EIA-930 does not include energy storage as one of fuel types for net generation, but instead this data is theoretically spread between reported demand, hydro generation, and other generation. The EIA-930 instructions note:
Pumped storage: Pumped storage is included in net generation only when there is net output to the system during the hour. During hours when electricity from the system is used on net to store energy, this electricity is to be includedin actual demand.
The EIA-930 instructions do not include any instructions related to other energy storage, but if energy storage is reported consistently with the rules for pumped storage, then discharge would likely be reported as “other” net generation, whereas charging would be reported as increased net demand
See #37
As of 2020, there were 230 utility-scale batteries reported in EIA-860, a majority of which are located within the territories of the major RTOs/ISOs in the US. If each of these ISOs report timeseries data for energy storage dispatch separate from the data they report to EIA-930, we could use that data to assign a profile. This means that we would need to ingest data from these sources separately. To do this, we could potentially pull data from the singularity API, pyiso, or potentially ElectricityMap.
If our only option were to interpolate storage profiles, EIA-860, schedule 3-4 also reports the various applications that an energy storage plant served (e.g. load following, excess wind and solar generation, system peak shaving, arbitrage). Using this information, we could develop synthetic storage dispatch profiles based on how we assume these batteries would operate.
When we are validating our final outputs against the published annual eGRID values, we should be sure to filter out plants where there are known data quality issues with the published eGRID data.
When imputing missing CEMS emissions data, we use a fuel-specific emission rate based on the fuel type identified in the power sector data crosswalk. However there is a chance that these assigned values are incorrect or at least represent the primary fuel type for a multi-fuel plant. A more robust approach could be to calculate a month-specific weighed average emission factor based on the proportion of each type of fuel actually burned in that unit in a given month (as reported in EIA923), assuming that there is a 1:1 or 1:m mapping between the EPA unit and the EIA boiler/generator.
As noted in this PR, data_pipeline.py
contains the argument --small
which filters out 95% of plants so pipeline runs faster for testing.
A few follow up ideas to improve the functionality of this would be to:
data_cleaning.clean_cems(year)
, it still takes 10+ minutes. If it were faster, we could run it in a commit hook to guarantee that data_cleaning
is always functional. We should enable filtering in data_cleaning.clean_cems(year)
right after loading the cems data from parquet, but before cleaning it.One of the validation tests we should implement for the outputs of data_cleaning.clean_eia923()
is to make sure that the prime mover assigned to each generator in the allocation process matches the prime mover reportedin EIA-860
Currently, in our gross to net generation methodology hierarchy, if all other methods fail, the final method is to assume a gross to net generation ratio of 0.85
, which was based on an approximate average of fleetwide GTN ratios. However, this assumption is not necessarily very robust and should be improved. For the 2020 data, this method is currently used for about 2.2% of total net generation.
One simple fix could be to at least calculate prime mover and fuel specific gross to net ratios.
Certain functions in data_pipeline
, specifically which download data from the internet or take a long time to run (like to gross to net generation calculations), are implemented such that they will not be re-run if the data already exists in the directory. However, in certain cases (new source data is released or need to re-generate GTN calculations) currently the user would have to manually delete these directories. Instead, we should include an option like --clobber
in these functions, which if used, would overwrite the existing data even if it already exists. Could also make this a command line argument if it’s a common use case
Although the primary purpose of this data is to provide accurate hourly emissions data, some users may wish to still use monthly or annual averages for their purposes. Thus, we should include outputs at these aggregations for users as well.
The table data/manual/ba_reference.csv
lists all of the ba codes and relevant metadata about each BA for the pipeline. This file was created based on the BAs that exist in the EIA-930 reference tables (https://www.eia.gov/electricity/930-content/EIA930_Reference_Tables.xlsx). However, my understanding is that this data starts as of 2015, so BAs that retired before then might not be represented.
However, FERC maintains a spreadsheet of "Allowable Entries for Balancing Authorities and Hubs" on its EQR website which seems to be a more complete list. We should use this to improve the coverage of our ba_reference spreadsheet. We will have to manually identify the local time zone for each BA in this spreadsheet.
As the eGRID technical guide notes:
some generator-level net generation data are missing or not reported for various generators in the 2020 EIA-923. EIA aggregates these missing data to the state level by fuel type, but it is not possible to distribute them back to the generator level accurately.
This imputed state-level data is reported in EIA-923 as a "State-Level Fuel Increment" using plant ID 99999. We currently exclude this imputed state-level data from our calculations, and it is unclear if/how eGRID incorporates these data.
Thus we should
When assigning an hourly profile to monthly hydroelectric data, the current method is to use the cleaned hydro profiles from EIA-930 if available, and assign a flat hourly profile to each month otherwise. In exploring the data, there are a couple of ways that this methodology should be improved.
Currently conventional hydroelectric and pumped storage hydroelectric (PSH) are grouped together, both in our cleaned EIA-923 values and in the EIA-930 data. It appears that at least in some cases, certain BAs are reporting net negative hydro generation in certain hours, which would reasonably represent PSH charging. Things to investigate:
HY
prime mover code, and PSH is identified with the PS
prime mover code.Although hydroelectic often displays a significant amount of seasonal variation, many hydro generators (especially reservoirs/dams) exhibit significant variation in generation across hours of a day. We might want to consider how we could estimate this hourly variation if we do not have direct data for the hydro facilities operating in that BA. Several options:
We are currently using gridemissions for physics-based data cleaning https://github.com/jdechalendar/gridemissions
The current process is to manually export a file from that package and copy it to our data folder, but it should get run from the rest of our data pipeline.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.