Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nrel atb usa costs #160

Open
wants to merge 111 commits into
base: master
Choose a base branch
from

Conversation

finozzifa
Copy link
Contributor

@finozzifa finozzifa commented Dec 12, 2024

Goals

This pull request contains the changes performed by @danielelerede-oet and myself (@finozzifa) as agreed with @martacki and @euronion.

Proposed final goal: this pull request is the first (intermediate) step of a set of changes that aim at granting the possibility to the model users to use country-specific cost assumptions.

Goal of this pull request: This work in particular proposes an intermediate step and creates a sub-folder outputs/US, where the outputs/costs_yyyy.csv files are copied and updated with NREL/ATB data.

Input and output schemas

NREL/ATB input values and schema

The NREL/ATB electricity data source is available here.

We require the cost assumptions for the years 2020, 2025, 2030, 2035, 2040, 2045 and 2050. The cost assumptions for 2020 are obtained from atb_e_2022 dataset, whereas those for the other years from the atb_e_2024 dataset. The schema of these files is unfortunately slightly different. Namely:

Schema of atb_e_2022

 #   Column                 Non-Null Count   Dtype  
---  ------                 --------------   -----  
 0   atb_year               286998 non-null  int64  
 1   core_metric_key        286812 non-null  object 
 2   core_metric_parameter  286998 non-null  object 
 3   core_metric_case       286998 non-null  object 
 4   crpyears               286998 non-null  object 
 5   technology             286998 non-null  object 
 6   technology_alias       285510 non-null  object 
 7   techdetail             286998 non-null  object 
 8   display_name           285510 non-null  object 
 9   default                285510 non-null  float64
 10  scenario               286998 non-null  object 
 11  core_metric_variable   286998 non-null  object 
 12  units                  286998 non-null  object 
 13  value                  286998 non-null  float64

Schema of atb_e_2024

 #   Column                 Non-Null Count   Dtype  
---  ------                 --------------   -----  
 0   atb_year               572232 non-null  int64  
 1   core_metric_key        572232 non-null  object 
 2   core_metric_parameter  572232 non-null  object 
 3   core_metric_case       572232 non-null  object 
 4   tax_credit_case        212388 non-null  object 
 5   crpyears               572232 non-null  object 
 6   technology             572232 non-null  object 
 7   technology_alias       572232 non-null  object 
 8   techdetail             572232 non-null  object 
 9   techdetail2            506772 non-null  object 
 10  resourcedetail         488328 non-null  object 
 11  display_name           572232 non-null  object 
 12  default                572232 non-null  int64  
 13  scale                  506772 non-null  object 
 14  maturity               506772 non-null  object 
 15  scenario               572232 non-null  object 
 16  core_metric_variable   572232 non-null  int64  
 17  units                  572232 non-null  object 
 18  value                  572232 non-null  float64

We consider a subset of such columns. This can be configured with the configuration from the config.yaml from config["nrel_atb"]["nrel_atb_columns_to_keep"]. For this pull request, the columns taken are

["atb_year", "core_metric_parameter", "core_metric_case", "core_metric_variable", "technology", "technology_alias", "display_name", "scenario", "units", "value"]

where in particular

  • atb_year equals the year in the file name. For example atb_year = 2022 for atb_e_2022 or atb_year = 2024 for atb_e_2024
  • core_metric_variable equals the year for which the cost assumption is made
  • core_metric_parameter has various values. We consider
Column value Unit
CAPEX $/kW
CF
Fixed O&M $/KW-yr
Variable O&M $/MWh
Fuel $/MWh
Additional OCC $/KW
WACC Real
  • scenario equals Moderate, Conservative, Advanced

outputs/US/costs_yyyy.csv values and schema

NREL/ATB column outputs/US/costs_yyyy.csv column Notes
display_name technology
core_metric_parameter parameter
units unit
source taken from config["nrel_atb"]["nrel_atb_source_link"]
further description left blank
atb_year currency_year
scenario scenario
core_metric_case financial_case

Changes

Changes in the workflow

The workflow has been updated as follows:

  1. Step 1: rule compile_cost_assumptions generates outputs/costs_yyyy.csv files
  2. Step 2: rule compile_cost_assumptions_nrel takes the outputs/costs_yyyy.csv files, reads-in the nrel/atb inputs, processes them and outputs a dedicated set of costs for the US in outputs/US/costs_yyyy.csv

The "high level" description of what compile_cost_assumptions_nrel.py does

The "high level" description of what the script does is:

  • it loops through the years 2020, 2025, 2030, 2035, 2040, 2045, 2050
  • for each year, it reads the corresponding outputs/cost_yyyy.csv file
  • for each year, it reads the corresponding atb input file, filters the rows corresponding to the given year and extracts the necessary columns
  • for each year, it normalizes the Fixed O&M by Additional OCC (for retrofits technologies) or CAPEX (for any other technology) and changes its unit from $/KW-yr to %-yr
  • for each year, it performs the following technology renaming. NREL/ATB contains many more technologies than those listed in the table below. Those technologies are included anyway in the final outputs/US/costs_year.csv files but not renamed.
display_name source NREL/ATB file PyPSA technology name
Coal-new -> 2nd Gen Tech atb_e_2022.parquet coal
Coal-new atb_e_2022.parquet, atb_e_2024.parquet coal
NG F-Frame CT atb_e_2022.parquet CCGT
NG Combustion Turbine (F-Frame) atb_e_2024.parquet CCGT
Hydropower - NPD 1 atb_e_2022.parquet, atb_e_2024.parquet hydro
Hydropower - NSD 1 atb_e_2022.parquet, atb_e_2024.parquet ror
Pumped Storage Hydropower - National Class 1 atb_e_2022.parquet, atb_e_2024.parquet PHS
Nuclear - Large atb_e_2024.parquet nuclear
Nuclear - AP1000 atb_e_2022.parquet nuclear
Geothermal - Hydro / Flash atb_e_2022.parquet, atb_e_2024.parquet geothermal
Land-Based Wind - Class 1 atb_e_2022.parquet onwind
Land-Based Wind - Class 1 - Technology 1 atb_e_2024.parquet onwind
Offshore Wind - Class 1 atb_e_2022.parquet, atb_e_2024.parquet offwind
Utility PV - Class 1 atb_e_2022.parquet, atb_e_2024.parquet solar-utility
Commercial PV - Class 1 atb_e_2022.parquet, atb_e_2024.parquet solar-rooftop
Utility-Scale Battery Storage - 6Hr atb_e_2022.parquet, atb_e_2024.parquet battery storage
Biopower atb_e_2022.parquet biomass
Biopower - Dedicated atb_e_2022.parquet, atb_e_2024.parquet biomass
CSP - Class 2 atb_e_2022.parquet, atb_e_2024.parquet csp-tower
  • for each year, it updates the cost values in the corresponding outputs/US/costs_year.csv file so that, the script
    • appends NREL/ATB technology entries not present in the original file outputs/costs_year.csv
    • leaves untouched technology entries not present in the NREL/ATB dataset
    • updates the technology entries present in both the NREL/ATB dataset and outputs/costs_year.csv, with NREL/ATB values
    • appends the values of the discount_rate from the input file discount_rates_usa.csv. This step is necessary because the NREL/ATB dataset provides "grouped names" instead of single technology names. For example, for the parameters CAPEX, CF, Fixed O&M we have the technology Land-Based Wind - Class 1, Land-Based Wind - Class 2 etc. Instead for WACC Real we have just Land-Based Wind
    • appends the values of the fuel from the input file fuel_costs_usa.csv. This step is necessary because the NREL/ATB dataset only presents fuel costs for nuclear and biomass. Oil and gas US-specific fuel costs are fetched from the World Bank's annual prices with projections up to 2030 (based on World Bank's 2026 estimations). Coal fuel cost is fetched from the EIA Annual Coal Report 2023.

Other noteworthy changes

Aligning the technology names of atb_e_2022 to the names of atb_e_2024

The technologies listed below have different name between atb_e_2022.parquet and atb_e_2024.parquet. Therefore the technologies on the left-hand side are renamed to the names on the right-hand side.

atb_e_2022 display_name atb_e_2024 display_name
Land-Based Wind - Class 2 Land-Based Wind - Class 2 - Technology 1
Land-Based Wind - Class 3 Land-Based Wind - Class 3 - Technology 1
Land-Based Wind - Class 4 Land-Based Wind - Class 4 - Technology 1
Land-Based Wind - Class 5 Land-Based Wind - Class 5 - Technology 1
Land-Based Wind - Class 6 Land-Based Wind - Class 6 - Technology 1
Land-Based Wind - Class 7 Land-Based Wind - Class 7 - Technology 1
Land-Based Wind - Class 8 Land-Based Wind - Class 8 - Technology 2
Land-Based Wind - Class 9 Land-Based Wind - Class 9 - Technology 3
Land-Based Wind - Class 10 Land-Based Wind - Class 10 - Technology 4
NG F-Frame CC NG 2-on-1 Combined Cycle (F-Frame)
NG H-Frame CC NG 2-on-1 Combined Cycle (H-Frame)
NG combined cycle 95% CCS (F-frame basis -> 2nd Gen Tech) NG 2-on-1 Combined Cycle (F-Frame) 95% CCS
NG combined cycle 95% CCS (H-frame basis -> 2nd Gen Tech) NG 2-on-1 Combined Cycle (H-Frame) 95% CCS
NG combined cycle Max CCS (F-frame basis -> 2nd Gen Tech) NG 2-on-1 Combined Cycle (F-Frame) 97% CCS
NG combined cycle Max CCS (H-frame basis -> 2nd Gen Tech) NG 2-on-1 Combined Cycle (H-Frame) 97% CCS
Coal-CCS-95% -> 2nd Gen Tech Coal-95%-CCS
Coal-Max-CCS -> 2nd Gen Tech Coal-99%-CCS
Coal-IGCC Coal - IGCC
CSP - Class 7 CSP - Class 8
Nuclear - Small Modular Reactor Nuclear - Small

Discarded technologies

The following technologies are present in the input file atb_e_2022.parquet. They are however not present in atb_e_2024.parquet. They are therefore discarded from the final cost output files. They are:

  • Coal-CCS-95% -> Transformational Tech
  • Coal-Max-CCS -> Transformational Tech
  • Coal-new -> Transformational Tech
  • NG combined cycle 95% CCS (F-frame basis -> Transformational Tech)
  • NG combined cycle 95% CCS (H-frame basis -> Transformational Tech)
  • NG combined cycle Max CCS (F-frame basis -> Transformational Tech)
  • NG combined cycle Max CCS (H-frame basis -> Transformational Tech)

environment.yaml

We choose to take the the input datasets atb_e_2022 and atb_e_2024 in parquet format. This is because the corresponding csv files have a size which is significantly larger. This choice brings about the following addition to the environment.yaml file

  - pyarrow
  - fastparquet

unit tests

We added a test folder to include unit tests for the functions included in scripts/compile_cost_assumptions_nrel.py

inputs/manual_input.csv

New technologies have been added to manual_inputs.csv.

Checklist

  • Code changes are sufficiently documented; i.e. new functions contain docstrings and further explanations may be given in doc.
  • Data source for new technologies is clearly stated.
  • Newly introduced dependencies are added to environment.yaml (if applicable).
  • A note for the release notes doc/release_notes.rst of the upcoming release is included.
  • I consent to the release of this PR's code under the GPLv3 license.

finozzifa and others added 30 commits October 7, 2024 14:52
@pz-max
Copy link
Contributor

pz-max commented Jan 10, 2025

@finozzifa @danielelerede-oet looks good.

Just some thoughts and observations -- no changes required:
atb_e_2022 is used for 2020
atb_e_2024 is used for 2025, 2030, 2035, 2040, 2045 and 2050
You could have used only atb_e_2024 and applied a simple interpolation. This is how we did it for the energy storage case.

Alternatively, we could keep both ATB versions but make sure that each ATB can do all years (2020, 2025, 2030, 2035, 2040, 2045 and 2050 - or any arbitrary in between). This could be actually quite nice because people can see how model results will change with evolving technology cost assumptions @euronion

@danielelerede-oet
Copy link

Hi @pz-max, thank you for giving it a look. Actually, the idea was to use atb_e_2022 for 2020 only and atb_e_2024 for future years. Indeed, atb_e_2024 does not contain any data for 2020 and that would obviously cause issues in the model, while we would prefer to only have the most recent data for future years (so the ones coming from atb_e_2024). Does it sound good to you?

@euronion
Copy link
Collaborator

Hi @pz-max, thank you for giving it a look. Actually, the idea was to use atb_e_2022 for 2020 only and atb_e_2024 for future years. Indeed, atb_e_2024 does not contain any data for 2020 and that would obviously cause issues in the model, while we would prefer to only have the most recent data for future years (so the ones coming from atb_e_2024). Does it sound good to you?

I prefer your solution. Historic values stay historic, future values are based on the most recent numbers.
If there is a use case and need for other, we can raise an issue an adapt in the future, I'd prefer to keep it for another iteration though :)

Regarding dependencies (curiosity): why pyarrow and fastparquet? One parquet backend for pandas should be enough?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please clearly document where you got this file or the information for this file from?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please clearly document where you got this file or the information for this file from?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please clearly document where you got this file or the information for this file from? Or alternatively have an automatic download rule to pull from NREL's website. (same applied to all the other parquet files)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the modifications to this file?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the update to snakemake > 8.0, right?

test/__init__.py Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason we now have a shebang in some Python files? Seem superfluous to me.

from _helpers import mock_snakemake


def get_convertion_dictionary(flag):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix typo in function name.

Could you also please add type hints?

Comment on lines +563 to +570
if year_val == 2020:
# choose atb_e_2022
input_atb_path = input_file_list_atb[0]
elif year_val in year_list[1:]:
# choose atb_e_2024
input_atb_path = input_file_list_atb[1]
else:
raise Exception(f"{year_val} is not a considered year")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will fail if you remove a year from config["nrel_atb"]["nrel_atb_input_years"] .

If you don't want to make it flexible, you can also hard code 2022 and 2024 numbers as different input files, e.g. snakemake.input["nrel_atb_historic"] and snakemake.input["nrel_atb_future"].

Comment on lines +537 to +540
if len(year_list) != len(cost_file_list):
raise Exception(
f"The cost files {year_list} are more than the considered years {cost_file_list}"
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if len(year_list) != len(cost_file_list):
raise Exception(
f"The cost files {year_list} are more than the considered years {cost_file_list}"
)

Since both lists are created by snakemake, let's trust it does a job in keeping them consistent. :)

Comment on lines +555 to +560
if len(input_cost_path_list) == 1:
input_cost_path = input_cost_path_list[0]
else:
raise Exception(
"Please verify the list of cost files. It may contain duplicates."
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only way this would happen is, for config["years"] to contain duplicates. This will cause many more issues. If you are concerned about this, I suggest you catch it right at the beginning of the script (e.g. len(set(config["years"]))<len(config["years"]) ).

That's simpler and allows you to keep the list comprehension here to a minimum.

Alternatively for snakemake a nice way to handle input files is to define

rule ...:
    input:
        **{f"cost_file_to_modify_{year}": f"outputs/costs_{year}.csv" for year in config["years"]},
    ....

and then access them in here through e.g. snakemake.input["cost_file_to_modify_2020"].

Comment on lines +652 to +661
output_cost_path_list = [
path for path in snakemake.output if str(year_val) in path
]
if len(output_cost_path_list) == 1:
output_cost_path = output_cost_path_list[0]
updated_cost_df.to_csv(output_cost_path, index=False)
else:
raise Exception(
"Please verify the list of cost files. It may contain duplicates."
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above, checks become redundant if you catch this earlier

).reset_index(drop=True)

# Cast "value" from float
updated_cost_df["value"] = updated_cost_df["value"].astype(float)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary? Is the datatype of "value" not well defined?

Comment on lines +644 to +646
updated_cost_df = pd.concat(
[updated_cost_df.query(query_string_fuel_cost), fuel_costs_year_df]
).reset_index(drop=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the individual dataframes shouldn't overlap, i think it would be easier to read if we concat all only once at the end, like so:atb_e_df

updated_cost_df = pd.concat([
    updated_cost_df, # w/o the the entries that are substituted, i.e. with the `query`-ies applied
    fuel_costs_year_df,
    discount_rate_year_df,
    atb_e_df,
], ignore_index=True)

@euronion
Copy link
Collaborator

Thanks @finozzifa for the PR - I haven't had a full look yet. Just a few codestyle comments from a first pass. I'm trying to simplify it a bit to make it easier for me to make sense of it.

I noticed you're making heavy use of .casefold() - is this really necessary, maybe even creating more issues downstream? I'm thinking of occassions where the case does not match, .casefold() catches it and allows the code to run, but then the (wrongly) cased words get added to the file later, which could mess up the capitalisation? I'd rather catch wrongly cased input right at the start and ask the user to fix. ("In the face of ambiguity, refuse the temptation to guess.")

I'll try to run it in the next days.

Comment on lines +53 to +54
conda:
"environment.yaml"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why, but snakemake failed over this line (my conda version is outdated), despite NOT setting the --use-conda for snakemake (bug?). But we can keep it.

@euronion
Copy link
Collaborator

Ok, seems to be running smoothly!

@lkstrp : Is there any code conventions you'd like to see for newly added code? The repo has been wildwest in the past, so I as long it is working, I consider it good enough.

@finozzifa If you could just address the following things:

  • Code simplifications mentioned
  • Source indications
  • Documentation update (mention NREL ATB option and output for US, how to run, and added keys to config file)

That would be great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants