-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nrel atb usa costs #160
base: master
Are you sure you want to change the base?
Nrel atb usa costs #160
Conversation
…chnology-data into nrel_atb_usa_costs
@finozzifa @danielelerede-oet looks good. Just some thoughts and observations -- no changes required: Alternatively, we could keep both ATB versions but make sure that each ATB can do all years (2020, 2025, 2030, 2035, 2040, 2045 and 2050 - or any arbitrary in between). This could be actually quite nice because people can see how model results will change with evolving technology cost assumptions @euronion |
Hi @pz-max, thank you for giving it a look. Actually, the idea was to use atb_e_2022 for 2020 only and atb_e_2024 for future years. Indeed, atb_e_2024 does not contain any data for 2020 and that would obviously cause issues in the model, while we would prefer to only have the most recent data for future years (so the ones coming from atb_e_2024). Does it sound good to you? |
I prefer your solution. Historic values stay historic, future values are based on the most recent numbers. Regarding dependencies (curiosity): why |
inputs/fuel_costs_usa.csv
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please clearly document where you got this file or the information for this file from?
inputs/discount_rates_usa.csv
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please clearly document where you got this file or the information for this file from?
inputs/atb_e_2024.parquet
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please clearly document where you got this file or the information for this file from? Or alternatively have an automatic download rule to pull from NREL's website. (same applied to all the other parquet files)
inputs/manual_input.csv
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the modifications to this file?
scripts/_helpers.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the update to snakemake > 8.0, right?
test/__init__.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason we now have a shebang in some Python files? Seem superfluous to me.
from _helpers import mock_snakemake | ||
|
||
|
||
def get_convertion_dictionary(flag): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typo in function name.
Could you also please add type hints?
if year_val == 2020: | ||
# choose atb_e_2022 | ||
input_atb_path = input_file_list_atb[0] | ||
elif year_val in year_list[1:]: | ||
# choose atb_e_2024 | ||
input_atb_path = input_file_list_atb[1] | ||
else: | ||
raise Exception(f"{year_val} is not a considered year") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will fail if you remove a year from config["nrel_atb"]["nrel_atb_input_years"]
.
If you don't want to make it flexible, you can also hard code 2022 and 2024 numbers as different input files, e.g. snakemake.input["nrel_atb_historic"]
and snakemake.input["nrel_atb_future"]
.
if len(year_list) != len(cost_file_list): | ||
raise Exception( | ||
f"The cost files {year_list} are more than the considered years {cost_file_list}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if len(year_list) != len(cost_file_list): | |
raise Exception( | |
f"The cost files {year_list} are more than the considered years {cost_file_list}" | |
) |
Since both lists are created by snakemake
, let's trust it does a job in keeping them consistent. :)
if len(input_cost_path_list) == 1: | ||
input_cost_path = input_cost_path_list[0] | ||
else: | ||
raise Exception( | ||
"Please verify the list of cost files. It may contain duplicates." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only way this would happen is, for config["years"]
to contain duplicates. This will cause many more issues. If you are concerned about this, I suggest you catch it right at the beginning of the script (e.g. len(set(config["years"]))<len(config["years"])
).
That's simpler and allows you to keep the list comprehension here to a minimum.
Alternatively for snakemake
a nice way to handle input files is to define
rule ...:
input:
**{f"cost_file_to_modify_{year}": f"outputs/costs_{year}.csv" for year in config["years"]},
....
and then access them in here through e.g. snakemake.input["cost_file_to_modify_2020"]
.
output_cost_path_list = [ | ||
path for path in snakemake.output if str(year_val) in path | ||
] | ||
if len(output_cost_path_list) == 1: | ||
output_cost_path = output_cost_path_list[0] | ||
updated_cost_df.to_csv(output_cost_path, index=False) | ||
else: | ||
raise Exception( | ||
"Please verify the list of cost files. It may contain duplicates." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above, checks become redundant if you catch this earlier
).reset_index(drop=True) | ||
|
||
# Cast "value" from float | ||
updated_cost_df["value"] = updated_cost_df["value"].astype(float) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this necessary? Is the datatype of "value" not well defined?
updated_cost_df = pd.concat( | ||
[updated_cost_df.query(query_string_fuel_cost), fuel_costs_year_df] | ||
).reset_index(drop=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the individual dataframes shouldn't overlap, i think it would be easier to read if we concat all only once at the end, like so:atb_e_df
updated_cost_df = pd.concat([
updated_cost_df, # w/o the the entries that are substituted, i.e. with the `query`-ies applied
fuel_costs_year_df,
discount_rate_year_df,
atb_e_df,
], ignore_index=True)
Thanks @finozzifa for the PR - I haven't had a full look yet. Just a few codestyle comments from a first pass. I'm trying to simplify it a bit to make it easier for me to make sense of it. I noticed you're making heavy use of I'll try to run it in the next days. |
conda: | ||
"environment.yaml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why, but snakemake
failed over this line (my conda
version is outdated), despite NOT setting the --use-conda
for snakemake
(bug?). But we can keep it.
Ok, seems to be running smoothly! @lkstrp : Is there any code conventions you'd like to see for newly added code? The repo has been wildwest in the past, so I as long it is working, I consider it good enough. @finozzifa If you could just address the following things:
That would be great! |
Goals
This pull request contains the changes performed by @danielelerede-oet and myself (@finozzifa) as agreed with @martacki and @euronion.
Proposed final goal: this pull request is the first (intermediate) step of a set of changes that aim at granting the possibility to the model users to use country-specific cost assumptions.
Goal of this pull request: This work in particular proposes an intermediate step and creates a sub-folder outputs/US, where the outputs/costs_yyyy.csv files are copied and updated with NREL/ATB data.
Input and output schemas
NREL/ATB input values and schema
The NREL/ATB electricity data source is available here.
We require the cost assumptions for the years 2020, 2025, 2030, 2035, 2040, 2045 and 2050. The cost assumptions for 2020 are obtained from atb_e_2022 dataset, whereas those for the other years from the atb_e_2024 dataset. The schema of these files is unfortunately slightly different. Namely:
Schema of atb_e_2022
Schema of atb_e_2024
We consider a subset of such columns. This can be configured with the configuration from the
config.yaml
fromconfig["nrel_atb"]["nrel_atb_columns_to_keep"]
. For this pull request, the columns taken arewhere in particular
atb_year
equals the year in the file name. For exampleatb_year = 2022
for atb_e_2022 oratb_year = 2024
for atb_e_2024core_metric_variable
equals the year for which the cost assumption is madecore_metric_parameter
has various values. We considerscenario
equalsModerate, Conservative, Advanced
outputs/US/costs_yyyy.csv values and schema
Changes
Changes in the workflow
The workflow has been updated as follows:
rule compile_cost_assumptions
generatesoutputs/costs_yyyy.csv
filesrule compile_cost_assumptions_nrel
takes theoutputs/costs_yyyy.csv
files, reads-in the nrel/atb inputs, processes them and outputs a dedicated set of costs for the US inoutputs/US/costs_yyyy.csv
The "high level" description of what compile_cost_assumptions_nrel.py does
The "high level" description of what the script does is:
outputs/cost_yyyy.csv
fileretrofits
technologies) or CAPEX (for any other technology) and changes its unit from$/KW-yr
to%-yr
nuclear
andbiomass
.Oil
andgas
US-specific fuel costs are fetched from the World Bank's annual prices with projections up to 2030 (based on World Bank's 2026 estimations).Coal
fuel cost is fetched from the EIA Annual Coal Report 2023.Other noteworthy changes
Aligning the technology names of atb_e_2022 to the names of atb_e_2024
The technologies listed below have different name between atb_e_2022.parquet and atb_e_2024.parquet. Therefore the technologies on the left-hand side are renamed to the names on the right-hand side.
Discarded technologies
The following technologies are present in the input file atb_e_2022.parquet. They are however not present in atb_e_2024.parquet. They are therefore discarded from the final cost output files. They are:
environment.yaml
We choose to take the the input datasets atb_e_2022 and atb_e_2024 in parquet format. This is because the corresponding csv files have a size which is significantly larger. This choice brings about the following addition to the environment.yaml file
unit tests
We added a
test
folder to include unit tests for the functions included inscripts/compile_cost_assumptions_nrel.py
inputs/manual_input.csv
New technologies have been added to
manual_inputs.csv
.Checklist
doc
.environment.yaml
(if applicable).doc/release_notes.rst
of the upcoming release is included.