-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Additional items for the cheat sheet #40680
Comments
I am first time contributor and would like to take up this issue. |
@kunal21sinha I've assigned it to you but I know that @OliEfr is possibly working on this based on the discussion from the PR he created. |
Hello, |
Okay sure, will work on it. |
maybe you want to add other functions of loc and iloc, iloc for reassignment of a row and loc for reassingment and append new row |
Hi, This is the very first time I am contributing. according to wiki. I should figure it out if this is is taken or not. @AlexGCas . What is the status of this? |
@Dr-Irv Can you let me know how can I get started with contributing to this cheatsheet. Like, how do I make changes to this cheatsheet ? I have learnt Data Analysis recently and Pandas was of great help. I can add more plotting functions from a pandas dataframe. |
@rishitbhojak |
@rishitbhojak Add a third page with respect to plotting. Move the examples at the bottom of page 2 to page 3, but find something to fill the empty space on page 2 that will be created by the move of those 2 examples. Probably a list of the frequently used options would fit nicely there. Follow the instructions for contributing to pandas here; https://pandas.pydata.org/docs/development/contributing.html |
@rishitbhojak that's fine. Try to keep the graphics smaller. For the |
Got it! Thank you sir |
@rishitbhojak Any review will occur in a pull request, so when you are all done with your changes, create the PR, and I'll provide more feedback there. |
In which branch should I make the pull request? I am done with the plotting pie charts portion and in place of the scatter plot and the histogram, I made a section to drop a dataframe column. I will make the PR in this week |
Create your own branch off of master. See https://pandas.pydata.org/docs/development/contributing.html#working-with-the-code |
Is this close ? or Can I contribute in this.? |
There is a PR #43036 that was just submitted that I need to review. Once that is done, I will update the list above with the open items. |
@Dr-Irv Hello. I'd like to do my bit here. May I take up this issue? |
Absolutely! I kind of realised that #43036 might still be in the queue, so I went through its contents and made sure I wasn't repeating anything :) |
Hi. I have added a fourth page to the cheatsheet after #43036. Please have a look @Dr-Irv and let me know what you think. Will update the contribution accordingly. Refer #45347. |
Some additional suggestions to consider for this round or the next:
|
@MichaelTiemannOSC could you explain how |
Good question. melt is "more general" in that it can handle multilevel indexes. wide_to_long is more user-friendly in that in the single-index case, it can both reshape and rename columns, whereas melt only concerns itself with reshaping data. |
Hrm - to be honest I've always just used
df = pd.DataFrame(
{
"famid": [1, 1, 1, 2, 2, 2, 3, 3, 3],
"birth": [1, 2, 3, 1, 2, 3, 1, 2, 3],
"ht_one": [2.8, 2.9, 2.2, 2, 1.8, 1.9, 2.2, 2.3, 2.1],
"ht_two": [3.4, 3.8, 2.9, 3.2, 2.8, 2.4, 3.3, 3.4, 2.9],
}
)
pd.wide_to_long(
df,
stubnames="ht",
i=["famid", "birth"],
j="age",
sep="_",
suffix=r"\w+",
)
(
df.melt(id_vars=["famid", "birth"], value_name="ht")
.assign(age=lambda df: df["variable"].str.split("_").str[-1])
.drop("variable", axis=1)
.sort_values(["famid", "birth", "age"])
.set_index(["famid", "birth", "age"])
) |
For the record, I use melt 90% of the time. But there's lots of financial data of the form xyz_2016, xyz_2017, ... and wide_to_long is great for that. |
@Dr-Irv I have worked and updated the cheat sheet in pdf format. Can I contribute here as it will be my first open-source contribution. |
@ashishbalti4 You should create a pull request that includes the updated powerpoint and PDF. |
Hello @Dr-Irv, I am a first time contributor. Can I take up this issue? Please allow me. |
Yes, just provide your edits in a PR and I will review. The PR should update the Powerpoint and the PDF. |
I suggest specifying the minimum version of pandas that's needed. |
I think, but I'm not sure, that all of the examples in the cheat sheet work with version 1.x on up. If you see otherwise, let me know |
Hi, |
Could you link to the PR...I'd be interested. The list of suggestions I made much earlier in this thread (#40680 (comment)) is not being ticked off (though one topic--melt vs. wide_to_long was discussed in detail). split-transform-combine remains a major concept in data science and is nowhere referenced or exemplified in the Cheat Sheet. |
Here is the PR. I just started on the third page and added the frequently used options from the initial list. |
Location of the documentation
https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf
Potential Cheat Sheet Improvements
Per discussion at #39806 (review) , add a third page to the cheat sheet:
The text was updated successfully, but these errors were encountered: