DOC: Additional items for the cheat sheet #40680

Dr-Irv · 2021-03-29T16:18:42Z

Location of the documentation

https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf

Potential Cheat Sheet Improvements

Per discussion at #39806 (review) , add a third page to the cheat sheet:

More visualization examples that use pandas plotting (no dependence on third party libraries)
List of frequently used options shown here: https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html#frequently-used-options
I/O: A whole section showing a variety of popular IO usage (CSV, Excel, SQL, HTML), and also output file formats (feather, parquet, HDF)
An Apply Functions section
The new Extension Types (String, Integer, Float) and how pd.NA works
Anything else that could fill up the space (if needed)

kunal21sinha · 2021-04-08T04:58:31Z

I am first time contributor and would like to take up this issue.
Could you assign this to me.

Dr-Irv · 2021-04-08T12:31:41Z

@kunal21sinha I've assigned it to you but I know that @OliEfr is possibly working on this based on the discussion from the PR he created.

OliEfr · 2021-04-08T12:36:25Z

Hello,
yes, I was working on the cheatsheet before. However, I did not start working on this precise issue. I think you can go ahead and create something according to your thoughts @kunal21sinha. I'll probably also make some contributions later.

kunal21sinha · 2021-04-12T00:49:57Z

Okay sure, will work on it.

AlexGCas · 2021-05-31T16:35:36Z

maybe you want to add other functions of loc and iloc, iloc for reassignment of a row and loc for reassingment and append new row

amaru-g · 2021-06-14T20:29:33Z

Hi, This is the very first time I am contributing. according to wiki. I should figure it out if this is is taken or not. @AlexGCas . What is the status of this?
Regards,

rishitbhojak · 2021-06-23T10:59:39Z

@Dr-Irv Can you let me know how can I get started with contributing to this cheatsheet. Like, how do I make changes to this cheatsheet ? I have learnt Data Analysis recently and Pandas was of great help. I can add more plotting functions from a pandas dataframe.

OliEfr · 2021-06-23T12:10:47Z

@rishitbhojak
Well, I can help you with that. Go to this folder: https://github.com/OliEfr/pandas/tree/master/doc/cheatsheet
Here you will find the powerpoints, which are used to create the pdf files. Its also useful to read the README.txt.
Besides that of course, it is useful to know how git and github works.

rishitbhojak · 2021-06-23T12:46:52Z

@Dr-Irv @OliEfr Can I start with pie chart plots and subplots . It seems to be a good idea because we'll add more visualization examples. What's your opinion? Can I add the third page and move forward with it?

Dr-Irv · 2021-06-23T13:24:57Z

@rishitbhojak Add a third page with respect to plotting. Move the examples at the bottom of page 2 to page 3, but find something to fill the empty space on page 2 that will be created by the move of those 2 examples. Probably a list of the frequently used options would fit nicely there.

Follow the instructions for contributing to pandas here; https://pandas.pydata.org/docs/development/contributing.html

rishitbhojak · 2021-06-23T13:39:07Z

Alright sir. Can you let me know whether I have done it correctly or not? I am attaching a sample screenshot

Dr-Irv · 2021-06-23T13:42:00Z

@rishitbhojak that's fine. Try to keep the graphics smaller. For the subplots example, ideally there would be titles on each subplot, so you should show how to do that.

rishitbhojak · 2021-06-23T13:57:11Z

Got it! Thank you sir

rishitbhojak · 2021-06-26T11:02:54Z

Took the visualization portion on the third page and will put some function which we use regularly in place of it. Also, I added titles to both the subplots of the pie chart

Dr-Irv · 2021-06-28T13:30:03Z

@rishitbhojak Any review will occur in a pull request, so when you are all done with your changes, create the PR, and I'll provide more feedback there.

rishitbhojak · 2021-07-06T11:17:30Z

In which branch should I make the pull request? I am done with the plotting pie charts portion and in place of the scatter plot and the histogram, I made a section to drop a dataframe column. I will make the PR in this week

Dr-Irv · 2021-07-06T13:36:16Z

In which branch should I make the pull request? I am done with the plotting pie charts portion and in place of the scatter plot and the histogram, I made a section to drop a dataframe column. I will make the PR in this week

Create your own branch off of master. See https://pandas.pydata.org/docs/development/contributing.html#working-with-the-code

NishitaPatnaik21 · 2021-08-18T06:24:36Z

Is this close ? or Can I contribute in this.?

Dr-Irv · 2021-08-19T21:53:24Z

Is this close ? or Can I contribute in this.?

There is a PR #43036 that was just submitted that I need to review. Once that is done, I will update the list above with the open items.

KeeratKG · 2022-01-12T07:00:40Z

@Dr-Irv Hello. I'd like to do my bit here. May I take up this issue?

Dr-Irv · 2022-01-12T15:28:43Z

@Dr-Irv Hello. I'd like to do my bit here. May I take up this issue?

Thanks for offering your help. I just realized from your note that I had forgotten to review #43036 . So let me get that done and merged, and then you could do additional things from there.

KeeratKG · 2022-01-12T15:31:45Z

@Dr-Irv Hello. I'd like to do my bit here. May I take up this issue?

Thanks for offering your help. I just realized from your note that I had forgotten to review #43036 . So let me get that done and merged, and then you could do additional things from there.

Absolutely! I kind of realised that #43036 might still be in the queue, so I went through its contents and made sure I wasn't repeating anything :)

KeeratKG · 2022-01-13T15:10:06Z

Hi. I have added a fourth page to the cheatsheet after #43036. Please have a look @Dr-Irv and let me know what you think. Will update the contribution accordingly. Refer #45347.
I wasn't sure what 'The new Extension Types (String, Integer, Float) and how pd.NA works' referred to. Tried to cover everything else.

MichaelTiemannOSC · 2022-01-14T15:46:56Z

Some additional suggestions to consider for this round or the next:

wide_to_long: more general and more powerful than melt and important to know it exists
groupby box should at least mention the concept of split-apply-combine
groupby.first() operator: important for dealing with missing data
Group by: split-apply-combine (which brings in the the whole concept of operations on data). The elaboration of these concepts will easily fill half to a full page with high-quality content.
The confusing topic of using conditionals in pandas (what to do with a.any, a.all, etc and when they can be avoided entirely)

geo7 · 2022-03-29T11:18:39Z

@MichaelTiemannOSC could you explain how wide_to_long is "more general" and "more powerful" than melt? As I thought the opposite was true, and the documentation (https://pandas.pydata.org/docs/reference/api/pandas.wide_to_long.html) seems to support that "_ Less flexible but more user-friendly than melt_". Maybe there's something I'm unaware of though.

MichaelTiemannOSC · 2022-03-29T11:26:15Z

Good question. melt is "more general" in that it can handle multilevel indexes. wide_to_long is more user-friendly in that in the single-index case, it can both reshape and rename columns, whereas melt only concerns itself with reshaping data.

geo7 · 2022-03-29T11:41:35Z

Hrm - to be honest I've always just used melt - but wide_to_long does have a couple of things packed in that I might use other methods for, I find melt clearer - perhaps because that's what i typically use though. Sorry to derail the thread - I'll leave an example from the wide_to_long documentation with a melt version incase that's of use to anyone in future.

data

df = pd.DataFrame(
    {
        "famid": [1, 1, 1, 2, 2, 2, 3, 3, 3],
        "birth": [1, 2, 3, 1, 2, 3, 1, 2, 3],
        "ht_one": [2.8, 2.9, 2.2, 2, 1.8, 1.9, 2.2, 2.3, 2.1],
        "ht_two": [3.4, 3.8, 2.9, 3.2, 2.8, 2.4, 3.3, 3.4, 2.9],
    }
)

wide_to_long

pd.wide_to_long(
    df,
    stubnames="ht",
    i=["famid", "birth"],
    j="age",
    sep="_",
    suffix=r"\w+",
)

melt

(
    df.melt(id_vars=["famid", "birth"], value_name="ht")
    .assign(age=lambda df: df["variable"].str.split("_").str[-1])
    .drop("variable", axis=1)
    .sort_values(["famid", "birth", "age"])
    .set_index(["famid", "birth", "age"])
)

MichaelTiemannOSC · 2022-03-29T12:33:15Z

For the record, I use melt 90% of the time. But there's lots of financial data of the form xyz_2016, xyz_2017, ... and wide_to_long is great for that.

suryakapurothu · 2022-10-05T16:48:40Z

Is it open now?
@geo7 @Dr-Irv
A bit of guidance would let me add up value to the cheatsheet getting cooked.

Dr-Irv · 2022-10-06T14:22:14Z

Is it open now? @geo7 @Dr-Irv A bit of guidance would let me add up value to the cheatsheet getting cooked.

We always welcome improvements to the cheatsheet, so feel free to create a pull request with your suggested changes.

ashishbalti4 · 2022-10-15T14:06:03Z

@Dr-Irv I have worked and updated the cheat sheet in pdf format. Can I contribute here as it will be my first open-source contribution.

Dr-Irv · 2022-10-16T02:31:51Z

@Dr-Irv I have worked and updated the cheat sheet in pdf format. Can I contribute here as it will be my first open-source contribution.

@ashishbalti4 You should create a pull request that includes the updated powerpoint and PDF.

Aryabhatt1234 · 2024-03-10T14:20:08Z

Hello @Dr-Irv, I am a first time contributor. Can I take up this issue? Please allow me.

Dr-Irv · 2024-03-11T13:44:57Z

Hello @Dr-Irv, I am a first time contributor. Can I take up this issue? Please allow me.

Yes, just provide your edits in a PR and I will review. The PR should update the Powerpoint and the PDF.

rootsmusic · 2024-06-01T21:36:11Z

I suggest specifying the minimum version of pandas that's needed.

Dr-Irv · 2024-06-03T13:22:20Z

I suggest specifying the minimum version of pandas that's needed.

I think, but I'm not sure, that all of the examples in the cheat sheet work with version 1.x on up. If you see otherwise, let me know

samuel-davidson · 2024-08-10T05:48:03Z

Hi,
I am new to open source software and would love to contribute. I have added a third page to the cheatsheet and submitted a PR. Thank you!

MichaelTiemannOSC · 2024-08-10T14:12:00Z

Could you link to the PR...I'd be interested. The list of suggestions I made much earlier in this thread (#40680 (comment)) is not being ticked off (though one topic--melt vs. wide_to_long was discussed in detail). split-transform-combine remains a major concept in data science and is nowhere referenced or exemplified in the Cheat Sheet.

samuel-davidson · 2024-08-10T17:40:10Z

Here is the PR. I just started on the third page and added the frequently used options from the initial list.

Dr-Irv added the Docs label Mar 29, 2021

Dr-Irv mentioned this issue Mar 29, 2021

DOC: update cheatsheet #39806

Merged

4 tasks

lithomas1 added the good first issue label Apr 1, 2021

lithomas1 added this to the Contributions Welcome milestone Apr 1, 2021

Dr-Irv assigned kunal21sinha and unassigned kunal21sinha Apr 8, 2021

alimcmaster1 mentioned this issue Aug 31, 2021

DOC: Additional Items for the Cheat Sheet #43036

Closed

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

Dr-Irv mentioned this issue Nov 3, 2022

Added a New Page in Cheatsheet with some extra functions. #49227

Closed

rootsmusic mentioned this issue Jun 2, 2024

DOC: add cheat sheet numpy/numpy#26593

Open

deveshbervar added a commit to deveshbervar/pandas that referenced this issue Jan 4, 2025

Added improvements to the cheat sheet based on issue pandas-dev#40680

50ca17c

deveshbervar linked a pull request Jan 4, 2025 that will close this issue

Added improvements to the cheat sheet based on issue #40680 #60658

Open

4 tasks

DOC: Additional items for the cheat sheet #40680

DOC: Additional items for the cheat sheet #40680

Comments

Dr-Irv commented Mar 29, 2021

Location of the documentation

Potential Cheat Sheet Improvements

kunal21sinha commented Apr 8, 2021

Dr-Irv commented Apr 8, 2021

OliEfr commented Apr 8, 2021 • edited Loading

kunal21sinha commented Apr 12, 2021

AlexGCas commented May 31, 2021

amaru-g commented Jun 14, 2021

rishitbhojak commented Jun 23, 2021

OliEfr commented Jun 23, 2021 • edited Loading

rishitbhojak commented Jun 23, 2021

Dr-Irv commented Jun 23, 2021

rishitbhojak commented Jun 23, 2021

Dr-Irv commented Jun 23, 2021

rishitbhojak commented Jun 23, 2021

rishitbhojak commented Jun 26, 2021

Dr-Irv commented Jun 28, 2021

rishitbhojak commented Jul 6, 2021 • edited Loading

Dr-Irv commented Jul 6, 2021

NishitaPatnaik21 commented Aug 18, 2021

Dr-Irv commented Aug 19, 2021

KeeratKG commented Jan 12, 2022

Dr-Irv commented Jan 12, 2022

KeeratKG commented Jan 12, 2022

KeeratKG commented Jan 13, 2022

MichaelTiemannOSC commented Jan 14, 2022

geo7 commented Mar 29, 2022

MichaelTiemannOSC commented Mar 29, 2022

geo7 commented Mar 29, 2022

MichaelTiemannOSC commented Mar 29, 2022

suryakapurothu commented Oct 5, 2022

Dr-Irv commented Oct 6, 2022

ashishbalti4 commented Oct 15, 2022

Dr-Irv commented Oct 16, 2022

Aryabhatt1234 commented Mar 10, 2024

Dr-Irv commented Mar 11, 2024

rootsmusic commented Jun 1, 2024

Dr-Irv commented Jun 3, 2024

samuel-davidson commented Aug 10, 2024

MichaelTiemannOSC commented Aug 10, 2024

samuel-davidson commented Aug 10, 2024 • edited Loading

OliEfr commented Apr 8, 2021 •

edited

Loading

OliEfr commented Jun 23, 2021 •

edited

Loading

rishitbhojak commented Jul 6, 2021 •

edited

Loading

samuel-davidson commented Aug 10, 2024 •

edited

Loading