Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stdlib] Add List.map(fn(mut T)->None) #3839

Open
wants to merge 1 commit into
base: nightly
Choose a base branch
from

Conversation

rd4com
Copy link
Contributor

@rd4com rd4com commented Dec 5, 2024

Hello, the pr is an implementation for map,
we could have used parameters, but it is for a first pythonic experience 🐍
(progressive exposure to complexity and build on current python skills already learned)

  • 🆕 map
    Apply a function to every elements of self and returns a new List
    fn MyFunc(mut e: Int):
        e+=1
    
    var MyList = List(0, 1, 2).map(MyFunc)
    
    print(
        MyList[0] == 1,
        MyList[1] == 2,
        MyList[2] == 3,
    )

@rd4com rd4com requested a review from a team as a code owner December 5, 2024 21:57
@rd4com rd4com force-pushed the list_map_function branch from 9d20a4a to 786e580 Compare December 5, 2024 21:59
@rd4com rd4com changed the title Add List.map(fn(mut T)->None) [stdlib] Add List.map(fn(mut T)->None) Dec 5, 2024
@owenhilyard
Copy link
Contributor

In order to get feature parity with python, we probably want lazy iterators. My vote would be for copying the Rust API where you need to do a .collect[collection: FromIterator](), and then you can implement that trait for all collections. This also means the slightly more pythonic version (List([1,2,3,4].map(lambda x: x ** 2))) will work fine. The other reason I want laziness is to allow for the compiler to do iterator fusion, which lets a long series of maps, filters, and other iterator ops compile down into the same code as the equivalent for loop.

Could we do something like making that return a Map[origin, _ListIter, fn] and which itself is an iterator (which may be doable to implement using aliases in traits)? At the moment, I don't see a good way to abstract over owned vs read/mut cleanly unless we force read/mut to Pointer, which may be sub-optimal but start a conversation about first class references. I think we 100% want to use parameters so that the API can be compiled more efficiently, because if we do it using runtime function pointers we end up leaning on different parts of LLVM which I don't think are quite as guaranteed to work (leading to fragile optimizations).

@martinvuyk
Copy link
Contributor

martinvuyk commented Dec 8, 2024

I think it's a good idea to deviate form Python and add map, filter, sum, join, reduce to List (one of the first things I did for my Array implementation).

we probably want lazy iterators

+1 I think it's worth making it an iterator return and not a copy of the data.

We should also aim to implement Python's builtin map function (also sum) at some point. These methods are very good because they're generic, but the API is not at all achievable until we have the Iterator traits and APIs defined.

My vote would be for copying the Rust API where you need to do a .collectcollection: FromIterator

I think this will dovetail nicely with #3653 (the part about implicit constructors from iterators), which would mean for people who want to just declare the variable type they can, and those who want to use functional style collect also can :)


The approach I'll propose can become ugly very fast once we do it for many types, but I think it's worth trying to implement it and testing it out:

What if we implement one specific type of iterator for each op, for this PR the case would be _ListMapIter that yields the elements. We could then do the same for sum, filter, etc. (or just implement map and test it out first)

Or, we could just let the function return a new List and come back to iterate and fix it once we have Iterators better defined. WDYT?

@owenhilyard
Copy link
Contributor

We should also aim to implement Python's builtin map function (also sum) at some point. These methods are very good because they're generic, but the API is not at all achievable until we have the Iterator traits and APIs defined.

I agree, the builtin map can take an iterable (remember that iterators are iterable and the .iter() is a noop)

The approach I'll propose can become ugly very fast once we do it for many types, but I think it's worth trying to implement it and testing it out:

What if we implement one specific type of iterator for each op, for this PR the case would be _ListMapIter that yields the elements. We could then do the same for sum, filter, etc. (or just implement map and test it out first)

I'd rather have a generic iterator, so that List[T].map returns IterMap[ListIter[T], map_fn]. If we really need the extra performance, we can also specialize map on a bunch of things like the C++ ranges concepts, for instance contiguous buffers, and potentially have traits for iterators that can efficiently yield SIMD chunks. Ideally, I'd like something like Rust's impl Iterator<Output=T>, where we can make very light guarantees about the behavior to the user but the compiler can see it's actually an _ListIter[T], which implements Iterator[T] + FixedSizeSIMDIterator[T] + ContigiousBufferIterator[T] + ... and use that when determining which specializations to choose.

@lsh lsh added the needs-discussion Need discussion in order to move forward label Dec 17, 2024
@@ -930,6 +930,35 @@ struct List[T: CollectionElement, hint_trivial_type: Bool = False](
"""
return self.data

fn map(ref self, func: fn (mut T) -> None) -> Self:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: The API you're describing seems more like for_each than map.
fn(mut T) -> None is a common for_each signature, whereas map tends to be fn[U: CollectionElement](T) -> U.

Examples:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think it was just a mistake in assuming the output will be the same type as the input collection type

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly. for_each tends to mutate in place, while map tends to create a new allocation. Users associate both APIs with those characteristics in my opinion.

Copy link
Contributor

@martinvuyk martinvuyk Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that I think the OP meant map as in map a func to return a new instance with copied values changed, just that he forgot the use case where the collection type of the result might be different.

personal note: I'd actually like us to call for_each apply because it's a name used in the pandas and numpy ecosystem and is IMO much more intuitive (see PR #3877)

@lsh
Copy link
Contributor

lsh commented Dec 17, 2024

@owenhilyard I don't want to get too into the weeds on this and derail the main topic of this PR, but I tend to think the cleanest API would be a constructor for List from Iterator. Due to the cultural linkage to Python, there will probably be a skew in favor of using comprehensions over map/filter. I also personally think List[Int](x ** 2 for x in [1,2,3] if x <= 4) looks nicer than List[Int](1,2,3).map[lambda x: x ** 2]().filter[lambda x: x <= 4](), but that's just a personal preference without taking this to the team.

edit: To be clear this is just a hypothetical lambda syntax.

@owenhilyard
Copy link
Contributor

@owenhilyard I don't want to get too into the weeds on this and derail the main topic of this PR, but I tend to think the cleanest API would be a constructor for List from Iterator.

+1 to List from Iterator, I think it's useful to have both that and a .collect() postfix option.

Due to the cultural linkage to Python, there will probably be a skew in favor of using comprehensions over map/filter. I also personally think List[Int](x ** 2 for x in [1,2,3] if x <= 4) looks nicer than List[Int](1,2,3).map[lambda x: x ** 2]().filter[lambda x: x <= 4](), but that's just a personal preference without taking this to the team.

I agree short list comprehensions are cleaner, but I find they get messy once you want to do [expensive_function(j[1]) for j in [get_filter_key(i) for i in foo] if filter_function(j[0])] or similar longer chains. Python's lambda syntax does make it worse, and I hope we can find an alternative since I don't think I've ever talked to someone who likes it.

@rd4com
Copy link
Contributor Author

rd4com commented Dec 18, 2024

The for_each suggestion makes sense @lsh 👍
after checking python, maybe map could be a function to match python, wdyt?

items = [1, 2, 3, 4, 5]
inc = list(map(lambda x: x+1, items))

That way, there is more time to think about the chaining thing ?

@owenhilyard
Copy link
Contributor

The for_each suggestion makes sense @lsh 👍 after checking python, maybe map could be a function to match python, wdyt?

items = [1, 2, 3, 4, 5]
inc = list(map(lambda x: x+1, items))

That way, there is more time to think about the chaining thing ?

I prefer to do method chaining, since it means that I'm not reading my code inside-out. In terms of this particular PR, I agree it should be for_each, map is a transformation function.

@martinvuyk
Copy link
Contributor

I'd actually like us to call for_each apply because it's a name used in the pandas and numpy ecosystem and is IMO much more intuitive (see PR #3877)

I want to repeat this point, I strongly dislike this being called for_each. Either call it map and fix the func: fn[T2](T1) -> T2 or consider using apply and make the operation inplace. I use numpy and pandas quite heavily, I've used Javascript a bunch as well, and I very much prefer thinking of applying a function than forEach. There is also the not-trivial problem that the abstraction of for_each falls appart when dealing with matrices, tensors, and vectors; with apply it's easy to think about axes. It is also a word that is much more used in the ML space.

@owenhilyard
Copy link
Contributor

I want to repeat this point, I strongly dislike this being called for_each. Either call it map and fix the func: fn[T2](T1) -> T2 or consider using apply and make the operation inplace. I use numpy and pandas quite heavily, I've used Javascript a bunch as well, and I very much prefer thinking of applying a function than forEach. There is also the not-trivial problem that the abstraction of for_each falls appart when dealing with matrices, tensors, and vectors; with apply it's easy to think about axes. It is also a word that is much more used in the ML space.

for_each or foreach has very old roots in FP, I think more people are going to find issues with apply than for_each.

@martinvuyk
Copy link
Contributor

for_each or foreach has very old roots in FP, I think more people are going to find issues with apply than for_each.

I'd argue that many more people use Python (pandas, numpy, ML frameworks) and R (it's a builtin) than Haskell...

@owenhilyard
Copy link
Contributor

for_each or foreach has very old roots in FP, I think more people are going to find issues with apply than for_each.

I'd argue that many more people use Python (pandas, numpy, ML frameworks) and R (it's a builtin) than Haskell...

The other side of the fence is JS, TS, Rust, C++, Java, C#, most FP languages, Scala, Kotlin, and Swift, among others. Ocaml uses for_all. Ruby uses .each.

@martinvuyk
Copy link
Contributor

for_each or foreach has very old roots in FP, I think more people are going to find issues with apply than for_each.

I'd argue that many more people use Python (pandas, numpy, ML frameworks) and R (it's a builtin) than Haskell...

The other side of the fence is JS, TS, Rust, C++, Java, C#, most FP languages, Scala, Kotlin, and Swift, among others. Ocaml uses for_all. Ruby uses .each.

Ok that's a long list 😆. But I still stand by my opinion, I think for_each just comes from the procedural mentality of for loops and it does not adjust well to tensors. I also don't agree with it returning a copy of the data and then modifying it, it should be inplace. With map and apply we have much better readability IMO:

fn add_two(value: Int) -> Int:
    return value + 2

fn bigger_smaller_or_eq(value: Int) -> String:
    return "bigger" if value > 4 else "smaller" if value != 4 else "equal"

fn main():
    items = List[Int](1, 2, 3)
    print(items.map(add_two)) # 3 4 5
    items.apply(add_two)
    print(items) # 3 4 5
    # items.apply(bigger_or_smaller) # won't compile
    print(items.map(bigger_smaller_or_eq)) # smaller equal bigger

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-discussion Need discussion in order to move forward
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants