[stdlib] Add `List.map(fn(mut T)->None)` #3839

rd4com · 2024-12-05T21:57:35Z

Hello, the pr is an implementation for map,
we could have used parameters, but it is for a first pythonic experience 🐍
(progressive exposure to complexity and build on current python skills already learned)

🆕 map
Apply a function to every elements of self and returns a new List

fn MyFunc(mut e: Int):
    e+=1

var MyList = List(0, 1, 2).map(MyFunc)

print(
    MyList[0] == 1,
    MyList[1] == 2,
    MyList[2] == 3,
)

Signed-off-by: rd4com <[email protected]>

owenhilyard · 2024-12-07T21:55:04Z

In order to get feature parity with python, we probably want lazy iterators. My vote would be for copying the Rust API where you need to do a .collect[collection: FromIterator](), and then you can implement that trait for all collections. This also means the slightly more pythonic version (List([1,2,3,4].map(lambda x: x ** 2))) will work fine. The other reason I want laziness is to allow for the compiler to do iterator fusion, which lets a long series of maps, filters, and other iterator ops compile down into the same code as the equivalent for loop.

Could we do something like making that return a Map[origin, _ListIter, fn] and which itself is an iterator (which may be doable to implement using aliases in traits)? At the moment, I don't see a good way to abstract over owned vs read/mut cleanly unless we force read/mut to Pointer, which may be sub-optimal but start a conversation about first class references. I think we 100% want to use parameters so that the API can be compiled more efficiently, because if we do it using runtime function pointers we end up leaning on different parts of LLVM which I don't think are quite as guaranteed to work (leading to fragile optimizations).

martinvuyk · 2024-12-08T14:14:32Z

I think it's a good idea to deviate form Python and add map, filter, sum, join, reduce to List (one of the first things I did for my Array implementation).

we probably want lazy iterators

+1 I think it's worth making it an iterator return and not a copy of the data.

We should also aim to implement Python's builtin map function (also sum) at some point. These methods are very good because they're generic, but the API is not at all achievable until we have the Iterator traits and APIs defined.

My vote would be for copying the Rust API where you need to do a .collectcollection: FromIterator

I think this will dovetail nicely with #3653 (the part about implicit constructors from iterators), which would mean for people who want to just declare the variable type they can, and those who want to use functional style collect also can :)

The approach I'll propose can become ugly very fast once we do it for many types, but I think it's worth trying to implement it and testing it out:

What if we implement one specific type of iterator for each op, for this PR the case would be _ListMapIter that yields the elements. We could then do the same for sum, filter, etc. (or just implement map and test it out first)

Or, we could just let the function return a new List and come back to iterate and fix it once we have Iterators better defined. WDYT?

owenhilyard · 2024-12-08T20:04:16Z

We should also aim to implement Python's builtin map function (also sum) at some point. These methods are very good because they're generic, but the API is not at all achievable until we have the Iterator traits and APIs defined.

I agree, the builtin map can take an iterable (remember that iterators are iterable and the .iter() is a noop)

The approach I'll propose can become ugly very fast once we do it for many types, but I think it's worth trying to implement it and testing it out:

What if we implement one specific type of iterator for each op, for this PR the case would be _ListMapIter that yields the elements. We could then do the same for sum, filter, etc. (or just implement map and test it out first)

I'd rather have a generic iterator, so that List[T].map returns IterMap[ListIter[T], map_fn]. If we really need the extra performance, we can also specialize map on a bunch of things like the C++ ranges concepts, for instance contiguous buffers, and potentially have traits for iterators that can efficiently yield SIMD chunks. Ideally, I'd like something like Rust's impl Iterator<Output=T>, where we can make very light guarantees about the behavior to the user but the compiler can see it's actually an _ListIter[T], which implements Iterator[T] + FixedSizeSIMDIterator[T] + ContigiousBufferIterator[T] + ... and use that when determining which specializations to choose.

lsh · 2024-12-17T18:58:25Z

stdlib/src/collections/list.mojo

@@ -930,6 +930,35 @@ struct List[T: CollectionElement, hint_trivial_type: Bool = False](
        """
        return self.data

+    fn map(ref self, func: fn (mut T) -> None) -> Self:


thought: The API you're describing seems more like for_each than map.
fn(mut T) -> None is a common for_each signature, whereas map tends to be fn[U: CollectionElement](T) -> U.

Examples:

Rust for_each and map

Swift for_each and map

C++ ranges::for_each and ranges::transform

Yes I think it was just a mistake in assuming the output will be the same type as the input collection type

Not exactly. for_each tends to mutate in place, while map tends to create a new allocation. Users associate both APIs with those characteristics in my opinion.

I meant that I think the OP meant map as in map a func to return a new instance with copied values changed, just that he forgot the use case where the collection type of the result might be different.

personal note: I'd actually like us to call for_each apply because it's a name used in the pandas and numpy ecosystem and is IMO much more intuitive (see PR #3877)

lsh · 2024-12-17T19:06:26Z

@owenhilyard I don't want to get too into the weeds on this and derail the main topic of this PR, but I tend to think the cleanest API would be a constructor for List from Iterator. Due to the cultural linkage to Python, there will probably be a skew in favor of using comprehensions over map/filter. I also personally think List[Int](x ** 2 for x in [1,2,3] if x <= 4) looks nicer than List[Int](1,2,3).map[lambda x: x ** 2]().filter[lambda x: x <= 4](), but that's just a personal preference without taking this to the team.

edit: To be clear this is just a hypothetical lambda syntax.

owenhilyard · 2024-12-18T16:53:24Z

@owenhilyard I don't want to get too into the weeds on this and derail the main topic of this PR, but I tend to think the cleanest API would be a constructor for List from Iterator.

+1 to List from Iterator, I think it's useful to have both that and a .collect() postfix option.

Due to the cultural linkage to Python, there will probably be a skew in favor of using comprehensions over map/filter. I also personally think List[Int](x ** 2 for x in [1,2,3] if x <= 4) looks nicer than List[Int](1,2,3).map[lambda x: x ** 2]().filter[lambda x: x <= 4](), but that's just a personal preference without taking this to the team.

I agree short list comprehensions are cleaner, but I find they get messy once you want to do [expensive_function(j[1]) for j in [get_filter_key(i) for i in foo] if filter_function(j[0])] or similar longer chains. Python's lambda syntax does make it worse, and I hope we can find an alternative since I don't think I've ever talked to someone who likes it.

rd4com · 2024-12-18T17:54:14Z

The for_each suggestion makes sense @lsh 👍
after checking python, maybe map could be a function to match python, wdyt?

items = [1, 2, 3, 4, 5]
inc = list(map(lambda x: x+1, items))

That way, there is more time to think about the chaining thing ?

owenhilyard · 2024-12-18T18:46:31Z

The for_each suggestion makes sense @lsh 👍 after checking python, maybe map could be a function to match python, wdyt?
items = [1, 2, 3, 4, 5]
inc = list(map(lambda x: x+1, items))
That way, there is more time to think about the chaining thing ?

I prefer to do method chaining, since it means that I'm not reading my code inside-out. In terms of this particular PR, I agree it should be for_each, map is a transformation function.

martinvuyk · 2024-12-18T19:16:10Z

I'd actually like us to call for_each apply because it's a name used in the pandas and numpy ecosystem and is IMO much more intuitive (see PR #3877)

I want to repeat this point, I strongly dislike this being called for_each. Either call it map and fix the func: fn[T2](T1) -> T2 or consider using apply and make the operation inplace. I use numpy and pandas quite heavily, I've used Javascript a bunch as well, and I very much prefer thinking of applying a function than forEach. There is also the not-trivial problem that the abstraction of for_each falls appart when dealing with matrices, tensors, and vectors; with apply it's easy to think about axes. It is also a word that is much more used in the ML space.

owenhilyard · 2024-12-18T19:40:35Z

I want to repeat this point, I strongly dislike this being called for_each. Either call it map and fix the func: fn[T2](T1) -> T2 or consider using apply and make the operation inplace. I use numpy and pandas quite heavily, I've used Javascript a bunch as well, and I very much prefer thinking of applying a function than forEach. There is also the not-trivial problem that the abstraction of for_each falls appart when dealing with matrices, tensors, and vectors; with apply it's easy to think about axes. It is also a word that is much more used in the ML space.

for_each or foreach has very old roots in FP, I think more people are going to find issues with apply than for_each.

martinvuyk · 2024-12-18T19:48:02Z

for_each or foreach has very old roots in FP, I think more people are going to find issues with apply than for_each.

I'd argue that many more people use Python (pandas, numpy, ML frameworks) and R (it's a builtin) than Haskell...

owenhilyard · 2024-12-18T20:12:47Z

for_each or foreach has very old roots in FP, I think more people are going to find issues with apply than for_each.

I'd argue that many more people use Python (pandas, numpy, ML frameworks) and R (it's a builtin) than Haskell...

The other side of the fence is JS, TS, Rust, C++, Java, C#, most FP languages, Scala, Kotlin, and Swift, among others. Ocaml uses for_all. Ruby uses .each.

martinvuyk · 2024-12-18T20:35:54Z

for_each or foreach has very old roots in FP, I think more people are going to find issues with apply than for_each.

I'd argue that many more people use Python (pandas, numpy, ML frameworks) and R (it's a builtin) than Haskell...

The other side of the fence is JS, TS, Rust, C++, Java, C#, most FP languages, Scala, Kotlin, and Swift, among others. Ocaml uses for_all. Ruby uses .each.

Ok that's a long list 😆. But I still stand by my opinion, I think for_each just comes from the procedural mentality of for loops and it does not adjust well to tensors. I also don't agree with it returning a copy of the data and then modifying it, it should be inplace. With map and apply we have much better readability IMO:

fn add_two(value: Int) -> Int:
    return value + 2

fn bigger_smaller_or_eq(value: Int) -> String:
    return "bigger" if value > 4 else "smaller" if value != 4 else "equal"

fn main():
    items = List[Int](1, 2, 3)
    print(items.map(add_two)) # 3 4 5
    items.apply(add_two)
    print(items) # 3 4 5
    # items.apply(bigger_or_smaller) # won't compile
    print(items.map(bigger_smaller_or_eq)) # smaller equal bigger

rd4com requested a review from a team as a code owner December 5, 2024 21:57

[stdlib] Add List.map(fn(mut T)->None)

786e580

Signed-off-by: rd4com <[email protected]>

rd4com force-pushed the list_map_function branch from 9d20a4a to 786e580 Compare December 5, 2024 21:59

rd4com changed the title ~~Add List.map(fn(mut T)->None)~~ [stdlib] Add List.map(fn(mut T)->None) Dec 5, 2024

JoeLoser assigned lsh Dec 17, 2024

lsh added the needs-discussion Need discussion in order to move forward label Dec 17, 2024

lsh reviewed Dec 17, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[stdlib] Add `List.map(fn(mut T)->None)` #3839

[stdlib] Add `List.map(fn(mut T)->None)` #3839

rd4com commented Dec 5, 2024

owenhilyard commented Dec 7, 2024

martinvuyk commented Dec 8, 2024 •

edited

Loading

owenhilyard commented Dec 8, 2024

lsh Dec 17, 2024

martinvuyk Dec 17, 2024

lsh Dec 17, 2024

martinvuyk Dec 17, 2024 •

edited

Loading

lsh commented Dec 17, 2024 •

edited

Loading

owenhilyard commented Dec 18, 2024

rd4com commented Dec 18, 2024

owenhilyard commented Dec 18, 2024

martinvuyk commented Dec 18, 2024

owenhilyard commented Dec 18, 2024

martinvuyk commented Dec 18, 2024

owenhilyard commented Dec 18, 2024

martinvuyk commented Dec 18, 2024

[stdlib] Add List.map(fn(mut T)->None) #3839

Are you sure you want to change the base?

[stdlib] Add List.map(fn(mut T)->None) #3839

Conversation

rd4com commented Dec 5, 2024

owenhilyard commented Dec 7, 2024

martinvuyk commented Dec 8, 2024 • edited Loading

owenhilyard commented Dec 8, 2024

lsh Dec 17, 2024

Choose a reason for hiding this comment

martinvuyk Dec 17, 2024

Choose a reason for hiding this comment

lsh Dec 17, 2024

Choose a reason for hiding this comment

martinvuyk Dec 17, 2024 • edited Loading

Choose a reason for hiding this comment

lsh commented Dec 17, 2024 • edited Loading

owenhilyard commented Dec 18, 2024

rd4com commented Dec 18, 2024

owenhilyard commented Dec 18, 2024

martinvuyk commented Dec 18, 2024

owenhilyard commented Dec 18, 2024

martinvuyk commented Dec 18, 2024

owenhilyard commented Dec 18, 2024

martinvuyk commented Dec 18, 2024

[stdlib] Add `List.map(fn(mut T)->None)` #3839

[stdlib] Add `List.map(fn(mut T)->None)` #3839

martinvuyk commented Dec 8, 2024 •

edited

Loading

martinvuyk Dec 17, 2024 •

edited

Loading

lsh commented Dec 17, 2024 •

edited

Loading