Geeks With Blogs

News

Currently Reading

CLR via C#m
Under The Influence(of code) Abhijeet Patel's blog
Chunking a List

As I mentioned last time, I'm knee deep in python these days. I come from a statically typed background so it's definitely a mental adjustment. List comprehensions is BIG in Python and having worked with a few of them I can see why. Let's say we need to chunk a list into sublists of a specified size.
Here is how we'd do it in C#

  1.  static class Extensions  
  2.  {  
  3.      public static IEnumerable<List<T>> Chunk<T>(this List<T> l, int chunkSize)  
  4.      {  
  5.          if (chunkSize <0)  
  6.          {  
  7.              throw new ArgumentException("chunkSize cannot be negative""chunkSize");  
  8.          }  
  9.          for (int i = 0; i < l.Count; i += chunkSize)  
  10.          {  
  11.              yield return new List<T>(l.Skip(i).Take(chunkSize));  
  12.          }  
  13.      }   
  14.  }  
  15.   
  16. static void Main(string[] args)  
  17. {  
  18.          var l = new List<string> { "a""b""c""d""e""f","g" };  
  19.   
  20.          foreach (var list in l.Chunk(7))  
  21.          {  
  22.              string str = list.Aggregate((s1, s2) => s1 + "," + s2);  
  23.              Console.WriteLine(str);  
  24.          }  
  25.  }  

A little wordy but still pretty concise thanks to LINQ.We skip the iteration number plus chunkSize elements and yield out a new List of chunkSize elements on each iteration.

The python implementation is a bit more terse.

  1. def chunkIterable(iter, chunkSize):  
  2.     '''Chunks an iterable 
  3.         object into a list of the specified chunkSize 
  4.     '''    
  5.     assert hasattr(iter, "__iter__"), "iter is not an iterable"  
  6.     for i in xrange(0, len(iter), chunkSize):  
  7.         yield iter[i:i + chunkSize]  
  8.   
  9. if __name__ == '__main__':  
  10.     l = ['a', 'b', 'c', 'd', 'e', 'f']  
  11.     generator = chunkIterable(l,2)  
  12.     try:  
  13.         while(1):  
  14.             print generator.next()  
  15.     except StopIteration:  
  16.         pass  
xrange generates elements in the specified range taking in a seed and returning a generator. which can be used in a for loop(much like using a C# iterator in a foreach loop)
Since chunkIterable has a yield statement, it turns this method into a generator as well.
iter[i:i + chunkSize] essentially slices the list based on the current iteration index and chunksize and creates a new list that we yield out to the caller one at a time.
A generator much like an iterator is a state machine and each subsequent call to it remembers the state at which the last call left off and resumes execution from that point.

The caveat to keep in mind is that since variables are not explicitly typed we need to ensure that the object passed in is iterable using hasattr(iter, "__iter__").This way we can perform chunking on any object which is an "iterable", very similar to accepting an IEnumerable in the .NET land

Posted on Sunday, March 7, 2010 12:36 PM C# , Python | Back to top


Comments on this post: Chunking a List - .NET vs Python

# re: Chunking a List - .NET vs Python
Requesting Gravatar...
Checking if 'iter' is an iterator is not correct. You should be checking if it is a sequence instead. Or better still, try to slice with no type-checks, then handle the exception appropriately if it fails.

Iterators are not necessarily slicable or indexable. Your function would be better called "chunkSequence". A python list is a special case that is both slicable and iterable.

If you must check types in advance, use the Abstract Base Classes in the collections module:

i.e.

from collections import Sequence
if isinstance(val, Sequence):
chunkSequence(val)
else:
raise TypeError("arg is not a sequence")
Left by bc on Oct 04, 2010 3:46 AM

# re: Chunking a List - .NET vs Python
Requesting Gravatar...
wtf?

def chunk_it(list, chunk_size):
for i in xrange(0, len(list), chunk_size):
yield list[i:i + chunk_size]

l = ['a', 'b', 'c', 'd', 'e', 'f']
gen = chunk_it(l,2)
try:
while True:
print gen.next()
except:
pass

justput it that way alright? and now compare again? where do you type more faking boilerplate code?
and if you wouldn't use that shitty generator function it would have only 7 lines of code.

let alone how you camelcase your variable names in python. that' faking ugly dude.
Left by elmo on Oct 10, 2010 7:02 AM

# re: Chunking a List - .NET vs Python
Requesting Gravatar...
@elmo-
assert hasattr(iter, "__iter__"), "iter is not an iterable"

is this too much boiler plate code? Your code is exactly the same as what i have in the post minus the assertion.

Why would you not use a generator, if your list is huge, materializing the chunked lists up front is not very performant now is it!
Pardon the camel casing since I haven't gotten the casing quite down yet, I switch between C# and Python a LOT!
Left by Abhijeet P on Oct 10, 2010 10:06 AM

# re: Chunking a List - .NET vs Python
Requesting Gravatar...
You're losing the whole point of using a generator with that len(iter)... sometimes you can't tell the size of an interable until you reach its end (e.g a database query set).


try these:


def chunkIterable(sequence, chunkSize):
'''Chunks a sequence
object into a list of the specified chunkSize
'''
assert hasattr(sequence, "__iter__"), "iter is not an iterable"
iterator = iter(sequence) # iter is actually a built-in function
while True:
chunk = [i for i,c in zip(iterator,xrange(chunkSize))]
if not chunk:
return
yield chunk




def chunkIterable(sequence, chunkSize):
'''Chunks a sequence
object into a list of the specified chunkSize
'''
assert hasattr(sequence, "__iter__"), "iter is not an iterable"
chunk = []
for c in sequence:
chunk.append(c)
if chunkSize == len(chunk):
yield chunk
chunk = []


Left by koreno on Feb 14, 2011 5:03 PM

# re: Chunking a List - .NET vs Python
Requesting Gravatar...
The best Python answer is actually shorter and more powerfull:

https://gist.github.com/1275417

It does the same thing but allow you to use iterable with unknow length (like when you download streams over the Web) and you can decide the type of each chunk by passing it as a parameter (list, tuple, iter, etc).

Good rules of thumb: when you deal with iterables, always import itertools. It will be usefull at one time or another.
Left by ksamuel on Oct 10, 2011 7:12 AM

Your comment:
 (will show your gravatar)


Copyright © Abhijeet Patel | Powered by: GeeksWithBlogs.net