Geeks With Blogs

Bill Osuch - Random geek notes

If you're working with large chunks of data, eventually you'll probably use a foreach loop to iterate through an enumerable data source and execute the same actions on each item (for example, do something to every DataRow in a DataSet). With the new Task Parallel Library (TPL) in .Net 4 you can execute these loops in parallel for a noticeable improvement in speed.

.Net has had support for parallel programming since the 1.0 version, but the developer had to do (sometimes...) extensive work to create the threads, manage and monitor them. Now with the TPL, you can just change a single line of code to make your app multi-threaded.

Let's start with a simple example - I'm going to to create a string array of 900,000 six-digit numbers (from "000000" to "899999"), then loop through the array with a foreach loop and do an MD5 hash on each string. I'm basically just doing a simple task that I know will take a few seconds to execute.

First, I created two console apps, one called "NonParallelForEach" and the other called "ParallelForEach". I created a method common to each one to generate the list of numbers:

        private static string[] GenerateNumberList()
        {
            string[] numbers = new string[900000];
            for (int x = 0; x < 900000; x++)
            {
                numbers[x] = x.ToString("000000");
            }
            return numbers;
        }

In the non-parallel test, I then loop through the list and do the MD5 hash:

 static void Main(string[] args)
        {
            Stopwatch stopwatch = new Stopwatch();
            stopwatch.Start();

            Console.WriteLine("Building numbers list");
            string[] numbers = GenerateNumberList();

            Console.WriteLine("Starting foreach loop");
            foreach (string currentString in numbers)
            {
                MD5 md5Hasher = MD5.Create();
                byte[] data = md5Hasher.ComputeHash(Encoding.UTF8.GetBytes(currentString));
                StringBuilder sBuilder = new StringBuilder();

                for (int i = 0; i < data.Length; i++)
                    sBuilder.Append(data[i].ToString("x2"));
            }

            stopwatch.Stop();
            Console.WriteLine("Time elapsed: {0}", stopwatch.Elapsed);
            Console.ReadLine();
        }

On average, this test took 12.8 seconds.

Now I'm going to change a couple lines to be able to execute this in parallel. You'll need to reference System.Threading and System.Threading.Tasks, and the method signature looks like this:

 Parallel.ForEach<TSource>(
  IEnumerable<TSource> source,
  Action<TSource> body
 )


So you'll be changing this:

 foreach (string currentString in numbers)
 {
 }

to this:

 Parallel.ForEach(numbers, currentString =>
 {
 });

The whole thing looks almost identical except for the loop:

        static void Main(string[] args)
        {
            Stopwatch stopwatch = new Stopwatch();
            stopwatch.Start();

            Console.WriteLine("Building numbers list");
            string[] numbers = GenerateNumberList();

            Console.WriteLine("Starting Parallel.ForEach loop");
            Parallel.ForEach(numbers, currentString =>
                {
                    MD5 md5Hasher = MD5.Create();
                    byte[] data = md5Hasher.ComputeHash(Encoding.Default.GetBytes(currentString));
                    StringBuilder sBuilder = new StringBuilder();

                    for (int i = 0; i < data.Length; i++)
                        sBuilder.Append(data[i].ToString("x2"));
                });

            stopwatch.Stop();
            Console.WriteLine("Time elapsed: {0}", stopwatch.Elapsed);
            Console.ReadLine();
        }

Executing this way takes an average of 8.7 seconds; we shaved almost 35% of the time off. (For reference, I'm running a dual-core T9800 Intel at 2.93GHz; you could get an even better speed gain on a quad-core system).

Next I wanted to see what would happen if database hits were involved - would there be locks that would actually slow down the process? I created two more console apps ("DatabaseParallel" and "DatabaseNonParallel"), and had each of them get a DataSet from the AdventureWorks database:

 private DataSet GetDataSet()
        {
            string sqlText = "";

            SqlConnection myConn = new SqlConnection(_connectionString);
            sqlText = "select AddressID, AddressLine1, City, StateProvinceID, PostalCode, ModifiedDate from Person.Address";

            myConn.Open();

            SqlDataAdapter adapter = new SqlDataAdapter();
            DataSet ds = new DataSet();
            SqlCommand cmd = new SqlCommand(sqlText, myConn);
            cmd.CommandType = CommandType.Text;
            adapter.SelectCommand = cmd;
            adapter.Fill(ds);

            myConn.Close();

            return ds;
        }

Then I loop through each DataRow, open a DataReader, and then do an update:

 public DatabaseNonParallel()
        {
            Stopwatch stopwatch = new Stopwatch();
            stopwatch.Start();

            DataSet ds = GetDataSet();

            foreach (DataRow addressRow in ds.Tables[0].Rows)
            {
                string firstName = "";

                string sql = @"select P.FirstName
                                from Person.BusinessEntityAddress BEA
                                join Person.BusinessEntity BE on BE.BusinessEntityID = BEA.BusinessEntityID
                                join Person.Person P on P.BusinessEntityID = BE.BusinessEntityID
                                where BEA.AddressID = " + addressRow["AddressID"].ToString();

                SqlConnection myConnection = new SqlConnection(_connectionString);
                myConnection.Open();
                SqlCommand myCommand = new SqlCommand(sql, myConnection);
                myCommand.CommandType = CommandType.Text;
                SqlDataReader myReader = myCommand.ExecuteReader(CommandBehavior.CloseConnection);
                if (myReader.Read())
                {
                    firstName = myReader.GetSqlString(0).ToString();
                }
                myReader.Close();
                myConnection.Close();

                sql = "Update Person.Address set AddressLine1 = @Address where AddressID = @addressId ";

                SqlParameter addressParam = new SqlParameter("@Address", DbType.String);
                addressParam.Value = addressRow["AddressLine1"].ToString();
                SqlParameter idParam = new SqlParameter("@addressId", DbType.Int32);
                idParam.Value = Convert.ToInt32(addressRow["AddressID"]);

                myConnection.Open();
                myCommand = myConnection.CreateCommand();
                myCommand.CommandType = CommandType.Text;
                myCommand.CommandText = sql;
                myCommand.Parameters.Add(addressParam);
                myCommand.Parameters.Add(idParam);
                Object obj = myCommand.ExecuteScalar();
                myConnection.Close();

                myReader = null;
                myCommand = null;
                myConnection = null;
            }

            stopwatch.Stop();
            Console.WriteLine("Time elapsed: {0}", stopwatch.Elapsed);
            Console.ReadLine();
        }

Yes, it's not really realistic code, but I just wanted something that would take some time. Executing this took an average of 12.5 seconds.

Then I changed the foreach loop again:

 Parallel.ForEach(ds.Tables[0].AsEnumerable(), addressRow =>
                {
  });

Notice this is using ds.Tables[0].AsEnumerable() instead of ds.Tables[0].Rows; Parallel.ForEach only supports generic collections. Other than changing the ForEach loop, all other code is identical. In tests, the parallel version took an average of 3.8 seconds - a whopping 70% savings!

I found one interesting thing - if you try to throw a Console.WriteLine inside of your parallel loop, you'll actually cause it to run slower than the non-parallel version. This is due to the parallel threads having to fight each other for output to the console.

Posted on Monday, September 26, 2011 12:15 PM Visual Studio 2010 , C# , .Net | Back to top


Comments on this post: Speeding up ForEach loops with parallel programming - Task Parallel Library

# re: Speeding up ForEach loops with parallel programming - Task Parallel Library
Requesting Gravatar...
Great Article,

One problem for .NET Developers though.
Can you show the same code using VB.NET ?

Getting the Parallel.ForEach right in VB.NET is a little tricky.
In particular, iterating through a collection of DataRows in a DataTable is what I'm after.
Left by Nelson on Jul 20, 2017 8:29 PM

# re: Speeding up ForEach loops with parallel programming - Task Parallel Library
Requesting Gravatar...
Can I update data-row values within parallel loop because what I read from other technical websites that DataTable is not thread safe so you can't do it or do it with lock mechanism.

And even I can use will i get performance improvement.


Left by Shyam S on Sep 01, 2017 7:58 AM

# re: Speeding up ForEach loops with parallel programming - Task Parallel Library
Requesting Gravatar...
Thank you so much for sharing these details here as I have been searching for this for past few days. The source code here helps me to reorganize the steps. Keep continuing to share more soon.Pacific Die Casting California
Left by booby on Dec 05, 2017 5:26 AM

Your comment:
 (will show your gravatar)


Copyright © Bill Osuch | Powered by: GeeksWithBlogs.net