Geeks With Blogs

News
Joe Mayo Cloud and Glue

When first working with the Twitter API, I thought that using SinceID would be an effective way to page through timelines. In practice it doesn’t work well for various reasons. To explain why, Twitter published an excellent document that is a must-read for anyone working with timelines:

Twitter Documentation: Working with Timelines

This post shows how to implement the recommended strategies in that document by using LINQ to Twitter. You should read the document in it’s entirety before moving on because my explanation will start at the bottom and work back up to the top in relation to the Twitter document. What follows is an explanation of SinceID, MaxID, and how they come together to help you efficiently work with Twitter timelines.

The Role of SinceID

Specifying SinceID says to Twitter, “Don’t return tweets earlier than this”. What you want to do is store this value after every timeline query set so that it can be reused on the next set of queries.  The next section will explain what I mean by query set, but a quick explanation is that it’s a loop that gets all new tweets. The SinceID is a backstop to avoid retrieving tweets that you already have. Here’s some initialization code that includes a variable named sinceID that will be used to populate the SinceID property in subsequent queries:

            // last tweet processed on previous query set
            ulong sinceID = 210024053698867204;

            ulong maxID;
            const int Count = 10;
            var statusList = new List<status>();

Here, I’ve hard-coded the sinceID variable, but this is where you would initialize sinceID from whatever storage you choose (i.e. a database).

The first time you ever run this code, you won’t have a value from a previous query set. Initially setting it to 1 might sound like a good idea, but what if you’re querying a timeline with lots of tweets? Because of the number of tweets and rate limits, your query set might take a very long time to run. A caveat might be that Twitter won’t return an entire timeline back to Tweet #1, but rather only go back a certain period of time, the limits of which are documented for individual Twitter timeline API resources. So, to initialize SinceID at too low of a number can result in a lot of initial tweets, yet there is a limit to how far you can go back. What you’re trying to accomplish in your application should guide you in how to initially set SinceID. I have more to say about SinceID later in this post.

The other variables initialized above include the declaration for MaxID, Count, and statusList. The statusList variable is a holder for all the timeline tweets collected during this query set. You can set Count to any value you want as the largest number of tweets to retrieve, as defined by individual Twitter timeline API resources. To effectively page results, you’ll use the maxID variable to set the MaxID property in queries, which I’ll discuss next.

Initializing MaxID

On your first query of a query set, MaxID will be whatever the most recent tweet is that you get back. Further, you don’t know what MaxID is until after the initial query. The technique used in this post is to do an initial query and then use the results to figure out what the next MaxID will be.  Here’s the code for the initial query:

            var userStatusResponse =
                (from tweet in twitterCtx.Status
                 where tweet.Type == StatusType.User &&
                       tweet.ScreenName == "JoeMayo" &&
                       tweet.SinceID == sinceID &&
                       tweet.Count == Count
                 select tweet)
                .ToList();

            statusList.AddRange(userStatusResponse);

            // first tweet processed on current query
            maxID = userStatusResponse.Min(
                status => ulong.Parse(status.StatusID)) - 1;

The query above sets both SinceID and Count properties. As explained earlier, Count is the largest number of tweets to return, but the number can be less. A couple reasons why the number of tweets that are returned could be less than Count include the fact that the user, specified by ScreenName, might not have tweeted Count times yet or might not have tweeted at least Count times within the maximum number of tweets that can be returned by the Twitter timeline API resource. Another reason could be because there aren’t Count tweets between now and the tweet ID specified by sinceID. Setting SinceID constrains the results to only those tweets that occurred after the specified Tweet ID, assigned via the sinceID variable in the query above.

The statusList is an accumulator of all tweets receive during this query set. To simplify the code, I left out some logic to check whether there were no tweets returned. If  the query above doesn’t return any tweets, you’ll receive an exception when trying to perform operations on an empty list. Yeah, I cheated again.

Besides querying initial tweets, what’s important about this code is the final line that sets maxID. It retrieves the lowest numbered status ID in the results. Since the lowest numbered status ID is for a tweet we already have, the code decrements the result by one to keep from asking for that tweet again. Remember, SinceID is not inclusive, but MaxID is. The maxID variable is now set to the highest possible tweet ID that can be returned in the next query. The next section explains how to use MaxID to help get the remaining tweets in the query set.

Retrieving Remaining Tweets

Earlier in this post, I defined a term that I called a query set. Essentially, this is a group of requests to Twitter that you perform to get all new tweets. A single query might not be enough to get all new tweets, so you’ll have to start at the top of the list that Twitter returns and keep making requests until you have all new tweets.

The previous section showed the first query of the query set. The code below is a loop that completes the query set:

            do
            {
                // now add sinceID and maxID
                userStatusResponse =
                    (from tweet in twitterCtx.Status
                     where tweet.Type == StatusType.User &&
                           tweet.ScreenName == "JoeMayo" &&
                           tweet.Count == Count &&
                           tweet.SinceID == sinceID &&
                           tweet.MaxID == maxID
                     select tweet)
                    .ToList();

                if (userStatusResponse.Count > 0)
                {
                    // first tweet processed on current query
                    maxID = userStatusResponse.Min(
                        status => ulong.Parse(status.StatusID)) - 1;

                    statusList.AddRange(userStatusResponse); 
                }
            }
            while (userStatusResponse.Count != 0 && statusList.Count < 30);

Here we have another query, but this time it includes the MaxID property. The SinceID property prevents reading tweets that we’ve already read and Count specifies the largest number of tweets to return.

Earlier, I mentioned how it was important to check how many tweets were returned because failing to do so will result in an exception when subsequent code runs on an empty list. The code above protects against this problem by only working with the results if Twitter actually returns tweets. Reasons why there wouldn’t be results include: if the first query got all the new tweets there wouldn’t be more to get and there might not have been any new tweets between the SinceID and MaxID settings of the most recent query.

The code for loading the returned tweets into statusList and getting the maxID are the same as previously explained. The important point here is that MaxID is being reset, not SinceID. As explained in the Twitter documentation, paging occurs from the newest tweets to oldest, so setting MaxID lets us move from the most recent tweets down to the oldest as specified by SinceID.

The two loop conditions cause the loop to continue as long as tweets are being read or a max number of tweets have been read.  Logically, you want to stop reading when you’ve read all the tweets and that’s indicated by the fact that the most recent query did not return results.

I put the check to stop after 30 tweets are reached to keep the demo from running too long – in the console the response scrolls past available buffer and I wanted you to be able to see the complete output. Yet, there’s another point to be made about constraining the number of items you return at one time. The Twitter API has rate limits and making too many queries per minute will result in an error from twitter that LINQ to Twitter raises as an exception. To use the API properly, you’ll have to ensure you don’t exceed this threshold. Looking at the statusList.Count as done above is rather primitive, but you can implement your own logic to properly manage your rate limit. Yeah, I cheated again.

Summary

Now you know how to use LINQ to Twitter to work with Twitter timelines. After reading this post, you have a better idea of the role of SinceID - the oldest tweet already received. You also know that MaxID is the largest tweet ID to retrieve in a query. Together, these settings allow you to page through results via one or more queries. You also understand what factors affect the number of tweets returned and considerations for potential error handling logic.

The full example of the code for this post is included in the downloadable source code for LINQ to Twitter.

 

@JoeMayo

Posted on Sunday, September 2, 2012 8:29 PM Twitter , LINQ , LINQ to Twitter | Back to top


Comments on this post: Working with Timelines with LINQ to Twitter

No comments posted yet.
Your comment:
 (will show your gravatar)
 


Copyright © Joe Mayo | Powered by: GeeksWithBlogs.net | Join free