For one of my projects I need to have Business News headlines displayed on the front page, to give me greater control as to which headlines and from which feeds I use I'm going to have the aggregator call each feed and store the article details in a SQL database.
Schemas
I'm going to use schemas in designing my database. The use of these changed in SQL 2005 so that each one is a distinct namespace, independent of the user who created it. This will allow me to group related tables together, but it also allows you to use the same table name for different items. For example in an accounts system you have Sales accounts and Purchase accounts. You could of course create one table called SalesAccount and another called PurchaseAccount. With schemas we create ourselves two schemas one called SalesLedger and one called PurchaseLedger. Each schema has a table called 'Accounts'.
So first task is to create the schema that will hold our News. The following line of T/SQL will create our Namespace;
CREATE SCHEMA [NewsAggregator] AUTHORIZATION [dbo]
GO
Creating our tables
Now we have our schema the next thing is the tables that will store the feeds we are going to read, and the news headlines. Not a lot special in these ones to be honest. We need two tables; one to store the feeds that we will be aggregating and the second to store the headlines. The following T/SQL will create these tables;
CREATE TABLE [NewsAggregator].[NewsStories](
[Id] [bigint] IDENTITY(1,1) NOT NULL,
[Title] [varchar](127) NOT NULL,
[Link] [varchar](max) NOT NULL,
[Desc] [varchar](max) NULL,
[Supplier] [varchar](20) NOT NULL,
[DateAdded] [datetime] NOT NULL
CONSTRAINT [DF_NewsStories_DateAdded]
DEFAULT (getdate())
) ON [PRIMARY]
CREATE TABLE [NewsAggregator].[NewsFeeds](
[Id] [bigint] IDENTITY(1,1) NOT NULL,
[Link] [varchar](max) NOT NULL,
[Supplier] [varchar](30) NOT NULL,
[Weight] [smallint] NULL
) ON [PRIMARY]
So there is our two tables. I have added a 'weight' field to the table of feeds. How this works will be clear later, however it is here to allow us to control how many articles from each feed we are displaying.
Now when developing I like to keep things separated. For this reason I like to do all my database access via Stored Procedure. The practical upshot of this is that any differences in the dialect of SQL you are using should be hidden behind the stored procedure making it a lot easier to covert the whole thing to another database.
Stored Procedures
Our feeds are retrieved by making a call to RetrieveNewsFeeds. Taking note of the fact that I am using a schema called NewsAggregator the code to create this stored procedure is as follows;
CREATE PROCEDURE [NewsAggregator].[RetrieveNewsFeeds]
AS
BEGIN
SET NOCOUNT ON;
SELECT Id,
Link,
Supplier
FROM NewsAggregator.NewsFeeds
END
This simply returns the results of select statement. I'm not interested in the order so to speed things along no sorting is done. Next having retrieved our news feeds we need to have a means of writing the headlines to our database;
CREATE PROCEDURE [NewsAggregator].[WriteNewsStory]
@Title VarChar(127),
@Link VarChar(MAX),
@Desc VarChar(MAX),
@Supplier VarChar(20)
AS
BEGIN
Declare @StoryCount BigInt
SET NOCOUNT ON;
Set @StoryCount = (Select Count(*) As Counter From NewsAggregator.NewsStories
Where Link = @Link)
If @StoryCount = 0
Begin
INSERT INTO NewsAggregator.NewsStories
(Title, Link, [Desc], Supplier)
VALUES (@Title,@Link,@Desc,@Supplier)
End
END
This is fairly simple as a stored procedure. We first look for the story link in the database to see if it has been read before. If not then the story gets added to the database.
Now we have our feed suppliers, and we can write the stories to our database. The final stored procedure is to recover the files from the database.
First off the procedure opens up a cursor to access the list of feed suppliers. Normally I would not use a cursor which are notoriously slow in SQL Server but it does provide a simple mechanism for progressing through a table.
CREATE PROCEDURE [NewsAggregator].[RetrieveLatestStories]
AS
BEGIN
Declare @Supplier VarChar(30)
Declare @Weight SmallInt
Declare @Result As Table
(
Title VarChar(127),
Link VarChar(Max),
[Desc] VarChar(Max)
)
Declare FeedSuppliers Cursor For
SELECT Supplier,
Weight
FROM NewsAggregator.NewsFeeds
Open FeedSuppliers
Fetch FeedSuppliers Into @Supplier,@Weight
While @@Fetch_Status = 0
Begin
Insert Into @Result (Title,Link,[Desc])
Select Top (@Weight)Title,Link,[Desc]
From NewsAggregator.NewsStories
Where Supplier = @Supplier
Order By DateAdded Desc
Fetch FeedSuppliers Into @Supplier,@Weight
End
Close FeedSuppliers
Deallocate FeedSuppliers
Select * From @Result
END
While we are moving through the table of suppliers we retrieve from the list of stories the latest for each supplier based on the weight. If the weight is 1 then we retrieve 1 story, 2 and 2 stories 3, 3 stories etc. This weight places a bias in the display of stories allowing us to control the stories from each supplier. and the following will store the news headlines