Geeks With Blogs
Molnar Tibor blog

I had an interesting thing to solve: working on an enterprise solution in financial services, volume tests were planned to be executed on a test database on SQL server 2000 SP3. The database has around 70 tables, but some of them are estimated to be filled with millions of records in the next few years.

The setup of the database requires quite complex reference data on which the mass data is inserted in around 10 tables, but in order to simulate the real life scenario, it is important to obey to certain restrictions. One example could be that 10% of the data is inserted by a certain customer, 50% by the next 10 and the rest is distributed equally between another 100 customers. There are other several rules like that.

The problem was to create a maintainable set of code (TSQL and/or C#) to be able to create the volume test database in reasonable time with the restrictions mentioned above. These scripts are used in different environments to create the database for volume test.

My first instinct was to write a nice console application which populates the database; of course after 10 minutes I realized that the database population would take days or maybe more. The next thing was to put all the DB population logic in a DB script, but was cumbersome to implement all the logic required for the data population in TSQL and was also pretty slow due to the frequent inserts in DB. The solution are used is the next:

  1. Run DB (schema and stored procedures) creation script without creating the indexes, primary keys and foreign keys
  2. Run the creation script of the low volume reference data
  3. Run DTS package to import other reference data coming from a financial services authority, more then 1 million rows in 5 tables
  4. Run a TSQL script which generates initial data (users, companies, etc) based on the financial services authority reference data, 10s of thousands rows
  5. Run a console application which creates the coma delimited text files for the mass bulk import, here is a trick: because this is the bulk of the data in order to speed up this import, I generate only the primary and foreign keys for import, the rest of the columns are empty (a good side effect is that the files are much smaller this way, only few hundred MBs the biggest ones); here in total we are talking about tens of millions of records
  6. Run a bulk insert script, which imports the text files and then updates the imported rows with indexed default values (like Company1, Company2 etc)
  7. Run a script to create the indexes, primary keys and foreign keys

Again, what was interesting is to create big comma delimited text files containing ONLY the primary and foreign keys of the rows, import those files using bulk insert and then update the rows with a single update statement with default indexed values. What was important is the logic of the creation of the relations betwen the tables (primary and foreign keys) that way creating the required real-life organisation of the data.

For a smaller test database (30 GB) is working below an hour. The maintenance is not a big effort because the schema creation scripts are generated, the biggest effort is to keep the generated text files is sync with the database schema because the system is still under development.

Posted on Tuesday, August 1, 2006 5:54 AM Miscellaneous | Back to top


Comments on this post: Efficient database population for volume test

# re: Efficient database population for volume test
Requesting Gravatar...
is good for me if u get it for me .
Left by lee on Jul 04, 2008 4:47 PM

Your comment:
 (will show your gravatar)


Copyright © Molnar Tibor | Powered by: GeeksWithBlogs.net