Send Lots of Data to a Stored Procedure

I ran across an issue where I need to send lots of data to a stored procedure in SQL Server for processing. All of this is for a site called www.YourStatManager.com. This is a break down of how to do it. Unfortunately, at the time, .Net Core didn’t support using a DataTable with a stored procedure but there are alternatives.

The problem that needs solving

The scenario is I need to create a program that will allow a baseball coach to upload a flat data file of player swings that contain data points concerning a batter hitting a pitched ball. Sensors create data points that record the angle of the hit, the speed of the pitch, and about a hundred other varying data points. Data is analyzed and shown to the coach to compare players. Initial tests had this import taking many minutes to complete. Minutes in a web upload is forever and could lead to a user thinking there is a problem when there is not.

When do you decide performance is a problem?

One of the things to deal with in creating a program is performance. You need to decide when to add complexity to a program to deal with a performance issue. Is it worth it to make code more complex, potentially bug prone, to counter what could be an acceptable performance issue? I mean, lets face it, everything takes time.

When writing lots of data to a database, a lot of the time can simply be the network latency between your client and the database server. You need to weigh simplicity against performance. Writing a simple insert statement may be fast enough. If it is fast enough, then do an insert or update statement, or merge, and move on. However, if you are writing lots of data in a loop, then each time you execute a statement that data is sent over the network to the database. That can take time, and in a loop, time adds up.

Break-down of what happens in an upload

So what I will explain here is how to send a batch of records to a stored procedure all at once for processing on the database side of things. That means one trip across the network and a reduction of overhead in the communication between client and database. If you are doing lots of work with this data, it could be more than simply one insert statement you are avoiding. In my case, I am sending player data that represents a swing of a baseball bat against a pitch.

For each swing record in the data file I need to:

  • Look for the pitcher record, if not found then create.
  • Look for the batter record, if not found then create.
  • Look for the pitcher’s team record, if not found then create.
  • Look for the batter’s team record, if not found then create.
  • Look for the pitcher/team record, if not found then create.
  • Look for the batter/team record, if not found then create.
  • Look for a batting session record, if not found then create.
  • Create an associated entry to record the swing.

So, 8 trips to the database all related to one record. That means, for the price of sending one record to a stored procedure, I can save 7 trips to the database. Certainly some operations could be checked to see if they are necessary without going to the database.

For example, maybe I’ve already checked to see if a player is created and I have their player id from another iteration of the loop. I can save database lookups that way, but in the end there are actually more things going on than I have mentioned.

Now imagine there are a hundred swings per player and 20 players. It adds up. Now multiply all of that by about 20 batting sessions or games. This all happens when a file is uploaded on a web site, so for usability sake it really needs to be fast. People start to think there is a problem when an upload takes too long. I’ll say the upload time went from many minutes to a matter of seconds in one worst case. Unbearable to ‘Holy cow that was fast!’.

Most of the time we stick to simple, but in this case we needed to address this performance problem and that meant more complex code involving prepping data to send to a stored procedure that receives a user-define table type.

Create the user defined table type

We first need to define a table type to pass as a parameter to the stored procedure. For brevity I will not show all of the fields, just a few key types.

CREATE TYPE [dbo].[SessionEntryImportType] AS TABLE(
 [SwingId] [varchar](38) NOT NULL,
 [EventTime] [datetime2](0) NOT NULL,
 [PaOfInning] [int] NOT NULL,
 [ExitVelocity] [decimal](6, 3) NULL,
 [PitcherSetCd] [tinyint] NULL
 )
 GO

The stored procedure

For brevity in some places I have removed some code pertaining to additional fields. In my case, I used the user-defined table as a lookup in a while loop based on the count of records. There may be other ways to go through the table. I needed to set variables to use in subsequent inserts and queries since the table itself cannot seem to be used as a reference directly (I could be wrong).

CREATE PROCEDURE [dbo].[usp_BulkInsertSessionEntry]
    (@companyId int, @sessionType tinyint, @teamLevel tinyint, @fileMasterId int, @tvpSessionEntry SessionEntryImportType READONLY,@status varchar(100) out)
 AS
 DECLARE @RowsToProcess  int
 DECLARE @CurrentRow     int
 DECLARE @SelectSwingId  varchar(38)
 DECLARE @RowsAdded int
 DECLARE @SportCd tinyint
 SET @RowsAdded = 0;
 SET @status = 'Begin'
 DECLARE @indexTable TABLE (RowID int not null primary key identity(1,1), swingId varchar(38))
 INSERT into @indexTable (swingId) SELECT swingId FROM @tvpSessionEntry
 SET @RowsToProcess=@@ROWCOUNT
 SET @status = CAST(@RowsToProcess AS VARCHAR(6))
 SET @CurrentRow=0
 -- Set SportCd based on teamLevel.  teamLevel > 19 is softball
 IF (@teamLevel < 20) SET @SportCd=1 ELSE SET @SportCd=2
 --Delete any old session with this FileMasterId
 DELETE FROM Session WHERE CompanyId=@companyId AND FileMasterId=@fileMasterId;
 WHILE @CurrentRow<@RowsToProcess
 BEGIN
     SET @CurrentRow=@CurrentRow+1
 DECLARE @sessionId     int
 --we are looping through by rowid, get the swingid for the row
     SELECT
         @SelectSwingId=swingId
         FROM @indexTable
         WHERE RowID=@CurrentRow
     --if the swing is already stored we dont want to do anything else with it here
 SET @status = 'New records found'
 --get the swing results and prepare to store them in SessionEntry
 DECLARE @SwingId varchar(38)
 DECLARE @EventTime datetime2(0)
 DECLARE @EventDate date
 DECLARE @PaOfInning int
 DECLARE @Pitcher varchar(100)
 DECLARE @PitcherIdentifier varchar(38)
 DECLARE @PitcherId int
 DECLARE @PitcherTeam varchar(100)
 DECLARE @PitcherTeamId int
 DECLARE @Batter varchar(100)
 DECLARE @BatterIdentifier varchar(38)
 DECLARE @BatterId int
 DECLARE @BatterTeam varchar(100)
 DECLARE @BatterTeamId int
 DECLARE @ExitVelocity decimal(6, 3)
 DECLARE @PitcherSetCd tinyint
 SELECT
 @SwingId=SwingId ,
 @EventTime=EventTime ,
 @PaOfInning=PaOfInning ,
 @PitchOfPa=PitchOfPa ,
 @Pitcher=Pitcher ,
 @PitcherIdentifier =PitcherIdentifier,
 @PitcherTeam=PitcherTeam ,
 @Batter=Batter ,
 @BatterIdentifier=BatterIdentifier ,
 @BatterTeam =BatterTeam,
 @ExitVelocity=ExitVelocity ,
 @PitcherSetCd=PitcherSetCd
 FROM @tvpSessionEntry WHERE SwingId=@SelectSwingId;
 -- we're considering a 'session' to be date, file id, home team, and away team combination.
 -- entries need to arrive ordered by EventTime so we know when looking at the first entry the pitcher's team is home, hitter's is away.
    SET @EventDate = CONVERT(DATE, @EventTime);
 --Create or get pitcher team
 IF NOT EXISTS(SELECT Id FROM Team WHERE CompanyId=@companyId AND Name=@PitcherTeam)
 BEGIN
 INSERT INTO Team(CompanyId, Name, TeamLevelCd) VALUES (@CompanyId, @PitcherTeam, @teamLevel)
 END
 SELECT @pitcherTeamId=Id FROM Team WHERE CompanyId=@companyId AND Name=@PitcherTeam;
 --Create or get batter team
 IF NOT EXISTS(SELECT Id FROM Team WHERE CompanyId=@companyId AND Name=@BatterTeam)
 BEGIN
 INSERT INTO Team(CompanyId, Name, TeamLevelCd) VALUES (@CompanyId, @BatterTeam, @teamLevel)
 END
 SELECT @batterTeamId=Id FROM Team WHERE CompanyId=@companyId AND Name=@BatterTeam;
 --Create or get sessionId
 IF NOT EXISTS(SELECT Id FROM Session WHERE
 CompanyId=@companyId AND
 SessionTypeCd=@sessionType AND
 SessionDate=@EventDate AND
 HomeTeamId IN (@pitcherTeamId, @batterTeamId) AND
 AwayTeamId IN (@pitcherTeamId, @batterTeamId) AND
 FileMasterId=@fileMasterId)
 BEGIN
    INSERT INTO Session(CompanyId, SessionDate, SessionTypeCd, FileMasterId, TeamLevelCd, HomeTeamId, AwayTeamId) VALUES (@CompanyId, @EventDate , @sessionType, @fileMasterId, @teamLevel, @pitcherTeamId, @batterTeamId)
 END
 SELECT @sessionId=Id FROM Session WHERE CompanyId=@companyId AND SessionTypeCd=@sessionType AND SessionDate=@EventDate AND HomeTeamId IN (@pitcherTeamId, @batterTeamId) AND AwayTeamId IN (@pitcherTeamId, @batterTeamId) AND FileMasterId=@fileMasterId;
 --Create or get batter id
 IF NOT EXISTS(SELECT Id FROM Player WHERE CompanyId=@companyId AND Identifier=@BatterIdentifier)
 BEGIN
 INSERT INTO Player(CompanyId, Identifier, FullName, ItemStatusCd, SportCd) VALUES (@CompanyId, @BatterIdentifier, @Batter, 1, @SportCd)
 END
 SELECT @batterId=Id FROM Player WHERE CompanyId=@companyId AND Identifier=@BatterIdentifier;
 --Add batter to team if necessary
 IF NOT EXISTS(SELECT TeamId FROM TeamPlayer WHERE TeamId=@batterTeamId AND PlayerId=@batterId)
 BEGIN
 INSERT INTO TeamPlayer(TeamId, PlayerId) VALUES (@batterTeamId, @batterId)
 END
 --Create or get pitcher id
 IF NOT EXISTS(SELECT Id FROM Player WHERE CompanyId=@companyId AND Identifier=@PitcherIdentifier)
 BEGIN
 INSERT INTO Player(CompanyId, Identifier, FullName, ItemStatusCd, SportCd) VALUES (@CompanyId, @PitcherIdentifier, @Pitcher, 1, @SportCd)
 END
 SELECT @pitcherId=Id FROM Player WHERE CompanyId=@companyId AND Identifier=@PitcherIdentifier;
 --Add pitcher to team if necessary
 IF NOT EXISTS(SELECT TeamId FROM TeamPlayer WHERE TeamId=@pitcherTeamId AND PlayerId=@pitcherId)
 BEGIN
 INSERT INTO TeamPlayer(TeamId, PlayerId) VALUES (@pitcherTeamId, @pitcherId)
 END
 --At this point we have sessionid, batterid, and pitcherid and we know the record needs to be inserted. For brevity I have removed some fields
 INSERT INTO SessionEntry(
 SessionId, BatterId, PitcherId,
 EventTime, ExitVelocity,PitcherSetCd
 )
 VALUES (
 @sessionId, @batterId, @pitcherId,
 @EventTime, @ExitVelocity, @PitcherSetCd
 );
 SET @RowsAdded = @RowsAdded + 1;
 END
 RETURN @RowsAdded;
 GO

Prepare the c# object as our user-defined table type

Let’s assume we have a list of data in c# and we need to create the data in the right structure to send to a stored procedure. At time of writing, .Net Core could not work with sending a DataTable. The link below will describe how to use a DataTable in .Net Framework. You can see if this way of sending data is available now in .Net Core.

How to pass table value parameters to stored procedure

I needed to come up with a solution that used SqlMetaData and SqlDataRecord. I take a list of type SessionImportData and create an enumerable of type SqlDataRecord that will be sent to the stored procedure that maps to the user-defined table type in SQL Server:

 private static IEnumerable CreateSessionEntrySqlDataRecords(IEnumerable results) {
 var metaData = new SqlMetaData[Enum.GetNames(typeof(SessionImportField)).Length];
 //for brevity I will not show the setup of every field, note index position is defined by an enum.
 metaData[(byte)SessionImportField.SwingId] = new SqlMetaData("SwingId", SqlDbType.VarChar, 38);
 metaData[(byte)SessionImportField.EventTime] = new SqlMetaData("EventTime", SqlDbType.DateTime);
 metaData[(byte)SessionImportField.PaOfInning] = new SqlMetaData("PaOfInning", SqlDbType.Int);
 metaData[(byte)SessionImportField.ExitVelocity] = new SqlMetaData("ExitVelocity", SqlDbType.Decimal, 6, 3);
 metaData[(byte)SessionImportField.PitcherSetCd] = new SqlMetaData("PitcherSetCd", SqlDbType.TinyInt);
 //... other fields
 var record = new SqlDataRecord(metaData);
 var key = new List();
 foreach (var x in results) {
             if (!x.IsValid()) {
                continue;
             }
 //Stored proc needs a unique id for each record because of how it access the table in a loop
 record.AddString((byte)SessionImportField.SwingId, Guid.NewGuid().ToString());
 record.SetDateTime((byte)SessionImportField.EventTime, GetEventTime(x.Date, x.Time));
 record.SetInt32((byte)SessionImportField.PaOfInning, x.PaOfInning);
 record.AddNullableDecimal((byte)SessionImportField.ExitVelocity, x.ExitVelocity);
 record.AddNullableByte((byte)SessionImportField.PitcherSetCd, GetPitcherSetCd(x.PitcherSet));
 //... other fields
 key.Add(x.SwingId);
             yield return record;
          }
       }

C# call to stored procedure

We have code above that takes a list and converts it to a different type that maps to the user-defined table type expected by the database. Let’s create a method to take that enumerable and call the stored procedure. Note that I have an output value that I can use to help debug the stored procedure, how far it gets and any information that might help me figure out what went wrong. Otherwise I have no idea how to step through it.

public async Task BulkInsertSessionEntriesAsync(IEnumerable data, int companyId, SessionType sessionType, TeamLevel teamLevel, int fileMasterId) {
          using (var conn = new SqlConnection(ConnectionString)) {
             conn.Open();
             using (var cmd = conn.CreateCommand()) {
                cmd.CommandType = CommandType.StoredProcedure;
                cmd.CommandText = "usp_BulkInsertSessionEntry";
                cmd.CommandTimeout = 120;
                cmd.Parameters.AddWithValue("@companyId", companyId);
                cmd.Parameters.AddWithValue("@sessionType", (byte)sessionType);
                cmd.Parameters.AddWithValue("@teamLevel", (byte)teamLevel);
                cmd.Parameters.AddWithValue("@fileMasterId", fileMasterId);
                var status = cmd.Parameters.Add("@status", SqlDbType.VarChar, 100);
                status.Direction = ParameterDirection.Output;
                var retVal = cmd.Parameters.Add("@RETURN_VALUE", SqlDbType.Int);
                retVal.Direction = ParameterDirection.ReturnValue;
                var tvpParam = cmd.Parameters.AddWithValue("@tvpSessionEntry", data);
                tvpParam.SqlDbType = SqlDbType.Structured;
                await cmd.ExecuteNonQueryAsync();
                return (int)retVal.Value;
             }
          }
       }

Finally

In the end, the two are put together as follows:

var table = CreateSessionEntrySqlDataRecords(records.OrderBy(x => GetEventTime(x.Date, x.Time)));
 var numInserted = await BulkInsertSessionEntriesAsync(table, companyId, sessionType, teamLevel, fileMasterId.Value);

At least the C# code got shorter, but clearly we moved a lot of code off to the database. That shifts work from the client to the database, a new potential bottle neck and a place where bugs can live. However, the performance difference is night and day and we made the program usable, which is the most important point.


Leave a Reply

Your email address will not be published. Required fields are marked *