SQL Server Backup to Sqlite

· 747 words · 4 minute read

I was interested in exploring NHTSA Vehicle Data without worrying about the limits placed on their API. They make available a backup of their MS Sql Server database as a .bak file. However, I don’t know Sql Server, and I don’t really want to learn it too much. Also, it’s not necessarily “fun” to run Sql Server locally on Linux.

Here’s a pretty rudimentary solution I came up with. It uses Docker to run a Sql Server container. I was having some issues with folder permissions when attempting to bind mount a local directory, and I just honstly didn’t care enough to figure it out, so this ends up being a bit more verbose due to copying to and from the container.

This MS article is where a lot of the setup came from, and then I just had to do a bunch of experimentation to figure out the rest.

The high-level overview is this:

  • Start a SQL Server 2019 container
  • Restore the backup file to the database
  • List the tables
  • Use bcp to execute a SELECT * on the table
  • Include FOR JSON PATH and INCLUDE_NULL_VALUES in the query
  • Use sqlite-utils to import the JSON to SQLite

SQLite doesn’t offer a large variety of types, and doesn’t strictly enforce those types, which makes it ideal for this scenario. In this case, we’re letting SQL Server do the type conversion by serializing it to JSON, and then letting sqlite-utils convert to appropriate types while deserializing the data into the database.

#!/usr/bin/env bash

# Set a few variables

# container name.
CONTAINER=sql1

# sql server backup file location.
FILENAME=vPICList_lite_2023_03.bak

# sql server password. the intent here is to convert the database to sqlite, so
# who cares what the password is, as long as sql server accepts it.
PASSWORD=YourStrong!Passw0rd

# SQLARGS are the common flags that get passed to the different mssql-tools
# utilities to connect to the database.
SQLARGS="-S localhost -U SA -P $PASSWORD"

# bcp() and sqlcmd() could probably be combined, but this worked, and I'm lazy.
# They're essentially the same function to call different utilities in the
# container.
bcp() {
    q=$(printf "\"$1\""); shift; args="$@";
    bash -c "docker exec -i $CONTAINER /opt/mssql-tools/bin/bcp $q $args $SQLARGS"
}

sqlcmd() {
    q=$(printf "\"$1\""); shift; args="$@";
    bash -c "docker exec -i $CONTAINER /opt/mssql-tools/bin/sqlcmd -Q $q $args $SQLARGS"
}

# Start the sql server container
docker run \
    -e 'ACCEPT_EULA=Y' \
    -e "MSSQL_SA_PASSWORD=${PASSWORD}" \
    --name $CONTAINER \
    -d \
    mcr.microsoft.com/mssql/server:2019-latest

### Copy the .bak file
docker exec -it $CONTAINER mkdir -p /var/opt/mssql/backup
docker cp $FILENAME "$CONTAINER:/var/opt/mssql/backup/${FILENAME}"

# List logical file paths in the backup
LFILES=$(sqlcmd "RESTORE FILELISTONLY FROM DISK = '/var/opt/mssql/backup/${FILENAME}'" \
    | tr -s ' ' \
    | cut -d ' ' -f 2 \
    | head -n -2 \
    | tail -n +3)

# List logical file names in the backup
LNAMES=$(sqlcmd "RESTORE FILELISTONLY FROM DISK = '/var/opt/mssql/backup/${FILENAME}'" \
    | tr -s ' ' \
    | cut -d ' ' -f 1 \
    | head -n -2 \
    | tail -n +3)

## Create an array of logical file paths and names
readarray -t lnames_arr <<< "$LNAMES"
readarray -t lfiles_arr <<< "$LFILES"

# Create the WITH MOVE clause
WITHMOVE=""
for (( i=0; i<${#lnames_arr[@]}; i++ )); do
    lname=$(echo ${lfiles_arr[$i]} | tr '\' '/'| xargs basename)

    if [ -z "$WITHMOVE" ]; then
        WITHMOVE+="WITH MOVE '${lnames_arr[$i]}' TO '/var/opt/mssql/data/${lname}'"
    else
        WITHMOVE+=", MOVE '${lnames_arr[$i]}' TO '/var/opt/mssql/data/${lname}'"
    fi
done

# Restore the database
RESTORE="RESTORE DATABASE vinlist FROM DISK = '/var/opt/mssql/backup/${FILENAME}' $WITHMOVE"
echo $RESTORE
sqlcmd "$RESTORE"

# Verify that the database exists
sqlcmd "SELECT Name FROM sys.Databases"

# List the tables in the database, for manual verification
sqlcmd "SELECT Name FROM vinlist.sys.Tables"

# List the tables in the database to a variable
TABLES=$(sqlcmd "SELECT Name FROM vinlist.sys.Tables" \
    | tr -s ' ' \
    | cut -d ' ' -f 1 \
    | head -n -2 \
    | tail -n +3)

for t in $TABLES; do
    echo $t

    # Dump the table to a JSON file
    bcp "SELECT * FROM vinlist.dbo.$t FOR JSON PATH, INCLUDE_NULL_VALUES" queryout /var/opt/mssql/backup/$t.json -c

    # Copy the JSON file to the host
    docker cp $CONTAINER:/var/opt/mssql/backup/$t.json $t.json > /dev/null

    # Fix the id field. Some tables have both "id" and "Id" fields.
    cat $t.json \
        | tr -d '\n' \
        | sed 's/"id"/"Id"/g' \
        | sqlite-utils insert vinlist.db $t --pk=Id -

    # Cleanup
    rm $t.json
    docker exec -it $CONTAINER rm /var/opt/mssql/backup/$t.json
done

# Print the schema to a file. Ideal for sqlc
sqlite3 vinlist.db .schema | tr -d '[]' > schema.sql