SQLineage.io Build in Public 1

Get introduced to our newest R&D avenue and see the progress so far

By: Tyler Simpson | Date: 06/12/2024



Introduction

At Granum Technologies LLC, our mission is to revolutionize the fintech sector with innovative software solutions that address unmet needs. With multiple possible projects that we introduced in our last blog we are excited to announce we have landed on SQLineage.io. This product aims at providing an affordable, robust solution for tracking data lineage, initially focusing on Azure/Microsoft stack. Please feel free to follow along as we build as our repository is public https://github.com/granum-tech/sqlineage.


The Problem We're Solving

Data lineage is a critical aspect of data governance, providing transparency and accountability in data transformations and movements across various systems. However, existing solutions like Microsoft Purview can be prohibitively expensive for many organizations. SQLineage.io aims to fill this gap by offering a cost-effective, open-source alternative. 


Our Tech Stack

To build SQLineage.io, we've chosen a tech stack that leverages the strengths of several powerful tools and platforms:


Getting Started with MS SQL

For those looking to set up their own test environment, you can choose from several options:

You can find all these options and download SQL Server here.


Our Database Architecture: Medallion Architecture

We are implementing a medallion architecture in our test database, which structures data flow in three layers:

Data moves from Bronze to Silver through stored procedures that handle data cleaning and transformation. From Silver to Gold, we apply aggregations and specific transformations to prepare the data for final use.


Test Database State

Our test database currently includes schemas for each layer, with tables and stored procedures to handle data transformations. Despite its simplicity now, our setup incorporates key SQL features such as:


Exploring the Code

One of the unique capabilities of our codebase is the ability to iterate through all databases in an Azure SQL Server—a feature not natively supported by Azure SQL. We've developed Python scripts that connect to the SQL server, enumerate all databases, and extract metadata for user tables, views, and stored procedures. Our code is not even in it's infancy yet but now that we have completed most of the structural frameworks, chosen our tools, and built out a small test database we can begin building out the code.


Conclusion

At Granum Technologies LLC, we are committed to driving innovation in the fintech sector. SQLineage.io represents our latest effort to democratize access to vital data insights by providing a cost-effective, open-source data lineage solution. As we continue to develop and enhance SQLineage.io, we look forward to sharing our progress and welcoming feedback from the community. Please don't forget to follow along on our public repository https://github.com/granum-tech/sqlineage and stay tuned for more updates as we build out this exciting new tool and continue our journey towards making data governance accessible to all.