How can we help?

Search Results

Data and Parquet files

All your data, delivered to your data lake.


LoanPro’s data files solution helps you store and analyze historical data from across our platform in your data lake, including loan, line of credit, payment, and customer data. It’s ideal for large scale reporting and for combining LoanPro data with external sources.

Unlike a traditional database, which is optimized for daily operations, a data lake is optimized for scale. By sending your data in Parquet files (a highly efficient, column-based format), you can store your data more efficiently and retrieve it faster.

You can store your data in its original format, access it in the structure that makes sense for your business, and query it using tools like Amazon Athena or Snowflake. Whether you’re creating dashboards, running audits, or building predictive models, our data files solution gives your team the flexibility to work with your data however you need.

Other data tools

While the data files solution is ideal for large scale analytics, our other data tools might be more useful in other situations:

  • UI reports are ideal clients without the technical personnel needed for structuring database information. They’re also useful for ad hoc reports.
  • The API helps connect LoanPro to the rest of your operations. It lets you pull information and make updates in real-time.
  • Relational database service (RDS) is a traditional database setup, letting you access all of your data in real time through SQL queries.
  • Data on Demand is LoanPro’s solution for clients who want data from RDS, but may lack the technical resources to configure it themselves. LoanPro team members will configure recurring queries for you, and the data will be delivered to a secure location of your choice.

Together, these tools give multiple data options to clients at every size.

 
 

Sign up and configuration

Before receiving your data files, you’ll need to reach out to your LoanPro contact to get started. We’ll work with you to define the right setup based on your environment, data needs, and update frequency. From there, we’ll confirm which datasets will be shared, how they’ll be delivered, and outline everything in your contract to make sure we’re aligned before implementation begins. 

Data lake architecture

The files we deliver follow the Medallion Architecture, which organizes data into three layers. Each layer is built to support different use cases, depending on how much processing and transformation you want applied to your data before analysis:

  • Bronze layer (raw data). Stores your data in directed, untouched copies of tables from your source databases. It’s useful when you need a complete, original record for audits, debugging, detailed portfolio performance analysis, or accounting workflows.
  • Silver layer (cleaned data). Data in this layer is joined into more comprehensive, standardized tables. We remove duplicates, align formats, and mask sensitive fields. This layer is better for consistent reporting or when you need to join data across tables for a broader analysis.
  • Gold layer (aggregated data). This layer’s data is grouped, summarized, or calculated for clients with specific business use cases, like portfolio performance, payment behavior, or collections insights. It’s ready for use in dashboards or advanced analysis.

Because Silver and Gold layers involve additional processing and curation, they can add time to your implementation. This layered structure lets you work with your data at the level of detail that fits your workflow.

Data collection and prep

LoanPro’s data lake is powered by a centralized data pipeline. This pipeline pulls data from your LoanPro environment, prepares it for analysis, and stores it in Amazon S3 in a format that’s optimized for large-scale use. 

Here's how it works:

Stage Explanation
Data sources

We pull from multiple systems, including LMS, Connections, and Secure Payments (while maintaining PCI compliance), within LoanPro:

  • RDS for structured application data
  • DynamoDB for event-driven data
  • OpenSearch for searchable logs and analytics data

This covers most critical data, but not all of LoanPro’s internal systems. Expanding source coverage is part of our roadmap.

Daily extraction schedule Data is extracted once per day by default. The extraction window runs between 7:30am and 8:30 a.m. CST, just after our daily system maintenance. We’re exploring options for more frequent or near real-time updates in future versions of the data lake.
Full and delta extraction When a new table is added, we extract the full dataset to create a baseline. After that, we only pull records that have changed (a delta snapshot), which reduces redundancy and improves performance. As data volumes grow, we continue to evaluate new strategies to maintain performance and minimize extraction time.
Transformation pipeline Raw data is processed in Iceberg tables, an open table format built for large analytics workloads. These tables are stored in Parquet files, a column-based format that’s efficient to query and store.
Storage structure

Data is organized in S3 using a clear, hierarchical folder structure based on:

  • Your tenant ID
  • The table name
  • The extraction timestamp (formatted as YYYY-MM-DD-HH-mm-ss)

This structure makes it easier to find what you need, track how your data changes over time, and refer back to specific points in time for audit or compliance purposes.

Implementation and data sharing details

How you format your data files depends on a few things, like the platform you’re using and how often you need updates. We’ll help guide you through the setup to make sure it works well with your system.

Here’s a breakdown of key implementation variables:

  • Data platform. If you’re using a platform that supports AWS natively (like Snowflake or Databricks), a setup is usually straightforward. If you’re using Azure or GCP, a VPN is required. 
  • Data scope. You can request specific tables or columns to limit the amount of data shared and improve performance.
  • Report types. If you need specific or custom reports, let us know up front so we can account for any extra setup.
  • Update frequency. By default, data is extracted once per day, after our daily maintenance window. If you need a different frequency for your use case, we can discuss possible adjustments.

All of these details will be confirmed in your contract, including which data will be shared, how often, and in what format. 

How data sharing works

Data sharing depends on the cloud environment where your data platform is hosted. The setup process is simplest if you're on AWS, but we also support other environments with a few extra steps. Here are two common examples.

Your data platform runs on AWS

If you’re using a platform that runs on AWS (e.g., Snowflake, Databricks), we can share data through a private S3 bucket. In this case, access is granted using IAM policies tied to your AWS account, and no VPN is required. You’ll use your platform’s standard process to connect to external S3 storage. For example, Snowflake allows you to create external tables, and Databricks lets you mount the S3 bucket directly.

 
 

Your data platform runs on other clouds

If your platform is hosted outside of AWS, such as in Azure or Google Cloud, a VPN connection is required. Our security policy does not allow public access to S3 buckets, so all data transfers must happen over a private connection. In this setup, you’ll need to create a VPN tunnel between your virtual network and our AWS environment. This allows us to maintain data security and meet compliance requirements, but it does add complexity to setup and may extend implementation timelines. 

 
 

Data storage in S3

Your data is stored in Amazon S3 using a consistent folder structure. This structure makes it easy to locate specific tables, track data over time, and manage historical files.

The exact setup depends on your hosting environment:

Please log in to view the content below.

 

For both Saas and VPC, initial data extractions will be labeled as “full,” and all following extractions will be “delta” snapshots that include only changed records since the previous pull.

This structure makes it easier to organize large volumes of data, maintain a record of changes, and retrieve historical snapshots when needed.

Customizing your dataset

LoanPro provides a standard set of datasets based on commonly requested tables. If you need additional tables, specific columns, or something more custom, we can work with you to define a dataset that fits your requirements.

To request a custom dataset, reach out to your regular LoanPro contact.

Unclassified Public Data