How to Access and Update Parquet Files in OneLake?

bhavya5903 · ‎03-09-2025

Hi Fabric Community,

I’m working with OneLake in Microsoft Fabric, where my data is stored in Parquet format. I need to:

Access and query Parquet files stored in OneLake.
Update or modify existing Parquet data efficiently.

I would like to know:

What are the best ways to query Parquet files from Fabric SQL Endpoint, Notebooks, or any other method?
How can I update Parquet data? Is it better to use Delta Lake for updates, or do I need to overwrite the entire file?
Are there any best practices or performance considerations when modifying Parquet files in OneLake?

v-shamiliv · ‎03-10-2025

Hi @bhavya5903
Thank you for reaching out microsoft fabric community forum.

1.Use Direct Lake mode to query Parquet files stored in OneLake without importing data into a warehouse.Ensure the Parquet files are registered as tables in your Lakehouse SQL analytics engine for efficient querying.Use PySpark to read Parquet files directly from OneLake.You can also use Pandas with pandas.read_parquet() for smaller datasets.
2.Use Delta Lake if frequent updates are needed, as it allows efficient data modifications.
3.Partition large Parquet files by relevant columns (e.g., date) to improve query performance.Use Direct Lake mode in Power BI instead of Import mode to avoid unnecessary data duplication.Parquet works best with larger files (~256MB) rather than many small files to reduce metadata overhead.

If this solution helps, please consider giving us Kudos and accepting it as the solution so that it may assist other members in the community
Thank you.

suparnababu8 · ‎03-10-2025

Hello @bhavya5903

Answering your questions as outlined below.

1) What are the best ways to query Parquet files from Fabric SQL Endpoint, Notebooks, or any other method?

It's depends on your data volumen you are processing and as per your project requirements. You can query your parquet files with any of SQL Endpoint, Notebooks and datflows methods. But I would recommend spark notebooks, becuase notebooks are a useful tool for working with Parquet files, particularly when using Apache Spark. With PySpark or Spark SQL, you can easily read and work with Parquet data.

2) How can I update Parquet data? Is it better to use Delta Lake for updates, or do I need to overwrite the entire file?

For updating data in parquest files, I would recommend Delta Lake, becuase delta lake supports ACID transactions, which means you can update or delete records without replacing the whole file. This is especially helpful for managing large datasets and keeping data accurate.

3) Are there any best practices or performance considerations when modifying Parquet files in OneLake?

- Organize your data by important columns to make queries faster. This way, you only access the needed parts of the data instead of going through the whole dataset.

- To make reading Parquet files faster, you can enable V-Order. This improves the file structure by organizing data, grouping rows, using dictionary encoding, and compressing the data.

If you need more info please go through below microsoft official documentation. It might helps you.

Delta Lake table optimization and V-Order - Microsoft Fabric | Microsoft Learn

Connect to Parquet files in dataflows - Microsoft Fabric | Microsoft Learn

How to configure Parquet format in the data pipeline of Data Factory in Microsoft Fabric - Microsoft...

Hope this helps you!

Thank you!

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

New Offer! Become a Certified Fabric Data Engineer

How to Access and Update Parquet Files in OneLake?

Helpful resources