1/16/2024 0 Comments Partiql redshiftAmazon’s retail business, led by Chris Suver, was in pursuit of an SQL-like query language. Amazon’s retail business already had vast sets of semi-structured data, most often in the Ion format. We developed PartiQL in response to Amazon’s own needs to query and transform vast amounts and varieties of data – not just SQL tabular data, but also nested and semi-structured data – found in a variety of formats and storage engines. We look forward to the creators of data processing engines diving deep into PartiQL, and joining us in solving a problem that affects all users of data, across all industries. Outside of Amazon, Couchbase also looks forward to supporting PartiQL in the Couchbase Server. More AWS services will add support in the coming months. Also, Amazon EMR pushes down PartiQL queries to S3 Select. It is already being used by Amazon S3 Select, Amazon Glacier Select, Amazon Redshift Spectrum, Amazon Quantum Ledger Database ( Amazon QLDB), and Amazon internal systems. PartiQL solves problems we faced within Amazon. The implementation supports users parsing PartiQL queries into abstract syntax trees that their applications can analyze or process and supports interpreting PartiQL queries directly. ![]() The PartiQL open source will make it easy for developers to parse and embed PartiQL in their own applications. ![]() We are open sourcing the PartiQL tutorial, specification, and a reference implementation of the language under the Apache2.0 license, so that everyone can participate, contribute, and use it to drive widespread adoption for this unifying query language. As long as your query engine supports PartiQL, you can process structured data from relational databases (both transactional and analytical), semi-structured and nested data in open data formats (such as an Amazon S3 data lake), and even schema-less data in NoSQL or document databases that allow different attributes for different rows. Today we are happy to announce PartiQL, a SQL-compatible query language that makes it easy to efficiently query data, regardless of where or in what format it is stored. This is a very large obstacle to the agility and flexibility needed to effectively use data lakes. Hence, if you want to change your data to another format, or change the database engine you use to access/process that data (which is not uncommon in a data lake world), or change the location of your data, you may also need to change your application and queries. The result is tight coupling between the query language and the format in which data is stored. Every different type and flavor of data store may suit a particular use case, but each also comes with its own query language. Data may also reside in the data lake, stored in formats that may lack schema, or may involve nesting or multiple values (e.g., Parquet, JSON). Other data may be stored in NoSQL engines, including key-value stores, graph databases, ledger databases, or time-series databases. Some data may be highly structured and stored in SQL databases or data warehouses. The root of the problem is that data is typically spread across a combination of relational databases, non-relational data stores, and data lakes. Much of this data is intended to drive business outcomes but, according to the Harvard Business Review, “…on average, less than half of an organization’s structured data is actively used in making decisions…” ![]() Data is being gathered and created at rates unprecedented in history.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |