snthpy 2 days ago

Hi,

Having this sort of "table polymorphism" is something we've thought a lot about for PRQL and is definitely something we want to get right. That said it's not straightforward but you can do a lot of it already. You can try the following examples for yourself in the PRQL Playground (https://prql-lang.org/playground/).

First a simple example using functions as they are documented:

```prql

let fullname = func firstname lastname -> f"{firstname} {lastname}"

from customers

select full_name=(fullname first_name last_name)

```

Now the example above isn't quite what you're looking for because you still have to specify the columns as function arguments and there really isn't much gained here. It serves to illustrate the principle though as the `fullname` function could be doing something more complicated.

What you want is:

```prql

let add_full_name = func tbl<relation> -> (

    from tbl

    derive full_name = f"{first_name} {last_name}"

    )
from customers

add_full_name

select full_name

```

Now this requires the `<relation>` type annotation which hasn't been documented because it's still quite experimental. However this works right now and can be applied to different tables or relations, for example you could use the same function in the following:

```prql

from i=invoices

join c=customers (==customer_id)

select {c.first_name, c.last_name, i.total}

sort {-total}

add_full_name

select {full_name, total}

```

I'll add some more examples in child comments.

Disclaimer: I'm a PRQL contributor.

2
snthpy 2 days ago

Here's a function to normalize values relative to the column range:

```prql

let normalize = func x -> ((x - min x)/((max x) - (min x)) | math.round 2)

from tracks

take 5

derive {ms_norm=(normalize milliseconds), bytes_norm=(normalize bytes)}

select {track_id, ms=milliseconds, ms_norm, bytes, bytes_norm}

```

which produces the following SQL:

```sql

WITH table_0 AS (

  SELECT
    track_id,
    bytes,
    milliseconds

  FROM
    tracks

  LIMIT
    5
)

SELECT track_id,

  milliseconds AS ms,

  ROUND(
    (milliseconds - MIN(milliseconds) OVER ()) / (
      MAX(milliseconds) OVER () - MIN(milliseconds) OVER ()
    ),
    2
  ) AS ms_norm,

  bytes,

  ROUND(
    (bytes - MIN(bytes) OVER ()) / (MAX(bytes) OVER () - MIN(bytes) OVER ()),
    2
  ) AS bytes_norm
FROM table_0

-- Generated by PRQL compiler version:0.13.2 (https://prql-lang.org)

```

hobofan 2 days ago

Thanks for the thorough reply! Really glad to see at least one of the solutions working towards it, and I'll definitely check PRQL out again in more depth!