Skip to content

ScalaUDAF — Catalyst Expression Adapter for UserDefinedAggregateFunction

ScalaUDAF is a Catalyst expression adapter to manage the lifecycle of UserDefinedAggregateFunction and hook it to Catalyst execution path.

ScalaUDAF is <> when:

  • UserDefinedAggregateFunction creates a Column for a user-defined aggregate function using all and distinct values (to use the UDAF in Dataset operators)

  • UDFRegistration is requested to UDFRegistration.md#register[register a user-defined aggregate function] (to use the UDAF in SparkSession.md#sql[SQL mode])

ScalaUDAF is a ImperativeAggregate.

[[ImperativeAggregate-methods]] .ScalaUDAF's ImperativeAggregate Methods [width="100%",cols="1,2",options="header"] |=== | Method Name | Behaviour

| <> | Requests <> to initialize

| <> | Requests <> to merge

| <> | Requests <> to update |===

[[eval]] When evaluated, ScalaUDAF...FIXME

ScalaUDAF has Expression.md#NonSQLExpression[no representation in SQL].

[[properties]] .ScalaUDAF's Properties [width="100%",cols="1,2",options="header"] |=== | Name | Description

| aggBufferAttributes | AttributeReferences of <>

| aggBufferSchema | bufferSchema of <>

| dataType | DataType of UserDefinedAggregateFunction

| deterministic | deterministic of <>

| inputAggBufferAttributes | Copy of <>

| inputTypes | Data types from inputSchema of UserDefinedAggregateFunction

| nullable | Always enabled (i.e. true) |===

[[internal-registries]] .ScalaUDAF's Internal Registries and Counters [cols="1,2",options="header",width="100%"] |=== | Name | Description

| [[inputAggregateBuffer]] inputAggregateBuffer | Used when...FIXME

| [[inputProjection]] inputProjection | Used when...FIXME

| [[inputToScalaConverters]] inputToScalaConverters | Used when...FIXME

| [[mutableAggregateBuffer]] mutableAggregateBuffer | Used when...FIXME |===

Creating Instance

ScalaUDAF takes the following to be created:

=== [[initialize]] initialize Method

[source, scala]

initialize(buffer: InternalRow): Unit

initialize sets the given InternalRow as underlyingBuffer of <> and requests the <> to initialize (with the <>).

ScalaUDAF initializes UserDefinedAggregateFunction

initialize is part of the ImperativeAggregate abstraction.

=== [[update]] update Method

[source, scala]

update( mutableAggBuffer: InternalRow, inputRow: InternalRow): Unit


update sets the given InternalRow as underlyingBuffer of <> and requests the <> to update.

NOTE: update uses <> on the input input and converts it using <>.

.ScalaUDAF updates UserDefinedAggregateFunction image::images/spark-sql-ScalaUDAF-update.png[align="center"]

update is part of the ImperativeAggregate abstraction.

=== [[merge]] merge Method

[source, scala]

merge(buffer1: InternalRow, buffer2: InternalRow): Unit

merge first sets:

  • underlyingBuffer of <> to the input buffer1
  • underlyingInputBuffer of <> to the input buffer2

merge then requests the <> to merge (passing in the <> and <>).

ScalaUDAF requests UserDefinedAggregateFunction to merge

merge is part of the ImperativeAggregate abstraction.


Last update: 2020-10-21