ScalaUDAF — Catalyst Expression Adapter for UserDefinedAggregateFunction¶
ScalaUDAF
is a Catalyst expression adapter to manage the lifecycle of UserDefinedAggregateFunction and hook it to Catalyst execution path.
ScalaUDAF
is <
-
UserDefinedAggregateFunction
creates aColumn
for a user-defined aggregate function using all and distinct values (to use the UDAF in Dataset operators) -
UDFRegistration
is requested to UDFRegistration.md#register[register a user-defined aggregate function] (to use the UDAF in SparkSession.md#sql[SQL mode])
ScalaUDAF
is a ImperativeAggregate.
[[ImperativeAggregate-methods]] .ScalaUDAF's ImperativeAggregate Methods [width="100%",cols="1,2",options="header"] |=== | Method Name | Behaviour
| <
| <
| <
[[eval]] When evaluated, ScalaUDAF
...FIXME
ScalaUDAF
has Expression.md#NonSQLExpression[no representation in SQL].
[[properties]] .ScalaUDAF's Properties [width="100%",cols="1,2",options="header"] |=== | Name | Description
| aggBufferAttributes
| AttributeReferences of <
| aggBufferSchema
| bufferSchema of <
| dataType
| DataType of UserDefinedAggregateFunction
| deterministic
| deterministic
of <
| inputAggBufferAttributes
| Copy of <
| inputTypes
| Data types from inputSchema of UserDefinedAggregateFunction
| nullable
| Always enabled (i.e. true
) |===
[[internal-registries]] .ScalaUDAF's Internal Registries and Counters [cols="1,2",options="header",width="100%"] |=== | Name | Description
| [[inputAggregateBuffer]] inputAggregateBuffer
| Used when...FIXME
| [[inputProjection]] inputProjection
| Used when...FIXME
| [[inputToScalaConverters]] inputToScalaConverters
| Used when...FIXME
| [[mutableAggregateBuffer]] mutableAggregateBuffer
| Used when...FIXME |===
Creating Instance¶
ScalaUDAF
takes the following to be created:
- [[children]] Children Catalyst expressions
- [[udaf]] UserDefinedAggregateFunction
- [[mutableAggBufferOffset]]
mutableAggBufferOffset
(starting with0
) - [[inputAggBufferOffset]]
inputAggBufferOffset
(starting with0
)
=== [[initialize]] initialize
Method
[source, scala]¶
initialize(buffer: InternalRow): Unit¶
initialize
sets the given InternalRow as underlyingBuffer
of <
initialize
is part of the ImperativeAggregate abstraction.
=== [[update]] update
Method
[source, scala]¶
update( mutableAggBuffer: InternalRow, inputRow: InternalRow): Unit
update
sets the given InternalRow as underlyingBuffer
of <
NOTE: update
uses <input
and converts it using <
.ScalaUDAF updates UserDefinedAggregateFunction image::images/spark-sql-ScalaUDAF-update.png[align="center"]
update
is part of the ImperativeAggregate abstraction.
=== [[merge]] merge
Method
[source, scala]¶
merge(buffer1: InternalRow, buffer2: InternalRow): Unit¶
merge
first sets:
underlyingBuffer
of <> to the input buffer1
underlyingInputBuffer
of <> to the input buffer2
merge
then requests the <
merge
is part of the ImperativeAggregate abstraction.