Stream Analytics is an on-the-fly service provided by Microsoft in their Azure platform, which allows real-time analytic computations on streaming data. It is more commonly referred to as ‘ASA’.

Those familiar with ASA will be aware of it’s scalability and efficiency but given one may not have used it or known of it before, I am giving a brief summary. Stream Analytics allows one to set both inputs and outputs for the job and the bulk of the logic/operation is done within a T-SQL Query, which makes up the Streaming job. The ‘Job Topology’ outlines this in the Azure Portal:

But how do we verify that the incoming streamed data is correct or valid? For such a purpose Microsoft offer a ‘Reference Data’ datatype which in simple terms is just a lookup table for cross verifying the incoming data. Reference Data is a finite data set that is usually static or slowly changing in nature. However, sometimes it may be required that Reference Data is dynamic i.e. takes into consideration the changing data that is incoming and can adapt for new values without the need of being manually updated by the user. Such was a requirement for a recent project I worked on; to achieve Dynamic Reference Data a query can be written thus (this is only an example to demonstrate, there are other ways this can be writen):

 

SELECT T.Value AS <Alias1>,

CASE

WHEN (<Value> LIKE <Value2>) AND (R.Column = <Value3> OR R.Column IS NULL) THEN <Value4>

WHEN …

ELSE <Value5>

END AS <Alias2>

INTO

<Output>

FROM

<EventHubInput> T

LEFT JOIN <ReferenceDataInput> R ON R.Column = T.Value

To re-iterate, the above is only one such example; which I have used in my client project for dynamic reference data. The CASE statement is what allows the data to adapt dynamically to the incoming data stream, the great thing is that you can have as many WHEN clauses as needed; enough to cover all scenarios and logic for your bespoke needs.

 

The ‘Test’ functionality in ASA allows you to check the validity of the query and verify the correct results are being obtained over a sample piece of data, so you don’t have to worry about whether the query will correctly run once started. Considering you have the output configured correctly, Microsoft Azure will take care of everything else, all you have to do is start the ASA job!

 

%d bloggers like this: