# Architecting AI for Industrial Determinism and Reliability

**Podcast:** Software Architektur im Stream
**Published:** 2026-04-20

## Transcript

Hier ein Hinweis in eigener Sache.
Softwarearchitektur im Stream ist live vor Ort.
Wir sind beim TechRider Summit im Juni in Köln dabei.
Mehr Infos dazu und einen speziellen Rabattcode für unsere Community findest du auf unserer Website.
Sei dabei, stell Fragen und komm auch gerne auf uns zu.
Nikita, kannst du zuerst sagen ein paar Worten über dich?
Ja, hi, hallo.
Vielen Dank für die Introduktion.
Vielen Dank für mich.
Mein Name ist Nikita und ich arbeite als AI Portfolio Architekt.
Ich bin Teil von Siemens.
Und in Siemens, unsere Team und ich sind mainly responsible für bringen AI zu der Factory Floor in der Welt der Welt.
train models in the cloud, how to deliver this model to the factory floor, and we just maybe more interested how to make those models run to inference in an edge environment with limited computational resources.
And yeah, that's what I'm mainly doing.
And by the way, I have around 16 years of experience as an architect or maybe in...
whole software development sphere in different roles.
So we are rather interested in this topic.
And thanks again for inviting me.
Yeah, thank you so much for showing up and taking the time.
We had a conversation about this topic at some conference and I thought it was quite interesting to listen to a practitioner who is actually doing that AI stuff for quite some while, while in the domain that I usually work in, which is enterprise IT, it seems that we have yet to all the problems and also the solutions.
And I think that's, I hope, the worth that this episode will bring.
Short announcements.
So Nikita and also me, we will both present at the TechRider Summit in Cologne.
Hürt, actually, one of the suburbs of Cologne.
And we have...
So as a matter of fact, you can actually attend for free.
And the subject here is related to what you're going to present at TechWriters, but it's not the very same content.
So there is a relation here.
And maybe it also lets your appetite to go to the TechWriters Summit.
I think it's going to be a very interesting conference.
It's the first time that I'm there.
And it's the first time that they actually have a day about technical stuff.
They also have a day that is more about business stuff.
So it's, I think, a good mixture of different subjects.
And I'm really looking forward to that.
Yeah, definitely.
Okay.
So let's start with the sort of basics.
So everyone is talking about large language models.
And it seems that AI is sort of a synonym for many for large language models.
So what are large language models actually?
Ja, von meiner Perspektive, es ist ein großes Problem, dass alle, die über AI sprechen, unter den AI-LLMs und generative AI-LLMs.
Aber von meiner Perspektive, LLMs ist, first of all, ein Probabilistik-Engine, ein Probabilistik-Mechanismus.
Das ist einfach nur die Input und die Ausführung mit einer Art von Probabilität.
Und das ist wichtig, dass LLMs nicht ein Computing-Fact ist.
Sie sind nicht ein New-Knowledge-Knowledge-Knowledge-Knowledge-Knowledge-Knowledge-Knowledge-Knowledge-Knowledge-Knowledge-Knowledge-Knowledge-Knowledge-Knowledge Interpolate, as any AI model patterns it saw before.
And yeah, we should be, not even as architect, but as engineers should be rather careful with terminology we are using.
And from my perspective, we even should try to, I don't like this word to educate, but try to increase awareness of, I don't know, stakeholders.
Are the engineers around us about different types of AI and when each of these should be used and can be used?
So Sir El Grey just said they are synthetic text generators that are sometimes useful.
And I was just reminded about an episode that I did that discussed a scientific paper, like really proper science, that discussed how LMS are actually bullshit in the sense that they provide text without any...
ob ob ob ob ob ob ob ob ob ob ob ob ob ob ob ob ob ob ob ob ob ob ob ob ob ob ob ob ob ob ob ob ob ob ob ob ob ob ob ob ob ob and which just immediately answer your question.
And this answer, it's not mandatory, will contain relevant information or useful information to you or maybe even the answer to your question.
And it's also, I think it's a great point that you point out that it's only the written book because the experience that we have goes far beyond that.
And I think that's important to note.
Ja, es ist interessant, von dieser Perspektive, was es geschrieben und die Erfahrung wir haben in unseren Händen.
Ich habe auch ein interessantes Beispiel über COBOL.
Ein paar Monate ago, da gab es eine Messe von Antropik, die gesagt, hey, Leute, wir brauchen diese COBOL-Developers für Immigration, weil wir eine SuperCole COBOL-Coding-Agenten haben und alles sollte sein.
Later, I saw another article, which we're referring to this one, and one of, let's say, old COBOL developers wrote, hey, guys, the problem is that the majority of COBOL code is not publicly available.
This is either in somebody's head or closed repository.
That's why you cannot judge, you cannot promise.
It will work for demo, but for the rest, it's the question.
So Eric Gray on YouTube just said, I like the mirror analogy.
Yeah.
So, but I mean, the reason why I really wanted to do this episode is because of your experience with industrial applications in the industrial context.
So can you explain why LMS fail in industrial applications and what kinds of applications that actually would be?
So first of all, Industrial applications, based on the main assumption that everything should be deterministic.
And the core of, let's say, each industrial automation, industrial application, it's so-called programmable logical controller, which at the end of the day gives you a concrete answer whether you need to stop your line or not.
And it cannot say you probably need to stop.
And the case of LLMs, they cannot...
Es gibt diese Guarantee der Determinismus, denn sie sind probabilistisch von Natur.
Mehr als auch, ihre Results sind nicht reproducible.
Und es ist wohl-known, dass das gleiche Inputs gibt verschiedene Outputs.
Und diese Outputs sind nicht, sagen wir, transparente.
Sie sind nicht understandable und reproducible.
Natürlich gibt es eine Reasoning Mode in LLM, aber es ist nicht reale Reasoning.
I don't know, tracking or logging, what kind of sources LLMs used to came to this decision.
And another interesting fact, it's named as a confidence illusion.
For instance, if a model, say, uses it with a confidence 94% this is a defect or something else, it does not really mean that This answer is correct, because probability is just confidence, it's just a level of probability.
Model believes in this output and model just trying to make a pattern matching, nothing else.
And also what is important in any Use case where you use an AI so-called drift or maybe even silent drift.
For instance, environment, user behavior change, economic conditions change, but model stays the same.
Training data becomes outdated.
And in this case, your model will still provide some kind of outputs.
You will not see no exceptions, no alerts, no messages, notification to the human being that something was breaking.
And which from the first perspective looks good, but at the end of the day, it's totally corrupt, totally unworkable scheme under the hood.
From my perspective, let's say, of course, there are a lot of different problems why we cannot use LLMs.
in industrial use cases, for instance, because of their performance, because of their science.
But it's a technical limitation, but I'm mainly talking about, let's say, ideological or conceptual limitation, why it's not a good place.
So first of all, Grace said they don't even give you real probabilities as opposed to regular ML machine learning over distribution.
So that's, I think, what you were talking about concerning confidence.
What I'm wondering is, I mean, I can totally see, so I'm a layman concerning industrial applications, obviously.
So I can totally see how you can have some kind of, what is it, like a system that says, okay, I take a picture of that screw and I figure out whether that screw is a good one or a bad one.
So I see an application for these kinds of, you could call them AI systems.
However, LLMs are, as we said, text-based.
So can you give an example where you actually have tried to apply them and where you failed?
And the other question is, what is some potential theoretical application if you didn't really try that in practice?
Here I would propose to have a split, again, between LLMs and Generative AI in general, because, for instance, Generative AI, it's a big family of models which provide the new content.
It can be images, it can be text.
LLMs, it's a subset of this Generative AI.
Considering LLMs, one example, for instance, we have a system with a combination of edge computing and the cloud computing, for instance, edge part of this system is responsible for defect detection.
It's a computer vision model with a computer vision AI model under the hood, which classify a type of defects on the surface of an end product, where these defects are placed, situated just as a simple, not simple, but to some extent classical machine learning AI system, which at the end of the day provides machine-readable output with the type of defect, with the position of defect, and this information goes to the cloud, where, for instance, RAC together with LLM sits, and based on this information and based on documents we have inside RAC database, this LLM provides some kind of reasoning or root cause analysis why one or another defect happens.
It's one example.
Or another example, considering the general application of generative AI, since all factories and factory automation, the main idea of factory automation on industrial automation is to prevent any kind of failures.
Sometimes it's rather difficult to have enough examples of failure modes.
That's why generative AI definitely can be used to generate synthetic data.
representing these potential failure modes, not, let's say, not affecting the real industry, the real, I don't know, aggregates.
It's another one of examples how generative AI can be used.
Okay, so with regards to the root cause analysis, it's actually what it sounds to me as if you're trying to build a system that actually sort of thinks about or tries to figure out the root cause.
Did you actually try that and fail, or is it rather that you said, well, this is not text generation, this is something different?
Because at the end of the day, it's about thinking about root causes.
So did you actually try that and fail, or did you rule out that option from the beginning?
It can be definitely an option.
I tried, it works.
And in this case, it's definitely...
Yes, it sounds like a reflection, a root cause analysis, but at the end of the day, it's a text generation because when a downstream system sends to upper one some kind of description of a defect, what RAC system or LLM system in the cloud is doing is just go into the database, get relative documents and try to summarize and try to say, hey, Dear domain user, you have a defect because you need to change the, I don't know, conditions in zone 1, 2, 3, 4, 5.
It's not, again, it's not generating a new knowledge.
It's not reflecting existence.
This is just trying to summarize and try to provide a summary based on documents which already exist.
So it seems like a backup solution.
Yeah, I'm at the moment currently working on, let's say, prototyping this solution.
I don't know, I will share this one in my repository, but from my perspective, it can be a good illustration, not even for industrial automation application, but in general, how to have a clear split between different types of AI and to show that AI is not even LLMs, not only LLMs and not only generative AI, there are different types and how these types can work as a part of architecture and can interact with each other.
Okay, but what you're saying is, So, what you say, does that mean that you should be careful with LLMs and how they will hallucinate and how they are probabilistic, but that doesn't rule out that you could use that.
And, you know, if you have some root cause analysis and the root cause analysis, you have to be aware of the fact that this might actually be wrong, but that doesn't necessarily stop you from actually doing that.
I think that probably sums it up.
Definitely, definitely.
I'm not totally saying that you must not use LLM.
So the main message definitely is as an architect, you need to find a better solution for your system.
But first of all, you should be aware of this hallucination.
And as architect, the main question you need to answer or you have to answer is what your system should do in case your AI part is not...
So you would need to ask the question, what happens if the root cost analysis is wrong?
And then from this perspective, maybe trying to map this stuff to known architecture practices, we can even name Or we can even mention that AI, it's one of good practices.
AI can work in a separate bounded context with respective anti-corruption layer and with all this respective.
We can, of course, deep dive in these details, but with some kind of isolation and some kind of layer which will protect your domain from this probabilism and this, let's say, bad side of AI stuff.
And it reminds me of the discussion that we had, I think, like 14 days ago with your colleague Michael Stahl, Professor Michael Stahl.
And he was also working at Siemens and we were talking about the architecture analysis tool that he built.
And he also mentioned that you have to be aware of that whatever that system spits out concerning the architecture, it analyzed.
that you have to be aware of the hallucinations and that you must sort of double check it and these kinds of things.
So it seems to be the same, sort of the same thing in a different area of application for LLMS.
Definitely, because at the end of the day, AI is just, we can think about all AI stuff just as an additional block with inputs and outputs with some kind of peculiarities which we should consider.
with building contracts and interfaces around this specific block.
Okay.
And you already mentioned Generative AI, GenAI.
So how are LLMs different from GenAI?
And how do you use them in industrial applications?
I already said that LLM is just a subset of Generative AI.
Generative AI is mainly responsible for generating new...
Content, not a new knowledge.
And from an industrial perspective, generative AI can be definitely used for generating new data, synthetic data, LLMs for reasoning, LLMs for helping end users to make a decision to analyze a big amount of data.
But the crucial point, as I already said, that LLMs part and all this, let's say, The most probabilistic part of your system should be isolated from the control group where you're affecting the physical reality around human beings or your production.
Okay.
So what other types of AI are there apart from generative AI?
So we are stepping up the...
Maybe the most important is so-called classical AI models, which can be illustrated with decision trees, random forests.
So the main advantage of these classical models is that they are interpretable.
For instance, in the case of a decision tree, it's at the end of the day.
Of course, it's a simplification, but at the end of the day, it's some kind of tree.
which shows a lot of rules, a lot of conditions, and what will happen in case.
It's a big if-else tree.
And it can be reversed and you can see how your model makes a decision in which case it has this separation.
It's a classical.
The second big pillar is computer vision model, which is also not...
A generative AI, these systems or these models are mainly responsible for analyzing images.
It's object detection, classification, segmentation, all this stuff connected with visual representation.
Rule-based, it's more strict systems.
Maybe they are not even...
We cannot even name them as a part of AI, but AI is much wider than ML.
It has some kind of, I don't know, genetic algorithms and so on, so on, so on.
But yeah, the most popular is classical models and computer vision models.
And computer vision models, I think we already discussed them.
And that was also part of our original discussion at that conference where you were talking about how you would find defects in products.
Like your system would look at it and figure that out.
Can you give an example of a decision tree?
For instance, decision tree, good example, it's a system, bank system, which is responsible for fraud analysis or for giving loans.
So there is a number of rules which are used in a bank based on which Bank or finance organization can make a decision whether it makes sense to give a loan or not.
And the big amount of these rules can be implemented, can be absorbed by decision tree.
And the main advantage, why I'm giving an example of a bank, because, for instance, in the case of bank systems, in the case of financial systems, this affects human beings.
And we have a GDPR.
And according to the...
Artikel, I suppose, 22 of GDPR, each person who is affected by AI should be able to get a clear explanation how AI interprets him and why AI makes one or another decision.
That's why such kind of classical models are crucial and widely used for banks and for this organization, which are under some kind of regulations.
Okay, so that makes a lot of sense.
Now, I would argue that when you apply for a loan, as you gave as an example, and there are some business routes to either give you the loan or not give you the loan, I could argue that this is actually business logic.
And I could just write it down in Java or whatever language I prefer.
So, where is the line between calling this business logic and calling this an application of a decision tree or a rule-based system and considered part of an AI?
Definitely, it's a number of rules you need to absorb, you need to implement as a part of your logic.
When you are using a classical deterministic logic, you need to find the barrier.
So you need to find if this, I don't know, value five, then make this decision.
In case of AI, you have a capability to retrain your model and you don't need to care the concrete value.
For instance, you have an initial data set, which represents, I don't know, some kind of distribution on top of this data set.
You can train decision.
which will, at the end of the day, give you some kind of list of...
Again, it's a simplification, but this tree can give you a list of if-else, which can be implemented if you really need it as a list of your business logic.
But next time, when you have a new distribution or when, for instance, you have some kind of shift, which we saw in COVID, after COVID, we have totally different landscape.
You need to reimplement this logic.
But for ML use cases, you just need to collect enough data and just retrain your model and have the same transparent if-else logic, but which automatically absorbs all variation.
Okay, so what you're saying is that instead of writing the logic, actually I train the system and then the result is logic and I don't need to look at the...
In der Logik itself.
My assumption would be that if there are gray areas, like if it's not really clear cut and I need to make a decision that is more intuitive, I'm not sure whether I can come up with an example.
I guess we all know, we look at some diagram about some architecture and we are like, this is somehow weird, but we can't really figure out why.
So I assume that this would be something that these gray areas and more intuitive things, this is probably something where this approach excels or is it not?
Yes, definitely.
It's another example.
For instance, if you have some kind of example, which in the intersection of your logic, AI approach or these classical machine learning models can help you to say this.
With a high probability, this concrete example, this concrete, I don't know, instance belongs to class A, class B, class C, because and constructs the logic rather transparent and business-oriented because of what.
But this is applicable only in case of classical models like decisions, random force, but in case of LLMs, of course, you can also make this decision process transparent, but There is no guarantee that this decision process, this reasoning process will be logical, because you can sooner or later face a problem that it depends on temperature, I don't know, temperature somewhere because of statistics.
Yeah, so a great just said another use case financial forecasting.
Yeah, definitely, definitely.
It's about absorbing the patterns, because sometimes for human beings it's difficult to find the respective patterns, the respective, let's say, trend in time series data.
And from the perspective of classical machine learning models, it's one of good examples.
It can identify this pattern and try to extrapolate this one to a bigger range.
Yeah.
I'm not sure, because that thought just crossed my mind, and I'm not sure whether you have a good answer for that.
But I understand that if you look at x-ray images, systems are supposed to be better than humans.
But somewhere I heard that in real life, humans are still preferable for one reason or another.
Do you have any idea about this or how this works in practice?
Because I could imagine that, you know...
And maybe that's, and it's also something that applies to your domain, because if I look at something as a human, that's different from what the system does.
So what's your impression there?
Is humans still superior or is the AI system superior also in practice?
Would you even have people that still look at the manufactured items to figure out whether they are defect or not defect, or is it all automated?
From my perspective, definitely we need a human in the loop.
For instance, in these medical use cases, when we talk about X-rays, yes, computer vision models can analyze these X-rays better than a human being.
But again, all AI, they are tightly coupled to patterns.
For instance, AI will see, I don't know, a dark part of this X-ray and they say, probably this is a cancer.
Aber ein Mensch sollte ein Feedback geben.
Ein Mensch oder ein Doktor, der in diesem System ist, sollte andere Anwendungen, andere Tools, um zu identifizieren, ob es ein Problem ist oder nicht.
Deswegen brauchen wir ein Mensch in der Lübe und in der Lübe, um die Feedback zu geben.
And you also do that in industrial applications.
So I assume that, you know, you have mass production and then there is defect detection and the system says, okay, this is something that is defect and then you still have a human look at it, like at everything?
Yeah, definitely.
For instance, a good example is so-called Simplex architecture when we have three blocks.
One block is AI model or AI part.
The second one is some kind of deterministic business logic or old school logic, which...
Based on if-else.
And there is a monitor, which first of all monitors the behavior of the model, the probabilities, the confidence as the output of the model.
And also, for instance, which is this model is monitoring inputs.
And in case monitor will see that distribution of inputs is something new, which model doesn't suspect to see or model behaves in a not confident way.
Then it will switch to a deterministic part or it will switch to a human being to involve, I don't know, reasoning and just, yeah.
So I agree.
I just said, my guess, as someone who actually did research in that area, finding things is easy, creating a diagnosis is harder.
Yeah, for instance, diagnosis, in case of giving a diagnosis, Usually diagnoses are not only given on x-ray, only on one, some kind of research.
You need to consider the whole illness history, the whole, I don't know, even behavior of the patient, which is out of the scope, out of the sense of sensing of concrete model.
Yeah.
And I had to smile when you were talking about how things that are not in the training data are actually a problem.
Because there is this video on YouTube where there's a Tesla driving down the street and, you know, there is blue lights all over the place.
And it's quite apparent that there is police or the fire brigades or whatever.
And the car just won't stop and won't slow down.
And it seems the explanation is that this happens so seldomly that it's not really in the training data.
And then the car would just happily drive on with high speed, and obviously that leads to an accident.
So that's what came to my mind.
For instance, it's the same illustration about these corner cases.
For instance, all these Tesla models and initial versions of these models were trained in a laboratory in some kind of, I don't know, sanity conditions, and it has never even seen these corner cases.
And Generative AI can be used to generate these corner cases, these unusual cases of data, just to challenge your model to, I don't know.
Yeah, I mean, so I have to tell that anecdote.
So when I was at university, there was that one person who reported about a system that was supposed to detect tanks like panzers.
And they said that it was quite successful.
Until they provided the system with real data and it failed big time.
So the system was supposed to find out whether there was a tank on that picture.
And it was very successful until they tried to use it in real life.
And then they figured out what they really had was a system that could figure out whether the weather was sunny or overcast.
Because all the pictures with the tanks were taken on a sunny day.
And the ones without were not taken on a sunny day.
There you go.
Exactly.
It's also a good example, which we usually see in our industrial use cases.
In these computer vision models, when a model was trained in a laboratory with one lighting conditions and in the factory floor conditions are totally different.
And more than ever, these conditions are changing during the day.
And as a part of mitigation, you need, as an architect, provide another abstraction, another mitigation block.
which will be aware of this lighting condition and try to, I don't know, increase lighting, decrease lighting.
And it's about this monitoring block, part of this monitoring block, which can be also responsible for validating, for checking the distribution of inputs and so on.
So I agree that Facebook personal recognition used to identify people by sweatshirts or T-shirts.
So what's the impact of that probabilistic nature that we have in, it seems, all of AI on the integration into the architecture?
Definitely we need to mitigate this probabilistic.
We need to find some, we need to build some kind of bridge between probabilistic and deterministic and a good example, a good, yeah, how we can do it.
First of all, we should use Some kind of gateways, as I already said about this simplex architecture with a model which is responsible for monitoring AI.
For instance, one approach you need to use an AI gateway, or let's name it just a gateway, which will check the confidence score, the probabilities, the parameters of a model, and in case if a model is not confident, throw this to a more deterministic counter.
It's just like we are using the classical API gateways, but instead of HTTP headers and, I don't know, the content of the packet, we're just using the metadata provided by a model to decide to which roads this decision, to which roads this, I don't know, flow should go.
It's the first.
The second...
Usually models are not failing on one case, it's failing on, I don't know, sort of cases.
And in this case, we can use, for instance, again, as a classical example, circuit brake.
So if you see that your model is failing, you need to, I don't know, to switch to another road and from time to time try to check whether it's, again, So from the perspective of Circuit Breaker, it's also part of this maybe gateway.
And of course, governance.
Everything you made, every decision, every outcome, output your model provides should be locked, should be documented, not only from the perspective of what model says, but all metadata, what kind of inputs, what kind of outputs and versions.
It can be also a good evidence for your auditors for this trying to fulfill requirements of UI-AI Act.
Yeah.
So what you're saying is that you need to have a fallback mechanism that won't use AI and to go to that one.
And Circle Breaker, we should mention that probably it's with this.
Es ist ein ziemlich famous Pattern, wahrscheinlich von der Resilienz-Space, und die circuit-breakers, die in einem elektrisch-circuit, das würde den circuiten, so dass die Haus nicht verbringt, weil es ein Schott-Circuit ist, und dann die Lichts gehen, die Power gehen, und dann, aber die Haus nicht verbringt, weil es die Schott-Circuit wird, weil es die Schott-Circuit wird.
And now this is something that you would also use for resilience in systems because you take one part of the system down so that you make it more resilient and it doesn't crash.
Instead, it has some time to recover these kinds of things.
And yes, good news that considering these patterns, we see that We should use the same patterns, the same approaches we already used in our classical software design, but with application to AI use cases.
And this AI use case does not differ so much.
It's just another block.
Of course, it's again a simplification, but it's another block with peculiarities, which can be wrapped with the same pattern, with the same resilience.
with the same routing practices, high load balancing and so on.
Yeah, I think it's quite interesting that you're referring to the very same patterns.
One thing that I'm wondering about is, so if you use these circuit breaker, for example, for resilience, it's quite obvious that the other system failed because, I mean, there is a measure that says, you know, it returns an HTTP 500, so obviously it's failed.
Or it won't respond at all.
So obviously it failed.
However, with AI systems, you are referring to that confidence score.
And I can see how that somehow gets calculated by the model.
However, that's just a probability.
So is that enough in your experience?
So what I'm trying to say is you're trying to figure out whether the system fails.
You're trying to figure out whether the system fails, by asking the system itself.
Now, there might be ways the system fails, and it wouldn't say it fails, it's just highly confident, it's just complete utter nonsense that it tries to do.
You just need to monitor, you just need to check inputs, so you are listening to the system, the AI system itself, what kind of outputs this system gives.
Also, you need to analyze what kind of data is used as an input for your AI system.
For instance, as part of this analysis, you can see that this data is nonsense or this data you know that your AI model has not seen before, that it totally makes sense to switch to another, to deterministic to the human in the loop, just to make a decision.
AI at the end of the day will just give, as I already said, we just give a probability.
Even if this model had not seen this pattern before or this data before, it will try to generalize and provide and say, from what I've seen before, it looks like this one.
But it's, yeah, that's why you need to monitor data you're using as an input also.
Yeah, it's just that I'm still sort of trying to wrap my head around that.
Because, and I have to think about that one example where there was, I think it was an LLM system that was supposed to run a vending machine, or it was even supposed to run a shop, I'm not entirely sure.
And at the end of the day, that machine was, or that LLM was talked into providing stuff for free and would run the whole business into the ground.
Now what you're saying is, or I think that's what you're saying, this is sort of, You shouldn't do that.
You should have some checks and balances in place that says, okay, this is a decision that it shouldn't make.
And you should have some sort of supervision around that to provide these kinds of things.
So then you would need to implement something that says, okay, you're not going to give out things for free, sort of a set of rules to check that.
And I'm just wondering why.
They didn't think about that because it seems sort of obvious.
I suppose they were mainly, again, it's my guess, they were mainly driven with this AI hype.
Let's use LLM.
It will solve all of our problems.
First of all, even from the architecture perspective, it's a crazy idea.
It's a very bad idea to put such kind of generic model as your...
Decision Engine, because at the end of the day you don't really know how this decision mechanism works inside this system.
And even, for instance, from a business perspective, your stakeholders would like to use this crazy idea with LLM as a decision engine.
You should, as an architect, try to mitigate consequences.
For instance, as you give an example, calculate some kind of check sum of behavior of recommendation, or maybe use some kind of classical ruling gene, whether decision makes sense or not.
So first of all, my recommendation will be try to affect their decision not to use LNM as such kind of foundation, genetic model.
Yeah, and that was also the reason why I asked whether you're also considering a human in the loop for failure detection, because it's Or should I put it, if there is a failure that is detected and in fact that part is actually okay.
I think it's not such a huge problem, but if you already do these kinds of checks in these kinds of environments, you should consider doing them at the other example that we gave.
Well, you should really, really consider that.
So Sir Elgray just said the case was from Anthropic.
So they tried to completely free and failed.
So thanks for that information.
Part of their marketing, I suppose.
Yeah, yeah, probably.
And then he also said, yeah, this, since it's basically statistics, ensure that the statistic probabilities assumptions are still valid for classical ML.
Yes, definitely.
So yeah, you can even evaluate whether one or another behavior is statistically valid or it's just...
Es ist in der gleichen Statistik-Range, wie es funktioniert.
Ja, ja.
So, als Architekt, wenn man nicht verändert, wenn man nicht verändert, wenn man nicht verändert, wenn man nicht verändert, wenn man die Stakeholder macht, muss man versuchen, die Verständnis zu ermöglichen.
Man kann sich versuchen, eine Sicherheit zu entwickeln.
Man muss sich, first of all, isolieren das aus dem Rest des System.
And then try to use, I don't know, interfaces and mitigation measures how to.
So talking about it, maybe that's a stupid question, but is there such a thing as a confidence level?
Also, if you use an NLM, because I've never seen that.
I mean, if I use ChatGPT, it says, this is the answer, deal with it.
And it said, well, but my confidence score is, I don't know, 40% or whatever.
There is a matrix, as far as I remember, named perplexity, and this matrix shows...
I can't be wrong, but this matrix shows the probability of...
So there is a list of words that should go next, according to this text, and perplexity is a probability.
Whether LLM will give a definite word as its output, something like this.
But yeah, its name is another, but logic is again about confidence, probability.
It's a probabilistic, not even probability, probabilistic.
Yeah, because I'm wondering if you rely on that metric, then you need to have that metric before you can start building that.
The problem is that, yes, you are right, but this metric can be how I can calculate this matrix for a text.
I can only calculate this matrix, for instance, if I take all existing books and just calculate a probability that one word will go after another and use this probability to measure.
But this...
Probability brings no value for my business, for my concrete business domain.
So Sir L.
Grey just said, nope, they don't really have it.
That's a huge problem, at least from the API surface.
And he also said, I'm not sure what he is actually referring to.
New models have solved exactly this case, by the way.
So I assume that he is still talking about that vending machine on business that is run by an LLM.
So you came up with an extension to ARC42, that famous architecture documentation system standard for AI.
And that is also something that you're going to talk about at the TechWriters event.
So can you say a few words about that, what it actually is, why you would use it, these kinds of things?
Yeah, definitely.
So ARC42 is a...
super cool and agnostic framework to document classical architecture.
But the problem is nowadays we have a lot of systems which has AI in the hood.
And the problem is that Arc 42 does not cover this AI stuff.
So description of how definitely we can share.
I have a separate repository.
We can share this template and full description of this extension.
But my idea to propose an extension to ARC42 zu haben, die gleiche, standard und klassisch, gut-dokumentationen approach, wie AI-Architecture sollte.
Denn heute haben wir natürlich verschiedene Instrumente approaches, wie wir, ich weiß nicht, wie wir documentate Modul Cards und so weiter.
Aber die Idee ist, dass wir nicht nur eine Solidarität haben, sondern eine Solidarität.
how we should document in order to architects to understand each other, in order to be on the same page, in order to have the same approach.
It's the first.
And the second one, this extension is mainly driven by necessity to fulfill requirements of AI-UA Act.
Because as I already said, once you have a model, you need to make this model transparent.
It's a process, how it makes decisions, how you use it.
It's again about making the transparency, bringing the transparency about how models behave, how its models work on inference time.
It's very short.
But I suppose if we can share my repository, listeners can find more details.
I will definitely put it in the show notes.
And also I will put it in the chat right away.
So, can you give me an idea what kind of additional artifacts there are in that Arc 42 extension?
Like, is there another chapter or are there other types of diagrams or what does it contain?
What would these things actually explain or is it a completely new approach where you have completely different chapters?
The main approach is to extend existing chapters.
So as a part of this repository, listeners can find a clear description of new ideas.
So from the perspective of this AI extension, I'm mainly talking about four main views.
The first view describes, it talks about data.
So we need to document from which sources we take data, how we take this data, how this data was transformed.
The second view is about model behavior.
So we need to make model behavior the logic how models reasoning transparent.
The third view is about how model behaves in the runtime, how we deploy, what should we do, how model is retrained, how model is collect feedback, how it's deployed, what kind of CI, CD pipelines it contains to be delivered.
And the third one about risks.
Maybe it's the biggest extension, because we already have risks in Arc42, but this extension about risk explicitly lists new risks which AI brings to your architecture.
It explicitly asks you to mention who is owner of this risk, how you are planning to mitigate this risk, and this definitely will be very good.
help for everybody who is planning or who is have to fulfill these requirements of AI Act, which we have in 2026.
And at the end of the day, these five views are split.
Each of you contains subsections or maybe some kind of subtopics.
And at the end of the day, these five views are distributed as extension to each chapter.
So at the end of the day, it's the same structure, the same.
12 chapters, but they are extended with new tables, with new, I don't know, artifacts.
It's not really new artifacts because artifacts are the same, maybe a new description, new representation or new detailization.
Yeah, so thanks a lot.
That's very helpful.
So you were talking about how you would document the model.
And what you did to the model, how you trained the model.
Now with the most common things that are done these days, the model would be, you know, an LLM that is provided by Anthropic or OpenAI or whatever.
At least that's my assumption.
So how is it useful if I have such an application where I'm using that provided model, that LLM that is already given to me?
Again, the first question will be, as an architect, do you really need to use LLM as your AI model?
If yes, then my recommendation will be try to use a local one, because nowadays we have a rather major infrastructure to deploy your LLMs, open source LLMs on your local content.
For instance, even if you are a fan of wipe coding, you can create a local infrastructure with QIN model, which can help your developers to wipe code locally.
So first, Have a local LLM, try to solve your problem.
And in case if you really need, let's say deep, let's name it reasoning or deep capacities, you can use some kind of road gateway on top of your infrastructure and wrote this request to a bigger model hosted by Entropic.
I don't know.
OpenAI und so on, so on, so on.
Okay.
Yeah, I would say I'm against because of costs, because of, let's say, necessity, because of security.
So if you really believe that LLM is what you need, and even from the perspective of prototyping, you can just deploy it locally, use, I don't know, Lama and all this, I don't know, LangChine.
So we have rather wide and, as I said, mature.
Infrastructure, Environment, all sets to work with these LLMs locally.
Okay.
And you also say that there is an impact on AI on hexagonal architecture.
So what is hexagonal architecture?
What's the impact of AI on it?
Yeah, definitely.
As I already said, the good idea is to isolate and maybe the main requirement is to isolate AI from the rest of the system.
And in this case, we can use Hexagonal architecture and funny fact that you can consider your AI and your domain as separate hexagons interacting with each other through the respective ports.
And in case of AI as a center of hexagonal architecture, you will have ports.
Each port will be responsible for a respective input or output of your, not even input and output of your model, but input and output of Model as an artifact.
For instance, input can be, if we are talking about computer vision model, it can be, a port can be image input and adapters can be different cameras or file storage as images input.
As an output, you can use a port.
which will be responsible for the concrete engine, responsible for inferencing this model.
So you have an agnostic artifact, for instance, ONIX model, which can be inferenced on TensorRT through the respective adapter on the ONIX, and so on, so on, which gives you, at the end of the day, replaceability of your model when you need to retrain your model, when you need to change your model, when, for instance, you work with a...
Classical AI and suddenly decide to replace it is LLM.
You just replace the artifact, but the ports and this specification and contracts stay the same.
So it's try to isolate.
And the same about the classical approach.
So you use your domain with the ports and you have ports to communicate with domain and with a model.
Yeah, it's mainly the idea.
I would say propagating or trying to...
Talk about how hexagonal architecture can be used.
And again, hexagonal architecture is something we already know for, I don't know, for decades.
And just use it for...
Yes, apply the same ideas just for different things.
So that's great.
And it shows how the fundamentals don't actually change.
So here is a question by Sir Gray.
So he asks...
How do you deal with model versions from vendors which basically change monthly?
And I should add that actually in our stream, Ralf did the transcription and he was using some LLM to do that and also the sum up and so on.
And then eventually the model that he used was outdated and wouldn't be available anymore.
So we got a new one and the new one behaved quite differently.
So therefore the The sum-ups and the bullet points that the system concludes from the episodes were different.
I think that's the problem that Sir Gray is referring to.
So eventually you have a new system, a new version of an LLM that behaves very different for the same input and then you have a problem and that is something that we also have in architecture or in software development in general where usually we try to pin down our dependencies to the last dot, to have precisely that one, to have precisely rebuildable builds.
And obviously, if there is some LLM, that won't work.
So I guess that's the question.
Yeah, the answer is just to use hexagonal approach.
So you should instantiate the model artifact from interfaces, from which ports, for instance, considering different versions.
In classical or more or less classical use cases of AI and ML, it can be modeled, deployed or provided by vendor in an ONIX format.
It's some kind of standard format.
It's a file.
You can use different ONIX models from different vendors.
And what you need to do at the end of the day just to replace...
one file with another file, but the rest of the ports and adapters will stay the same.
So you have a port how to run this model and adapter to run this model either on one environment, hardware dependent, on another.
In the case of LLM, it can be the same.
So the general recommendation is to instantiate, to build an abstraction layer on top of your model.
It does not even...
Das heißt, man kann es nur mit den Abstrukturen starten und dann später kommt es für eine Hexagonal-Approach.
Es ist nur eine Abstruction, die einfach nur die Technikimplementation hat und die Kontrakte sollte nicht verändern.
Also, man kann es einfach nicht verändern.
Das ist die Frage.
So, wenn ich eine LLM aus dem Internet benutze, die ich nicht wirklich kontrolliere, will not be available anymore.
I'm basically screwed and there is no way around it.
So I should rather not do that.
Is that your advice?
So is it again an advice to use local LLMs instead?
You just need to use local LLMs.
In general, you need to build an interface layer on top of your LLM usage.
Because LLMs, they have a specific, for instance, not LLMs, any kind of system has a specific requirements for input.
You just need to build an abstraction which abstracts your business language, your business inputs from the inputs from the schema your concrete model needs.
And when you change one model with another one, you just need, you have the same port, but you need to change the adapter.
Okay.
Anything else that you want to mention?
Anything that I forgot to ask you?
Yeah, maybe the main recommendation from my side, as I already said, there is no, let's say, something new, some kind of new approaches or new recommendations.
And industrial AI does not really have specific problems.
It has, let's say, low tolerance for the problems all AI systems already have.
And all these extreme requirements with I don't know, reliability and interpretability.
From my perspective, it's a good engineering practice for any kind of AI system.
But in case of industrial AI, you have to implement this from day one.
But in case of classical, more or less classical web application and enterprise application, you still have a time when it happens.
That's why it's, yeah, approaches are the same.
Aber ja, es kann eine gute Illustration zu benutzen, diese Engineering-Practice aus dem ersten Tag.
Okay, so there is one, I mean, I was so proud to end the stream, however, there's still one question, by Hiral Dave.
And the question is, can we say that dependency injection rule will play a pivotal role going forward for such situations where models change monthly?
Ja, als eine von den Implementationen, eine Dependency Injection kann sein, ja.
Weil die Dependency Injection ist ja, eine von den Implementationen-Pattern, die kann man verwendet werden, die hinter dieser Hexagonal-Architektur.
Ja, definitiv.
So, wo du dann injectierst das Modell oder die Interface-Trad-Modell und dann...
Whatever, use the step model, but use the one that is injected and it won't be looked up by us itself.
Definitely.
Thanks a lot.
Thanks a lot for taking the time.
Talk to you soon at TechRiders.
I think we will meet in person there and have a great weekend.
Yeah, thanks a lot.
Yeah, thank you.
Ciao.
Bye.
Hier ein Hinweis in eigener Sache.
Softwarearchitektur im Stream ist live vor Ort.
Wir sind beim TechRider Summit im Juni in Köln dabei.
Mehr Infos dazu und einen speziellen Rabattcode für unsere Community findest du auf unserer Website.
Sei dabei, stell Fragen und komm auch gerne auf uns zu.