# Monetizing AI Video Generation: Seed Dance V2 and Business Applications

**Podcast:** The Startup Ideas Podcast
**Published:** 2026-04-17

## Transcript

Yeah, so my friend Sirio, one of the world's greatest AI creative minds, just took me through Seed Dance V2 and it blew my mind.
It blew my mind because the people that are going to understand how to use this model, and this is the world's greatest AI creative model on the planet, they are going to be able to create AI influencers, faceless accounts, original movies, literally, ads that convert, ads in any language on the planet.
This is the creative AI model we have all been waiting for.
In this episode, Sirio takes you through a bunch of these use cases, show you how to do it, the prompts, the tactics, everything you need to know about Seed Dance V2 is in this episode.
And if you stick to the end, you will be a weapon for how to use Seed Dance V2, how to use AI video to make money.
Create content that gets you followers and more.
I've got one of my most creative friends on the podcast, Sirio.
Sirio, by the end of this episode, what are people going to learn?
Ooh, a lot of things.
First, why are all these image models, video models, and API providers so important to your business?
If you are starting some sort of AI app, or if you're trying to solve for issues in the creative space, with AI tools, we're going to talk about all the use cases.
Seed Dance V2 is here.
So we're going to try and explore all the use cases and how we can build on top of Seed Dance to solve particular issues and then productize around those workflows.
I love it.
Yeah, there's tons of tutorials and videos about, okay, Seed Dance V2 is here.
Look how cool this is.
But this is going to be a more practical guide to actually, okay, great.
How can you build a business around these models?
How can you make money from these models?
How can you create creative assets that are going to transform your business?
So that's my hope out of this episode.
And Sirio, if there's anyone who can deliver on this, it's you.
So excited to get into it.
Thank you for having me, Greg.
And I hope I can do my best here.
All right.
Okay.
So Seed Dance V2 officially launched today.
You can access it anywhere in all of your favorite AI tools.
Something very interesting about Seed Dance is that it's probably the first AI model that allows for multi-input generation.
So what does that mean?
We usually, if you're never, or if you've used AI tools, you know that you can either use first or last frames and you can generate videos based on those two inputs.
But now with Seed Dance V2, we're able to generate videos with multiple inputs.
For example, we can generate videos with multiple inputs.
So we can generate videos with up to two images.
We can add up to two videos.
We can add an audio file.
And then what Seed Dance is going to do based on what we are prompting and what we're trying to achieve, it's going to combine all those inputs together and give us a final video.
That's something very interesting here because it allows us for way more control.
And to show that, I'm going to go into my demo page.
Give me a second.
So this is what, so this is what it actually means, right?
In here, we have a video, like a green screen.
So, and this is AI generated, completely AI generated with Seed Dance, by the way.
And let's suppose that I want to change, I'm a production studio, I'm creating this game, and I want to put some sort of a demo on my social media or a quick video on my landing page.
And I want to replace these two people.
So I'm going to start with two different characters.
But at the same time, I want to replace the background.
Traditionally, this would take a very long time, but also it would cost a lot.
And what we're doing here is that we're using the multi-input feature inside Seed Dance.
And we're going to have our character one, our character two, and then we're going to have our background image.
And since this is, again, multi-input, we're going to reference all these inputs in the prompt by tagging them.
We're going to hit generate.
And it's going to take about 60 seconds for our video to generate here.
And again, the purpose of multi-input, as I said, is to get very creative with our editing process.
Seed Dance two, it's not only a video generator, it is a video editor.
That's how I see it.
It's almost like Nano Banana Pro, whereby the use cases are unlimited.
It's not used, it's not just producing an image, an image through text.
In this case, a video through text.
Or a video 3D.
an image, but you're combining multiple inputs to produce an output that's way more complex than traditional image-to-video models.
You can do something very similar with Cling 3, but the quality of C-Dense 2, based on my testing so far, is unmatched.
And we're going to see all the use cases and all the demo videos today.
And I hope that you make the decision on your own.
But this is the video that it generated.
This is pretty crazy.
Let me try and pull up the original video input, the green screen here.
This one over here is what C-Dense 2 generated.
The motion control is crazy here.
Yeah.
And it's simply from a prompt.
You're literally telling it to control the motion, to keep the motion of the original video exactly the same.
This is all natural language.
First of all, this just exceeded my expectations.
I think this is beautiful.
Two questions for you.
One is, from a prompt perspective, did you just manually create that prompt, or is that something that you used in LLM to optimize?
You can definitely use an LLM to optimize.
I think that, Claude does a phenomenal job, and it's the best by far, especially the 4.6 version, Opus 4.6.
I've used GPT before, but I do think that Claude does understand prompt engineering for vision models a bit better, at least in my experience, and I could be biased.
Um, but this is, uh, the more something with C-Dense, is that the more detail you give it, the better it does.
Differently from other models, um, where you can be simple and to the point, for example, Clink 3, if you're simple, straightforward, you're not using a lot of tokens or words in your prompt, um, then it might do a better job.
What I'm, uh, what I'm figuring out with C-Dense is that you have to be highly specific.
If you want to get very high quality output, especially if you're doing something with, uh, that, that relates to preserving character identity, that relates to preserving particular motions in the video or particular, uh, or particular, um, um, transitions, uh, uh, throughout.
So I think that both work.
I like to start my prompt myself, and then, uh, most of the time I will optimize it with Claude, um, uh, 4.6.
Cool.
And before we go into the next use case, I think, I mean, you're, you're a stylish guy, you know, and, uh, I think one of the reasons why, yeah, you, you are, you know, you're wearing your hat says Los Angeles, you know, upside down.
I feel like you always got good style.
Every time I see you, you've got good style.
A part of why this video crushed it was, yeah, C-Dense 2.0 did a good job, but also your reference images are really on point.
How, how were you able to find those reference images and videos and any tips on how to do that?
Everything starts with a very good idea, a very good source reference, source image.
What is your vision?
You can describe your vision, but the second that the, these LLMs or these, these models see a source reference, they're able to understand your taste and they're able to mimic, uh, um, that, that reference image into something more concrete and more tangible for you.
So always focus on having a great source image, source reference, that matches your idea.
It's like in any traditional art, uh, um, I'm, I'm also a painter.
I draw, I sculpt.
Um, and for me, in order to visualize my idea, I have to have something in front of me that I can see and I can be, Hey, I'm inspired.
I want to create something similar to this.
Um, and it's the same thing with LLMs.
Like think of them as, as, as humans, if they were to be like your, your, your assistants or like your, your friends, that's how they understand inputs.
So give it a very good source, uh, reference of whatever you're trying to, to, to, to achieve and then follow it with a very specific prompt.
All right.
Should we, should we keep going?
Yes.
Okay.
I'm going to showcase a video that I did myself.
This is, uh, a virtual try on video.
Uh, I recorded myself out there in Canada, in Montreal.
It was like minus 30 degrees.
I was wearing shorts and I was like, Oh, I wonder whether AI can put me, uh, into this outfit.
So now I want AI to replace me to actually put me, put on, uh, this outfit and have a bear walk by.
And this would be, you know, helpful if maybe you're doing like what, like an ad, maybe.
Yes.
If you want to replace, let's say that you have an actor and you're, you did an e-com shoot and, um, you want to have the exact same motion of the model and you just want to replace the clothes, the clothes that they are wearing, because you're creating this very cool transition or just because you want a very clean, uh, style throughout, um, your, your, um, e-commerce assets.
So let's see what he came up with.
It is minus 30.
And I'm wondering whether AI can help me put on this outfit.
Okay.
How about have a bear walk by?
Look at, look at the details.
Like, look at like how the bear walks by and then you have all the, um, footprint.
Can you stop it for a sec?
Yeah.
I cannot tell that your outfit is AI.
Like, honestly, not only that, but what I'm very impressed with is that my face is the same.
If I saw this video myself, like I know how to, how to, I'm very familiar with all the AI models, open source, closed source, everything.
Um, and I can tell you which video is what I can tell you if it's generated with cling, if it's generated, if I can tell you if it's generated with cling, if it's generated with a C dense 1.5, uh, with when, um, but when I saw this video myself, I'm like, it looks like me.
There's no distortion in the face, which is crazy.
Um, and yes, the outfit too, like it was able to match the exact look at the boots.
Like look, look at the pattern of the, um, um, of the pans over here.
So if you go into our source reference, so if you go into our source reference over here, yep.
You see how it has like all these, um, like this specific pattern, this cut that's like dark.
Yep.
If you go into our video, it's here.
Yep.
It's crazy.
Okay.
How about have a bear walk by look at like the footprint.
It's like looking at the bear.
It's tracking the bear with, with, with, uh, with the eyes and the head.
Like, so it understands the input very well.
Um, and mind you, the input here was very simple.
Like I didn't go into details.
I could have been way more specific.
I could have actually described my outfit so that the outfit could have been, um, more accurate.
Um, so it's phenomenal.
And again, it doesn't take more than 60 seconds.
And this tool that we're in enhancer, you're the founder of this, right?
Yes, sir.
I'm the founder of this.
So this, you can use enhancer, not just with C dense 2.0, right?
You can use other, you can use it with, with, um, yeah, you can use it with any model.
Another cool use case is translating for everyone that wants to build a translation app.
Um, that's going to take 30 seconds to translate or not only that, but also replacing the character in the frame.
Take a look at this one.
So we have this original video in Chinese.
She's showcasing the glasses.
But now your company, operates in the United States and you want to showcase the same classes.
You want to have the same asset.
You want to have her move exactly the same because you're AB testing the ads and you want everything to look exactly the same, but you want the language to be different and also the model to be different because you're targeting different demographics.
So here's our reference model.
This is a model that we generated previously.
And now we want this model to replace the woman, but also we want her to speak in English.
And this is the prompt that we're using.
You can stop and screenshot.
This, um, go and use these assets.
We're inside video editor.
We're going to hit generate again.
You can take your time to read the prompt.
And what it's going to do, um, is that it will replace the woman in the first video with our source image.
And it's going to translate everything she's saying in Chinese from Chinese to English, um, in a matter of seconds.
So let's see how it does.
Yeah, this is really interesting.
Also, just like, creating ads and just creating content in, in like a hundred languages, right?
Yeah.
It's AB testing at its finest.
Yeah.
And getting higher conversion rates, just getting cheaper ads because of that.
Optimizing, optimizing, optimizing.
Yeah.
All right.
There it is.
So what do you think she said?
She said, I love you, Serio.
You wear my new glasses.
Let's see.
Let's tell we're, we translated the original video, um, from, from, uh, Chinese Mandarin.
This one's amazing.
It's flattering and versatile.
Must have.
So you see the wink.
Let me go back into, into the, the, the reference video.
It feels like she's selling the glasses that I'm wearing back to me.
And I want, it's so good.
She's doing such a good job selling it.
I want another pair.
That's how good of a job this is doing.
Look at the wink.
Look at the way that she puts her, her, her hand on her glasses.
It's the exact same motion.
This one's amazing.
Look at, look at the blur in the camera.
Like the, the focus, the motion, the, the, the motion, the focus.
This one's amazing.
It's flattering and versatile.
Must have.
Right.
Nailed it.
This one is very interesting.
Look at this video.
What we're going to do here.
.
.
, this is an ad.
Now we have a package, right?
And this is like traditional, just like 3d render.
There's no branding in the package.
This is meant for like evergreen.
Okay.
Like a template.
You can buy these templates.
What if we actually replace that package with this image?
So what we're doing right now, again, here's a prompt.
You can screenshot this.
We're replacing only the package and keeping everything the same.
Generate, um, in, you can find any 3d asset out there.
You can start, um, applying texture to all these 3d assets by combining the source reference with image references and just literally telling it to pay.
Make sure that you put the texture from image number one into the 3d render video and image number, uh, in video, uh, number two, we could do this with nano banana in images.
And now we're doing this with C dense too.
And then we're doing this with 3d render in videos, which is quite insane.
So that template, was that a, was that found on some, like a stock video website?
That was entirely generated, but you can go into free pick.
For example, I think that they have a bunch of templates like that.
Not quite sure if it's a video template that they have, but you can take an image that is, uh, an, an, an image template.
You can turn it into a video and then you can put everything together and then you can create this template at video.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Yeah.
Let's take a look at this.
There you go.
The logo is completely consistent in under and understood like the background that it had to be yellow, um, kept everything the same.
Is this, is, is he dense to the best video model to ever exist for now?
Yes.
Yeah.
Like for, uh, no VO.
Yeah.
It's the view of four.
Um, But by far, it is the best out there in terms of realism, in terms of motion, in terms of quality.
It's only up to 720p for now.
And when they release their 1080p version, it's going to be a game changer for anyone that's creating digital assets.
Cool.
Do we have time for a couple more?
Yeah, we do.
So I want to show you two other use cases that are very interesting that everyone would love.
The first one is extending videos.
You have a three-second video, you have a 10-second video, and you want to extend it to 15 more seconds while keeping everything the same.
We could not do this before.
Google VO 3.1 kind of tried, but look at this.
So we have our three-second video here, and we don't know what's happening next.
We can recreate this entire scene.
Here is the prompt.
You can screenshot it.
You can take a look at it.
And then, again, we're using our video extender feature.
Hit generate.
And what it's going to do is that it's going to continue the actual storyline based on what we said in the prompt while keeping everything consistent.
This is use case number one of video extension, and there's a different use case for a video extension that would actually fill in the middle of the video.
This is extending the last bit of the video, and there's a use case that I'm going to show you after we explore this one where it's going to fill in the gaps.
So we have two videos, and it's going to figure out what goes in the middle, which is insane to me.
Yeah, I mean, if this could do this, this is big.
Because this has been a pain point for me personally with playing with some of these models.
Like ads, yeah.
Yeah, exactly, with ads, literally ads.
Or just traditional filmmaking doesn't have to be ads.
Like there's something that you just, you want to have at least three more seconds of that video.
You cannot do that.
Let's take a look at this.
So it extended the video from the point where the video cut, right?
You have a different scene.
And it's the exact same last frame.
This is use case number one of extending your videos.
There's a bunch of others, but there's one last thing I want to show, which is AI influencers in lip syncing.
This is the best model for you to generate AI influencers.
And they can do anything you want them to do.
Again, as I said.
Screenshot this prompt over here.
The prompt is highly specific.
This is a source image generated with Nano Banana Pro.
You're going to go and use the asset.
And the way that the influencers or avatars lip sync is simply you prompt it to say particular things.
In the prompt, you go and say, hey, she is saying.
And then you go and say, give me a second.
Or she's saying, this is what I mean.
And.
What do you call these?
Quotation.
In quotation.
Yeah.
In quotation marks.
So everything inside quotation marks is what the model or the avatar will say.
Very simple, natural language.
You just tell it what you want it to do.
And it's going to understand exactly.
Now, of course, there's ways to prompt things.
So they look and feel more realistic, especially emotions.
How to control emotions.
How to prompt.
Emotions.
You do not prompt emotions by saying, hey, the character is sad or the character is happy.
You have to describe the muscle movements.
Right.
Because just saying character is sad.
OK, there's not a lot of control.
Like sad.
How there's thousands of ways for someone to be sad.
But by describing the muscle movements, by describing the transition in emotion, transition in tone, in body language, it's able to achieve more realistic results.
So this is what we're going to see here.
That's why this is a very long prompt, because I'm being very specific with what they're saying, because we the aim for this video is so it doesn't look AI.
OK.
Yeah.
Give me a second.
This is what I mean.
The way I breathe, the way I talk right after moving.
It's all generated inside enhancer.
It's crazy.
There's.
Let me show you.
I have goosebumps on that one.
Like that, that looked real.
No, let me show you another one right here.
So this is our sort of reference to say that we want to generate ads.
Right.
And we have our product and our product has some sort of text.
And again, one of the main flows for other video generators is that the text was usually wrapping or was changing as the video was generating.
Right.
Then we have our prompt over here.
Again, we're very specific with our problem with what they're saying, how they're saying it, how we're structuring things.
So we're going to change the prompt, we're going to hit generate, and then we're going to have her talk about this product that she never tried, because this AI model does not have thoughts or does not have taste.
She doesn't exist.
And the product was never sent to her.
So that's the beauty of of AI models, because you can create a version of yourself if you want, or you can create a completely different IP and the brand does not have to send you the actual clothes that would cost them a few bucks to to actually send, like, ship them.
Right.
To you.
And now multiply that by, like, thousands of influencers, it becomes, like, very costly.
Now the brand can just be like, hey, can you just place this inside your image with nano banana pro?
It's going to keep it very consistent.
Can you just generate it with C dense, too?
Or maybe we can do it for you if you want.
And there you go.
Like unlimited content.
Very cheap.
There he goes.
OK, quick taste test, huh?
Wait, that's actually nice.
It's not super sweet.
It's really clean.
I wasn't expecting that.
Yeah, I drink this.
Look at look at the text, like the text is quite like spot on.
It's not changing.
Insane.
All right, Serio, this C dense V2, this feels like the best creative model to ever exist with it being so good.
Just walk us through quickly how to think about, like, think, you know, why would we use any other model?
Are there any benefits using another model?
Or as of recording?
This should we just make this the default?
I think that at some point what will happen is the same thing that it did with nano banana, where it became the best AI editing image editing model.
C dense to me seems like it's the best by far.
However, and the reason why it's the best by far is because you can practically you can animate UX UI, you can animate like logos, you can place logos within a video.
You can do so many things that other models are not capable of, and you can generate like, very good lip syncing.
Now, of course, some others are very good at other things, maybe emotion control, clang three does a very good job at that.
And there's other models who are fine tuned so that the images look way more low fidelity, like more more realistic, or it doesn't look like they have like the cinematic feel.
cling three has a cinematic feel.
You know, you when you when you look at the video, you know that it's very good at producing cinematic videos, I'll show you an example of another video model that we fine tuned inside enhancer is called enhancer v4.
And what this model does, again, it's does it's not the highest fidelity, like video model, it does not produce like this crazy transitions, it just might not keep the character extremely consistent, it might not have multi input references, but it produces Hi, guys.
It's crazy, like, I am not even real great version probably cannot do the same thing, because it has different type of color schemes that it produces kind of different depth, different ways that it treats the background different ways that it treats the subjects.
And this is a different video model that is fine tuned, particularly for this exact use case for that's talking head video generator.
So I would not say that C dense two will replace everything.
That's out.
There.
Because it really depends on what you're trying to achieve with the model and how you are using the model.
But I think it's going to be for now, the default model to generate and edit videos, especially editing videos, maybe not video generator, but it's phenomenal, and will be the the state of the art video editor out there for any use case, really.
What's the best video generator?
Well, I mean, for now, seems to be C dense, right?
Yeah, seems to be C dense.
But it sounds like what you're saying is like the daily drivers going to be C dense, generation and editing.
However, there are some use cases that certain models have a different look visual look like you were saying, enhancer, V4, where it's like, yeah, you know, I, if I'm trying to go for that look, I might just do you, You know, use that for like a specific use case.
And also it depends on what the, again, what the user is used to.
There's things that the user really likes or the creative in this case really likes a particular model and they just want to stick with it because it's good enough for what they're doing.
And maybe it's cheap, right?
Maybe it's faster.
Maybe they're just producing low fidelity video for social media and they don't want to spend like $3 on a five second video like Google VO3.
Or at least back in the days when it was about a three video per, what is it, five or eight second clip.
And again, people that are using generative tools are not using them just for fun.
They are normally using them or monetizing through them.
Like they either have some sort of a business, maybe they're a service provider, maybe they're creative, or maybe they're building their own app.
It's not that they're coming to this.
They're coming to these platforms and spending all the money without making something back.
So, of course, price matters.
And depending on the use case, then some models become more relevant than others.
Last question before we head out.
Adobe is a $106 billion company.
Not asking for financial advice, but what do you think happens to Adobe?
I mean, they were the leaders in the creative suite, right?
They were the go-to for everything.
For 20 years, more.
You know, is Adobe, what happens to Adobe over the next five years?
Your best guess.
Probably Adobe might require these AI generative tools.
I think it would be a smart move if they do.
All Enhancer competitors, not going to mention them.
Maybe they require Enhancer, I don't know.
But I believe, I still believe that Adobe is relevant.
Especially for creative professionals.
Who want way more control.
Who actually want to edit or cut that frame.
Who actually want to produce high fidelity videos that are 8K.
And who, again, are more to them than simply a creative director that's just starting out with AI tools.
I think that every time I produce something with AI, I still have to edit it.
It will not produce.
It will not produce the perfect output for me.
Like, there's still, like, is this the post-production phase that happens?
And I believe it will always exist.
And there's always going to be a need for this.
Like, there's a need for digital photography, right?
Because I believe that in the future, most photography is going to be prompted unless there's an event.
And there's also a need for Polaroids.
And there's also a need for photos.
For film photography.
Because those things are way more technical.
And while I think that AI or technology advances, still, creatives need to have full control of their outputs.
And I think that Adobe, what they do best is...
Yeah, I mean, basically what you're saying is, like, Adobe is, like, the place for creative professionals.
Yeah.
And these tools, VO3, C-Dense 2, things like...
You know, these models.
That's just the first step, right?
Yeah.
It's simply the first step.
And there's...
I believe that there's always going to be a need for, like...
There's always going to be a need for post-editing.
And Adobe is the app where you do the post-editing.
It's not the app where you initially create.
Because we create videos every single day with our phones.
And then we go to Adobe.
So we could generate videos every single day.
We could create videos with our laptops and phones.
And then we go to Adobe.
Even though Adobe now is trying to be the place where you create and also edit and also post-produce.
Which is a smart move.
However, would be ideal for me if they focus on things that actual pro-creatives really need and want.
Which is not necessarily generate the content in Adobe.
But is how to use Adobe tools to have an agenda...
Agents like feature where it just edits for you, like focus more on the post-producing than the actual production.
Totally.
Makes sense to me, Syrio.
Thank you so much for coming on the show.
I'm going to include links where you can follow Syrio on social.
Links to Enhancer.
And dude, thank you so much for showing us all these different use cases.
This is really cool.
Thank you, Greg.
Appreciate it.
I appreciate you.
