Skip to content

[Wan] Optimize time & memory#12780

Open
Fabrice-TIERCELIN wants to merge 8 commits intohuggingface:mainfrom
Fabrice-TIERCELIN:wan_optimization
Open

[Wan] Optimize time & memory#12780
Fabrice-TIERCELIN wants to merge 8 commits intohuggingface:mainfrom
Fabrice-TIERCELIN:wan_optimization

Conversation

@Fabrice-TIERCELIN
Copy link
Copy Markdown

@Fabrice-TIERCELIN Fabrice-TIERCELIN commented Dec 3, 2025

What does this PR do?

This PR reduces the time and space used when running Wan. I have successfully tested the performance improvement and I have done a crash test (put an error in place of my code and see the error). The output result is remains the same.

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@JoeGaffney
Copy link
Copy Markdown

Hey, be interesting to know the rough before and after metrics?

Be great if this does reduce memory as wan really shoots up in memory with resolution and time increases.

@Fabrice-TIERCELIN
Copy link
Copy Markdown
Author

I have implemented this code to benchmark:

import time
...
                start = time.time()
                x1 = hidden_states[..., 0::2]
                x2 = hidden_states[..., 1::2]
                end = time.time()
                print("!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! BENCHMARK !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!")
                print(end - start)

There are 160 executions on my startup. Before, after the change and the diff (in seconds):

Before After Diff
0.0421395301818847 0.0186080932617187 -0.0235314369201660
0.0091154575347900 0.0033469200134277 -0.0057685375213623
0.0087890625000000 0.0034732818603516 -0.0053157806396484
0.0093810558319092 0.0033228397369385 -0.0060582160949707
0.0087580680847168 0.0031294822692871 -0.0056285858154297
0.0088031291961670 0.0031223297119141 -0.0056807994842529
0.0055754184722900 0.0031778812408447 -0.0023975372314453
0.0071923732757568 0.0030803680419922 -0.0041120052337647
0.0078923702239990 0.0031263828277588 -0.0047659873962402
0.0077276229858398 0.0031015872955322 -0.0046260356903076
0.0077912807464600 0.0031595230102539 -0.0046317577362061
0.0087156295776367 0.0030493736267090 -0.0056662559509277
0.0075991153717041 0.0031275749206543 -0.0044715404510498
0.0085408687591553 0.0030329227447510 -0.0055079460144043
0.0046560764312744 0.0031311511993408 -0.0015249252319336
0.0051045417785645 0.0030672550201416 -0.0020372867584229
0.0057218074798584 0.0032899379730225 -0.0024318695068359
0.0074028968811035 0.0030341148376465 -0.0043687820434570
0.0047314167022705 0.0031266212463379 -0.0016047954559326
0.0048007965087891 0.0029926300048828 -0.0018081665039063
0.0061461925506592 0.0031211376190186 -0.0030250549316406
0.0068418979644775 0.0030431747436523 -0.0037987232208252
0.0066962242126465 0.0031459331512451 -0.0035502910614014
0.0049102306365967 0.0029878616333008 -0.0019223690032959
0.0093643665313721 0.0030882358551025 -0.0062761306762695
0.0074284076690674 0.0029993057250977 -0.0044291019439697
0.0057675838470459 0.0031495094299316 -0.0026180744171143
0.0071196556091309 0.0030658245086670 -0.0040538311004639
0.0077340602874756 0.0037443637847900 -0.0039896965026856
0.0077130794525146 0.0030124187469482 -0.0047006607055664
0.0061564445495605 0.0031197071075439 -0.0030367374420166
0.0079400539398193 0.0030746459960938 -0.0048654079437256
0.0058634281158447 0.0031714439392090 -0.0026919841766357
0.0081622600555420 0.0029609203338623 -0.0052013397216797
0.0069501399993896 0.0031454563140869 -0.0038046836853027
0.0078878402709961 0.0030357837677002 -0.0048520565032959
0.0080575942993164 0.0031151771545410 -0.0049424171447754
0.0050995349884033 0.0030565261840820 -0.0020430088043213
0.0080208778381348 0.0031700134277344 -0.0048508644104004
0.0065584182739258 0.0030229091644287 -0.0035355091094971
0.0053703784942627 0.0031318664550781 -0.0022385120391846
0.0052416324615479 0.0030868053436279 -0.0021548271179199
0.0054197311401367 0.0030879974365234 -0.0023317337036133
0.0049960613250732 0.0030584335327148 -0.0019376277923584
0.0074501037597656 0.0031375885009766 -0.0043125152587891
0.0073106288909912 0.0029852390289307 -0.0043253898620606
0.0046367645263672 0.0031688213348389 -0.0014679431915283
0.0049192905426025 0.0030219554901123 -0.0018973350524902
0.0060331821441650 0.0031657218933105 -0.0028674602508545
0.0115311145782470 0.0030133724212646 -0.0085177421569824
0.0118765830993652 0.0032989978790283 -0.0085775852203369
0.0052525997161865 0.0031018257141113 -0.0021507740020752
0.0048851966857910 0.0034267902374268 -0.0014584064483643
0.0111300945281982 0.0031645298004150 -0.0079655647277832
0.0047070980072021 0.0031282901763916 -0.0015788078308105
0.0045855045318604 0.0032043457031250 -0.0013811588287354
0.0094137191772461 0.0030689239501953 -0.0063447952270508
0.0093262195587158 0.0030744075775146 -0.0062518119812012
0.0091929435729980 0.0032446384429932 -0.0059483051300049
0.0071072578430176 0.0030021667480469 -0.0041050910949707
0.0094301700592041 0.0033464431762695 -0.0060837268829346
0.0092351436614990 0.0032732486724854 -0.0059618949890137
0.0054991245269775 0.0033721923828125 -0.0021269321441650
0.0046093463897705 0.0031516551971436 -0.0014576911926270
0.0101990699768066 0.0039906501770020 -0.0062084197998047
0.0113568305969238 0.0030558109283447 -0.0083010196685791
0.0070419311523438 0.0031654834747314 -0.0038764476776123
0.0086443424224854 0.0030453205108643 -0.0055990219116211
0.0099291801452637 0.0031201839447021 -0.0068089962005615
0.0091631412506104 0.0031297206878662 -0.0060334205627441
0.0095853805541992 0.0033917427062988 -0.0061936378479004
0.0111463069915771 0.0034832954406738 -0.0076630115509033
0.0105581283569335 0.0034265518188477 -0.0071315765380859
0.0102081298828125 0.0030958652496338 -0.0071122646331787
0.0094234943389893 0.0032963752746582 -0.0061271190643311
0.0081713199615479 0.0032901763916016 -0.0048811435699463
0.0074520111083984 0.0034911632537842 -0.0039608478546143
0.0089154243469238 0.0031881332397461 -0.0057272911071777
0.0088458061218262 0.0033829212188721 -0.0054628849029541
0.0096502304077148 0.0033123493194580 -0.0063378810882568
0.0244009494781494 0.0078990459442139 -0.0165019035339355
0.0057473182678223 0.0030648708343506 -0.0026824474334717
0.0046644210815430 0.0031671524047852 -0.0014972686767578
0.0047345161437988 0.0029947757720947 -0.0017397403717041
0.0048987865447998 0.0030910968780518 -0.0018076896667481
0.0067203044891357 0.0029950141906738 -0.0037252902984619
0.0048406124114990 0.0030899047851563 -0.0017507076263428
0.0047273635864258 0.0029575824737549 -0.0017697811126709
0.0061659812927246 0.0030968189239502 -0.0030691623687744
0.0046279430389404 0.0030541419982910 -0.0015738010406494
0.0047345161437988 0.0031464099884033 -0.0015881061553955
0.0069465637207031 0.0029978752136230 -0.0039486885070801
0.0075678825378418 0.0031361579895020 -0.0044317245483398
0.0048172473907471 0.0030024051666260 -0.0018148422241211
0.0048902034759521 0.0030891895294189 -0.0018010139465332
0.0048506259918213 0.0029928684234619 -0.0018577575683594
0.0049655437469482 0.0031177997589111 -0.0018477439880371
0.0050978660583496 0.0030713081359863 -0.0020265579223633
0.0050847530364990 0.0031337738037109 -0.0019509792327881
0.0047094821929932 0.0030446052551270 -0.0016648769378662
0.0046484470367432 0.0032041072845459 -0.0014443397521973
0.0062952041625977 0.0029749870300293 -0.0033202171325684
0.0047221183776855 0.0030632019042969 -0.0016589164733887
0.0046348571777344 0.0029704570770264 -0.0016644001007080
0.0047054290771484 0.0030844211578369 -0.0016210079193115
0.0045263767242432 0.0030784606933594 -0.0014479160308838
0.0047385692596436 0.0031981468200684 -0.0015404224395752
0.0048210620880127 0.0030486583709717 -0.0017724037170410
0.0045921802520752 0.0031194686889648 -0.0014727115631104
0.0047745704650879 0.0030097961425781 -0.0017647743225098
0.0049242973327637 0.0031056404113770 -0.0018186569213867
0.0046339035034180 0.0029680728912354 -0.0016658306121826
0.0048007965087891 0.0063762664794922 0.0015754699707031
0.0047740936279297 0.0031573772430420 -0.0016167163848877
0.0047769546508789 0.0030333995819092 -0.0017435550689697
0.0073404312133789 0.0030534267425537 -0.0042870044708252
0.0077805519104004 0.0041611194610596 -0.0036194324493408
0.0048308372497559 0.0030725002288818 -0.0017583370208740
0.0047106742858887 0.0032036304473877 -0.0015070438385010
0.0047028064727783 0.0045213699340820 -0.0001814365386963
0.0046601295471191 0.0031633377075195 -0.0014967918395996
0.0045568943023682 0.0031034946441650 -0.0014533996582031
0.0048530101776123 0.0035943984985352 -0.0012586116790772
0.0046441555023193 0.0034363269805908 -0.0012078285217285
0.0048103332519531 0.0034394264221191 -0.0013709068298340
0.0047457218170166 0.0033185482025146 -0.0014271736145020
0.0047028064727783 0.0033512115478516 -0.0013515949249268
0.0046455860137939 0.0030872821807861 -0.0015583038330078
0.0047202110290527 0.0031988620758057 -0.0015213489532471
0.0046472549438477 0.0030579566955566 -0.0015892982482910
0.0044870376586914 0.0032095909118652 -0.0012774467468262
0.0066292285919189 0.0030336380004883 -0.0035955905914307
0.0045423507690430 0.0031654834747314 -0.0013768672943115
0.0057387351989746 0.0030357837677002 -0.0027029514312744
0.0047850608825684 0.0033471584320068 -0.0014379024505615
0.0044982433319092 0.0030486583709717 -0.0014495849609375
0.0045578479766846 0.0031671524047852 -0.0013906955718994
0.0064446926116943 0.0030281543731689 -0.0034165382385254
0.0066962242126465 0.0031011104583740 -0.0035951137542725
0.0046284198760986 0.0030128955841064 -0.0016155242919922
0.0046155452728271 0.0030755996704102 -0.0015399456024170
0.0045783519744873 0.0029969215393066 -0.0015814304351807
0.0047678947448730 0.0030941963195801 -0.0016736984252930
0.0046646595001221 0.0030879974365234 -0.0015766620635986
0.0045282840728760 0.0031120777130127 -0.0014162063598633
0.0047643184661865 0.0030353069305420 -0.0017290115356445
0.0049471855163574 0.0030617713928223 -0.0018854141235352
0.0045933723449707 0.0030624866485596 -0.0015308856964111
0.0044887065887451 0.0031399726867676 -0.0013487339019775
0.0046794414520264 0.0030252933502197 -0.0016541481018066
0.0047705173492432 0.0030777454376221 -0.0016927719116211
0.0045073032379150 0.0029671192169189 -0.0015401840209961
0.0047283172607422 0.0030846595764160 -0.0016436576843262
0.0045824050903320 0.0030794143676758 -0.0015029907226563
0.0047013759613037 0.0031030178070068 -0.0015983581542969
0.0068626403808594 0.0030944347381592 -0.0037682056427002
0.0079987049102783 0.0031914710998535 -0.0048072338104248
0.0101602077484130 0.0030560493469238 -0.0071041584014892
0.0124363899230957 0.0031447410583496 -0.0092916488647461
0.0094563961029053 0.0031492710113525 -0.0063071250915527

So the time is reduced by -0.555660486221313 seconds

@Fabrice-TIERCELIN
Copy link
Copy Markdown
Author

Are you waiting for me to do something?

@github-actions
Copy link
Copy Markdown
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions Bot added the stale Issues that haven't received updates label Jan 22, 2026
@Fabrice-TIERCELIN
Copy link
Copy Markdown
Author

Please approve my MR 🥺

A message threatens to archive it.

@github-actions github-actions Bot removed the stale Issues that haven't received updates label Jan 23, 2026
@Fabrice-TIERCELIN
Copy link
Copy Markdown
Author

@sayakpaul or @delmalih, please approve 🥺

Copy link
Copy Markdown
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!

Can you provide a script that measures the latency and memory consumption with and without this PR?

@Fabrice-TIERCELIN
Copy link
Copy Markdown
Author

Here are the two modifications to compare:

Without improvement

  1. In src/diffusers/models/transformers/transformer_wan.py, import time
  2. In src/diffusers/models/transformers/transformer_wan.py, at line 109, replace this:
                x1, x2 = hidden_states.unflatten(-1, (-1, 2)).unbind(-1)

... by that:

                start = time.time()
                x1, x2 = hidden_states.unflatten(-1, (-1, 2)).unbind(-1)
                end = time.time()
                print("!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! BENCHMARK !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!")
                print(end - start)

With improvement

  1. In src/diffusers/models/transformers/transformer_wan.py, import time
  2. In src/diffusers/models/transformers/transformer_wan.py, at line 109, replace this:
                x1, x2 = hidden_states.unflatten(-1, (-1, 2)).unbind(-1)

... by that:

                start = time.time()
                x1 = hidden_states[..., 0::2]
                x2 = hidden_states[..., 1::2]
                end = time.time()
                print("!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! BENCHMARK !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!")
                print(end - start)

The duration will be logged in both case. You have already the results above.

@sayakpaul
Copy link
Copy Markdown
Member

Those results are not conclusive enough, IMO. We should benchmark the end-to-end latency involved in generating a reasonable video clip.

@Fabrice-TIERCELIN
Copy link
Copy Markdown
Author

All this code is run at launch time, not generation time. It optimizes the startup. So the video length is not important.

@Fabrice-TIERCELIN
Copy link
Copy Markdown
Author

@sayakpaul, here are a video without:

without.mp4

and a video with:

with.mp4

@github-actions github-actions Bot added models size/S PR with diff < 50 LOC labels Apr 25, 2026
@Fabrice-TIERCELIN
Copy link
Copy Markdown
Author

@DN6 or @DefTruth, please approve 🥺

@github-actions github-actions Bot added size/S PR with diff < 50 LOC and removed size/S PR with diff < 50 LOC labels May 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

models size/S PR with diff < 50 LOC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants