• Alphane Moon@lemmy.worldOPM
    link
    fedilink
    arrow-up
    9
    ·
    2 days ago

    Lenovo claimed it won 24 percent of the PC market and saw AI PCs account for over ten percent of notebook sales. Group president Luca Rossi said PC sales should improve as buyers have two good reasons to upgrade: end of life for devices bought to run Windows 10 which are now ripe for replacement, and a desire to adopt AI PCs and offer users new experiences.

    Windows 10 EOL is a fair argument. I have yet to see any research showing consumers looking to buy “AI PCs” specifically. I suspect most people just get a new laptop and it happens to be an “AI PC”.

    • Brkdncr@lemmy.world
      link
      fedilink
      arrow-up
      5
      ·
      2 days ago

      Enterprise is different. Lots of business decision makers are prepping their workforce for AI, and do t want to put their data on someone’s cloud. Local AI will be a big deal.

      • Alphane Moon@lemmy.worldOPM
        link
        fedilink
        English
        arrow-up
        3
        ·
        2 days ago

        Are the current crop of NPUs really suitable for this?

        I play around with video upscaling and local LLMs. I have a 3080 which is supposed to be 238 TOPS. It takes about 25 min to upscale a ~5 min SD video to HD (sometime longer depending on the source content). The “AI PC” NPUs are rated at around ~50 TOPs, so that would be a massive increase in upscale time (closer to 2 hours for ~5 min SD source).

        I also have a local LLM that I’ve been comparing against ChatGPT. For my limited use case (elaborate spelling/typo/style checking), the local LLM (llama) works comparable to ChatGPT, but I run it on a 3080. Is this true for local LLMs that run on NPUs? I would speculate that more complex use cases (programming support?), you would need even more throughput from your NPU.

        I have much more experience with upscaling though and my experiments/usage of local LLMs is somewhat limited compared to ChatGPT usage.

        • brucethemoose@lemmy.world
          link
          fedilink
          arrow-up
          1
          ·
          9 hours ago

          NPUs are basically useless for LLMs because no software supports them. They also can’t allocate much memory, and they don’t support the exotic quantization schemes modern runtimes use very well.

          And speed wise, they are rather limited by their slow memory busses they’re attached to.

          Even on Apple, where there is a little support for running LLMs on NPUs, everyone just does the compute on the GPU anyway because its so much faster and more flexible.

          This MIGHT change if bitnet llms take off, or if Inte/AMD start regularly shipping quad channel designs.

          • Alphane Moon@lemmy.worldOPM
            link
            fedilink
            English
            arrow-up
            1
            ·
            9 hours ago

            Yes, I was reading through the documentation of some of the tools I use and I noticed minimal info about NPU support.

            Will take a look at bitnet (if my tools support it), curious how it would compare to Llama which seems decent for my use cases.

            • brucethemoose@lemmy.world
              link
              fedilink
              arrow-up
              2
              ·
              edit-2
              8 hours ago

              Bitnet is theoretical now and unsupported by NPUs anyway.

              Basically they are useless for large models :P

              The IGPs on the newest AMD/Intel IGPs are OK for hosting models up to like 14B though. Maybe 32B if with the right BIOS, if you don’t mind very slow output.

              If I were you, on a 3080, if you keep desktop vram usage VERY minimal, I would run TabbyAPI and a 4bpw exl2 quantization of Qwen 2.5 14B coder, instruct, and RP finetune… pick your flavor. I’d recommend this one in particular.

              https://huggingface.co/bartowski/SuperNova-Medius-exl2/tree/4_25

              Run it with Q6 cache and set the context to like 16K, or whatever you can fit in your vram.

              I guarantee this will blow away whatever llama (8b) setup you have.

                • brucethemoose@lemmy.world
                  link
                  fedilink
                  arrow-up
                  2
                  ·
                  edit-2
                  8 hours ago

                  Pick up a 3090 if you can!

                  Then you can combine it with your 3080 and squeeze Qwen 72B in, and straight up beat GPT-4 in some use cases.

                  Also, TabbyAPI can be tricky to set up, ping me if you need help.

                  • Alphane Moon@lemmy.worldOPM
                    link
                    fedilink
                    arrow-up
                    1
                    ·
                    8 hours ago

                    Not planning on getting a new/additional GPU at this point. My local LLM project is more of curiosity, I am more knowledgeable on the AI upscaling side. :)

                    Thanks for the offer, will consider it!