Tag: Hardware

  • Surface Laptop Ultra makes Microsoft’s MacBook Pro fight about local AI

    Surface Laptop Ultra makes Microsoft’s MacBook Pro fight about local AI

    Surface Laptop Ultra is being framed as Microsoft’s answer to the MacBook Pro. That comparison is useful, but only up to a point. The more interesting question is whether Microsoft and NVIDIA can make a Windows laptop feel credible for local AI work instead of stopping at spec-sheet bragging.

    The short version

    • Windows Latest reports that Microsoft has introduced Surface Laptop Ultra, a high-end Windows on Arm laptop built around NVIDIA’s RTX Spark platform.
    • The headline specs are aggressive: a 20-core NVIDIA Grace CPU, Blackwell RTX graphics, up to 128GB of unified memory, CUDA support, and claims around 120-billion-parameter local model runs.
    • The hard part is not raw GPU marketing. Microsoft has to prove battery life, heat, x86 compatibility, creative-app support, and Windows on Arm developer tooling in daily use.
    • Hacker News readers mostly argued about price, fan noise, and whether large local AI workloads belong on a laptop at all.

    What happened with Surface Laptop Ultra

    Windows Latest says Microsoft used Computex 2026 to show Surface Laptop Ultra, a new top-end Surface laptop built with NVIDIA. The reported platform combines a 20-core NVIDIA Grace CPU, a Blackwell RTX GPU, fifth-generation Tensor Cores with FP4 support, NVLink-C2C between CPU and GPU, and up to 128GB of unified memory.

    The article also says Microsoft tuned Windows 11 on Arm for the platform. That includes scheduler work across 20 cores, power and thermal management, higher GPU-accessible memory limits, shared-memory page handling, Prism emulation changes for older x86 apps, and containment primitives for local AI agents.

    Those details matter more than the MacBook Pro comparison. Apple’s current advantage is not one chip or one benchmark. It is the boring, valuable mix of performance, battery life, unified memory, silence, app support, and predictable hardware behavior. Surface Laptop Ultra has to compete with that whole package.

    Why this is worth watching

    Surface Laptop Ultra could become a useful test case for the next phase of AI PCs. A lot of AI laptop talk has been stuck on NPU TOPS. This machine points at a different lane: local inference, CUDA-backed experimentation, video work, 3D rendering, and agent workflows that need a bigger shared memory pool.

    If the 128GB unified-memory configuration works as described, the appeal is obvious for developers who want to prototype with local models before moving serious jobs to the cloud. It could also matter for creators who already live inside Adobe, game engines, 3D tools, and GPU-heavy production software.

    The catch is that Windows on Arm still has to earn trust. Native apps are better than they were, and Prism emulation has improved, but professional buyers do not want a science project. They want Premiere, Photoshop, anti-cheat-protected games, IDEs, drivers, plugins, and weird old utilities to behave without becoming the day’s main problem.

    That is why this story fits the broader IT & AI archive: the hardware is interesting, but the platform question is the real story. Microsoft needs the laptop, the operating system, and the developer ecosystem to land at the same time.

    What Hacker News readers are arguing about

    The Hacker News thread was less impressed by the launch language than by the practical tradeoffs. Price came up first. Several commenters guessed that a 64GB or 128GB RTX Spark laptop would land somewhere around premium workstation pricing, with DGX Spark comparisons making a sub-$3,000 product sound unlikely.

    Fan noise became another sticking point. Some readers thought Microsoft’s promo emphasis on cooling was a strange way to chase MacBook Pro buyers, because one of Apple Silicon’s strongest selling points is how quiet it feels during normal work. Others pushed back: if you are running large local models or GPU-heavy creative jobs, fans are part of the deal.

    The most useful split was about local AI itself. One camp asked why anyone would run large models on a Windows laptop instead of using a server. The other camp wanted exactly that portability: a machine you can take to a coffee shop, run a coding model without depending on cloud access, and keep working when Wi-Fi is bad or locked down.

    There was also a familiar Windows skepticism. Some readers treated “built on Windows” as a warning label. Others brought up older Surface devices they still like, especially for unusual form factors, pens, keyboards, and portable creative work. The thread did not settle the question. It did make the buyer profile clearer: this only makes sense if local GPU work matters enough to pay for weight, heat, and price.

    The practical read

    Treat Surface Laptop Ultra as a platform bet, not a simple MacBook Pro clone. The spec list is strong enough to make Windows hardware interesting again for local AI, but the first reviews need to answer five plain questions.

    Can it stay quiet and fast under long AI or rendering jobs? Does battery life hold up when the GPU is actually doing work? Do x86 apps, anti-cheat systems, Adobe tools, drivers, and dev utilities behave on Windows on Arm? Is CUDA support easy to use on the laptop, or does it feel like a demo path? And does the price make sense against a MacBook Pro, a desktop workstation, or rented cloud GPU time?

    If Microsoft gets those answers right, Surface Laptop Ultra could give Windows developers and creators a serious local AI machine. If not, it will be another impressive Surface idea that people admire from a distance.

    Sources

  • CPU LLM inference: Gemma runs on a 2016 Xeon

    CPU LLM inference: Gemma runs on a 2016 Xeon

    CPU LLM inference usually sounds like a compromise you make when a GPU is unavailable. Christina Sorensen’s test makes the compromise more interesting: Gemma 4 26B-A4B ran at roughly reading speed on a 2016 Intel Xeon E5-2620 v4 server with no GPU, 128GB of DDR3 memory, and a long list of ik_llama.cpp flags. The useful lesson is not that old Xeons are suddenly better than GPUs. It is that memory bandwidth, KV cache size, speculative decoding, and engine control matter more than a simple hardware checklist.

    The short version

    • The test used one Intel Xeon E5-2620 v4, 8 physical cores, 16 threads, 128GB of DDR3 RAM, and no GPU.
    • Gemma 4 26B-A4B is described as a roughly 25.2B parameter Mixture-of-Experts model with about 3.8B active parameters per token.
    • The run needed about 82GB of memory at the full 262K context, with roughly 25GB for weights and 56GB for KV cache.
    • The practical win came from engine-level tuning: MTP speculative decoding, CPU-aware MoE routing, runtime repacking, Flash Attention, and explicit KV-cache handling.
    • For builders, the test is a reminder that local AI can make sense for privacy or batch jobs, but power draw, noise, and setup time still count.

    What happened

    Sorensen published a detailed run of Gemma 4 26B-A4B on a recycled server that looks weak by current AI standards. The CPU is a single Xeon E5-2620 v4 from 2016. It has AVX2, but no AVX-512, no AVX-VNNI, no BF16, and no integrated GPU. The memory is the saving grace and the bottleneck at the same time: 128GB is enough capacity, but DDR3 is slow compared with modern laptop memory.

    The run did not use a simple wrapper. The command line included --spec-type mtp, --draft-max 3, --cpu-moe, --merge-up-gate-experts, --run-time-repack, --flash-attn on, --mla-use 3, --mlock, and --no-kv-offload. Some of those flags are about speed. Some are about avoiding wasted work. Some are there because the engine has to be told, explicitly, that there is no GPU to lean on.

    The memory accounting is the part that should make people pause. At the full 262K context, the run needed 82,355 MiB for model tensors plus cache. The KV cache was larger than the model weights. That is a good mental reset for CPU LLM inference: once the context gets large, the short-term memory of the conversation can become the thing that dominates RAM.

    CPU LLM inference in plain terms

    The decoder phase of an LLM is often memory-bound. Each new token requires the system to stream model weights through memory and cache. On a GPU server, high-bandwidth memory hides a lot of that pain. On an old CPU box, the memory wall is right in your face.

    That is why the details in this post matter. Speculative decoding tries to get more useful tokens out of each expensive verifier pass by pairing the main model with a smaller drafter. CPU-aware MoE routing tries to keep expert weights from thrashing the cache. Runtime repacking reshapes weight matrices so the CPU can read them more efficiently. Flash Attention and MLA reduce the amount of attention and KV-cache data that has to be materialized in memory.

    None of this makes the setup friendly. It actually proves the opposite. If the only way to make CPU LLM inference usable is a 25-flag command, missing documentation, and logs that quietly downgrade unsupported settings, then the open-model stack still has a usability problem. The model may be open. The working recipe is harder to get.

    Why this is worth watching

    The interesting part is not nostalgia for old servers. It is the gap between “can run” and “can run well.” Local AI is full of that gap right now. A consumer tool may hide all the knobs, which is fine until the defaults waste RAM, miss a CPU optimization, or let a model swap to disk.

    This matters for teams that want local inference for internal documents, private workflows, or overnight automation. A slow local model can still be useful if the job is summarizing PDFs, drafting code comments, classifying logs, or running background research. For more stories like this, the IT & AI archive tracks practical AI tooling rather than launch-day hype.

    The catch is cost. A repurposed server is not free if it burns power, runs loud, and takes hours to tune. The right comparison is not “old Xeon versus H100.” It is “owned hardware for patient workloads versus hosted inference for fast ones.” CPU LLM inference belongs in that second-level decision, not in a slogan about replacing GPUs.

    What Hacker News readers are arguing about

    The Hacker News thread is mostly useful because it pushes back on the romance of the homelab. Several readers liked the privacy and offline angle, especially for data that should not leave a home or company network. Others pointed out that rack-era Xeon machines can be noisy, hot, and inefficient. One commenter compared old Xeon boxes with newer small Intel systems and argued that the modern machine is often faster while using far less power.

    A second thread of discussion focused on measurement. Readers questioned whether a tiny prompt such as “Why is the sky blue?” tells enough about real workloads. Coding, log analysis, and document tasks often start with thousands of input tokens, so prompt evaluation, prefix caching, and long-context behavior matter as much as output speed. That skepticism is fair. Reading-speed generation is useful, but it is not a full benchmark.

    There was also a more technical argument about cache and CPU choice. Some readers noted that older Xeons vary a lot, and modern consumer CPUs can have comparable or better cache behavior. Others brought up AMD 3D V-Cache and high-memory consumer systems as a better direction than keeping loud server hardware alive. The strongest practical takeaway from the thread: local inference is attractive when privacy or control matters, but hosted models may still be cheaper for casual batch jobs once electricity and time are included.

    The practical read

    If you are building with local models, treat this as a checklist, not a buying guide. Start with the workload. If the job is interactive chat, an old CPU box will probably frustrate users. If the job runs in the background and handles sensitive data, a slower local model can be fine.

    Then check memory before you check FLOPS. Model weights are only part of the footprint. Long context can make the KV cache bigger than the model itself, and swapping will destroy performance. After that, look at the engine. A wrapper that is easy to install may be the wrong tool if it hides the settings needed for your hardware.

    For app builders, the ASO angle is simple: local AI features should be marketed around privacy, offline use, and patient background work, not raw speed. CPU LLM inference is credible when the product promise matches the hardware reality.

    Sources

  • nice nano wireless keyboard: a dorm-room board that found a real market

    nice nano wireless keyboard: a dorm-room board that found a real market

    The nice nano wireless keyboard board is a good reminder that a hardware product does not need a huge category to become meaningful. Nick Winans built a Pro Micro-compatible wireless controller as a college freshman, sold the first 1,000 units in seven hours, and later saw more than 50,000 boards move through the custom keyboard world.

    The short version

    • The product worked because it fit the Pro Micro footprint that many DIY keyboard designs already expected.
    • The first group buy sold 1,000 units in seven hours, but the experience convinced Winans not to keep using preorder funding.
    • ZMK firmware turned the board from a clever part into a more complete wireless keyboard platform.
    • The bigger lesson is distribution: Reddit, Discord, vendors, and Typeractive made a tiny niche easier to buy into.

    What happened

    Winans started with a failed wireless keyboard project called the Dissatisfaction65. It looked good, but the typing latency was poor and the battery lasted only a few days even with a large battery. That pushed him toward Nordic chips, the Pro Micro form factor, and the gap between commercial wireless keyboards and the DIY keyboard scene.

    The nice nano wireless keyboard board came out of that search. Winans designed the first version over a weekend using KiCad, Nordic documentation, the nRFMicro wiki, and Adafruit’s nRF52840 Feather schematic. The result was a thin nRF52840 board that could drop into many keyboard builds designed around a Pro Micro.

    The early proof was practical. He built a Lily58 with the boards and saw weeks of battery life from a 110mAh battery. A Reddit post drew interest, the Discord community grew, and the first group buy sold out its 1,000-unit cap within seven hours. The order still created stress: customer money arrived before the physical product did, PayPal held funds, and fulfillment became a family operation.

    From there, the project became a small ecosystem. ZMK gave wireless keyboard builders a stronger firmware path. Vendors started carrying the board. In 2022, Winans and his family launched Typeractive, a store built around wireless split keyboard kits and a 3D configuration tool that helped buyers choose the right parts.

    Why this is worth watching

    The useful part of this story is not the dorm-room mythology. It is the constraint. Winans did not ask buyers to adopt a new keyboard architecture. He made the wireless part fit where the community already expected a controller to fit.

    That is a product lesson software teams often forget. A niche can be small and still be serious if the pain is specific, the buyers talk to each other, and the product slips into an existing workflow. For more technology briefs like this, the IT & AI archive keeps a running set of builder-focused stories.

    The nice nano wireless keyboard story also shows the tradeoff in open hardware. Public schematics and community firmware helped the board spread, but they also made cloning easier. Winans says clones appeared on Taobao and AliExpress, including products advertised as nice!nanos and shipped with the same firmware identity. That is not a clean win or loss. It is the usual bargain: openness can create trust and distribution, then force a founder to compete on quality, brand, support, and buying experience.

    nice nano wireless keyboard lesson

    The repeatable pattern is compatibility first, then community, then purchasing help. That order made the board easier to try and easier to recommend.

    What Hacker News readers are arguing about

    The Hacker News discussion is less about the circuit board and more about whether niche products can still make good businesses. Several commenters liked the simple framing: make something a small group badly wants, then reach that group directly. Others pushed back on the idea that any niche works. Their point was that 50,000 reachable, solvent, motivated buyers is rare, and finding them is often the hard part.

    Winans joined the thread and gave the most useful detail. He credited timing, a Reddit post during the early Covid period, fast community work in Discord, frequent updates, and a quick move into vendor storefronts. In other words, luck helped, but he also converted attention into a channel before it faded.

    The skeptical thread was about compliance and clones. Some readers asked about FCC obligations for an intentional radiator. Others argued that small hardware makers face a harsh choice between regulatory cost and shipping a product before the market is proven. The clone discussion split in a similar way: trademark enforcement may be possible in some channels, but cross-border hardware copying is rarely a neat problem.

    The most practical comments came from keyboard users. A few said they owned multiple boards, which explains why a part that sounds impossibly narrow can still sell in volume. A wireless split keyboard often needs two controllers, and hobbyists rarely stop at one build.

    The practical read

    If you are building a niche hardware product, the nice nano wireless keyboard case points to three tests. Does it fit an existing standard? Can buyers explain the pain to each other without your sales deck? Can a first-time buyer get from interest to a working build without getting lost?

    The board passed those tests better than most hobby projects. The Pro Micro footprint lowered adoption friction. ZMK made the firmware story credible. Typeractive reduced the shopping problem. None of that removes the ugly parts of hardware: cash timing, fulfillment, certification, clones, support, and inventory. It does explain why a small board could become a real business.

    For app and product builders, the discovery angle is similar. The 3D kit configurator was not decoration; it helped people assemble the right purchase. In a niche market, the buying path can be as important as the product spec.

    Sources