DreamBig Semiconductor Announces Partnership with Samsung Foundry to Launch Chiplets for World Leading MARS Chiplet Platform on 4nm FinFET Process Technology Featuring 3D HBM Integration
DreamBig’s open MARS Chiplet Platform with world leading Chiplet HubTM for scale-up and Networking IO Chiplets for scale-out enables customers to compose the most advanced AI solutions with UCIe/BoW compliant chiplets leveraging Samsung Foundry SF4X process technology
DreamBig empowers customers to forge the future of AI, datacenter, edge, storage, and automotive solutions by developing application specific processor/accelerator chiplets and composing products by adding the chiplets to MARS Chiplet Platform with world leading Chiplet Hub™ for scale-up and Networking IO Chiplets for scale-out.
DreamBig is the original pioneer of 3D HBM stacking on the Chiplet Hub™ improving performance and efficiency for systems that are memory dominated such as AI inference and training. 3D integrated HBM stacked on the Chiplet Hub™ is part of a composable memory hierarchy with SRAM, HBM, chiplet connected DDR, chiplet connected CXL memory expansion, and chiplet connected PCIe SSD storage. The Chiplet Hub™ memory hierarchy is accelerated by integration of a highly differentiated and fully associative hardware managed Final Level Cache (from FLC Technology Group). DreamBig is partnering with Samsung Foundry to bring the innovative Chiplet Hub™ and Networking IO Chiplets to market for MARS Chiplet Platform. This collaboration leverages Samsung Foundry’s proven broad expertise and its combined technology leadership, including the state-of-the-art SF4X FinFET process, a robust ecosystem partnership for 3D chip-on-wafer stacking advanced packaging, and HBM memory synergy.
“DreamBig is disrupting the industry with the leading open chiplet solution for AI. The differentiated technology integrated in the Chiplet Hub™ serves the most demanding AI, datacenter, edge, storage, and automotive use cases. The Chiplet Hub™ base die implemented with advanced capabilities of Samsung Foundry SF4X process and 3D manufacturing provides the best performance and latency while achieving low power required for 3D integration of HBM that has eluded the industry,” stated Sohail Syed, Co-founder and CEO of DreamBig. “DreamBig MARS Chiplet Platform combining Chiplet Hub™ with 3D HBM, Networking IO Chiplets, customer AI processor/accelerator chiplets, and leveraging Silicon Box Advanced Panel Level Packaging enables unparalleled scale-up and scale-out solutions so customers can achieve the highest levels of performance and energy efficiency at lowest cost and fastest time-to-market.”
“Next-generation AI applications and chiplet-based system designs are converging, and DreamBig’s MARS Chiplet Platform is well-positioned to deliver chiplet-based AI solutions with 3D HBM integration,” said Mijung Noh, vice president and the head of Foundry Business Development 1 Team at Samsung Electronics. “We are thrilled to partner with DreamBig on our SF4X process technology with a full array of design enablement for chiplet-based architectures. This successful collaboration leverages Samsung’s solution with optimized silicon technology, memory, and advanced packaging.”
About DreamBig :
Founded in 2019, DreamBig is developing a cutting-edge chiplet platform that drives the next wave of affordable, scalable, and modular semiconductor solutions for the AI era and beyond. The company’s specialties include applications in Large Language Models (LLMs), Generative AI, Data Centers, Edge computing, and Automotive sectors. DreamBig is renowned for providing the most advanced Chiplet Hub™, facilitating the scaling of processor, accelerator, and networking chiplets.
DreamBig closes $75M Series B Funding Round
DreamBig Semiconductor Inc., a pioneer in high-performance accelerator platforms utilizing its industry-leading Chiplet Hub™ with 3D HBM, raised $75M in series B equity funding round.
This round was co-led by the Samsung Catalyst Fund and the Sutardja Family. New investors include Samsung, Hanwha, Event Horizon, and Raptor, alongside continuing contributions from existing stakeholders including the Sutardja Family, UMC Capital, BRV, Ignite Innovation Fund, Grandfull Fund, amongst others. These funds will bolster the development and commercialization of products built on DreamBig’s Chiplet Hub™ and Platform Chiplets
DreamBig Announces world leading 800G AI-SuperNIC chip (Mercury) with Fully HW Offloaded RoCE v2 + UEC RDMA Engine
Mercury delivers 800Gbps bandwidth connectivity for any AI chipset with industry leading throughput of 800Mpps while minimizing power consumption, latency, and area. It features a hardened RDMA engine with programmable congestion control for RoCE v2 and UEC standards, making it an ideal solution for AI and HPC applications. When integrated as a chiplet compatible component with previously announced DreamBig Mars Chiplet Platform, the solution can scale to 12.8Tbps RDMA throughput.
San Jose, Calif., January. 6, 2025 /*Newswire*/ — DreamBig Semiconductor, Inc. today proudly unveils Mercury AI-SuperNIC, that enables scaling-out of next generation AI Platforms by seamlessly connecting GPUs with unparalleled efficiency.
DreamBig Mercury chip features a hardware accelerated RDMA engine that supports existing RoCE (RDMA over Converged Ethernet) v2 and new UEC (Ultra Ethernet Consortium) standards, delivering best-in-class bandwidth (800Gbps) and throughput (800Mpps) with lowest power, ultra low latency, and smallest area.
Mercury is designed with fully programmable Congestion Control to adapt to any data center and provides the following critical functions for AI applications
Multi-pathing and packet spraying
Out-of-order packet placement with in-order message delivery
Programmable congestion control for RoCE v2 and UEC algorithms
Advanced packet trimming and telemetry congestion notifications
Support for selective retransmission
Mercury provides UEC-compliant software drivers enabling GPU-to-GPU communication with exceptional throughput and minimal latency.
“This year marks the 25th anniversary that the team we have assembled at DreamBig has been developing RDMA together, from Infiniband to iWARP to RoCE, and now UEC. All that experience has gone into re-architecting RDMA from the ground up for the AI era,” said Steve Majors, SVP of Engineering at DreamBig Semiconductor. “With the efficiency of full HW RDMA offload, innovative layering to adapt to evolving Ethernet transport modes and congestion control methods, and unparalleld performance scaling from 800Gbps to 12.8Tbps, DreamBig is setting the bar for the next several generations of AI networking.”
As a groundbreaking monolithic chip, Mercury redefines 800Gbps discrete networking for TPUs/GPUs, combining unmatched performance, power efficiency, and cost-effectiveness to meet the demands of next-generation AI platforms. As a chiplet compatible component, Mercury provides unparalleled scaling of upto 12.8Tbps integrated networking for AI SuperChips.
To learn more, come visit DreamBig technology solution demo January 7-10 at CES 2025 – The Most Powerful Tech Event in the World in the Venetian Bellini Suite #XX, Las Vegas, Nevada.
About DreamBig Semiconductor:
DreamBig, founded in 2019, is developing a disruptive and world leading chiplet platform that enables customers to develop the next generation of responsible, affordable, scalable, and composable semiconductor chiplet solutions for the AI revolution and digital world. The company specializes in high performance applications used for Large Language Model (LLM), Generative AI, Datacenter, Edge, and Automotive markets.
DreamBig provides the industry’s most advanced Chiplet Hub to scale up processor, accelerator, and networking Chiplets.
Dream Big Virtual Prototyping Platform
DreamBig Semiconductor (DBS) stands out as a player in the semiconductor sector, established by a group of visionaries with a solid history in the industry.
DreamBig has transformed the semiconductor-based data processing and networking industry by offering low-cost, low-latency, high-throughput solutions using cutting-edge chiplet technology by way of its MARS Chiplet Platform. Beyond the bleeding edge chiplet design philosophy, DreamBig is also redefining how the semiconductor industry validates large silicon design projects with its Virtual Prototyping Platform (VPP). DreamBig’s VPP, which enables different ecosystem partners to develop and test customized chiplet solutions together in a single environment, allows for early architectural exploration, design validation, and chiplet interoperability testing. All which support the goal of an earlier time-to-market.
Next-generation Chiplet-based DPU Architecture
Introduction: The Emergence of Chiplet-based Data Processing Unit (DPU) Architectures
Cloud Service Providers (CSPs) are in the process of revamping their data center architectures, driven by the emergence of optimized processors for domain-specific workloads, especially Data Processing Units (DPUs). The DPU has emerged as an autonomous coprocessor and as a result has taken over many of the compute and infrastructure processing functions; for example, server virtualization, networking, storage, and security, that are traditionally performed by the CPU. One of the major goals behind deployment of DPUs is to have server CPUs wholly dedicated to run application workloads. As a result, DPUs could potentially enable optimum performance for various cloud data center use-cases, including hyperscale, telco, enterprise and hybrid clouds with improved security and reliability, and improved return on investment (ROI).
DPUs have been traditionally implemented by using monolithic system-on-chip (SoC) architecture. While the shift to smaller silicon geometries has enabled larger SoC devices with big caches and many cores, but each new silicon technology generation requires a huge investment and results in significantly higher development and die costs. Consequently, developing SoC-based DPUs supporting the range of required use cases is becoming commercially unappealing, especially in light of several important industry trends.
Moore’s Law provided consistent advances in transistor technology over decades that enabled chip vendors to meet the demands of monolithic SoC architectures, but only lately Moore’s law has witnessed a historic slowdown. As a result, transitioning to the next node has become much more costly while offering minimal advantage. Relatedly, due to the significant increase in cost in architecting SoCs, the industry is seeing the emergence of chip designs using several small dies, or chiplets, rather than one monolithic die.
The use of chiplets for DPU architectures can enable the support of power and area goals while providing the flexibility and product modularity required to support different use cases in a single DPU package. A chiplet-based architecture can support an integrated DPU device which uses components developed using different silicon technologies while also providing innovative solutions that utilize best-of-breed components. As a result, the use of chiplets offers a cost-effective approach for providing DPUs that support a wide range of use cases.
SoC to Chiplet Transition Enables Moore’s Law Extension
The benefits provided by Moore’s Law through enabling steady advances in development of complex SoCs such as DPUs are starting to diminish. Multiplying transistor density now takes three or four years instead of two. Each boost in density comes with a dramatic increase in wafer expense, generating modest or no decrease in cost per transistor, a key principle of Moore’s Law. Speed and power increases have also lessened with each new evolution in transistor density or node. In short, transitioning to the next node has become much more costly while offering minimal advantage.
Due to the exponential increase in cost of developing leading-edge SoCs, only the very biggest suppliers have been able to develop monolithic chip designs for CPUs and DPUs. To compensate for soaring design costs and boost manufacturing yields, leading vendors have instead begun to adopt designs. Intel and Marvell have introduced chiplet-based products while AMD and Amazon Web Services (AWS) employ chiplets in new EPYC and Graviton server processors respectively. However, most of such early design chiplets have been designed exclusively in-house.
Further cost savings can come from creating different (heterogeneous) chiplets using different manufacturing nodes, which is impossible in a monolithic design. For example, DPU chiplet designs could segregate I/O functions into a separate die manufactured in an older node. Some logic circuitry, such as accelerators, may not need to run at the same maximum clock rate as the main processor and thus can be fabricated in an intermediate node. Using older process technology can reduce the manufacturing cost of those chiplets as well as optimize aggregate power consumption of the aggregate chiplet-based DPU design.
Die-to-Die Interconnect Standardization Enables Heterogenous Chiplet-based Designs.
Heterogenous chiplet-based designs require standardizing die-to-die (D2D) interconnects so that chiplets from multiple vendors may be seamlessly integrated. Otherwise, each chiplet remains vendor-specific, which reduces the economic advantage of disaggregating the design.
Over the past few years, a broad range of cloud computing and semiconductor industry stalwarts have introduced open-source designs for die-to-die interconnects between chiplets, thus reducing costs and fostering a broader ecosystem of validated chiplets. In 2019 the Open Domain Specific Architecture (ODSA) subgroup within the Open Compute Project introduced a Bunch of Wires (BoW) die-to-die interconnects for providing a standardized connection between chiplets such as processor DPUs, and cores, memory, and I/O, that operates like on-die connections.
Similarly, major semiconductor companies, including Intel, AMD, and Arm, have introduced the UCIe chiplet interconnect standard based on existing PCIe and CXL protocols which will also support latency and bandwidth requirements for rack-scale designs.
Optimum Support of Data Center Use Cases Requires Disaggregated Chiplet-based DPUs
Cloud operators can potentially reap tremendous benefits from DPU support for bare metal virtualization, while controlling and provisioning servers with isolation and security from their tenants. DPUs can also possibly enable bare-metal data centers where the entire server hypervisor is offloaded to the DPU.
By accelerating network services such as Open Virtual Switch (OVS) and virtual router (vRouter) functions, the DPU could allow data centers to also support Network Function Virtualization (NFV) and a range of additional security, filtering, and analytics use-cases.
The DPU could also facilitate efficient pooling through permitting significantly higher utilization while offloading various networking, storage, and security functions. For example, through enabling pools of processors, pools of AI/GPU clusters and pools of storage, allows cloud operators to dynamically assign resources based on the specific AI application need; for example, inference or training.
DPUs with embedded hardware-based security processing may provide East-West firewalls to every server in the data center to meet the zero-trust imperative. DPUs may also be beneficial for offloading of inline or lookaside encryption security using IPsec and SSL/TLS for encryption/decryption of data-in-motion and data-at-rest.
While monolithic SoC-based DPUs may address the data center use cases discussed above, the time and cost required for designing such devices (which are only applicable to smaller market segments as compared to general purpose CPUs) can be very demanding. It requires deep expertise in networking, storage, and security protocol domains. In addition, the investment required to fabricate traditional monolithic architecture based DPUs using more sophisticated process nodes can be astonishing.
Developing DPUs using chiplets with smaller die sizes can drastically cut semiconductor development times while lowering manufacturing costs. DPUs built using chiplets also provide additional major benefits. Chiplet-based DPUs can utilize best-of-breed chips that incorporate components from multiple vendors, each with domain expertise in specific use case areas (e.g., processors, networking, security, and storage). As a result, cost and time-to-market for DPUs for specific use cases can be optimized through using smaller chiplet dies, different process nodes for specific chiplets as appropriate with a strong emphasis on core competency.
DreamBig Semiconductor: Next-Generation Chiplet-based Disruptive DPU Architecture
The data center market is facing a strategic challenge as it attempts to meet the requirements for democratizing development and manufacturing of chiplet-based DPU silicon. Driven by standardized connectivity between chiplets, there is a critical need for an open ecosystem in which chiplets remain interoperable across different vendors while enabling support for complete end-to-end specifications to meet aggressive customer DPU use-cases. This is exactly where DreamBig Semiconductor, with its proven architecture and validated chiplet ecosystem, brings significant value.
DreamBig Semiconductor (DBS) is a pioneer in utilizing the chiplet approach to dramatically reduce the cost and time required for developing disaggregated DPUs for the full range of data-intensive use cases. DBS accomplishes this through offering chiplets for differentiated DPU functions, such as computer, networking, storage, and security functions, based on leading-edge process technology and through leveraging an established partner ecosystem of chiplet vendors for standard, common functions. As a result, customers can rapidly and cost-effectively develop chiplet-based DPUs which can optimally meet the needs of their specific data-intensive use cases.
The DBS chiplet-based DPU architecture will provide end-users with greater flexibility, open new frontiers for component reuse and enable innovation on price, performance, and power consumption across the full continuum of DPU use-cases. As a result, the DBS DPU architecture will allow users to bring together design IP and process technologies from an established ecosystem of vendors enabled not only by standardization of Die-to-Die (D2D) interfaces, but also through interoperability across different vendors and foundries while at the same time supporting multiple process nodes (both leading-edge and established) and packaging technologies.
Towards that goal, DBS is also focused on enabling a best-of-breed partner ecosystem of chiplet manufacturers to offer DPU users standardized, interoperable hardware as a critical priority. This ecosystem supports complete end-to-end specifications including protocols, packaging, testing, and manufacturing, to meet aggressive and diverse customer time-to-market, cost, and use-case requirements.
Author
Saqib jang
Investigating content streaming using accelerated QUIC Stream frame packing
This paper investigates QUIC fast path optimizations based on short header packet and STREAM frame generation mechanisms.
This is a first step towards moving parts of QUIC user space implementations to a fast path infrastructure Details in this paper will serve as guidelines for SmartNIC OEMs who are interested in supporting QUIC hardware acceleration. The investigation provides details of each field of the QUIC short packet header and QUIC STREAM frame, along with the frequency of state synchronization required for QFP to generate valid QUIC packets for an ongoing, established QUIC stream. The paper also covers complexities and tradeoffs inherent in this approach due to QUIC's advanced mechanisms. Vendors implementing QUIC fast path can use the most suitable approach based on their implementation and use case. DreamBig Semiconductor is working on a QFP prototype by creating a framework which will enable QUIC fast path offload support in the Linux kernel to enable various scenarios covered in section 2. DreamBig Semiconductor is open to partnering and engaging with others who may be interested in development of QUIC fast path.
RDMA Performance Report
Earlier this year, DreamBig announced Chiplet Hub with industry leading 3D HBM integration and Memory-First architecture solving some of the most difficult challenges with scaling silicon infrastructure for AI.
Unveiling now an equally challenging void by unlocking the highest performance HW offloaded RDMA solutions for open standard integration into any AI or Computing platform. Now you have access to unparalleled RDMA performance and integrate it as best suits your needs.
RDMA Validation Report
DreamBig team has done it again. Sharing this RDMA Validation Report, no other RDMA IP available in the market can support all application use cases from integrated AI Super Chip to discrete AI Performance NIC to discrete cloud DPU/IPU while delivering industry-leading performance and the only verification environment you can drop into your datacenter to co-develop and verify system-level applications as part of product development.
Imagine how much easier this will make the transition to UEC/UAL. Read the RDMA Validation Report and you decide how RDMA works for you, and not the way traditional vendors decided to make it work for you.
DreamBig World Leading "MARS" Open Chiplet Platform
DreamBig Semiconductor, Inc. today unveiled “MARS”, a world leading platform to enable a new generation of semiconductor solutions using open standard Chiplets for the mass market. This disruptive platform will democratize silicon by enabling startups or any size company to scale-up and scale-out LLM, Generative AI, Automotive, Datacenter, and Edge solutions with optimized performance and energy efficiency.
To learn more, come see DreamBig technology solution demo January 9-12 at CES 2024
DreamBig World Leading “MARS” Open Chiplet Platform Unveiled at CES 2024
DreamBig World Leading “MARS” Open Chiplet Platform Enables Scaling of Next Generation Large Language Model (LLM), Generative AI, and Automotive Semiconductor Solutions
DreamBig Semiconductor, Inc. today unveiled “MARS”, a world leading platform to enable a new generation of semiconductor solutions using open standard Chiplets for the mass market. This disruptive platform will democratize silicon by enabling startups or any size company to scale-up and scale-out LLM, Generative AI, Automotive, Datacenter, and Edge solutions with optimized performance and energy efficiency.
DreamBig “MARS” Chiplet Platform allows customers to focus investment on the areas of silicon where they can differentiate to have competitive advantage and bring a solution to market faster at lower cost by leveraging the rest of the open standard chiplets available in the platform. This is particularly critical for the fast moving AI training and inference market where the best performance and energy efficiency can be achieved when the solution is application specific.
“DreamBig is disrupting the industry by providing the most advanced open chiplet platform for customers to innovate never before possible solutions combining their specialized hardware chiplets with infrastructure that scales up and out maintaining affordable and efficient modular product development,” said Sohail Syed, CEO of DreamBig Semiconductor.”
DreamBig “MARS” Chiplet Platform solves the two biggest technical challenges facing HW developers of AI servers and accelerators – scaling up compute and scaling out networking. The Chiplet Hub is the most advanced 3D memory first architecture in the industry with direct access to both SRAM and DRAM tiers by all compute, accelerator, and networking chiplets for data movement, data caching, or data processing. Chiplet Hubs can be tiled in a package to scale-up at highest performance and energy efficiency. RDMA Ethernet Networking Chiplets provide unparalleled scale-out performance and energy efficiency between devices and systems with independent selection of data path BW and control path packet processing rate.
“Customers can now focus on designing the most innovative AI compute and accelerator technology chiplets optimized for their applications and use the most advanced DreamBig Chiplet Platform to scale-up and scale-out to achieve maximum performance and energy efficiency,” said Steve Majors, SVP of Engineering at DreamBig Semiconductor. “By establishing leadership with 3D HBM backed by multiple memory tiers under HW control in Silicon Box advanced packaging that provides highest performance at lowest cost without the yield and availability issues plaguing the industry, the barriers to scale are eliminated.”
The Platform Chiplet Hub and Networking Chiplets offer the following differentiated features:
Open standard interfaces and architecture agnostic support for CPU, AI, Accelerator, IO, and Memory Chiplets that customers can compose in a package
Secure boot and management of chiplets as a unified system-in-package similar to a platform motherboard of chips
Memory First Architecture with direct access from all chiplets to cache/memory tiers including low-latency SRAM/3D HBM stacked on Chiplet Hubs and high-capacity DDR/CXL/SSD on chiplets
FLC Technology Group fully associative HW acceleration for cache/memory tiers
HW DMA and RDMA for direct placement of data to any memory tier from any local or remote source
Algorithmic TCAM HW acceleration for Match/Action when scaled-out to cloud
Virtual PCIe/CXL switch for flexible root port or endpoint resource allocation
Optimized for Silicon Box advanced Panel Level Packaging to achieve the best performance/power/cost – an alternative to CoWoS for the AI mass market
Customers are currently working with DreamBig on next generation devices for the following use cases:
HAI Servers and Accelerators
High-end Datacenter and Low-end Edge Servers
Petabyte Storage Servers
DPUs and DPU Smart Switches
Automotive ADAS, Infotainment, Zonal Processors
“We are very proud of what DreamBig has achieved establishing leadership in driving a key pillar of the market for high performance, energy conscious, and highly scalable AI solutions to serve the world,” stated Sehat Sutardja and Weili Dai, Co-founders and Chairman/Chairwoman of DreamBig. “The company has raised the technology bar to lead the semiconductor industry by delivering the next generation of open chiplet solutions such as Large Language Model (LLM), Generative AI, Datacenter, and Automotive solutions for the global mass market.”
To learn more, come see DreamBig technology solution demo January 9-12 at CES 2024 – The Most Powerful Tech Event in the World in The Venetian Expo, Bellini 2003 Meeting Room.
About DreamBig Semiconductor
DreamBig, founded in 2019, is developing a disruptive and world leading Chiplet Platform that enables customers to bring to market the next generation of high performance, energy conscious, affordable, scalable, and composable semiconductor chiplet solutions for the AI revolution and digital world. The company specializes in high performance applications used for Large Language Model (LLM), Generative AI, Datacenter, Edge, and Automotive markets.
DreamBig provides the industry’s most advanced Chiplet Hub to scale-up compute/accelerator Chiplets, and the most advanced Networking Chiplets to scale-out.
Author
DreamBig Semiconductor Inc. || PR News Wire
DreamBig Semiconductor Participated in SmartNICs SUMMIT 2023
We are pleased to announce that we participated in SmartNICs SUMMIT 2023 & presented our MARS SmartNIC LAN & RDMA Device Model’ we demonstrated (IPsec, Match/Action, MAC Filtering, Checksum Verification, and RSS) offloads for MARS LAN Device Model.
Our Deimos Chiplet Hub leverages Arm Flexible Access for Startups
Our CEO & president was quoted by Arm as he said:
“Arm Flexible Access for Startups was a game-changer for us. It enabled us to innovate on top of Arm’s world-class IP, access its broad ecosystem in a cost-efficient way and prototype our industry-leading Deimos Chiplet Hub for next-generation data center solutions.”