Databricks
Canvas Category Software : Information Technology : Data & AI
With origins in academia and the open source community, Databricks was founded in 2013 by the original creators of Apache Spark™, Delta Lake and MLflow. As the world’s first and only lakehouse platform in the cloud, Databricks combines the best of data warehouses and data lakes to offer an open and unified platform for data and AI.
Assembly Line
Databricks is Raising $10B Series J Investment at $62B Valuation
Databricks, the Data and AI company, announced its Series J funding. The company is raising $10 billion of expected non-dilutive financing and has completed $8.6 billion to date. This funding values Databricks at $62 billion and is led by Thrive Capital. Along with Thrive, the round is co-led by Andreessen Horowitz, DST Global, GIC, Insight Partners and WCM Investment Management. Other significant participants include existing investor Ontario Teachers’ Pension Plan and new investors ICONIQ Growth, MGX, Sands Capital and Wellington Management.
The company has seen increased momentum and accelerated growth (over 60% year-over-year) in recent quarters largely due to the unprecedented interest in artificial intelligence. To satisfy customer demand, Databricks intends to invest this capital towards new AI products, acquisitions, and significant expansion of its international go-to-market operations. In addition to fueling its growth, this capital is expected to be used towards providing liquidity for current and former employees, as well as pay related taxes. Finally, this quarter marks the first time the company is expected to achieve positive free cash flow.
Kongsberg Digital Announces Availability of Kognitwin on the Databricks Data Intelligence Platform
Kongsberg Digital’s digital twin, Kognitwin, is now available on the Databricks Data Intelligence Platform. This partnership combines Databricks’ data intelligence with Kognitwin’s interface, allowing businesses to integrate real-time data, reduce costs, and optimize operations for better decision-making.
Customers can now also fully utilize their data-driven digital twin with Databricks Delta Sharing. Delta Sharing simplifies the process by eliminating the need to replicate data, making data analysis and secure access smoother. This integration ensures that users always work with the most current data and simulations.
Unilever transforms operations with GenAI using the Databricks Data Intelligence Platform
Optimizing global journeys through secure data connectivity
Operating in over 190 countries, Amadeus provides cutting-edge solutions to the travel industry, including services for more than 400 airlines and over 1 million hotels. To efficiently handle the high volumes of data from various platforms and optimize operational costs, Amadeus recognized the need to migrate to the cloud and enhance their data management capabilities. By adopting the Databricks Data Intelligence Platform on Azure, Amadeus has embraced a future-focused approach to data democratization and secure sharing. This transition supports faster innovation and enables both internal teams and customers to benefit from more personalized travel experiences. By leveraging the Databricks Platform to create a Data Mesh, Amadeus can seamlessly integrate diverse data sources to predict, plan and deliver exceptional and tailored travel experiences at scale.
Solving global trade challenges across the supply chain
Altana, the world’s first value chain management system, is on a mission to build a more transparent and reliable global supply chain. Altana needs to create many disparate models and manage an enormous amount of data for various customer use cases. Databricks has facilitated managing and optimizing their data and machine learning (ML) workflows, to rapidly iterate and fine-tune models for diverse customer needs. Prior to Databricks, the company had to create boilerplate tools for launching GenAI applications, diverting focus from cutting-edge product functionality. After leveraging Databricks Mosaic AI for their generative AI product development, Altana could streamline their ML lifecycle, enhance collaboration across teams and improve operational efficiency. This transition allowed them to deploy models 20 times faster and achieve up to 50% better performance, enabling customer-centric innovations that were previously unattainable.
Altana uses generative AI in several innovative ways. One use case involves streamlining cross-border trading needs by generating accurate tax and tariff classifications for goods, as well as producing comprehensive legal write-ups. “These models generate what is the right tax and tariff classification of this good and then auto-generates an entire legal write-up that a customer can use for justification purposes,” Cadieu described. Another use case focuses on enhancing their global supply chain knowledge graph by integrating customer data in a privacy-preserving manner, thereby enriching the understanding of supply chain dependencies and enabling strategic decision-making.
Intelligent Data-Driven Energy Management: Data usage in Michelin's energy project
Centralized Monitoring and follow for energy to each factories in Michelin, down to the equipment in 65 factories. Data cleansing in Databricks for the use case.
ServiceNow partners with Databricks to deliver Zero Copy integration that supercharges machine learning and AI capabilities
ServiceNow (NYSE: NOW), the AI platform for business transformation, and Databricks, the Data and AI company, announced a Zero Copy partnership that supercharges machine learning and AI capabilities. Databricks’ Delta Sharing will enable ServiceNow to offer Zero Copy, high‑bandwidth, bi‑directional, and secure integration with the Databricks Data Intelligence Platform so customers can turn data and insights into instant, AI‑powered action.
Bi‑directional data exchange with the Zero Copy integration will allow Databricks customers to access data within the ServiceNow platform via ServiceNow’s RaptorDB high‑performance database to analyze, enhance, and combine different sets of company data. Additionally, ServiceNow customers will be able to access rich data and insights from Databricks to trigger workflows in ServiceNow.
Databricks will also enable ServiceNow customers to build, test, and deploy custom GenAI applications tailored to their unique business needs. The partnership also amplifies ServiceNow’s agentic AI workflow capabilities. Customers will be able to design and deploy AI‑driven workflows on the ServiceNow platform that combine Databricks’ AI capabilities with ServiceNow’s workflow automation, creating predictive and self‑optimizing business processes. This includes training custom models from Databricks and integrating them into the ServiceNow platform.
Virgin Atlantic's VP of Data and AI on chatbots, transformation and enterprise astrophysics
As VP of Data and AI, Masters’ remit covers all enterprise data at Virgin Atlantic, ranging from the information generated by its commercial activity all the way to the data from engineering or maintenance teams. He’s responsible for forging the links between data and AI, focusing on “how we build the engines and algorithms”.
The airline is now on a GenAI journey which began with relatively basic use cases such as summarisation and categorisation of messages from email, web forms or other sources, which is then combined and processed in combination with its knowledge policies and procedures, as well as details of the tone of voice and other brand-related information to makes the responses of an AI model more consistent, convincing and human in tone. It was the first airline to adopt fully automated generative AI pricing technology, which is now live and being used to price selected routes. Customer service chatbots demonstrate the power of GenAI, Masters says, which offers transformative benefits for both passengers and airline staff.
APA Group delivers new ERP and cloud-based data strategy
According to Butler, the adoption of Delta Live Tables assisted APA when the company needed to develop integrations for the same source system for two separate projects, one of which was its ERP.
Responding to a question from iTnews, Butler elaborated: “We had ERP and we had another large-scale, similar project somewhere else in the business and they both had their own reporting requirements.
“We didn’t want two different methods for doing data warehousing and reporting. So, we created the third project, which was to stand up Databricks, to facilitate those two [projects].
How NOV solved critical oilfield operations using Databricks Data Intelligence Platform
NOV is a global oilfield services corporation. In order to better achieve scale and achieve data driven decisions across their operations, NOV aspired to create a data command center involving a comprehensive approach to ingest and seamlessly integrate real-time streaming data with historical batch data to provide a holistic overview of well site activities.They have implemented advanced data analytics and visualization tools to make sure that operators can promptly identify anomalies, potential issues, and trends from the well data.
Rigs generate over 1 billion rows of data per day from 30,000+ sensors streaming at 1 Hz or higher. This sensor data streams seamlessly into central repositories such as Aveva PI, formerly known as OSIsoft, yet corporate data and equipment details remain siloed, needing duplicated efforts due to limited data access. A robust model deployment pipeline significantly improves data-driven optimization. Modernizing to a unified, scalable data environment is critical for unlocking data-driven insights.
Implementing Databricks on AWS has proven instrumental in advancing CBM strategies within the realm of drilling equipment, leading to substantial improvements in the supply chain management of such crucial assets. By harnessing the power of data analytics and ML on the Databricks Data Intelligence Platform organizations can efficiently monitor the health and performance of drilling equipment in real-time.
Manufacturing and Transportation Industry Forum | DAIS 2024
Innovating for Sustainable Future: Williams' Journey with NextGen Gas
Enhancing Audit Efficiency at Hapag-Lloyd with Generative AI
Parker Aerospace writes innovative safety algorithm on Azure Databricks
Parker Aerospace partnered with its customer Republic Airways to better understand how its parts were performing in flight, but it needed to overcome infrastructure challenges that limited its data processing capacity. The company adopted Azure Databricks, reducing processing time from 14 hours to two. This enhanced operational capacity has paved the way for Parker Aerospace to further improve its ability to support aviation operations.
The company is using Azure Databricks to completely transform its analytics capabilities. Republic Airways sends each day’s aircraft operational data to Parker Aerospace for analysis—a massive and constant influx of raw data. “Now, we can process large new data sets nightly,” explains Lim. By leveraging advanced analytics algorithms, Azure Databricks provides more accurate insights into aircraft performance and maintenance needs, which helps the airline optimize its operations and improve service delivery. Parker Aerospace has also integrated Power BI with Azure Databricks, which enables Parker Aerospace to visualize data in new ways and create report dashboards for Republic Airways.
The partnership between Parker Aerospace and Republic Airways is mutually beneficial. In addition to supplying Parker Aerospace with insights, the Leak Detection Algorithm integrates seamlessly into an interactive predictive maintenance dashboard in Power BI powered via Azure Databricks, providing the airline’s maintenance team with insights into the performance of their fleet. This dashboard allows maintenance personnel to visualize and analyze data trends, enabling them to make informed decisions about maintenance priorities and resource allocation. The dashboard features predictive analytics capabilities through Azure Databricks, which can forecast potential maintenance issues based on historical data and enhance maintenance planning and efficiency. “Microsoft had all the necessary resources we needed to make the collaboration with Republic Airways possible,” says Austin Major, VP of Sales and Business Development at Parker Aerospace.
Introducing HORIZON: Pioneering Critical Metals Exploration with Deep Learning and Databricks
Durendal Resources has developed HORIZON, an advanced Deep Neural Network designed for mineral exploration. Named the High-Resolution Ore Investigation Network, HORIZON can accurately classify mineral occurrences without relying on surface geochemical or direct detection methods, making it a valuable tool for exploring areas with cover. Utilizing deep learning, HORIZON analyzes extensive geological and exploration data to reveal hidden patterns Deng et al. (2021).
HORIZON represents the first step in a revolutionary suite of models designed to transform mineral exploration worldwide. While significant progress has been made, further development is needed to create models that can make highly accurate predictions with minimal geoscience data. This ongoing work promises to enhance the efficiency and effectiveness of mineral exploration, uncovering hidden resources and driving the industry forward.
Databricks and ARM Hub sign partnership to boost manufacturing
ARM Hub, Australia’s leading artificial intelligence (AI), robotics and design-for-manufacture industry hub is partnering with world-leading data and AI platform, Databricks. This international alliance is set to transform Australia’s manufacturing landscape by enabling smarter decision-making through affordable, safe, relevant and up-to-date data and AI solutions.
Unveiling Databricks power in analyzing electrical grid assets using computer vision
Data is ingested from an EPRI dataset consisting of images of distribution assets along with labels for each object. These are ingested into Delta tables and transformed through the medallion architecture in order to produce a dataset that is ready for model training.
After data loading has been completed, the training can begin. In the age of GenAI, there is a scarcity of large GPU’s leaving only the smaller ones that can significantly impact training and experimentation times. In order to combat this, Databricks allows you to run distributed GPU training using features like PytorchDistributor. This accelerator takes advantage of this to utilize a cluster of commodity GPU’s to train our model which brings the training time down almost linearly.
AVEVA and Databricks Forge Strategic Collaboration to Accelerate Industrial AI Outcomes and Enable a Connected Industrial Ecosystem
AVEVA, a global leader in industrial software, and Databricks, the Data and AI company and inventor of the data lakehouse, proudly announce a strategic partnership aimed at reshaping the future of industrial software through an open, secure, and governed approach to data and AI. This collaboration will drive a closer integration between AVEVA Connect, AVEVA’s industrial intelligence platform, and the Databricks Data Intelligence Platform.
The partnership will act as a catalyst for AVEVA’s vision of a connected industrial ecosystem by enabling seamless and secure sharing of data amongst different industry applications and community stakeholders, orchestrated by AVEVA Connect. An open approach for industry collaboration and secure, cross-platform data sharing, powered by Databricks’ Delta Sharing, will help mutual customers in manufacturing, energy, consumer packaged goods and pharmaceutical sectors to identify new sources of value and gain flexibility in industrial processes, without compromising on security and governance.
Industrial stream data management in Databricks
Italgas: from gas pipelines to data pipelines — Fueling our reporting with the latest innovations.
Utilities is a data intensive sector for three main reasons: the core process of utilities consists in metering and billing the usage of gas (or water or electricity) for millions of customers, an intrinsically data rich activity, even when it was done entirely manually. Today the Italgas Nimbus smart meter remotely manages the metering of multiple types of gas for the same customer.
Exploring the latest advancements in data technologies, we experimented with two simplifications. Regarding the second simplification, the new Databricks native connectors to PowerBI allowed a seamless integration of the reports directly with the Lakehouse. Databricks SQL Warehouses were about 50% faster thanks to optimizations like Photon Engine, optimized caching, query routing, etc.
Manufacturing Insights: Calculating Streaming Integrals on Low-Latency Sensor Data
Calculating integrals is an important tool in a toolbelt for modern data engineers working on real-world sensor data. While storing this is cheap, transmitting it may not be, and many IoT production systems today have methods to distill this deluge of data. Many sensors, or their intermediate systems, are set up to only transmit a reading when something “interesting” happens, such as changing from one binary state to another or measurements that are 5% different than the last. Therefore, for the data engineer, the absence of new readings can be significant in itself (nothing has changed in the system), or might represent late arriving data due to a network outage in the field.
Harnessing Machine Learning for Anomaly Detection in the Building Products Industry with Databricks
One of the biggest data-driven use cases at LP was monitoring process anomalies with time-series data from thousands of sensors. With Apache Spark on Databricks, large amounts of data can be ingested and prepared at scale to assist mill decision-makers in improving quality and process metrics. To prepare these data for mill data analytics, data science, and advanced predictive analytics, it is necessary for companies like LP to process sensor information faster and more reliably than on-premises data warehousing solutions alone.
Solution Accelerator: LLMs for Manufacturing
In this solution accelerator, we focus on item (3) above, which is the use case on augmenting field service engineers with a knowledge base in the form of an interactive context-aware Q/A session. The challenge that manufacturers face is how to build and incorporate data from proprietary documents into LLMs. Training LLMs from scratch is a very costly exercise, costing hundreds of thousands if not millions of dollars.
Instead, enterprises can tap into pre-trained foundational LLM models (like MPT-7B and MPT-30B from MosaicML) and augment and fine-tune these models with their proprietary data. This brings down the costs to tens, if not hundreds of dollars, effectively a 10000x cost saving.
Chevron Phillips Chemical tackles generative AI with Databricks
Databricks Raises Series I Investment at $43B Valuation
Databricks, the Data and AI company, today announced its Series I funding, raising over $500 million. This funding values the company at $43 billion and establishes the price per share at $73.50. The series is led by funds and accounts advised by T. Rowe Price Associates, Inc., which is joined by other existing investors, including Andreessen Horowitz, Baillie Gifford, ClearBridge Investments, funds and accounts managed by Counterpoint Global (Morgan Stanley), Fidelity Management & Research Company, Franklin Templeton, GIC, Octahedron Capital and Tiger Global along with new investors Capital One Ventures, Ontario Teachers’ Pension Plan and NVIDIA.
The Databricks Lakehouse unifies data, analytics and AI on a single platform so that customers can govern, manage and derive insights from enterprise data and build their own generative AI solutions faster. “Enterprise data is a goldmine for generative AI,” said Jensen Huang, founder and CEO of NVIDIA. “Databricks is doing incredible work with NVIDIA technology to accelerate data processing and generative AI models.”
Bringing Scalable AI to the Edge with Databricks and Azure DevOps
The ML-optimized runtime in Databricks contains popular ML frameworks such as PyTorch, TensorFlow, and scikit-learn. In this solution accelerator, we will build a basic Random Forest ML model in Databricks that will later be deployed to edge devices to execute inferences directly on the manufacturing shop floor. The focus will essentially be the deployment of ML Model built on Databricks to edge devices.
📊 Simplify and Accelerate IoT Data-Driven Innovation
Databricks is thrilled to announce strategic partnerships to deliver specialized expertise and unparalleled value to the industry. These partnerships allow companies to simplify access to complex datasets, generate actionable insights and accelerate the time to value with the Lakehouse platform.
Seeq, a global leader in advanced analytics for the process manufacturing industries, delivers self-service, enterprise SaaS solutions to accelerate critical insights and action from historically unused data. Sight Machine enables real-time data-driven operations for manufacturers to achieve breakthrough performance by continuously improving profitability, productivity, and sustainability. Kobai delivers unparalleled semantic capabilities to unify operational and enterprise data and empowers all users to make better decisions and drive operational excellence. Companies across the Fortune 500 leverage Plotly’s powerful interactive analytics and visualization tools to build and scale production-grade data apps quickly and easily.
Wipro and Databricks Partner to Build Williams Energy's First-Certified Gas Marketplace
Williams sought to disrupt the energy Industry by creating the first certified gas marketplace with a gas certification process to encompass the end-to-end supply chain: from production through transmission and delivery. Participants share emissions data throughout the value chain with the certification agency, Context Labs.
The Decarbonization-as-a-Service™ platform will enable Context Labs’ partners in upstream, midstream and downstream firms to track and quantify emissions associated with natural gas procurement, transmission, and delivery down to the individual molecule.
Achieving this ambitious goal involves ingesting and analyzing vast amount of complex time series data from proprietary systems like FlowCal, Maximo, and Pi which house critical data of transmission, emissions etc.
Wipro successfully tackled complex data engineering challenges by developing a PI Asset Framework (AF) for the NextGen use case, enabling seamless querying of data through PI AF using a linked server. Wipro leveraged Delta Share from Databricks for delivering data to Context Labs, while also provides necessary access tokens to the third party, in less than 48 hours which was a significant achievement. From complex data engineering to pioneering Delta Share and AI-driven tag consistency, Wipro’s solutions paved the way for NextGen Gas.
📊 Accelerating Innovation at JetBlue Using Databricks
The role of data and in particular analytics, AI and ML is key for airlines to provide a seamless experience for customers while maintaining efficient operations for optimum business goals. For a single flight, for example, from New York to London, hundreds of decisions have to be made based on factors encompassing customers, flight crews, aircraft sensors, live weather and live air traffic control (ATC) data. A large disruption such as a brutal winter storm can impact thousands of flights across the U.S. Therefore it is vital for airlines to depend on real-time data and AI & ML to make proactive real time decisions.
JetBlue has sped AI and ML deployments across a wide range of use cases spanning four lines of business, each with its own AI and ML team. The following are the fundamental functions of the business lines:
- Commercial Data Science (CDS) - Revenue growth
- Operations Data Science (ODS) - Cost reduction
- AI & ML engineering – Go-to-market product deployment optimization
- Business Intelligence – Reporting enterprise scaling and support
Each business line supports multiple strategic products that are prioritized regularly by JetBlue leadership to establish KPIs that lead to effective strategic outcomes.
A Data Architecture to assist Geologists in Real-Time Operations
Data plays a crucial role in making exploration and drilling operations for Eni a success all over the world. Our geologists use real-time well data collected by sensors installed on drilling pipes to keep track and to build predictive models of key properties during the drilling process.
Data is delivered by a custom dispatcher component designed to connect to a WITSML Server on all oil rigs and send time-indexed and / or depth-indexed data to any supported applications. In our case, data is delivered to Azure ADLS Gen2 in the format of WITSML files, each accompanied by a JSON file for additional custom metadata.
The visualizations generated from this data platform are used both on the oil rigs and in HQ, with operators exploring the curves enriched by the ML models as soon as they’re generated on a web application made in-house, which shows in real time how the drilling is progressing. Additionally, it is possible to explore historic data via the same application.
Databricks Announces Lakehouse for Manufacturing, Empowering the World's Leading Manufacturers to Realize the Full Value of Their Data
Databricks, the lakehouse company, today announced the Databricks Lakehouse for Manufacturing, the first open, enterprise-scale lakehouse platform tailored to manufacturers that unifies data and AI and delivers record-breaking performance for any analytics use case. The sheer volume of tools, systems and architectures required to run a modern manufacturing environment makes secure data sharing and collaboration a challenge at scale, with over 70 percent of data projects stalling at the proof of concept (PoC) stage. Available today, Databricks’ Lakehouse for Manufacturing breaks down these silos and is uniquely designed for manufacturers to access all of their data and make decisions in real-time. Databricks’ Lakehouse for Manufacturing has been adopted by industry-leading organizations like DuPont, Honeywell, Rolls-Royce, Shell and Tata Steel.
The Lakehouse for Manufacturing includes access to packaged use case accelerators that are designed to jumpstart the analytics process and offer a blueprint to help organizations tackle critical, high-value industry challenges.
A Deeper Look Into How SAP Datasphere Enables a Business Data Fabric
SAP announced the SAP Datasphere solution, the next generation of its data management portfolio, which gives customers easy access to business-ready data across the data landscape. SAP also introduced strategic partnerships with industry-leading data and AI companies – Collibra NV, Confluent Inc., Databricks Inc. and DataRobot Inc. – to enrich SAP Datasphere and allow organizations to create a unified data architecture that securely combines SAP software data and non-SAP data.
SAP Datasphere, and its open data ecosystem, is the technology foundation that enables a business data fabric. This is a data management architecture that simplifies the delivery of an integrated, semantically rich data layer over underlying data landscapes to provide seamless and scalable access to data without duplication. It’s not a rip-and-replace model, but is intended to connect, rather than solely move, data using data and metadata. A business data fabric equips any organization to deliver meaningful data to every data consumer — with business context and logic intact. As organizations require accurate data that is quickly available and described with business-friendly terms, this approach enables data professionals to permeate the clarity that business semantics provide throughout every use case.
Rolls-Royce Civil Aerospace keeps its Engines Running on Databricks Lakehouse
Rio Tinto to consolidate HR data to a single platform
The ‘people data lakehouse’ is to run on a Databricks platform, and “will allow Rio Tinto to move from a fragmented, siloed, complicated data web to a single accessible source”, the advertisement stated. “Through the use of a Databricks platform, we will enable our People function to present and analyse data from a variety of different sources in a unified and controlled way - all in one place, simplified and historically accurate.” Rio Tinto’s People function operates out of the company’s Perth, Brisbane and Montreal business hubs and is responsible for managing roughly 50,000 employees in 35 countries.
How Corning Built End-to-end ML on Databricks Lakehouse Platform
Specifically for quality inspection, we take high-resolution images to look for irregularities in the cells, which can be predictive of leaks and defective parts. The challenge, however, is the prevalence of false positives due to the debris in the manufacturing environment showing up in pictures.
To address this, we manually brush and blow the filters before imaging. We discovered that by notifying operators of which specific parts to clean, we could significantly reduce the total time required for the process, and machine learning came in handy. We used ML to predict whether a filter is clean or dirty based on low-resolution images taken while the operator is setting up the filter inside the imaging device. Based on the prediction, the operator would get the signal to clean the part or not, thus reducing false positives on the final high-res images, helping us move faster through the production process and providing high-quality filters.
Maersk embraces edge computing to revolutionize supply chain
Gavin Laybourne, global CIO of Maersk’s APM Terminals business, is embracing cutting-edge technologies to accelerate and fortify the global supply chain, working with technology giants to implement edge computing, private 5G networks, and thousands of IoT devices at its terminals to elevate the efficiency, quality, and visibility of the container ships Maersk uses to transport cargo across the oceans.
“Two to three years ago, we put everything on the cloud, but what we’re doing now is different,” Laybourne says. “The cloud, for me, is not the North Star. We must have the edge. We need real-time instruction sets for machines [container handling equipment at container terminals in ports] and then we’ll use cloud technologies where the data is not time-sensitive.”
Laybourne’s IT team is working with Microsoft to move cloud data to the edge, where containers are removed from ships by automated cranes and transferred to predefined locations in the port. To date, Laybourne and his team have migrated about 40% of APM Terminals’ cloud data to the edge, with a target to hit 80% by the end of 2023 at all operated terminals. Maersk has also been working with AI pioneer Databricks to develop algorithms to make its IoT devices and automated processes smarter. The company’s data scientists have built machine learning models in-house to improve safety and identify cargo. Data scientists will some day up the ante with advanced models to make all processes autonomous.
Solution Accelerator: Multi-factory Overall Equipment Effectiveness (OEE) and KPI Monitoring
The Databricks Lakehouse provides an end-to-end data engineering, serving, ETL, and machine learning platform that enables organizations to accelerate their analytics workloads by automating the complexity of building and maintaining analytics pipelines through open architecture and formats. This facilitates the connection to high-velocity Industrial IoT data using standard protocols like MQTT, Kafka, Event Hubs, or Kinesis to external datasets, like ERP systems, allowing manufacturers to converge their IT/OT data infrastructure for advanced analytics.
Using a Delta Live Tables pipeline, we leverage the medallion architecture to ingest data from multiple sensors in a semi-structured format (JSON) into our bronze layer where data is replicated in its natural format. The silver layer transformations include parsing of key fields from sensor data that are needed to be extracted/structured for subsequent analysis, and the ingestion of preprocessed workforce data from ERP systems needed to complete the analysis. Finally, the gold layer aggregates sensor data using structured streaming stateful aggregations, calculates OT metrics e.g. OEE, TA (technical availability), and finally combines the aggregated metrics with workforce data based on shifts allowing for IT-OT convergence.
Carhartt turns to data under new CIO
Like many CIOs, Carhartt’s top digital leader is aware that data is the key to making advanced technologies work. Carhartt opted to build its own enterprise data warehouse even as it built a data lake with Microsoft and Databricks to ensure that its handful of data scientists have both engines with which to manipulate structured and unstructured data sets.
At first, during the pandemic, many essential workers needed to be equipped with Carhartt work gear for extra protection. As a result, the company’s revenue stream grew in the double digits, even when certain business segments were curtailed due to widespread work stoppages.
Once work stoppages started taking hold, Carhartt gained a rare glimpse into its supply chain, enabling its data analysts to view the steps of the supply chain in exquisite detail, like the individual frames in a film.
Part Level Demand Forecasting at Scale
The challenges of demand forecasting include ensuring the right granularity, timeliness, and fidelity of forecasts. Due to limitations in computing capability and the lack of know-how, forecasting is often performed at an aggregated level, reducing fidelity.
In this blog, we demonstrate how our Solution Accelerator for Part Level Demand Forecasting helps your organization to forecast at the part level, rather than at the aggregate level using the Databricks Lakehouse Platform. Part-level demand forecasting is especially important in discrete manufacturing where manufacturers are at the mercy of their supply chain. This is due to the fact that constituent parts of a discrete manufactured product (e.g. cars) are dependent on components provided by third-party original equipment manufacturers (OEMs). The goal is to map the forecasted demand values for each SKU to quantities of the raw materials (the input of the production line) that are needed to produce the associated finished product (the output of the production line).
How to pull data into Databricks from AVEVA Data Hub
Using MLflow to deploy Graph Neural Networks for Monitoring Supply Chain Risk
We live in an ever interconnected world, and nowhere is this more evident than in modern supply chains. Due to the global macroeconomic environment and globalisation, modern supply chains have become intricately linked and weaved together. Companies worldwide rely on one another to keep their production lines flowing and to act ethically (e.g., complying with laws such as the Modern Slavery Act). From a modelling perspective, the procurement relationships between firms in this global network form an intricate, dynamic, and complex network spanning the globe.
Lastly, it was mentioned earlier that GNNs are a framework for defining deep learning algorithms over graph structured data. For this blog, we will utilise a specific architecture of GNNs called GraphSAGE. This algorithm does not require all nodes to be present during training, is able to generalise to new nodes efficiently, and can scale to billions of nodes. Earlier methods in the literature were transductive, meaning that the algorithms learned embeddings for nodes. This was useful for static graphs, but the algorithms had to be re-run after graph updates such as new nodes. Unlike those methods, GraphSAGE is an inductive framework which learns how to aggregate information from neighborhood nodes; i.e., it learns functions for generating embeddings, rather than learning embeddings directly. Therefore GraphSAGE ensures that we can seamlessly integrate new supply chain relationships retrieved from upstream processes without triggering costly retraining routines.
Optimizing Order Picking to Increase Omnichannel Profitability with Databricks
The core challenge most retailers are facing today is not how to deliver goods to customers in a timely manner, but how to do so while retaining profitability. It is estimated that margins are reduced 3 to 8 percentage-points on each order placed online for rapid fulfillment. The cost of sending a worker to store shelves to pick the items for each order is the primary culprit, and with the cost of labor only rising (and customers expressing little interest in paying a premium for what are increasingly seen as baseline services), retailers are feeling squeezed.
But by parallelizing the work, the days or even weeks often spent evaluating an approach can be reduced to hours or even minutes. The key is to identify discrete, independent units of work within the larger evaluation set and then to leverage technology to distribute these across a large, computational infrastructure. In the picking optimization explored above, each order represents such a unit of work as the sequencing of the items in one order has no impact on the sequencing of any others. At the extreme end of things, we might execute optimizations on all 3.3-millions simultaneously to perform our work incredibly quickly.
Virtualitics’ integration with Databricks sorts out what’s under the surface of your data lake
Databricks users can benefit from Virtualitics’ multi-user interface because it can enable hundreds more people across the business to get value from complex datasets, instead of a small team of expert data scientists. Analysts and citizen data scientists can do self-serve data exploration by querying large datasets with the ease of typing in question and AI-guided exploration instead of writing lines of code. Business decision makers get their hands on AI-generated insights that can help them take smart, predictive actions.
Albemarle's Digital Transformation
Lithium Data Science is a pure Azure deployment. Data sets moved for analysis to Azure DataBricks from Hana. Power BI w/ no restrictions on use.
How to Build Scalable Data and AI Industrial IoT Solutions in Manufacturing
Unlike traditional data architectures, which are IT-based, in manufacturing there is an intersection between hardware and software that requires an OT (operational technology) architecture. OT has to contend with processes and physical machinery. Each component and aspect of this architecture is designed to address a specific need or challenge, when dealing with industrial operations.
The Databricks Lakehouse Platform is ideally suited to manage large amounts of streaming data. Built on the foundation of Delta Lake, you can work with the large quantities of data streams delivered in small chunks from these multiple sensors and devices, providing ACID compliances and eliminating job failures compared to traditional warehouse architectures. The Lakehouse platform is designed to scale with large data volumes. Manufacturing produces multiple data types consisting of semi-structured (JSON, XML, MQTT, etc.) or unstructured (video, audio, PDF, etc.), which the platform pattern fully supports. By merging all these data types onto one platform, only one version of the truth exists, leading to more accurate outcomes.