Data Architecture

Assembly Line

Aftermarket Sensors Boost Yield In Wafer Fabs

📅 Date:

✍️ Author: Anne Meixner

🔖 Topics: Data Architecture, IT OT Convergence

🏭 Vertical: Semiconductor

🏢 Organizations: Nordson, Onto Innovation, Tignis


Third-party sensors are being added into fab equipment to help boost yield and to extend the life of expensive tools, supplementing the sensors that come with equipment used in fabs.

Sensors for temperature, pressure, humidity, gas concentrations, vibration and current can be found throughout the process tools and associated sub-systems. Third-party sensors are often added after installation and initial process development. Those sensors also can be applied to sub-fab intake and outtake pipes to monitor parameters that illuminate issues affecting process variability and defectivity. In addition, equipment teams can use instrumented wafers as a source for equipment set-up, post maintenance qualification and for predictive maintenance algorithms.

The good news is that integrating 3rd-party sensor data into factory data management systems is straightforward, especially when sensors use established SEMI standards like SECS/GEM. Even if they don’t adhere to those standards, the barrier to connect them to a factory’s data infrastructure is low. But to maximize their usefulness, several industry experts emphasized the importance of traceability and the ability to correlate data from different sources in context.

Read more at Semiconductor Engineering

Industrial Data Platform Capability Map (v1)

📅 Date:

🔖 Topics: Data Architecture


This article will help you identify the capabilities to build a modern industrial data system. As you’ll have with any capability map, it will most likely not be complete (feel free to leave your thoughts in the comment section!). You can use this list of capabilities to start your request for information (RFI) or request for proposal (RFP) process.

Read more at The IT/OT Insider

HENN: Improved production quality helps protect automotive brands

📅 Date:

🔖 Topics: Data Architecture

🏭 Vertical: Automotive

🏢 Organizations: AVEVA, HENN


Automotive parts manufacturer HENN needed an improved data architecture that would allow it to gain real-time insight into its assembly line process, The primary goal was to improve the quality of its charged air connector, which is a critical car component and used by most major automotive brands. If it were to fail, significant damage to the car manufacturer’s brand could result. HENN sought to meet this challenge in a unique way by using CONNECT and Edge Data Store.

Prior to deploying CONNECT, it took HENN’s team two days to validate the production line data. Now, analysts can retrieve data from the cloud two minutes after a connector leaves the machine, and they can run inquiries against large data sets without impacting operations. This improved data processing speed helped increase the efficiency of HENN’s operations and the quality control of its product, thereby improving the reliability of the automotive brands that purchase HENN’s products.

Read more at AVEVA

Building a UNS with Tulip Ecosystem Lead Mark Freedman

Industrial-grade AI: Transforming Data into Insights and Outcomes

📅 Date:

✍️ Author: Colin Masson

🔖 Topics: Data Architecture

🏢 Organizations: SAS, ARC Advisory Group


Data fabrics can simplify the AI and Analytics lifecycle for enterprises by weaving together a unified layer for data management and integration across some of the endpoints within an industrial environment. However, existing enterprise data fabrics may not be “industrial grade” enough for many Industrial AI use cases. They often require a “big bang” approach of migrating and standardizing data in cloud-based data lakes and may not handle the complex data types encountered on the industrial edge—data that is often unstructured, time-sensitive, and critical for real-time decision making in industrial AI use cases.

Breakthroughs in Gen AI have expanded the Industrial AI toolset - especially for use cases that address the sector’s skills gaps by enhancing knowledge retention and transfer and augmenting the workforce with “Assistants” and “Copilots” - and promise to have a sweeping impact on the way users across every industry interact with complex technology. Although perceived itself as an expensive AI solution, the wave of investments triggered by Gen AI is driving innovation and lowering costs across the broader AI landscape, offering new opportunities for scale deployment of tried and tested AI modeling techniques trained on each organization’s own industrial datasets.

Read more at SAS Whitepapers

How AlloyDB transformed Bayer’s data operations

📅 Date:

🔖 Topics: Data architecture

🏢 Organizations: Bayer, AlloyDB, Google


Migrating to AlloyDB has been transformative for our business. In our previous PostgreSQL setup, the primary writer was responsible for both write operations and replicating those changes to reader nodes. The anticipated increase in write traffic and reader count would have overwhelmed this node, leading to potential bottlenecks and increased replication lag. AlloyDB’s architecture, which utilizes a single source of truth for all nodes, significantly reduced the impact of scaling read traffic. After migrating, we saw a dramatic improvement in performance, ensuring our ability to meet growing demands and maintain consistently low replication delay. In parallel load tests, a smaller AlloyDB instance reduced response times by over 50% on average and increased throughput by 5x compared to our previous PostgreSQL solution.

By migrating to AlloyDB, we’ve ensured that our business growth won’t be hindered by database limitations, allowing us to focus on innovation. The true test of our migration came during our first peak harvest season, a time where performance is critical for product decision timelines. Due to agriculture’s seasonal nature, a delay of just a few days can postpone a product launch by an entire year. Our customers were understandably nervous, but thanks to Google Cloud and AlloyDB, the harvest season went as smoothly as we could have hoped for.

To support our data strategy, we have adopted a consistent architecture across our Google Cloud projects. For a typical project, the stack consists of Google Kubernetes Engine (GKE) hosted pods and pipelines for publishing events and analytics data. While Bayer uses Apache Kafka across teams and cloud providers for data streaming, individual teams regularly use Pub/Sub internally for messaging and event-driven architectures. Data for analytics and reporting is generally stored in BigQuery, with custom processes for materialization once it lands. By using cross-project BigQuery datasets, we are able to work with a larger, real-time user group and enhance our operational capabilities.

Read more at Google Cloud Blog

Parker Aerospace writes innovative safety algorithm on Azure Databricks

📅 Date:

🔖 Topics: Data Architecture

🏢 Organizations: Parker Hannifin, Republic Airways, Databricks, Microsoft


Parker Aerospace partnered with its customer Republic Airways to better understand how its parts were performing in flight, but it needed to overcome infrastructure challenges that limited its data processing capacity. The company adopted Azure Databricks, reducing processing time from 14 hours to two. This enhanced operational capacity has paved the way for Parker Aerospace to further improve its ability to support aviation operations.

The company is using Azure Databricks to completely transform its analytics capabilities. Republic Airways sends each day’s aircraft operational data to Parker Aerospace for analysis—a massive and constant influx of raw data. “Now, we can process large new data sets nightly,” explains Lim. By leveraging advanced analytics algorithms, Azure Databricks provides more accurate insights into aircraft performance and maintenance needs, which helps the airline optimize its operations and improve service delivery. Parker Aerospace has also integrated Power BI with Azure Databricks, which enables Parker Aerospace to visualize data in new ways and create report dashboards for Republic Airways.

The partnership between Parker Aerospace and Republic Airways is mutually beneficial. In addition to supplying Parker Aerospace with insights, the Leak Detection Algorithm integrates seamlessly into an interactive predictive maintenance dashboard in Power BI powered via Azure Databricks, providing the airline’s maintenance team with insights into the performance of their fleet. This dashboard allows maintenance personnel to visualize and analyze data trends, enabling them to make informed decisions about maintenance priorities and resource allocation. The dashboard features predictive analytics capabilities through Azure Databricks, which can forecast potential maintenance issues based on historical data and enhance maintenance planning and efficiency. “Microsoft had all the necessary resources we needed to make the collaboration with Republic Airways possible,” says Austin Major, VP of Sales and Business Development at Parker Aerospace.

Read more at Microsoft

SCADA Is Changing The Game

📅 Date:

🔖 Topics: SCADA, MQTT, Unified Namespace, Data Architecture


Today, it’s all about democratization of data. SCADA has always been a big data consumer. However, today organizations are thinking about data differently. Instead of thinking about what data my SCADA system requires, it is important to think about what data my business requires. Then you can extend that thought into standardizing what you want your data to look like across the entire organization.

Read more at Efficient Plant

Empower Your Industrial Data Strategy: Integrate Litmus Edge with AVEVA PI

📅 Date:

🔖 Topics: Data Architecture

🏢 Organizations: Litmus, AVEVA


Litmus Edge provides several integration solutions to unlock your data from the AVEVA PI System, moving past the limitations of the Asset Framework SDK and PI Web API to enhance accessibility:

  • DH Historian Agent: This Windows-based agent harnesses the Asset Framework SDK to collect data from the PI historian and securely transmit it to Litmus Edge for further processing.
  • Northbound Integration: Using the PI Web API and OMF (Open Messaging Format), this integration enables Litmus Edge to send processed data back to the PI server, simplifying connectivity with proprietary OEM machines.
  • Cloud Integration: OSIsoft’s acquisition by Aveva and the introduction of the AVEVA™ Data Hub allow Litmus Edge to send data directly to cloud-based services, enabling remote data management and analysis.

Read more at Litmus Blog

Accelerating industrialization of Machine Learning at BMW Group using the Machine Learning Operations (MLOps) solution

📅 Date:

✍️ Authors: Marc Neumann, Aubrey Oosthuizen

🔖 Topics: MLOps, Data Architecture

🏢 Organizations: BMW, AWS


The BMW Group’s Cloud Data Hub (CDH) manages company-wide data and data solutions on AWS. The CDH provides BMW Analysts and Data Scientists with access to data that helps drive business value through Data Analytics and Machine Learning (ML). The BMW Group’s MLOps solution includes (1) Reference architecture, (2) Reusable Infrastructure as Code (IaC) modules that use Amazon SageMaker and Analytics services, (3) ML workflows using AWS Step Functions, and (4) Deployable MLOps template that covers the ML lifecycle from data ingestion to inference.

Read more at AWS Blog

TsFile: A Standard Format for IoT Time Series Data

📅 Date:

✍️ Author: Susan Hall

🔖 Topics: Data Architecture, IIoT, Open Source

🏢 Organizations: Apache Software Foundation, Tsinghua University


TsFile is a columnar storage file format designed for time series data, featuring advanced compression to minimize storage, high throughput of read and write, and deep integration with processing and analysis tools such as Apache projects Spark and Flink. TsFile is designed to support a “high ingestion rate up to tens of million data points per second and rare updates only for the correction of low-quality data; compact data packaging and deep compression for long-live historical data; traditional sequential and conditional query, complex exploratory query, signal processing, data mining and machine learning.”

TsFile is the underlying storage file format for the Apache IoTDB time-series database. IoTDB represents more than a decade of work at China’s Tsinghua University School of Software. It became a top-level project with the Apache Software Foundation in 2020.

Read more at The New Stack

Data-driven Maintenance Work Order Management with Crosser and AVEVA

📅 Date:

🔖 Topics: Data Architecture, OPC-UA

🏢 Organizations: Crosser, AVEVA, SAP


The customer faced a significant challenge with its existing automated work order management system. This system relied on monitoring maintenance metrics using PLCs alongside predefined trigger points. AVEVA System Platform was responsible for initiating SAP to trigger specific work orders aligned with predefined work plans. However, this approach demanded manual adjustments to PLCs and the AVEVA System Platform each time a new device was introduced or new parameters were required.

Moreover, when a SAP work order was completed and counters needed to be reset to zero, manual connections to the PLCs were necessary, introducing operational risks and requiring substantial manual effort with each new device or reset. The existing approach not only carried operational risks with each change but also imposed significant manual labor for every new device and reset process.

Read more at Crosser Blog

How to Implement a Unified Namespace with Losant

📅 Date:

🔖 Topics: IIoT, Data Architecture, Unified Namespace

🏢 Organizations: Losant


Losant accelerates the deployment of a UNS by providing critical components like an MQTT broker and edge data processing capabilities all in one platform. Beyond these essentials, it enriches the UNS experience with additional tools that add value, enhancing functionality and operational insight.

Losant simplifies the implementation of a UNS and amplifies its potential through a scalable MQTT broker, advanced edge computing capabilities, and comprehensive data management tools. Whether it’s through creating digital twins, leveraging Jupyter Notebooks for insightful analytics, or building dynamic dashboards for real-time data visualization, Losant stands out as an excellent platform for organizations aiming to harness the full power of their data.

Read more at Losant Blog

Ingest and analyze equipment data in the cloud

📅 Date:

✍️ Authors: Suresh Kanniappan, Gurumoorthy Krishnasamy

🔖 Topics: Data Architecture, IIoT

🏢 Organizations: AWS


A sugar manufacturer with multiple plants across India use molasses as the key raw material to produce Extra Neutral Alcohol (ENA) through a 4-step process; 1/ Fermentation, 2/ Distillation, 3/ Evaporation, and 4/ Purification. This company needed better visibility into their production data to make better decisions, and ultimately improve overall equipment effectiveness (OEE).

AWS worked closely with the customer to build a solution that supported their Smart Manufacturing vision by providing: 1/ a mechanism to ingest data from PLC and DCS systems, 2/ support to securely ingest the data into the AWS Cloud, 3/ ability to analyze the OT data, and 4/ a dashboard for centralized real-time visibility into their production operations to aid in decision making.

Read more at AWS Blog

Improve tire manufacturing effectiveness with a process digital twin

📅 Date:

✍️ Authors: Sundar Ram, Anindya Bhattacharya

🔖 Topics: Data Architecture, Digital Twin

🏢 Organizations: AWS


In the rubber-mixing stage, the recipe of various raw material constituents like rubber, chemicals, carbon, oil, and other additives plays a vital role in the control of process standards and final product quality. In the current schema of things, parameters like Mooney viscosity, specific gravity, and Rheo (the level of curing that can be achieved over the compound) are fairly manual and offline. In addition, the correlation of these parameters is conducted either on a standard spreadsheet solver or statistical package. Because of the delay in such correlation and interdependency, the extent of control a process engineer has on the deviation (such as drop temperature, mixing time, ram pressure, injection time, and so on) are limited.

There are four steps to operationalize, the first being data acquisition and noise removal—a process of 3–6 weeks with the built-in and external connectors. Next is model tuning and ascertaining what is fit for our purpose. Since we are considering a list of defect types, we are talking about another four weeks for training, validating, creating test sets, and delivering a simulation environment with minimum error. The third step is delivering the set points and boundary conditions for each grade of compound.

For example, the process digital twin cockpit has three desirable sub-environments:

  • Carcass level—machine ID, drum width, drum diameter, module number, average weight, actual weight, and deviation results
  • Tread roll level—machine number, average weight, actual weight, deviation, and SKU number
  • Curing level—curing ID, handling time, estimated curing time, curing schedule, and associated deviations in curing time

The final step is ascertaining the model outcome and computing the simulation result (bias, Sum of Squares Error (SSE), deviation, and so on) with respect to the business outcome like defect percentage, speed of work, overall accuracy, and so on.

Read more at AWS for Industries

Industrial stream data management in Databricks

Neo4j Keeps the Army Running by Tracking Equipment Maintenance

📅 Date:

🔖 Topics: Data Architecture

🏢 Organizations: Neo4j, US Army


The Army recognized the need to modernize its core tracking system. The scale of the information Neo4j handles is vast, including a 3TB database with over 5.2 billion nodes and 14.1 billion relationships. Working with CALIBRE, an employee-owned management consulting and IT solutions company that delivers enduring solutions to defense, federal, and commercial clients, the U.S. Army is now employing Neo4j as a major part of their solution for providing greater visibility into the total costs of owning a system.

Read more at Neo4j Cases

Italgas: from gas pipelines to data pipelines — Fueling our reporting with the latest innovations.

📅 Date:

✍️ Author: Serena Delli

🔖 Topics: Data Architecture

🏢 Organizations: Italgas, Databricks


Utilities is a data intensive sector for three main reasons: the core process of utilities consists in metering and billing the usage of gas (or water or electricity) for millions of customers, an intrinsically data rich activity, even when it was done entirely manually. Today the Italgas Nimbus smart meter remotely manages the metering of multiple types of gas for the same customer.

Exploring the latest advancements in data technologies, we experimented with two simplifications. Regarding the second simplification, the new Databricks native connectors to PowerBI allowed a seamless integration of the reports directly with the Lakehouse. Databricks SQL Warehouses were about 50% faster thanks to optimizations like Photon Engine, optimized caching, query routing, etc.

Read more at Medium

Snowflake technology solution from industrial edge to the cloud

📅 Date:

🔖 Topics: Data Architecture, Manufacturing Analytics

🏢 Organizations: Opto 22, Snowflake


This is where the combined power of Snowflake’s data warehousing and Opto 22’s automation solutions comes into play. Data travels securely from groov products (edge hardware on the plant floor) up to Snowflake (data storage in the cloud). This combination gives you the tools needed to both collect and harness the power of big data, leveraging advanced analytics and machine learning to optimize plant floor operations and drive innovation.

With AI, ML, and anomaly detection (AD)–plus the integration of large language models (LLMs)–Snowflake helps you unearth patterns and insights from your data. Given the scale of data storage available in the cloud, a single human would be hard pressed to make sense of it all. But think of using simple language prompts like, “When was my peak energy consumption last quarter?” or “How many widgets did I produce between 11AM and 3PM on November 8, 2023?” This. Is. Powerful.

Read more at Industrial Ethernet Media

Unlocking the Full Potential of Manufacturing Capabilities Through Digital Twins on AWS

📅 Date:

✍️ Authors: Harjot Kalra, Ravi Avula, Sylvia Feng, Paul Park

🔖 Topics: Digital Twin, Metaverse, IT OT Convergence, Data Architecture, MQTT

🏢 Organizations: AWS, Matterport, Belden


In this post, we will explore the collaboration between Amazon Web Services (AWS) and Matterport to create a digital twin proof of concept (POC) for Belden Inc. at one of its major manufacturing facilities in Richmond, Indiana. The purpose of this digital twin POC was to gain insights and optimize operations in employee training, asset performance monitoring, and remote asset inspection at one of its assembly lines.

The onsite capture process required no more than an hour to capture a significant portion of the plant operation. Using the industry-leading Matterport 3D Pro3 capture camera system, we captured high-resolution imagery with high-fidelity measurement information to digitally recreate the entire plant environment.

The use of MQTT protocol to natively connect and send equipment data to AWS IoT Core further streamlined the process. MQTT, an efficient and lightweight messaging protocol designed for Internet of Things (IoT) applications, ensured seamless communication with minimal latency. This integration allowed for quick access to critical equipment data, facilitating informed decision making and enabling proactive maintenance measures.

Throughout the plant, sensors were strategically deployed to collect essential operational data that was previously missing. These sensors were responsible for monitoring various aspects of machine performance, availability, and health status, including indicators such as vibration, temperature, current, and power. Subsequently, the gathered operational data was transmitted through Belden’s zero-trust operational technology network to Belden Horizon Data Operations (BHDO).

Read more at AWS Partner Network (APN) Blog

IDEMIA: How a global leader in identity leverages AWS to improve productivity in Manufacturing

📅 Date:

✍️ Authors: Anthony Barré, Christophe Didier, Weibo Gu

🔖 Topics: IIoT, Cobot, Data Architecture

🏢 Organizations: AWS, IDEMIA


At IDEMIA, the flywheel started by prioritizing and grouping high-value and low-hanging use cases that could be implemented quickly and easily. The Cobot use cases were selected because they provided a clear business impact and had low technical complexity. Deploying these use cases in production generated a positive ROI in a short period of time for IDEMIA. It not only increased the profitability and efficiency of the industrial sites but also created a positive feedback loop that fostered further adoptions and investments. With the benefits generated from this initial use case, IDEMIA had the opportunity to reinvest in the IoT platform, making it more robust and scalable. This mitigated risks, lowered costs for the next use cases, and improved the performance and reliability of the existing ones. Demonstrating tangible benefits of Industrial Internet of Things (IIoT) solutions expanded adoption and engagement across IDEMIA’s organization, fostering a culture of continuous improvement and learning.

Read more at AWS for Industries

Accelerating Industrial Transformation with Azure IoT Operations

📅 Date:

✍️ Author: Kam VedBrat

🔖 Topics: Data Architecture, IIoT

🏢 Organizations: Microsoft


Announcing the public preview of Azure IoT Operations, enabled by Azure Arc. Azure IoT Operations expands on our Azure IoT portfolio with a composable set of Arc-enabled services that help organizations onboard assets, capture insights, and take actions to scale the digital transformation of their physical operations.

Azure IoT Operations empowers our customers with a unified, enterprise-wide technology architecture and data plane that supports repeatable solution deployment and comprehensive AI-enhanced decision making. It enables a cloud to edge data plane with local data processing and analytics to transfer clean, useful data to hyperscale cloud services such as Microsoft Fabric for unified data governance and analytics, Azure Event Grid for bi-directional messaging, and Azure Digital Twins for live data contextualization. This common data foundation is essential to democratize data, enable cross-team collaboration and accelerate decision-making.

Read more at Microsoft Tech Community

IFS Cloud for Manufacturing: Unlocking the Power of AI for Intelligent Automation

📅 Date:

✍️ Author: Moritz Roedel

🔖 Topics: Data Architecture, Manufacturing Execution System

🏢 Organizations: IFS, Crosser


The IFS Cloud for Manufacturing uses AI technologies to drive Manufacturing Execution Systems (MES) and Manufacturing Scheduling & Optimization, ultimately enhancing the efficiency and agility of manufacturing operations.

Read more at IFS Blog

Embracing the Unified Namespace Architecture with Litmus Edge

📅 Date:

✍️ Author: Dave McMorran

🔖 Topics: Unified Namespace, Data Architecture, IT OT Convergence

🏢 Organizations: Litmus


What exactly is a Unified Namespace (UNS)? The UNS offers a structured approach to organizing and connecting data across all layers of a business. It is particularly noteworthy because of its values-driven nature, which is a powerful influence behind its growing popularity.

Several companies, including Starbucks (food and beverages), Richemont (luxury goods) and Stada (life sciences) are already using the UNS architecture to improve their operations. So, if you’re here because you’re considering the UNS for your business too, you’re in good company. We wrote this article to help on your path.

Often mistaken for being a technology, the UNS embodies the principles of an Event-driven Architecture (EDA). In EDA, applications interact by exchanging events without being directly connected to each other. They rely on an intermediary called an event broker, which acts like a modern-day messenger.

The UNS stands out for 4 key reasons -

  1. It serves as the single source of truth (SST) for all data and information in your business.
  2. It structures and continually updates data across the entire business.
  3. It acts as the central hub where all data-connected smart components communicate.
  4. It lays the foundation for a digital future.

Read more at Litmus Blog

Exploring Manufacturing Databases with James Sewell

Automate plant maintenance using MDE with ABAP SDK for Google Cloud

📅 Date:

✍️ Authors: Manas Srivastava, Devesh Singh

🔖 Topics: Manufacturing Analytics, Cloud Computing, Data Architecture

🏢 Organizations: Google, SAP, Litmus


Analyzing production data at scale for huge datasets is always a challenge, especially when there’s data from multiple production facilities involved with thousands of assets in production pipelines. To help solve this challenge, our Manufacturing Data Engine is designed to help manufacturers manage end-to-end shop floor business processes.

Manufacturing Data Engine (MDE) is a scalable solution that accelerates, simplifies, and enhances the ingestion, processing, contextualization, storage, and usage of manufacturing data for monitoring, analytical, and machine learning use cases. This suite of components can help manufacturers accelerate their transformation with Google Cloud’s analytics and AI capabilities.

Read more at Google Cloud Blog

☁️🧠 Automated Cloud-to-Edge Deployment of Industrial AI Models with Siemens Industrial Edge

📅 Date:

✍️ Authors: Johann Bruckner, Johannes Kupser, Yvonne Quacken, Bruno Quintas, Helge Aufderheide

🔖 Topics: Cloud-to-Edge Deployment, Data Architecture, Edge Computing, Machine Learning, MQTT

🏢 Organizations: Siemens, AWS


Due to the sensitive nature of OT systems, a cloud-to-edge deployment can become a challenge. Specialized hardware devices are required, strict network protection is applied, and security policies are in place. Data can only be pulled by an intermediate factory IT system from where it can be deployed to the OT systems through highly controlled processes.

The following solution describes the “pull” deployment mechanism by using AWS services and Siemens Industrial AI software portfolio. The deployment process is enabled by three main components, the first of which is the Siemens AI Software Development Kit (AI SDK). After a model is created by a data scientist on Amazon SageMaker and stored in the SageMaker model registry, this SDK allows users to package a model in a format suitable for edge deployment using Siemens Industrial Edge. The second component, and the central connection between cloud and edge, is the Siemens AI Model Manager (AI MM). The third component is the Siemens AI Inference Server (AIIS), a specialized and hardened AI runtime environment running as a container on Siemens IEDs deployed on the shopfloor. The AIIS receives the packaged model from AI MM and is responsible to load, execute, and monitor ML models close to the production lines.

Read more at AWS Blogs

Transforming Semiconductor Yield Management with AWS and Deloitte

📅 Date:

🔖 Topics: Cloud Computing, Manufacturing Analytics, Data Architecture

🏭 Vertical: Semiconductor

🏢 Organizations: AWS, Deloitte


Together, AWS and Deloitte have developed a reference architecture to enable the aforementioned yield management capabilities. The architecture, shown in Figure 1, depicts how to collect, store, analyze and act on the yield related data throughout the supply chain. The following describes how the modernized yield management architecture enables the six capabilities discussed earlier.

Read more at AWS Blogs

📊 Accelerating Innovation at JetBlue Using Databricks

📅 Date:

✍️ Authors: Sai Ravuru, Yared Gudeta

🔖 Topics: Data Architecture

🏭 Vertical: Aerospace

🏢 Organizations: JetBlue, Databricks, Microsoft


The role of data and in particular analytics, AI and ML is key for airlines to provide a seamless experience for customers while maintaining efficient operations for optimum business goals. For a single flight, for example, from New York to London, hundreds of decisions have to be made based on factors encompassing customers, flight crews, aircraft sensors, live weather and live air traffic control (ATC) data. A large disruption such as a brutal winter storm can impact thousands of flights across the U.S. Therefore it is vital for airlines to depend on real-time data and AI & ML to make proactive real time decisions.

JetBlue has sped AI and ML deployments across a wide range of use cases spanning four lines of business, each with its own AI and ML team. The following are the fundamental functions of the business lines:

  • Commercial Data Science (CDS) - Revenue growth
  • Operations Data Science (ODS) - Cost reduction
  • AI & ML engineering – Go-to-market product deployment optimization
  • Business Intelligence – Reporting enterprise scaling and support

Each business line supports multiple strategic products that are prioritized regularly by JetBlue leadership to establish KPIs that lead to effective strategic outcomes.

Read more at Databricks Blog

Why is machine data special and what can you do with it?

📅 Date:

🔖 Topics: Data Architecture

🏢 Organizations: Arch Systems


Production data can unlock opportunities for electronics manufacturing service (EMS) providers to improve operations. Evolving systems for collection and analysis of machine data is vital to those efforts. Though factories produce many different types of usable data, machine data is special because it can be collected without operational burden, creating actionable production insights in real time and automating responses to them.

As more manufacturers develop and deploy machine data collection systems, industry best practices are surfacing, and systems often adopt similar structures in response to common needs in the factory. Most architectures include these key features:

  • There is usually some type of streaming event broker (often called a pub/sub architecture) that receives complex files and reports from production equipment to enable advanced analytics, holistic dashboards and visualization, automated action management, and system monitoring.
  • Systems should be able to integrate data from both advanced machines and legacy equipment, such as PLCs.
  • They use specialized databases and data lakes for storage.
  • Dedicated telemetry and monitoring are deployed to ensure data quality.

Read more at Arch Systems Blog

A Data Architecture to assist Geologists in Real-Time Operations

📅 Date:

✍️ Author: Nicola Lamonaca

🔖 Topics: Data Architecture

🏭 Vertical: Petroleum and Coal

🏢 Organizations: Eni, Databricks


Data plays a crucial role in making exploration and drilling operations for Eni a success all over the world. Our geologists use real-time well data collected by sensors installed on drilling pipes to keep track and to build predictive models of key properties during the drilling process.

Data is delivered by a custom dispatcher component designed to connect to a WITSML Server on all oil rigs and send time-indexed and / or depth-indexed data to any supported applications. In our case, data is delivered to Azure ADLS Gen2 in the format of WITSML files, each accompanied by a JSON file for additional custom metadata.

The visualizations generated from this data platform are used both on the oil rigs and in HQ, with operators exploring the curves enriched by the ML models as soon as they’re generated on a web application made in-house, which shows in real time how the drilling is progressing. Additionally, it is possible to explore historic data via the same application.

Read more at Medium

📊 Data pools as the foundation for the smart buildings of the future

📅 Date:

✍️ Authors: Frederik De Meyer, Christian Metzger

🔖 Topics: Building information modeling, Data Architecture

🏢 Organizations: Siemens


Today’s digital building technology generates a huge amount of data. So far, however, this data has only been used to a limited extent, primarily within hierarchical automation systems. Data however is key to the new generation of modern buildings, making them climate-neutral, energy- and resource-efficient, and at some point autonomous and self-maintaining.

More straightforward is the use of digital solutions for building management by planners, developers, owners, and operators of new buildings. The creation of a building twin must be defined and implemented as a BIM goal. At the heart of it is a Common Data Environment (CDE), a central digital repository where all relevant information about a building can be stored and shared already in the project phase. CDE is a part of the BIM process and enables collaboration and information exchange between the different stakeholders of the construction project.

Beyond the design and construction phases, a CDE can also in the operation phase help make building maintenance more effective by providing easy access to essential information about the building and its technical systems. If information about equipment, sensors, their location in the building, and all other relevant components is collected in a machine-readable form from the beginning of the lifecycle and updated continuously, building management tools can access this data directly during the operations phase, thus avoiding additional effort. The exact goal is to collect data without additional effort. To achieve this, in the future engineering and commissioning tools must automatically store their results in the common twin, making reengineering obsolete.

Read more at Siemens Blog

🧠 How a Data Fabric Gets Snow Tires to a Store When You Need Them

📅 Date:

✍️ Author: Susan Hall

🔖 Topics: Supply Chain Control Tower, Data Architecture

🏢 Organizations: American Tire Distributors, Promethium


“We were losing sales because the store owners were unable to answer the customers’ questions as to when exactly they would have the product in stock,” said Ehrar Jameel, director of data and analytics at ATD. The company didn’t want frustrated customers looking elsewhere. So he wanted to create what he called a “supply chain control tower” for data just like the ones at the airport.

“I wanted to give a single vision, a single pane of glass for the business, to just put in a SKU number and be able to see where that product is in the whole supply chain —not just the supply chain, but in the whole value chain of the company. ATD turned to Promethium, which provides a virtual data platform automating data management and governance across a distributed architecture with a combination of data fabric and self-service analytics capabilities.

It’s built on top of the open source SQL query engine Presto, which allows users to query data wherever it resides. It normalizes the data for query into an ANSI-compliant standard syntax, whether it comes from Oracle, Google BigQuery, Snowflake or wherever. It integrates with other business intelligence tools such as Tableau and can be used to create data pipelines. It uses natural language processing and artificial intelligence plus something it calls a “reasoner” to figure out, based on what you asked, what you’re really trying to do and the best data to answer that question.

Read more at The New Stack

A Deeper Look Into How SAP Datasphere Enables a Business Data Fabric

📅 Date:

✍️ Author: Juergen Mueller

🔖 Topics: Partnership, Data Architecture

🏢 Organizations: SAP, Databricks, Collibra, Confluent, DataRobot


SAP announced the SAP Datasphere solution, the next generation of its data management portfolio, which gives customers easy access to business-ready data across the data landscape. SAP also introduced strategic partnerships with industry-leading data and AI companies – Collibra NV, Confluent Inc., Databricks Inc. and DataRobot Inc. – to enrich SAP Datasphere and allow organizations to create a unified data architecture that securely combines SAP software data and non-SAP data.

SAP Datasphere, and its open data ecosystem, is the technology foundation that enables a business data fabric. This is a data management architecture that simplifies the delivery of an integrated, semantically rich data layer over underlying data landscapes to provide seamless and scalable access to data without duplication. It’s not a rip-and-replace model, but is intended to connect, rather than solely move, data using data and metadata. A business data fabric equips any organization to deliver meaningful data to every data consumer — with business context and logic intact. As organizations require accurate data that is quickly available and described with business-friendly terms, this approach enables data professionals to permeate the clarity that business semantics provide throughout every use case.

Read more at SAP News

Rolls-Royce Civil Aerospace keeps its Engines Running on Databricks Lakehouse

Our connected future: How industrial data sharing can unite a fragmented world

📅 Date:

✍️ Author: Peter Herweck

🔖 Topics: Manufacturing Analytics, Data Architecture

🏢 Organizations: AVEVA


The rapid and effective development of the coronavirus vaccines has set a new benchmark for today’s industries–but it is not the only one. Increasingly, savvy enterprises are starting to share industrial data strategically and securely beyond their own four walls, to collaborate with partners, suppliers and even customers.

Worldwide, almost nine out of 10 (87%) business executives at larger industrial companies cite a need for the type of connected data that delivers unique insights to address challenges such as economic uncertainty, unstable geopolitical environments, historic labor shortages, and disrupted supply chains. In fact, executives report in a global study that the most common benefits of having an open and agnostic information-sharing ecosystem are greater efficiency and innovation (48%), increasing employee satisfaction (45%), and staying competitive with other companies (44%).

Read more at AVEVA Perspectives

How Corning Built End-to-end ML on Databricks Lakehouse Platform

📅 Date:

✍️ Author: Denis Kamotsky

🔖 Topics: MLOps, Quality Assurance, Data Architecture, Cloud-to-Edge Deployment

🏢 Organizations: Corning, Databricks, AWS


Specifically for quality inspection, we take high-resolution images to look for irregularities in the cells, which can be predictive of leaks and defective parts. The challenge, however, is the prevalence of false positives due to the debris in the manufacturing environment showing up in pictures.

To address this, we manually brush and blow the filters before imaging. We discovered that by notifying operators of which specific parts to clean, we could significantly reduce the total time required for the process, and machine learning came in handy. We used ML to predict whether a filter is clean or dirty based on low-resolution images taken while the operator is setting up the filter inside the imaging device. Based on the prediction, the operator would get the signal to clean the part or not, thus reducing false positives on the final high-res images, helping us move faster through the production process and providing high-quality filters.

Read more at Databricks Blog

How to pull data into Databricks from AVEVA Data Hub

Common Data Models for Manufacturing

📅 Date:

🔖 Topics: Data Architecture

🏢 Organizations: Sight Machine


  • Sight Machine analyzes production by streaming plant floor data from all systems and sources in real time
  • Data is streamed into 4 Common Data Models. These models represent automated production in all industries
  • Streams continuously generate Data Foundation. Data Foundation is then graphed into representations of machines, lines and plants, which are further visualized and analyzed in KPIs, analytics and AI/ML
  • Sight Machine’s architecture is modular, transparent, and configurable at each level. Clients and partners can access and modify: (a) raw data, (b) configuration, and (c) Transformed Data via API and SDK layers
  • This presentation reviews Common Data Models and graphing methods, and highlights a few out of hundreds of analytics currently generated through web services. After several introductory slides, it is mostly pictures
  • Connectivity and pre-processing, and streaming and transformation are addressed elsewhere

Read more at Sight Machine Resources

Boeing transforms into data-driven business; data powers innovation and integration

📅 Date:

✍️ Author: Courtney Howard

🔖 Topics: Data Architecture

🏢 Organizations: Boeing, Teradata


The Boeing Company executives and engineers are tapping into value of data, using data-management and -analytics hardware and software, to drive product development and integration, as well as strengthen their competitive edge with enhanced, fact-based decision-making.

Boeing officials and engineers opted to organize and analyze the company’s data, applying data management and analytics tools from Teradata, and to take actions based on insights gleaned from that data that would help them achieve their strategic vision. Boeing transformed into, and now operates as, a fact-based, data-driven culture.

One goal was to provide self-service business intelligence (BI) to 20,000 internal and external users in company-wide divisions (human resources, finance, etc.) and the global supply chain through an integrated data warehouse. In doing so, Boeing information technology (IT) and business specialists had to find common definitions across business units and transform the systems infrastructure, which included consolidating hundreds of data mart. (A data mart is the access layer of the data warehouse environment used to provide data to the user; data marts are connected to and subsets of the data warehouse, a central repository.)

Using data from sources as diverse as radio-frequency identification (RFID) tags and airplane “black boxes” to drive timely decisions at a massive scale demands new approaches. Boeing officials, including subject-matter experts, business personnel, and data scientists/specialists, partnered with Teradata to devise and institute that innovative approach.

Boeing’s sensor data pipeline supports high-value analytics with the use of parallel databases, Hadoop, and Teradata QueryGrid, which connects Teradata and Hadoop systems enabling seamless multi-system analytics on a massive scale. Temporal SQL solves time alignment, latency and scale challenges, enabling interactive analytics that were previously impossible, officials describe.

Read more at Military Aerospace