Ylem documentation
  • 🗒️General information
    • Introduction to Ylem
    • Quick start guide
    • Release notes
  • 🔬Open-source edition
    • Installation
    • Usage of Apache Kafka
    • Task processing architecture
    • Configuring integrations with .env variables
  • 💡Integrations
    • Connecting an integration
    • Library of integrations
      • Amazon Redshift
      • Apache Kafka
      • APIs
      • Atlassian Jira
      • AWS Lambda
      • AWS RDS
      • AWS S3
      • ClickHouse
      • ElasticSearch
      • E-mail
      • Google Big Query
      • Google Cloud SQL
      • Google Pub/Sub
      • Google Sheets
      • Immuta
      • Incident.io
      • Jenkins
      • Hubspot
      • Microsoft Azure SQL
      • MySQL
      • OpenAI ChatGPT
      • Opsgenie
      • PostgreSQL
      • PlanetScale
      • RabbitMQ
      • Salesforce
      • Slack
      • Snowflake
      • Tableau
      • Twilio. SMS
      • WhatsApp (through Twilio)
    • Initial demo data source
  • 🚡Pipelines
    • Pipeline management
    • Tasks
      • Aggregator
      • API Call
      • Code
      • Condition
      • External trigger
      • Filter
      • For each
      • GPT
      • Merge
      • Notification
      • Query
      • Pipeline runner
      • Processor
      • Transformer
    • Running and scheduling pipelines
    • Library of templates
    • Environment variables
    • Mathematical functions and operations
    • Formatting of messages
  • 📈Statistics and profiling
    • Statistics of runs
    • Slow tasks
  • 📊Metrics
    • Metric management
    • Using previous values of a metric
  • 💼Use cases, patterns, templates, examples
    • Use cases
    • Messaging patterns
      • Datatype Channel
      • Message Dispatcher
      • Messaging Bridge
      • Message Bus
      • Message Filter
      • Message Router
      • Point-to-Point Channel
      • Publish-Subscribe Channel
      • Pull-Push
    • Functional use cases
      • Streaming from Apache Kafka and messaging queues
      • Streaming from APIs
      • Streaming from databases
      • Data orchestration, transformation and processing
      • Usage of Python and Pandas
      • KPI Monitoring
      • OKRs and custom metrics
      • Data Issues & Incidents
      • Reporting
      • Other functional use cases
    • Industry-specific use cases
      • Finance and Payments
      • E-commerce & Logistics
      • Customer Success
      • Security, Risk, and Anti-Fraud
      • Anti-Money Laundering (AML)
  • 🔌API
    • OAuth clients
    • API Reference
  • 👁️‍🗨️Other resources
    • FAQ
    • Our blog on Medium
Powered by GitBook
On this page
  • Tasks and human language
  • Task
  • Implementation

Was this helpful?

Edit on GitHub
  1. Use cases, patterns, templates, examples
  2. Functional use cases

Data orchestration, transformation and processing

PreviousStreaming from databasesNextUsage of Python and Pandas

Last updated 8 months ago

Was this helpful?

Not only can you stream data from one place to another using Ylem, but also fully orchestrate the process and do the entire data transformation and processing.

Everything that is happening in pipelines between the retrieval of data to sending it outside of Ylem can be orchestrated by the set of the following pipeline tasks:

  • "" is designed to transform an input dataset into one single variable. To do that it uses like SUM(), AVG() and other aggregating functions.

  • "" is a task that allows you to use Python and Pandas framework for complex logic.

  • "" allows you to branch your pipelines and dynamically execute one part of it if the condition result is true and another one if it is false.

  • "" is responsible for the basic filtering of datasets by a certain value of a certain key.

  • "" allows running the next task for each of input items separately.

  • "" merges data coming to the pipeline from two or more data sources.

  • "" is a task that allows you to transform, filter, and map data using the functionality of the JQ library.

  • "" converts data in different formats or extracts values.

These tasks make our pipelines very agile and equip you with versatile flexibility of how to orchestrate your data. And a combination of them is only limited by your creativity.

Tasks and human language

Even more interesting, our pipeline tasks are designed in a way so that you can easily describe the entire functionality of the pipeline with one sentence.

For example,

Task

Merge data from two databases, check if the resulting dataset meets conditions, filter it if yes, and process each of the items in the resulting dataset.

Implementation

Such an intuitive conversion from human language to pipelines simplifies the communication between stakeholders and data engineers significantly and also:

  • Establishes a common vocabulary

  • Reduces time on explanations

  • Accelerates implementation

  • Minimizes errors

💼
Aggregator
mathematical functions
Code
Condition
Filter
For Each
Merge
Processor
Transformer