Ylem documentation
  • 🗒️General information
    • Introduction to Ylem
    • Quick start guide
    • Release notes
  • 🔬Open-source edition
    • Installation
    • Usage of Apache Kafka
    • Task processing architecture
    • Configuring integrations with .env variables
  • 💡Integrations
    • Connecting an integration
    • Library of integrations
      • Amazon Redshift
      • Apache Kafka
      • APIs
      • Atlassian Jira
      • AWS Lambda
      • AWS RDS
      • AWS S3
      • ClickHouse
      • ElasticSearch
      • E-mail
      • Google Big Query
      • Google Cloud SQL
      • Google Pub/Sub
      • Google Sheets
      • Immuta
      • Incident.io
      • Jenkins
      • Hubspot
      • Microsoft Azure SQL
      • MySQL
      • OpenAI ChatGPT
      • Opsgenie
      • PostgreSQL
      • PlanetScale
      • RabbitMQ
      • Salesforce
      • Slack
      • Snowflake
      • Tableau
      • Twilio. SMS
      • WhatsApp (through Twilio)
    • Initial demo data source
  • 🚡Pipelines
    • Pipeline management
    • Tasks
      • Aggregator
      • API Call
      • Code
      • Condition
      • External trigger
      • Filter
      • For each
      • GPT
      • Merge
      • Notification
      • Query
      • Pipeline runner
      • Processor
      • Transformer
    • Running and scheduling pipelines
    • Library of templates
    • Environment variables
    • Mathematical functions and operations
    • Formatting of messages
  • 📈Statistics and profiling
    • Statistics of runs
    • Slow tasks
  • 📊Metrics
    • Metric management
    • Using previous values of a metric
  • 💼Use cases, patterns, templates, examples
    • Use cases
    • Messaging patterns
      • Datatype Channel
      • Message Dispatcher
      • Messaging Bridge
      • Message Bus
      • Message Filter
      • Message Router
      • Point-to-Point Channel
      • Publish-Subscribe Channel
      • Pull-Push
    • Functional use cases
      • Streaming from Apache Kafka and messaging queues
      • Streaming from APIs
      • Streaming from databases
      • Data orchestration, transformation and processing
      • Usage of Python and Pandas
      • KPI Monitoring
      • OKRs and custom metrics
      • Data Issues & Incidents
      • Reporting
      • Other functional use cases
    • Industry-specific use cases
      • Finance and Payments
      • E-commerce & Logistics
      • Customer Success
      • Security, Risk, and Anti-Fraud
      • Anti-Money Laundering (AML)
  • 🔌API
    • OAuth clients
    • API Reference
  • 👁️‍🗨️Other resources
    • FAQ
    • Our blog on Medium
Powered by GitBook
On this page

Was this helpful?

Edit on GitHub
  1. Pipelines
  2. Tasks

Merge

PreviousGPTNextNotification

Last updated 1 year ago

Was this helpful?

"Merge" is a task that is used when you have two or more datasets coming from different sources and want to merge them into one dataset.

These are the possible resulting options for the merge:

  • If two rows have the same values in all fields enumerated in this parameter, they will be merged.

    • If you add a field "id" — the block will merge rows with the same IDs,

    • If you add fields "id" and "name" — rows with the same ID and name will be merged.

  • If you leave key field names empty, all rows from all datasets will be merged.

In the following examples, items with the same id and type will be merged into one:

🚡