- ๐ญ Working @ Ververica
- ๐ฑ Iโm focusing on Apache Flink and Apache Paimon.
- ๐ I write articles on medium and from time to time @ Rock The JVM Blog
- ๐ฌ Ask me about Apache Flink, Apache Paimon and Streaming Data Systems.
- ๐ซ How to reach me https://www.linkedin.com/in/polyzos/
advent-of-code-flink-paimon's Introduction
advent-of-code-flink-paimon's People
advent-of-code-flink-paimon's Issues
failed to create file:/opt/flink/temp/paimon/default.db/measurements/bucket-0
hello dera;
i used your YAML from Flink Book, and i added Paimon to Jar folder of that and also on Dockerfile and create bellow YAML ( i attached YAML also ) , then i tried using this repositorys Readme.md and guide.md , when i create "measurements" table and other commands, until i run and create this job :
SET 'pipeline.name' = 'Sensor Info Ingestion';
INSERT INTO sensor_info
SELECT * FROM `default_catalog`.`default_database`.sensor_info;
in Flink, Running Jobs i recived this "Root Exception" :
2024-04-23 18:14:17
java.io.IOException: java.io.UncheckedIOException: java.io.IOException: Mkdirs failed to create file:/opt/flink/temp/paimon/default.db/measurements/bucket-0
at org.apache.paimon.flink.sink.RowDataStoreWriteOperator.processElement(RowDataStoreWriteOperator.java:125)
at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:237)
at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:146)
at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:110)
at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:550)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:231)
at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:839)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:788)
at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:952)
at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:931)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:745)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.io.UncheckedIOException: java.io.IOException: Mkdirs failed to create file:/opt/flink/temp/paimon/default.db/measurements/bucket-0
at org.apache.paimon.io.SingleFileWriter.<init>(SingleFileWriter.java:77)
at org.apache.paimon.io.StatsCollectingSingleFileWriter.<init>(StatsCollectingSingleFileWriter.java:58)
at org.apache.paimon.io.RowDataFileWriter.<init>(RowDataFileWriter.java:58)
at org.apache.paimon.io.RowDataRollingFileWriter.lambda$new$0(RowDataRollingFileWriter.java:51)
at org.apache.paimon.io.RollingFileWriter.openCurrentWriter(RollingFileWriter.java:100)
at org.apache.paimon.io.RollingFileWriter.write(RollingFileWriter.java:79)
at org.apache.paimon.append.AppendOnlyWriter.write(AppendOnlyWriter.java:113)
at org.apache.paimon.append.AppendOnlyWriter.write(AppendOnlyWriter.java:51)
at org.apache.paimon.operation.AbstractFileStoreWrite.write(AbstractFileStoreWrite.java:114)
at org.apache.paimon.table.sink.TableWriteImpl.writeAndReturn(TableWriteImpl.java:114)
at org.apache.paimon.flink.sink.StoreSinkWriteImpl.write(StoreSinkWriteImpl.java:159)
at org.apache.paimon.flink.sink.RowDataStoreWriteOperator.processElement(RowDataStoreWriteOperator.java:123)
... 13 more
Caused by: java.io.IOException: Mkdirs failed to create file:/opt/flink/temp/paimon/default.db/measurements/bucket-0
at org.apache.paimon.fs.local.LocalFileIO.newOutputStream(LocalFileIO.java:80)
at org.apache.paimon.io.SingleFileWriter.<init>(SingleFileWriter.java:68)
... 24 more
i think its because, Paimont not or can't create needed files or folder on this direction,
also i checked direction, this direction exist :/opt/flink/temp/paimon/default.db/measurements/ and there is file by the name of schema-0 nad no bucket-0 and all parquet files.
schema-0 s files content is like informations of job:
what i did wrong?!
{
"id" : 0,
"fields" : [ {
"id" : 0,
"name" : "sensor_id",
"type" : "BIGINT"
}, {
"id" : 1,
"name" : "reading",
"type" : "DECIMAL(5, 1)"
} ],
"highestFieldId" : 1,
"partitionKeys" : [ ],
"primaryKeys" : [ ],
"options" : {
"bucket" : "1",
"schema.2.expr" : "PROCTIME()",
"schema.2.data-type" : "TIMESTAMP(3) WITH LOCAL TIME ZONE NOT NULL",
"schema.2.name" : "event_time",
"bucket-key" : "sensor_id",
"file.format" : "parquet"
},
"timeMillis" : 1713882941002
}
version: "3.7"
networks:
redpanda_network:
driver: bridge
volumes:
redpanda: null
services:
redpanda:
command:
- redpanda
- start
- --kafka-addr internal://0.0.0.0:9092,external://0.0.0.0:19092
- --advertise-kafka-addr internal://redpanda:9092,external://localhost:19092
- --pandaproxy-addr internal://0.0.0.0:8082,external://0.0.0.0:18082
- --advertise-pandaproxy-addr internal://redpanda:8082,external://localhost:18082
- --schema-registry-addr internal://0.0.0.0:8081,external://0.0.0.0:18081
- --rpc-addr redpanda:33145
- --advertise-rpc-addr redpanda:33145
- --smp 1
- --memory 1G
- --mode dev-container
- --default-log-level=debug
image: docker.redpanda.com/redpandadata/redpanda:v23.2.3
container_name: redpanda
volumes:
- redpanda:/var/lib/redpanda/data
networks:
- redpanda_network
ports:
- "18081:18081"
- "18082:18082"
- "19092:19092"
- "19644:9644"
console:
container_name: redpanda-console
image: docker.redpanda.com/vectorized/console:v2.3.0
networks:
- redpanda_network
entrypoint: /bin/sh
command: -c 'echo "$$CONSOLE_CONFIG_FILE" > /tmp/config.yml; /app/console'
environment:
CONFIG_FILEPATH: /tmp/config.yml
CONSOLE_CONFIG_FILE: |
kafka:
brokers: ["redpanda:9092"]
schemaRegistry:
enabled: true
urls: ["http://redpanda:8081"]
redpanda:
adminApi:
enabled: true
urls: ["http://redpanda:9644"]
ports:
- "8080:8080"
depends_on:
- redpanda
jobmanager:
build: .
container_name: jobmanager
ports:
- "8081:8081"
- "9249:9249"
command: jobmanager
networks:
- redpanda_network
volumes:
- ./jars/:/opt/flink/jars
- ./logs/flink/:/opt/flink/temp
environment:
- |
FLINK_PROPERTIES=
jobmanager.rpc.address: jobmanager
jobmanager.scheduler: Adaptive
metrics.reporters: prom
metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory
metrics.reporter.prom.port: 9249
state.backend.rocksdb.metrics.actual-delayed-write-rate: true
state.backend.rocksdb.metrics.background-errors: true
state.backend.rocksdb.metrics.block-cache-capacity: true
state.backend.rocksdb.metrics.estimate-num-keys: true
state.backend.rocksdb.metrics.estimate-live-data-size: true
state.backend.rocksdb.metrics.estimate-pending-compaction-bytes: true
state.backend.rocksdb.metrics.num-running-compactions: true
state.backend.rocksdb.metrics.compaction-pending: true
state.backend.rocksdb.metrics.is-write-stopped: true
state.backend.rocksdb.metrics.num-running-flushes: true
state.backend.rocksdb.metrics.mem-table-flush-pending: true
state.backend.rocksdb.metrics.block-cache-usage: true
state.backend.rocksdb.metrics.size-all-mem-tables: true
state.backend.rocksdb.metrics.num-live-versions: true
state.backend.rocksdb.metrics.block-cache-pinned-usage: true
state.backend.rocksdb.metrics.estimate-table-readers-mem: true
state.backend.rocksdb.metrics.num-snapshots: true
state.backend.rocksdb.metrics.num-entries-active-mem-table: true
state.backend.rocksdb.metrics.num-deletes-imm-mem-tables: true
taskmanager1:
build: .
container_name: taskmanager1
depends_on:
- jobmanager
command: taskmanager
networks:
- redpanda_network
ports:
- "9250:9249"
volumes:
- ./logs/flink/:/opt/flink/temp
environment:
- |
FLINK_PROPERTIES=
jobmanager.rpc.address: jobmanager
taskmanager.numberOfTaskSlots: 5
metrics.reporters: prom
metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory
metrics.reporter.prom.port: 9249
state.backend.rocksdb.metrics.actual-delayed-write-rate: true
state.backend.rocksdb.metrics.background-errors: true
state.backend.rocksdb.metrics.block-cache-capacity: true
state.backend.rocksdb.metrics.estimate-num-keys: true
state.backend.rocksdb.metrics.estimate-live-data-size: true
state.backend.rocksdb.metrics.estimate-pending-compaction-bytes: true
state.backend.rocksdb.metrics.num-running-compactions: true
state.backend.rocksdb.metrics.compaction-pending: true
state.backend.rocksdb.metrics.is-write-stopped: true
state.backend.rocksdb.metrics.num-running-flushes: true
state.backend.rocksdb.metrics.mem-table-flush-pending: true
state.backend.rocksdb.metrics.block-cache-usage: true
state.backend.rocksdb.metrics.size-all-mem-tables: true
state.backend.rocksdb.metrics.num-live-versions: true
state.backend.rocksdb.metrics.block-cache-pinned-usage: true
state.backend.rocksdb.metrics.estimate-table-readers-mem: true
state.backend.rocksdb.metrics.num-snapshots: true
state.backend.rocksdb.metrics.num-entries-active-mem-table: true
state.backend.rocksdb.metrics.num-deletes-imm-mem-tables: true
taskmanager2:
build: .
container_name: taskmanager2
depends_on:
- jobmanager
command: taskmanager
networks:
- redpanda_network
ports:
- "9251:9249"
volumes:
- ./logs/flink/:/opt/flink/temp
environment:
- |
FLINK_PROPERTIES=
jobmanager.rpc.address: jobmanager
jobmanager.scheduler: Adaptive
taskmanager.numberOfTaskSlots: 5
metrics.reporters: prom
metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory
metrics.reporter.prom.port: 9249
state.backend.rocksdb.metrics.actual-delayed-write-rate: true
state.backend.rocksdb.metrics.background-errors: true
state.backend.rocksdb.metrics.block-cache-capacity: true
state.backend.rocksdb.metrics.estimate-num-keys: true
state.backend.rocksdb.metrics.estimate-live-data-size: true
state.backend.rocksdb.metrics.estimate-pending-compaction-bytes: true
state.backend.rocksdb.metrics.num-running-compactions: true
state.backend.rocksdb.metrics.compaction-pending: true
state.backend.rocksdb.metrics.is-write-stopped: true
state.backend.rocksdb.metrics.num-running-flushes: true
state.backend.rocksdb.metrics.mem-table-flush-pending: true
state.backend.rocksdb.metrics.block-cache-usage: true
state.backend.rocksdb.metrics.size-all-mem-tables: true
state.backend.rocksdb.metrics.num-live-versions: true
state.backend.rocksdb.metrics.block-cache-pinned-usage: true
state.backend.rocksdb.metrics.estimate-table-readers-mem: true
state.backend.rocksdb.metrics.num-snapshots: true
state.backend.rocksdb.metrics.num-entries-active-mem-table: true
state.backend.rocksdb.metrics.num-deletes-imm-mem-tables: true
postgres:
image: postgres:latest
networks:
- redpanda_network
container_name: postgres
restart: always
environment:
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=postgres
ports:
- '5432:5432'
prometheus:
image: prom/prometheus:latest
networks:
- redpanda_network
container_name: prometheus
command:
- '--config.file=/etc/prometheus/config.yaml'
ports:
- "9090:9090"
volumes:
- ./prometheus:/etc/prometheus
# - ./prom_data:/prometheus
grafana:
image: grafana/grafana
networks:
- redpanda_network
container_name: grafana
ports:
- "3000:3000"
restart: unless-stopped
environment:
- GF_SECURITY_ADMIN_USER=grafana
- GF_SECURITY_ADMIN_PASSWORD=grafana
volumes:
- ./grafana/provisioning:/etc/grafana/provisioning
- ./grafana/dashboards:/var/lib/grafana/dashboards
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.