digoal / blog Goto Github PK

View Code? Open in Web Editor NEW

8.0K 663.0 1.9K 1.69 GB

Opensource,Database,AI,Business,Minds. git clone --depth 1 https://github.com/digoal/blog

Home Page: https://github.com/digoal/blog/blob/master/README.md

License: GNU General Public License v2.0

Shell 7.17% PLpgSQL 70.12% HTML 22.72%

postgresql greenplum enterprisedb database gpdb hawq mysql mongodb pgsql postgres

blog's People

Contributors

Stargazers

Watchers

Forkers

hooqee michael-ancestor scottsiu-cn giserh franktom leitianqi vincentchen whatcat moxnet zhuqling hexiaoting pangpangcc lamchan2844 terminatorheart zhangconan uxlsl uranusmars xuegang xdcs100 hengyunabc arden2600 yaowenqiang yanggongwang printff jin-yang shyxingang zewailyan luyuncheng polokobe vviles changmen sjtuhjh privatecollections weifansym hzyzhdmz manyuanbin u20024804 oolongd swrd risent waylandfield kangzhenkang yangphere quenlang farck arcticjian awesomeleo serennity liuzhliang bickshaw lb1980 zhangshunrao sanwan figmar javacspring freeidea fuzw123 windy0911 lhwsysu kevin995 wangyipengpeter lsxredrain makai-china jiahhu songdi676 jsycllh2007 daweng jennie2017 yutellite jfhyn littleji lemonhall shicrom qzren1982 luoshulin wengda xialinlin xinyuan6009 coolwxb chinahecom yinrcode longzmkm linmingming joejiafeng xiayangy pning scorpiusjin bingoko myhover xiehongan123 krystaljingjing syqh826 sunzy07 tonghaibo eagle518 linux1689 dairymix ryanyntc2013 days72115 yvlf

blog's Issues

关于并行的问题

我通过partition分表了五个表，然后开启并行进行查询数据，在查询的过程中我发现好像开启并行速度更慢了。这是怎么一回事

请问一下如何在高并发多事务环境下增加字段

大神，请教一下，如何在高并发多事务环境下给数据表增加字段，目前我们尝试增加字段时，执行语句一直都是 waiting 状态，还会导致死锁问题。

PG对于JSONB的支持问题

如果某个字段是Jsonb的，存储的json数据，能否对json数据中的某个字段进行聚合操作或者关联操作呢

[root@Centos7_1 rum-master]# make USE_PGXS=1
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -We
ndif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -f
wrapv -fexcess-precision=standard -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fe
xceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switch
es -m64 -mtune=generic -fPIC -I. -I./ -I/usr/pgsql-10/include/server -I/usr/pgsq
l-10/include/internal -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include -c
-o src/rumsort.o src/rumsort.c
In file included from src/rumsort.c:138:0:
/usr/pgsql-10/include/server/utils/probes.h:10:21: fatal error: sys/sdt.h: No su
ch file or directory
#include <sys/sdt.h>

请问需要安装哪个组件吗?

安装smlar插件失败

安装smlar插件失败，提示是缺少了文件。请问一下这个文件怎么获取？

我是参照这里的步骤搞的：http://blog.databasepatterns.com/2014/07/postgresql-install-smlar-extension.html

以下为报错代码：
[root@localhost smlar]# USE_PGXS=1
[root@localhost smlar]# make
gcc -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -DLINUX_OOM_SCORE_ADJ=0 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -fPIC -I. -I. -I/usr/include/pgsql/server -I/usr/include/pgsql/internal -D_GNU_SOURCE -I/usr/include/libxml2 -c -o smlar.o smlar.c
smlar.c:6:33: fatal error: access/htup_details.h: No such file or directory
#include "access/htup_details.h"
^
感谢德哥支持。谢谢

使用smlar插件, 相似度计算公式采用tfidf，效率和你列出的差别特别大

我参照网页上你的步骤1步步做的，
数据是从https://www.mockaroo.com/c5418bd0，通过命令curl "https://api.mockaroo.com/api/c5418bd0?count=100000&key=b21830e0" > "tfidf.csv"下载的，一次只能下载5000，所以我下载了20次，得到10万数据，

然后我的执行计划：

postgres=# explain (analyze,verbose,timing,costs,buffers) select
postgres-# *,
postgres-# smlar( tokenize(body), '{Etiam,pretium,iaculis}'::text[] )
postgres-# from
postgres-# documents
postgres-# where
postgres-# tokenize(body) %% '{Etiam,pretium,iaculis}'::text[] -- where TFIDF similarity >= smlar.threshold
postgres-# order by
postgres-# smlar( tokenize(body), '{Etiam,pretium,iaculis}'::text[] ) desc
postgres-# limit 10;
QUERY PLAN

Limit (cost=368.05..368.07 rows=10 width=258) (actual time=4957.830..4957.832 rows=10 loops=1)
Output: document_id, body, (smlar(regexp_split_to_array(lower(body), '[^[:alnum:]]'::text), '{Etiam,pretium
,iaculis}'::text[]))
Buffers: shared hit=4870
-> Sort (cost=368.05..368.30 rows=100 width=258) (actual time=4957.829..4957.829 rows=10 loops=1)
Output: document_id, body, (smlar(regexp_split_to_array(lower(body), '[^[:alnum:]]'::text), '{Etiam,p
retium,iaculis}'::text[]))
Sort Key: (smlar(regexp_split_to_array(lower(documents.body), '[^[:alnum:]]'::text), '{Etiam,pretium,
iaculis}'::text[])) DESC
Sort Method: top-N heapsort Memory: 26kB
Buffers: shared hit=4870
-> Bitmap Heap Scan on public.documents (cost=17.05..365.89 rows=100 width=258) (actual time=111.75
7..4957.313 rows=666 loops=1)
Output: document_id, body, smlar(regexp_split_to_array(lower(body), '[^[:alnum:]]'::text), '{Et
iam,pretium,iaculis}'::text[])
Recheck Cond: (regexp_split_to_array(lower(documents.body), '[^[:alnum:]]'::text) %% '{Etiam,pr
etium,iaculis}'::text[])
Rows Removed by Index Recheck: 17737
Heap Blocks: exact=3525
Buffers: shared hit=4870
-> Bitmap Index Scan on documents_tokenize_idx (cost=0.00..17.03 rows=100 width=0) (actual ti
me=95.413..95.413 rows=18403 loops=1)
Index Cond: (regexp_split_to_array(lower(documents.body), '[^[:alnum:]]'::text) %% '{Etia
m,pretium,iaculis}'::text[])
Buffers: shared hit=1345
Planning time: 0.456 ms
Execution time: 4957.952 ms
(19 rows)

而你的类似的查询，才7毫秒左右，我shared hit的特别多，我自己中文的文档，10万文档，然类似查询的话，时间也在几千毫秒左右，这是哪边配置不对吗，
set smlar.type = tfidf;
set smlar.stattable = 'documents_body_stats'
set smlar.persistent_cache=true;
set smlar.threshold =0.4;
set smlar.tf_method = 'n';

pg_awr的snap_database函数问题咨询

当入参erase_stats不填或为true时，会执行pg_stat_reset清理统计信息，这个会影响系统vacuum吧，原意是不是要清理pg_stat_statements中的数据

关于分区表中使用类似on conflict 语句限制

我看了德哥的博客说pg 11支持这类语句，但是现在还是10.3版本，如何下载11版本测试呢，另外有没有什么方式也能实现 on conflict update 这种功能呢。我试了触发器，好像实现不了。

下面这几项，是关键。什么时间能写出一点呢？感谢。

5 高可用
9 读写分离
10 水平分库

OpenStreetMap import will break if links aren't updated before May 7

http://planet.openstreetmap.org is available over https, and will start redirecting to https on 2018-05-07.

Your project seems to be using osmosis, which will not follow redirects by default.

Would it please be possible to update all files from http://planet.openstreetmap.org to httpS://planet.openstreetmap.org before then?

Ideally, would it be possible to make all requests to openstreetmap.org over https? All services respond over https and will start redirecting soon.

postgresql拆分表后是怎么查询合并的

https://github.com/digoal/blog/blob/master/201612/20161225_01.md?spm=5176.100239.blogcont96791.37.plfwnu&file=20161225_01.md
请问你这篇文章里面把数据拆分N张表后，查询是怎么查询的，没看懂，数据是怎么合并的呢，求大神指点一下

关于阿里云RDS金融数据库的问题

德哥你好，
请教下在阿里云RDS金融数据库(三节点版) - 案例篇中，你提到了主从服务器分别放置在不同机房以异地容灾。这种情况下raft的性能会不会出现较大的下滑？在你们的生产环境中，异地机房和同城机房相比性能大概相差多少？

我们曾经在自己的生产环境下试验过跨机房raft的性能。在跨机房latency 40ms的情况下，raft的性能出现了很明显的下降，所以raft只用在同机房，跨机房还是只用async。想请教你们是否对跨机房有特别的优化。

请教一个Bitmap Heap Scan的问题（PostgreSQL 9.5版本）

explain ANALYSE select user_nickname, rating from plat_user where user_nickname like '%小宝%' order by rating limit 10;

表中有建过索引：CREATE INDEX "plat_user$nickname" ON plat_user USING gin (user_nickname gin_bigm_ops, rating)

有看过一些资料，似乎说gin的索引，实现有个小瑕疵，他会先做Bitmap Index Scan，再进行Bitmap Heap Scan。我的实际运行结果如下（用了17秒，总共表中数据大概2亿行左右）：

Limit (cost=5480.08..5480.11 rows=10 width=31) (actual time=17172.258..17172.260 rows=10 loops=1)
-> Sort (cost=5480.08..5482.22 rows=857 width=31) (actual time=17172.256..17172.256 rows=10 loops=1)
Sort Key: rating
Sort Method: top-N heapsort Memory: 26kB
-> Bitmap Heap Scan on plat_user (cost=2094.64..5461.56 rows=857 width=31) (actual time=12.365..17162.804 rows=13200 loops=1)
Recheck Cond: (user_nickname ~~ '%小宝%'::text)
Heap Blocks: exact=13066
-> Bitmap Index Scan on "plat_user$nickname" (cost=0.00..2094.43 rows=857 width=0) (actual time=9.651..9.651 rows=13687 loops=1)
Index Cond: (user_nickname ~~ '%小宝%'::text)
Planning time: 0.135 ms
Execution time: 17172.290 ms

问题如下：
1.最新版的PostgreSQL是否对gin有所改进，是否能直接做Bitmap Index Scan，而不用再次Bitmap Heap Scan（这个感觉速度超慢）？

2.因为是新手，接手别人的项目，是否建的索引有问题？（公司的要求比较简单，就是**like %昵称片段% + rating** 排序。）

谢谢各位大哥，帮我看看！

关于测试环境

请问您测试的存储介质是磁盘还是SSD?SSD是什么接口？

建议避免在标题中使用 "一天学会" 这样的字眼

没有什么东西是不靠长期经验积累就学会的吧. 否则会让初学者有些浮躁.

"快速入门" 也比 "一天学会" 好一些.

test01

test

WAL日志的创建时间或恢复时间可否查询？可以的话该如何查询

徳哥，感谢你的博客分享，我想请教下WAL日志的创建时间或恢复时间可否查询？可以的话该如何查询。我知道备份后backup_label里面有个START TIME，pg_xlogdump日志也没有关于时间的描述。

PostgreSql 商业支持

请问德哥，目前市面上；对postgres有商业支持的公司？最近公司在做本地化部署的时候，客户运维人员不知道使用postgres，要求替换成mysql。原因是postgres没有商业支持。

jdbc driver connect segment

您好，德哥，请教个问题，我想通过jdbc连接segment，通过运维模式，但是，出现授权连接卡住现象，以上卡住是使用gp驱动；然后我反转到pg驱动，跟踪代码，pg驱动是卡在了ssl阶段，于是关闭ssl，关闭ssl后出现闪断，不知道前辈遇到过没，谢谢指教

备注：Segment本地psql可以连接 PGOPTIONS='-c gp_session_role=utility' psql -h 127.0.0.1 -p 6000 -U gpadmin -w -d test，远程PSQL也不可以，也是卡住

postgresql in memory

PostgreSQL varbitx 插件能安装吗

你好，PostgreSQL varbitx 插件能安装吗？现在可供下载install吗？

请教postgreSQL商业化应用问题

Digoal, 您好!

我想请教一个关于postgreSQL商业化的问题. 山东瀚高好像是目前国内唯一提供企业级PostgreSQL部署和维护的公司, 他们也推出了自己的一个商业化版本. 我想请教一下, 对于中等规模企业来说, 如果希望通过PostgreSQL完成Oracle替代, 选择一家商业化服务公司是否靠谱, 以及您对山东瀚高有什么看法, 是否有其他更好的推荐?

感谢!

请教，关于使用convert函数生成字节流的问题

0看完文章，很感谢！现在遇到一个问题，请教下
1查询“金”在GBK中的编码是BDF0，现在在数据库中使用convert函数(转换成GBK)，转换后的结果是"\275\360"(可见下方代码，这三条数据得到的是一样的结果)，这个结果感觉和GBK中的编码值对不上，请帮忙看下是哪里出了问题？
2个人猜想以下方面，但是没有经验，能否帮忙看下
(1)第三个字段的数据类型定义的不对
(2)函数用的不对
(3)该显示受到客户端编码类型的影响
(4)字节流，和真实的值要经过转换

代码：
create table tbl_h
(
id int primary key,
wenzi varchar(30),
code bytea
)

insert into tbl_h values(1,'金',convert('金'::bytea,'UTF8','GBK'));
insert into tbl_h values(2,'金',convert('金','UTF8','GBK'));
insert into tbl_h values(3,'金',convert_to('金','GBK'));

结果
id wenzi code
1 金 \275\360
2 金 \275\360
3 金 \275\360

关于"实时数据交换平台 - BottledWater-pg with confluent"博客中的架构问题

德哥
想请教一下,图中所述的为相关数据先经过了PG数据库,再通过插件导入Kafka,最终消费.
为何不是直接导入到Kafka再进入PG呢?这样从可用的角度上是不是更好?还是博客中提到的另有优势呢?
还是说是重点在于使用bottledwater-pg 这个插件的使用情景限制住了?
谢谢!

关于《使用PostgreSQL逻辑订阅实现multi-master》另一种思路

个人感觉好像可以这么干。如 A ，B 库。对 customer_table 表实现双向写入共享数据。

将 customer_table 改成视图，同时新建表 customer_self ， customer_remote，两个表结构完全一致，customer_table 视图 union 这两张表
给 customer_table 视图添加触发器，当捕捉到 insert 操作时， insert到 customer_self 表
把 B 库的 customer_self 逻辑复制到 A 库的 customer_remote ；把 A 库的 customer_self 逻辑复制到 B 库的 customer_remote
上层业务逻辑，继续操作 customer_table ，而它其实已经是个视图
大功告成

int128与pghashlib存在冲突

德哥，打开int128支持后会与一些其它支持int128的extension冲突，比如pghashlib。我记得是因为int128的一个宏定义。

请教全字段模糊搜索中创建record_to_text报错

德哥的文章已经写得非常详细了，但我在学习过程中出现了一点小小的问题，而无法创建record_to_text函数

search_text=# create table t(phonenum text, info text, c1 int, c2 text, c3 text, c4 timestamp); 
Command OK

search_text=# insert into t values ('13888888888','i am digoal, a postgresqler',123,'china','中华人民共和国，阿里巴巴，阿',now());    
Command OK - 1 row affected

search_text=# select * from t;    
+-------------+-----------------------------+-----+-------+------------------------------+-------------------------+
| phonenum    | info                        | c1  | c2    | c3                           | c4                      |
+-------------+-----------------------------+-----+-------+------------------------------+-------------------------+
| 13888888888 | i am digoal, a postgresqler | 123 | china | 中华人民共和国，阿里巴巴，阿 | 2017-06-23 14:34:09.732 |
+-------------+-----------------------------+-----+-------+------------------------------+-------------------------+
1 row in set

search_text=# create extension pg_trgm;  
Command OK

search_text=# create or replace function record_to_text(anyelement) returns text as $$  
  select $1::text;                        
$$ language sql strict immutable;  
ERROR:  unterminated dollar-quoted string at or near "$$
  select $1::text;"
LINE 1: ...ace function record_to_text(anyelement) returns text as $$
                                                                   ^
ERROR:  unterminated dollar-quoted string at or near "$$ language sql strict immutable;"
LINE 1: $$ language sql strict immutable;
        ^
search_text=#

我不清楚自己执行中发生了什么问题，我既希望在后续的学习中理解和处理好自己的问题，也渴望得到些许帮助。

致谢！

另：我所参考学习的文章
https://github.com/digoal/blog/blob/master/201701/20170106_04.md

postgres集群HA自动切换

你好，请问如何不使用第三方工具实现postgres集群HA自动切换呢？手册里面有提到。可以使用一种简单的超时功能，特别是和主服务器上已知的archive_timeout设置一起（26.3 故障转移）。但是没有给出具体方法。请大神指点。谢谢。

http://pan.baidu.com/s/1pKVCgHX 連結失效

如題：

百度盤的連結顯示404失效

（视频音源修复）德哥 postgresql-PostgreSQL 数据库管理与优化培训视频 5天-PostgreSQL 9.3 DBA 培训 5天

链接:http://pan.baidu.com/s/1nv5NSqH 密码:7194

Enterprisedb oci-dblink的问题

1、我最近在使用Enterprisedb 10.4，然后参照下面的文章：https://www.enterprisedb.com/docs/en/10.0/Ora_Reference_Guide_v10/Database_Compatibility_for_Oracle_Developers_Reference_Guide.1.035.html

在pg admin里面输入了下面这些代码
CREATE DATABASE LINK chicago
CONNECT TO sue IDENTIFIED BY 'mypassword'
USING oci '//127.0.0.1/acctg';

但是却报了下面这个错，很疑惑，语法都是按照文档上来的，为什么会有这个问题呢
ERROR: syntax error at or near "TO"
SQL state: 42601
Character: 40

palaemon有什么资料吗？

pg 9.6 事务锁问题

德哥你好：
pg9.6 生产环境遇到大量 wait_event_type = Lock wait_event_name = transactionid的锁信息。

一般出现这种锁是因为同一张表执行一个事务后未提交(例：update操作)，此时另一个事务开始执行update操作。此时会出现wait_event_type = Lock wait_event_name = transactionid 这么一条记录。

反复查阅代码后，并未发现有开启事务后未提交的 bad code。
请问下德哥，还有其他原因会导致上述现象吗？

Greenplum执行了vacuum freeze，从节点的age都降下来了，主节点却一点儿不降

请教一下，greenplum上按节点执行了vacuum freeze，从节点的age都降了，而主节点依旧不降。执行vacuum freeze日志显示执行成功，但实际没有效果。
已检查没有临时表和schema。
请教大神指点。

请教：关于pg_trgm中文支持问题

参考中文模糊查询性能优化 by PostgreSQL trgm 及网上其他文章，都说pg_trgm已经支持multi-byte character，只要collate和ctype不是C就行，但是我自己实验还是不行

我在本地安装的10.1版本，或者是在Aliyun创建的9.4版本，结果都是一样
请问可能是什么地方出了问题？

pg_awr中函数snap_delete(i_reserved int)问题

文件：blog/201611/20161123_01_sql_001.sql

函数：nap_delete(i_reserved int)

具体说明：入参i_reserved没有使用到，具体代码

where id < (select id from snap_list order by id desc limit 1 offset 2)

范围查询和in查询优化

创建个新表create table a(id int,t_date timestamp,name varchar);
添加索引create index idx_date on a using btree(t_date desc);
create index idx_date_name on a using btree(t_date desc,name);
查询语句如：select * from a where t_date >='time1' and t_date <'time2' and name in ('a','b','c') order by t_date desc limit 10;
这样的查询执行计划会走单时间索引，选择出时间范围的数据在过滤name 数组，而不会像Oracle一样直接走date和name的联合索引直接查询？
希望得到德哥的解释，谢谢。

Can greenplum install pgAgent?

有关pgpool的问题请教

@digoal 德哥，有没有pgpool相关的学习视频我看看呀，最近弄pgpool很痛苦，一些思路就是想不明白。求指教。

PostgreSQL 递归查询的效率问题

德哥，Postgre查询树形结构的数量，
WITH RECURSIVE sub_member AS (
SELECT a., 0 AS level FROM member a WHERE id = 5
UNION ALL
SELECT b., level+1 FROM member b
INNER JOIN sub_member c ON b.parent_id = c.id
) SELECT count(*) FROM sub_member WHERE id <> 5;
上百万的记录，这样查询出来要十几秒，该怎样去优化？

imgsmlr插件安装失败，博主能提供帮助么

错误: 无法访问文件 "$libdir/imgsmlr": 没有那个文件或目录

请教关于Greenplum roaring bitmap的问题

hi,您好！
我在尝试使用"Greenplum roaring bitmap与业务场景 (类阿里云RDS PG varbitx, 应用于海量用户实时画像和圈选、透视)"文章提到的方法优化查询时，Master 执行PREFUNC时，遇到了一个段错误,定位到代码错误位置在如下代码:

// Is the second argument non-null?
if (!PG_ARGISNULL(1)) {

    **r2 = (roaring_bitmap_t *) PG_GETARG_POINTER(1);**  

    if (PG_ARGISNULL(0)) {  
        r1 = roaring_bitmap_copy(r2);  
    } else {  
        roaring_bitmap_and_inplace(r1, r2);  
    }  
    roaring_bitmap_free(r2);  
}

其中获取的变量 r2 在使用时,会引起段错误，猜想是内存访问越界或者内存数据结构错误导致的，我使用的 Greenplum 版本是 5.11.1 PostgreSQL 8.3 。
您遇到过这样的问题么？

复合查询SQL优化问题 - ( 范围检索 , IN检索加时间倒序）

如德哥写的**PostgreSQL 一复合查询SQL优化例子 - (多个exists , 范围检索 , IN检索 , 模糊检索组合)**这篇文章如果条件改成create table test(id int, c1 text, c2 date, c3 text);
select * from test
where
c1 in ('1','2','3')
and c2 between current_date-1 and current_date
oder by c2 desc limit 10;
这样的查询语句创建btree联合索引create index idx_c2_c1_test on test(c2,c1)；
使用这个索引只会读取时间范围，而c1的过滤却要回表过滤，而Oracle是会在索引上全部过滤掉的，索引性能pg不如Oracle快。
使用gin索引好像速度也不是很快。
感谢能给出优化方法或思路，万分感谢。