中文分词工具的比较

五款中文分词工具的比较,尝试的有jieba,SnowNLP,thulac(清华大学自然语言处理与社会人文计算实验室),StanfordCoreNLP,pyltp(哈工大语言云),环境是Win10,anaconda3.7

  1. 安装
Jieba:
pip install jieba

SnowNLP:
pip install snownlp

thulac:
pip install thulac

StanfordCoreNLP:
pip install stanfordcorenlp
下载CoreNLP并解压,将中文包下载并解压至CoreNLP文件夹

pyltp:
pip install pyltp,安装失败提示c++14 missing,手动编译失败,换成centos安装依然失败,最终因为安装太麻烦放弃
  1. 运行
a = 'Jimmy你怎么看'

import jieba.posseg as pseg
ws = pseg.cut(a)
for i in ws:
    print(i)

import thulac
thu1 = thulac.thulac()
text = thu1.cut(a)
print(text)

from stanfordcorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP(r'./stanford-corenlp-full-2018-10-05/', lang='zh')
print(nlp.pos_tag(a))

from snownlp import SnowNLP
s = SnowNLP(a)
t = s.tags
for i in t:
    print(i)
  1. 结果

只有Thulac的结果比较特别,StanfordCoreNLP的运行占用大量内存和CPU,尝试另一句话‘这本书很不错’,jieba无法分出‘本’,其他都可以完整分词,不过StanfordCoreNLP依然占用大量内存和CPU。

Jieba:
Jimmy/eng
你/r
怎么/r
看/v

Thulac:
Model loaded succeed
[['Jimmy你怎', 'x'], ['么', 'u'], ['看', 'v']]

SnowNLP:
[('Jimmy', 'NR'), ('你', 'PN'), ('怎么', 'AD'), ('看', 'VV')]

StanfordCoreNLP:
('Jimmy', 'p')
('你', 'r')
('怎么', 'r')
('看', 'v')

附图于:https://yq.aliyun.com/articles/649888?spm=a2c4e.11155435.0.0.28093312W2Bo18

国内建筑工程中BIM软件的综述

国内建筑工程中BIM软件的综述 原文:https://www.sciencedirect.com/science/article/pii/S0959652618323709

文章回顾总结了近年来,见于文字的工程领域的BIM软件。其中Autodesk旗下的Revit, Navisworks在期刊、论文、新闻报道中占据领先地位,国产软件鲁班大幅上升。

Building information modeling (BIM) is considered as a vital technology to achieve building sustainability in China. Many local companies have developed BIM assisting tools to accelerate the transformation of the Architecture, Engineering, Construction (AEC) industry. The industry believed the use of BIM tools would increase the informatization. However, no investigations have been made to evaluate the status of the tools providing BIM service in China. This study reviewed the typical features of BIM tools in China and illustrated the workflows of some representative tools. The common tools were counted. The different building projects where the tools were used and common functions of these tools were summarized. An assessment of their performances based on eleven criteria was conducted after the illustrations. The review of the tools finds that BIM software can be classified into three types, integrated platforms, instant service services, and supplementary works based on the workflow illustrations. The collaboration of these three types would contribute to cleaner production. The assessment revealed that the aspects where these tools played a role in improving engineering efficiency and what areas were needed to be improved. Three suggestions and three potential schemes of BIM tools in construction projects are discussed at the end. Developers need to further explore the cloud data usages, and thus, collaborations of different tools can be promoted.

Centos block IPs failed many times

来自
https://yq.aliyun.com/articles/624167?spm=a2c4e.11155435.0.0.49c63312Ds2gU9
vi /usr/local/bin/secure_ssh.sh

#! /bin/bash

cat /var/log/secure|awk ‘/Failed/{print $(NF-3)}’|sort|uniq -c|awk ‘{print $2″=”$1;}’ > /usr/local/bin/black.list

for i in `cat /usr/local/bin/black.list`

do

IP=`echo $i |awk -F= ‘{print $1}’`

NUM=`echo $i|awk -F= ‘{print $2}’`

if [ ${#NUM} -gt 1 ]; then

grep $IP /etc/hosts.deny > /dev/null

if [ $? -gt 0 ];then

echo “sshd:$IP:deny” >> /etc/hosts.deny

fi

fi

done

将secure_ssh.sh脚本放入cron计划任务,每1分钟执行一次。
vi /var/spool/cron/root

*/1 * * * * sh /usr/local/bin/secure_ssh.sh

看看服务器上的黑名单文件:
cat /usr/local/bin/black.txt

再看看服务器上的hosts.deny
cat /etc/hosts.deny

更多参考:

https://blog.csdn.net/ausboyue/article/details/53691953

http://huikon.cn/post-330.html

https://www.cnblogs.com/panblack/p/secure_ssh_auto_block.html

Block names of celebrities or stars

From
https://yq.aliyun.com/articles/627989?spm=a2c4e.11155435.0.0.49c63312Ds2gU9

About ten years ago, Baidu has been the front page of my browser. It was very handful. However, it became difficult to find information from other useless junks, not to mention its recommendations on the right side.

Somebody got cheated/married/divorced/ or whatever. I really do not care.

The simplest and direct method is to add the filters to Adblock on Chrome.

baidu.com###content_right

baidu.com##.cr-offset

To not seeing their names, I wrote some lines. When Names becomes too long, the duration of onload becomes longer. So, currently, I am keeping the array short.

var Names = new Array(“范冰冰”,”王思聪”,”孙杨”,”李晨”,”迪丽热巴”,”宁泽涛”,”傅园慧”,”鄢军”,”周立波”,”贾乃亮”,”火箭少女”,”吴亦凡”,”鹿晗”,”关晓彤”,”逐梦演艺圈”,”科比”,”李易峰”,”杨洋”);

String.prototype.myReplace = function(f,e) {

var reg=new RegExp(f,”g”);

return this.replace(reg,e);

};

window.onload = function () {

for (i in Names)

{

document.body.innerHTML = document.body.innerHTML.myReplace(Names[i], “Somebody”);

}

}

Mongodb

yum -y update

编辑Mongodb安装源

vim /etc/yum.repos.d/mongodb-org-3.6.repo

编辑内容如下:

[mongodb-org-3.6]

name=MongoDB Repository

baseurl=https://repo.mongodb.org/yum/redhat/$releasever/mongodb-org/3.6/x86_64/

gpgcheck=1

enabled=1

gpgkey=https://www.mongodb.org/static/pgp/server-3.6.asc

安装

yum install -y mongodb-org

修改mongodb配置文件

vim /etc/mongod.conf

systemctl start mongod.service //启动服务

systemctl stop mongod.service //停止服务

systemctl restart mongod.service //重启服务

删除安装的包yum erase $(rpm -qa | grep mongodb-org)

删除数据及日志

rm -r /var/log/mongodb

rm -r /var/lib/mongo

 

更多内容参考:

https://www.jianshu.com/p/d09506c64fcd

https://www.cnblogs.com/hackyo/p/7967170.html

Centos创建以太坊私有链

首先第一部当然是更新,然后使用yum命令安装golang语言

yum –y updateyum install golanggit clone https://github.com/ethereum/go-ethereum.git  cd go-ethereum  make allcd go-ethereummake all

这里出现了编译错误,提示go的版本1.8.3,编译需要最少1.9。前往https://www.golangtc.com/download goland中国下载go1.10.1.linux-amd64.tar.gz,再次安装前需要卸载原有的golang,yum –y remove golang

tar -C /usr/local -xzf go1.10.1.linux-amd64.tar.gzexport PATH=$PATH:/usr/local/go/bin

重复make,在go-ethereum/build/bin使用ll命令,会罗列文件夹内的在go-ethereum/build/bin目录下创建init.json的文本文件,内容如下:

{  “config”: {        “chainId”: 14,        “homesteadBlock”: 0,        “eip155Block”: 0,        “eip158Block”: 0    },  “alloc”      : {},  “coinbase”   : “0x0000000000000000000000000000000000000000”,  “difficulty” : “0x05000”,  “extraData”  : “”,  “gasLimit”   : “0x2fefd8”,  “nonce”      : “0x0000000000000042”,  “mixhash”    : “0x0000000000000000000000000000000000000000000000000000000000000000”,  “parentHash” : “0x0000000000000000000000000000000000000000000000000000000000000000”,  “timestamp”  : “0x00”}

保存并退出文件,同目录下执行./geth –datadir “/app/chain” init init.json完成创世区块的创建,同目录下执行./geth –rpc –rpccorsdomain “*” –datadir “/app/chain” –port “30303” –rpcapi “db,eth,net,web3” –networkid 100000 console每次启动多使用此命令。

这时进入geth命令,可以创建账户转账查余额等操作

//查看coinbase账户余额baseAccount = eth.accounts[0]num = eth.getBalance(baseAccount)//换算单位为ethweb3.fromWei(num)//新建一个账户personal.newAccount(“account”)//查看新建账户的余额account1 = eth.accounts[1]eth.getBalance(account1)//从coinbase账户转给account1账户1ethpersonal.unlockAccount(baseAccount ,”coinbase”)eth.sendTransaction({from:baseAccount,to:account1,value:web3.toWei(1,”ether”)})//观察挖矿shell的输出,等待交易打包完成后,查询account1的账户余额eth.getBalance(account1)

此时账户余额为零,因为我们的交易还未成功写进区块,写进区块的方式是挖矿,我们再次开启挖矿,挖矿成功后,并停止挖矿,再次查看余额。到这里就完成了私有链创建,账户创建,挖矿,转账等操作。私有链用于调试,在私有链上挖矿是没有意义的。

 

参考文献:

https://www.cnblogs.com/beyang/p/8469227.html

https://blog.csdn.net/wo541075754/article/details/78926177

https://blog.csdn.net/koastal/article/details/78737543

 

 

 

 

以太坊星火节点 CentOS 接入

转自:https://ethfans.org/wikis/%E6%98%9F%E7%81%AB%E8%8A%82%E7%82%B9%E8%AE%A1%E5%88%92-CentOS-%E6%8E%A5%E5%85%A5%E6%96%87%E6%A1%A3

https://ethfans.org/wikis/%E6%98%9F%E7%81%AB%E8%8A%82%E7%82%B9%E8%AE%A1%E5%88%92-Ubuntu-%E6%8E%A5%E5%85%A5%E6%96%87%E6%A1%A3

 

CentOS 的安装和部署流程跟 Ubuntu 的主要区别只有四点,其他部分均可以直接参考 Ubuntu 接入文档,所以这里只对这四点作出解释。

  • 创建 deploy 用户
  • 安装系统依赖库
  • 安装 geth
  • 安装 NodeJS

创建 deploy 用户

# 首次以 root 用户登陆后,先安装 sudo

yum -y install sudo

 

# 创建 deploy 用户

adduser deploy

 

# 设置 deploy 用户密码

passwd deploy

 

# deploy 用户添加 sudo 权限

echo “deploy    ALL=(ALL) ALL” >> /etc/sudoers

 

# 切换到 deploy 用户并完成后续安装

su – deploy

安装系统依赖库

# 安装编译工具和库

sudo yum -y groupinstall ‘Development Tools’

 

# 安装和设置 ntp 时间校准服务

sudo yum -y install ntp

sudo systemctl enable ntpd.service

sudo systemctl start ntpd.service

安装 geth

官方并不直接提供 geth 的 RPM 安装包或者 Yum 安装源,但是提供了交叉编译的 geth 二进制包以供下载安装

# 下载并解压缩 geth: http://ethfans.org/wikis/Ethereum-Geth-国内镜像下载

tar zxvf geth-linux-amd64-{version}.tar.gz

cd geth-linux-amd64-{version}

 

# 移动 geth /usr/bin 目录下

sudo mv geth /usr/bin/

安装 NodeJS

yum install -y nodejs

安装 ethstats-client 项目

ethstats-client 项目用于实时抓取本地 geth 节点数据,然后通过 WebSocket 提交给节点状态信息统计网站。

克隆项目到 ~/ethstats-client 目录下,然后安装项目的 NPM 依赖包。

cd ~ git clone https://github.com/cubedro/eth-net-intelligence-api ethstats-clientcd ethstats-clientnpm install

安装后台进程管理工具

以太坊全数据节点服务器需要确保 geth 和 ethstats-client 一直在后台运行,所以需要一个后台进程管理工具 这里使用官方推荐的 PM2 作为后台进程管理工具

npm install -g pm2

下载 pm2 配置文件 processes.json

cd ~curl -O https://gist.githubusercontent.com/lgn21st/530faf0f9f31febc6ec5c4e3f0301dca/raw/92558a5bc42d1b4fab1b12690f4184ce480f01f4/processes.json

编辑 processes.json 文件并修改 INSTANCE_NAME 和 CONTACT_DETAILS 信息,改成你的自定义节点名称和联系方式,例如:

[  {    “name”              : “geth”,    “cwd”               : “/usr/bin/”,       “script”            : “geth”,    “args”              : “–rpc –fast –maxpeers 100 –cache 512”,    “log_date_format”   : “YYYY-MM-DD HH:mm Z”,    “merge_logs”        : false,    “watch”             : false,    “max_restarts”      : 10,    “exec_interpreter”  : “none”,    “exec_mode”         : “fork_mode”  },  {    “name”              : “ethstats-client”,    “cwd”               : “/home/deploy/ethstats-client/”,    “script”            : “app.js”,    “log_date_format”   : “YYYY-MM-DD HH:mm Z”,    “log_file”          : “/home/deploy/ethstats-client/logs/node-app-log.log”,    “out_file”          : “/home/deploy/ethstats-client/logs/node-app-out.log”,    “error_file”        : “/home/deploy/ethstats-client/logs/node-app-err.log”,    “merge_logs”        : true,    “watch”             : false,    “max_restarts”      : 10,    “exec_interpreter”  : “node”,    “exec_mode”         : “fork_mode”,    “env”:    {      “NODE_ENV”        : “production”,      “RPC_HOST”        : “localhost”,      “RPC_PORT”        : “8545”,      “LISTENING_PORT”  : “30303”,      “INSTANCE_NAME”   : “”, //<-双引号内填写您的节点名称信息。如果是公司运行的节点,建议写成【贵公司网站】+公司名字,如果是个人运行的节点,建议写成【Ethfans】+个人名字。      “CONTACT_DETAILS” : “”, //<-双引号内填写您的联系信息,如网址或邮箱地址。      “WS_SERVER”       : “wss://stats.ethfans.org”,      “WS_SECRET”       : “ethfans4you”,      “VERBOSITY”       : 2    }  }]

提交您的节点信息星火节点计划

关于如何提交,请访问 星火节点计划超级节点列表

3. 日常运

pm2 进程管理工具的常用命令

# 载入配置文件并启动后台进程pm2 start processes.json # 停止后台进程pm2 kill # 查看应用的进程状态以pm2 status # 实时日志输出pm2 logs gethpm2 logs ethstats-client

以太坊客户端 geth 版本升级

# 当新版本的 geth 发布后,可以直接通过 apt 升级,升级完成后需要重启 geth 后台服务进程sudo apt-get updatesudo apt-get upgrade

关于节点信息显示不完整的问题

我们发现如果使用 geth 1.4 之后的版本,在节点状态信息统计网站上显示信息不完整的情况,具体表现为鼠标悬停在节点名称上,相关的节点信息悬浮窗口不会自动弹出,这是由于当前节点内帐号为空导致 JavaScript 执行错误,官方可能会在之后的版本中解决这个问题,但是目前临时的解决方案很简单方便,只需创建一个空帐号即可。

# 创建过程中会要求输入两次密码,可填写任意密码geth account new

关于如何加速同步

节点建立后,首次初始化将会非常耗费时间,为了能尽快同步区块链数据,以太坊爱好者社区很早就启动了一个 长期节点 计划,通过下载静态节点配置文件 static-nodes.json 并放置到 ~/.ethereum 目录下,然后重启 geth 即可,星火节点计划官方会不定期修改静态节点文件。

# 下载静态节点配置文件 static-nodes.jsoncurl -O https://gist.githubusercontent.com/lgn21st/9e7ef6b9dc9a9b45b700e72a6ce49b91/raw/59e343d9f32d1313edd369b3113d9ea677d2ed0a/static-nodes.json # 移动 static-nodes.json 到指定目录下mv static-nodes.json ~/.ethereum # 重启 geth 后台服务进程pm2 restart geth

 

Mining Monero On Centos

建站的时候租用了一台服务器,centos的操作系统。上面只放了两个网站, CPU使用率基本为零,偶尔操作的时候也只有40%,闲置是极大的浪费。资料说门罗币(monero)可以用cpu挖,那我也挖门罗币吧。
首先要有钱包地址,本地钱包或者在线钱包,我选择了后者,所以前者就不多说了。在线平台有很多,网友说这家还不错https://mymonero.com/,https://hitbtc.com/,注册账号之后在account里找到钱包地址,下面会用到。
接下来要在服务器上安装挖矿工具,以下命令复制粘贴。安装运行需要的程序后,从fireice-uk获取源码并编译安装,此处可以修改捐赠比例,默认是2%。
sudo yum –y install centos-release-scl epel-release
sudo yum –y install cmake3 devtoolset-7-gcc* hwloc-devel libmicrohttpd-devel openssl-devel make
sudo scl enable devtoolset-7 bash
git clone https://github.com/fireice-uk/xmr-stak.git
mkdir xmr-stak/build
cd xmr-stak/build
cmake3 -DCMAKE_LINK_STATIC=ON -DXMR-STAK_COMPILE=generic -DCUDA_ENABLE=OFF -DOpenCL_ENABLE=OFF ..
make install
然而总是在42%时出错,在网上找到了这个办法。清除后重新编译。
mkdir xmr-stak/build
cd xmr-stak/build
export CFLAGS=”-O2 -march=native -msse3 -fomit-frame-pointer -pipe”
export CHOST=”x86_64-pc-linux-gnu”
export CXXFLAGS=”${CFLAGS}”
export LDFLAGS=”-Wl,-O1″
cmake3 .. -DCUDA_ENABLE=OFF -DOpenCL_ENABLE=OFF
make -j 8

安装成功之后输入/usr/local/bin/cmake –version视版本略有不同,然后修改config.txt文件,在文件中的109行-111行填入矿池钱包地址等信息,端口3333难度较低笔记本也可以算,5555,7777和9999要求都较高,文件中有照着来就可以。将26行左右cpu_threads_conf及随后中括号前的星号删除。
输入接下来的命令就可以挖矿了。
cd /root/xmr-stak/bin/
./xmr-stak-cpu
但是有时会遇到MEMORY ALLOC FAILED: mmap failed的问题,停止运行后进入/etc/security/limits.conf,在文末添加
* soft memlock 262144
* hard memlock 262144
保存并退出,输入下面的命令后重启。一般这样问题就解决了。
sysctl -w vm.nr_hugepages=128
参考文献:
https://maijiaoben.com/centos-monero.html

使用闲置服务器的CPU算力挖掘Monero

The followings are just for practicing English.
An ECS with centos installed was rent while creating websites. Only two websites are running on it. CPU usage usually is 0%, and sometimes 40% at most. Leaving it unused is a waste. It is sad that CPU calculation can obtain Monero, so let’s do it.
The first step is to get a wallet address. There are two methods to get it, local registration and online registration. The later one is chosen, so the previous one is neglected here. There are many platforms, such as https://mymonero.com/, and https://hitbtc.com/. After registration, you need to copy the address, which will be used later.
Copy and paste the commands to install mining tool on the server:
yum install centos-release-scl cmake3 hwloc-devel libmicrohttpd-devel openssl-devel
yum install devtoolset-4-gcc*
scl enable devtoolset-4 bash
git clone https://github.com/fireice-uk/xmr-stak-cpu xmr-stak
cd xmr-stak
cmake3 .
make install
After installation, check the version by input: /usr/local/bin/cmake –version. The commands may vary depending on the versions. Then, modify “config.txt”, input your pool address, your wallet address, and your password ID based on the examples at line 109-111, and delete the “*” at line 25-29 approximately. Port 3333 is the easiest for low & mid-range CPUs.
After these, you can start mining by input the following commands:
cd /root/xmr-stak/bin/
./xmr-stak-cpu
However, I met the problem of MEMORY ALLOC FAILED: mmap failed. The solution is to stop mining and enter /etc/security/limits.conf, add the following lines in the end of the file:
* soft memlock 262144
* hard memlock 262144
Save and exit the file and input “sysctl -w vm.nr_hugepages=128”. Then, reboot.

Multi-sites and 404 of Permalinks issue

Initially, I was going to write something about using Modelica for simulations of HVAC systems. But I have been assigned to other tasks. So, I just share some experience about connecting one IP with multi-domains here.

My server has been running a website already when the server is set for running multi-sites, the original files are reordered for a clean view. I used the method for adding the site to Nginx firstly. But it did not work. While searching for solutions, I realized that my server is mainly using Apache, the easier way was to set up virtual hosts. So the codes added to /etc/httpd/conf/http.conf are as follows:

  1. <VirtualHost *:80>
  2.   ServerAdmin example@example.com
  3.   DocumentRoot /var/www/html/a
  4.   ServerName www.a.com
  5.   ErrorLog /var/log/httpd/a/error_log
  6.   CustomLog /var/log/httpd/a/access_log common
  7. </VirtualHost>
  8. <VirtualHost *:80>
  9.   ServerAdmin example@example.com
  10.   DocumentRoot /var/www/html/b
  11.   ServerName www.b.com
  12.   ErrorLog /var/log/httpd/b/error_log
  13.   CustomLog /var/log/httpd/b/access_log common
  14. </VirtualHost>

Then, I created another database for the other site. Unfortunately, my own site failed to work while opening blogs. The blogs can only work when permalinks are default. Then, the problem is solved by enabling the “mod_rewrite” for the folders that contain WordPress files thanks to Don.

Adding the following codes to /etc/httpd/conf/http.conf.

  1. <Directory “/var/www/html/***”>
  2.   Order allow,deny
  3.   Allow from all
  4.   AllowOverride All
  5. </Directory>

Updated on 7.26.2018

Also, the folders for error logs need to be created.

Bibliography:

https://yq.aliyun.com/ziliao/48568

https://community.rackspace.com/products/f/48/t/3180

SetUpDesktopOnSever

SetUpDesktopOnSever

Recently, I tried to download recovery image for my Surface Pro 3 from the official website.  The established downloading link lost connection after a few minutes and the download speed was within 10 kb/s.

Since the server rent from Aliyun is overseas, I considered to download it from the server. There are two options considered, usage of a text web browser or installation of a desktop.

Firstly, I used w3m and elinks. Both returned the result of ‘JavaScript is disabled’. The text web browsers do not support JavaScript. There are extensions of w3m-js, but I did not find available links.

 

yum install w3m w3m-img –y(rpm -e w3m w3m-img)

yum install links

yum install elinks(rpm -e links elinks);

 

Then, I have to install a desktop for it. Install X Windows and Desktop. After the work is done, I want to have it uninstalled.

 

yum -y groupinstall “X Window System” “MATE Desktop”

systemctl set-default graphical.target

reboot

yum -y groupremove “X Window System” “MATE Desktop”

systemctl set-default multi-user.target

reboot