Comments (5)
In the FAQ we answer the following questions:
-
The ids above are local how to map them to global Ids is described here
-
How to use the camera pose to render 3D labels is answered here
-
And last but not least: how to get 2D bounding boxes
These should answer the questions; closing this thread.
from 3rscan.
Hi @bayraktare,
we used an over-segmentation (Efficient Graph-Based Image Segmentation) of the scans when annotating our 3D models; an instance consists of multiple segments. This over-segmentation is also used in ScanNet (see here). If you want to read it you will also need mesh.refined.0.010000.segs.json
(which corresponds to <scanId>_vh_clean_2.0.010000.segs.json
in ScanNet). If you simply want to read the instance segmentation I recommend only reading the label
and objectId
of the segGroups
in *semseg.json
it maps to the objectId
in labels.instances.annotated.ply
.
from 3rscan.
Thanks for replying @WaldJohannaU
Just a little and quick question on your answer:
May you also write the corresponding files to;
<scanId>__vh_clean.aggregation.json
(I think this is*semseg.json
, isn't it?)- to obtain
labels.instances.annotated.xyz
fromlabels.instances.annotated.ply
what should I do?
In summary I am trying to obtain ground-truths from your dataset.
The workflow of my code as follows:
for the scene read the mesh.refined.0.010000.segs.json, labels.instances.annotated.xyz and *semseg.json
from *semseg.json get objectId and labels according to the segments and append them
read poses per frame and take the inverse. find the boolean array which is true for the points which are behind the camera and normalize the homogeneous point: [x y 1]
get the points related to the objects, from the indices and obtain bounding boxes
eliminate the bounding-boxes if it is outside the image or if the object is small on the image: (if x1<0 or y1<0 or wi<x2 or hi<y2 or int(x2-x1)<5 or int(y2-y1)<5: continue)
Until 5th step, I get many outputs but if I apply the 5 then most of them are removed. So it does not generate any outputs for most of the scenes. Even if it generates the results, it has only several lines for the whole sequence. When I check the values before 5th step, I see negative values, or very big values for bounding-boxes. Where is the error here, can you see it? Or do you have any better idea to etrieve the ground-truths for object ids, labels and bounding-boxes?
Thanks for your time and this great work.
from 3rscan.
Yes, the corresponding file to semseg.json
is <scanId>__vh_clean.aggregation.json
in ScanNet.
_vh_clean_2.labels.ply
and labels.instances.annotated.ply
store slightly different data, to get the semantic labels I recommend you first read labels.instances.annotated.ply
- you could easily do this f.e. in python:
file = open('labels.instances.annotated.ply', 'rb')
plydata = PlyData.read(file)
labels = plydata['vertex']['objectId']
objectId
gives you an instance ID per vertex (usually a low number f.e. 34 or 42); the ID is scene specific (so 1 could be a chair in one scene but a table in another).
The ID corresponds to objectId
in semseg.json
; there you also have the class label mapping; that means you can map objectId
42 to class label box in this particular scene:
"segGroups": [
{
"id": 42,
"objectId": 42,
"label": "box",
...
{
"id": 34,
"objectId": 34,
"label": "chair",
with open('semseg.json', 'r') as read_file:
data = json.load(read_file)
for segGroups in data['segGroups']:
print(segGroups["objectId"], segGroups["label"])
Since we have 534 unique class labels, we released a class mapping to NYU40 / Eigen (chair is class 5, same as armchair and dining chair):
https://github.com/WaldJohannaU/3RScan/blob/master/data/mapping.txt
I'm not sure what you are trying to do exactly but if you want to get 2D bounding boxes you could render the objectID using OpenGL (which would replace the second half of your 3. step) and do the above mapping in 2D.
Please note, you don't need to read the mesh.refined.0.010000.segs.json
.
I hope that helps.
from 3rscan.
Hi @WaldJohannaU Thank you very much for your detailed explanation.
Yes, I am trying to obtain ground-truths for performance evaluation of my algorithm. I have got it for ScanNet but not for your dataset yet, unfortunately.
For example a line of the file I try to get should look like this:
/path/... objectID ObjectClass Occlusion x1 y1 x2 y2
I am giving my whole code to here and if you may find some time and answer me what are the mistakes, I will appreciate. Then, maybe we can put it in your repo showing how to generate ground-truths for others would be good as well.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Tue Jan 21 18:11:45 2020
@author: bayraktare
"""
import json, glob, csv, sys,os, argparse, meshio
import numpy as np
def get_intrinsic_color(fn):
k = open(fn, 'r')
kk = k.readlines()
# print(kk)
K = []
for i in range(len(kk)):
K.append(kk[i].split(' '))
mt = np.asarray(K[7][2:-1], dtype='float')
mat = np.reshape(mt,(4,4))
wi = int(K[2][2][:-1]) # colorwidth
hi = int(K[3][2][:-1]) # colorheight
return (mat,wi,hi)
def get_pose(fn):
return np.loadtxt(open(fn, "rb"), delimiter=" ")
def get_full_pc(fn):
#return np.loadtxt(open(fn, "rb"), delimiter=" ")
return np.genfromtxt(open(fn, "rb"), delimiter=" ")
def frame_num_from_name(filename):# when name is .../frame-000000.pose.txt
return int(filename.split('/')[-1].split('-')[1].split('.')[0])
def getOcclusion(camref, behind, full_pc2d, pc3di, bbx):
(oid,l,x1,y1,x2,y2)=bbx
inter=5 # resolution of the grid
d=0.1# distance beyond we consider that a point does not belong anymore to an object
xs=range(int(x1),int(x2),int((x2-x1)/inter))
ys=range(int(y1),int(y2),int((y2-y1)/inter))
occlus_count=0
obj_idx=np.zeros((1, len(behind)), dtype=bool)
obj_idx[[0],[pc3di]]=True
oclus=0
outsideobject=0
for i in range(len(xs)-1):
for j in range(len(ys)-1):
all_idx=np.logical_and( xs[i]<full_pc2d[0] , full_pc2d[0]<xs[i+1])
all_idx=np.logical_and(all_idx, ys[j]<full_pc2d[1])
all_idx=np.logical_and( all_idx, full_pc2d[1]<ys[j+1])
all_idx=np.logical_and(all_idx, np.logical_not(behind) ) # Collecting point cloud inside this rectangle and not begind the camera
o_idx=np.logical_and( all_idx, obj_idx) # selecting the 2D points which are inside the rectangle and belong to the object
all_idx=np.logical_and(all_idx,np.logical_not(obj_idx))# removing the points linked to the object
if not o_idx.any() or not all_idx.any():
if not o_idx.any():
outsideobject+=1 # this part of the bounding box does not contain object points, so we don't count it
continue # no object points for this rectangle, or no point who does not belong to the object
o_depth=np.min(np.linalg.norm(camref[0:3,o_idx[0,:]],axis=0))/3 #mean depth of the object for this rectangle
if sum((o_depth-np.linalg.norm(camref[0:3,all_idx[0,:]],axis=0)/3)>d)>0:# if any point not from the object is in front (i.e. has a lower depth) of the object and make an occlusion ...
oclus+=1 # ... we count this part as occluded
if ((len(xs)-1)*(len(ys)-1)-outsideobject)==0:
print('something is wrong: no points projected on the bbx')
return 1
return float(oclus)/((len(xs)-1)*(len(ys)-1)-outsideobject)
# main code
input_sequences = glob.glob('/home/.../3rscan/sequence/*')
scene_list = [i.split('/')[-1] for i in input_sequences]
datadir = '/home/.../3rscan/sequence'
outdir = '/home/.../3rscan/2dgtwithbboxes'
# CONVERT PLY TO XYZ
for ply in input_ply:
if os.path.isfile(ply.split('ply')[0]+'xyz'):
print('file exists, skipping', ply.split('ply')[0]+'xyz')
continue
d = meshio.read(ply)
np.savetxt(ply.split('ply')[0]+'xyz', d.points, fmt='%1.6f')
c = 0
for scene in scene_list:
if os.path.isfile(outdir+'/'+scene+'.2dgt'): #seq.split('/')[-1]+'.2dgt'):
print('file exists, skipping', scene) #seq.split('/')[-1])
continue
# read *.semseg.json
ag_f = datadir+'/'+scene+'/semseg.json' # seq+'/semseg.json'
if not os.path.isfile(ag_f):
print('no *semseg.json file found for the scene', ag_f)
continue
fp = open(ag_f)
aggreg = json.load(fp)
fp.close()
objs = {} # will contain the objects for this scene
seg_f = datadir+'/'+scene+'/mesh.refined.0.010000.segs.json' # seq+'/mesh.refined.0.010000.segs.json'
if not os.path.isfile(seg_f):
print('no mesh.refined.0.010000.segs.json file found for the scene', seg_f)
continue
f = open(seg_f)
segs = json.load(f)
f.close
# read *.xyz and append them into 3D point cloud
xyz_f = datadir+'/'+scene+'/labels.instances.annotated.xyz' # seq+'/labels.instances.annotated.xyz'
if not os.path.isfile(xyz_f):
print('no XYZ file found for the scene', xyz_f)
continue
pc3d=get_full_pc(xyz_f) #getting the full 3D point cloud
for po in aggreg['segGroups']: # for each object in the json file
objs[po['objectId']]=[ po['label']]# {0: 'window'}
pc3di=[]
for segid in po['segments']:#for each segment of the object o
pc3di+=[x for (x,y) in enumerate(segs['segIndices']) if y==segid ] # collecting the 3D points associated to the segment segid
objs[po['objectId']].append(pc3di)
print('Loading 3D point cloud done, number of objects:', len(objs.keys()))
(m_calibrationColorIntrinsic,wi,hi)=get_intrinsic_color(datadir+'/'+scene+'/sequence/_info.txt') # seq+'/sequence/_info.txt')#getting intrinsic parameter
print('done1')
obj_by_img={}# dictionnary that will contain the bounding boxes of the objects appearing in each image.
for pose in glob.glob(datadir+'/'+scene+'/sequence/*pose.txt'):
cam2world=get_pose(pose)# getting the inverse of the extrinsic parameters from the frame_0XXXX.pose.txt
if np.logical_not(np.isfinite(cam2world)).any() or np.isnan(cam2world).any() or cam2world.shape[0]==0:
print('erroneous camera value, skipping', cam2world)
continue #the values of the camera pose are wrong, so we skip
world2cam=np.linalg.inv(cam2world)#getting the actual parameters
obj_by_img[frame_num_from_name(pose)]=[pose.split('/')[-1].split('.')[0],[]] # will contain a list with two elements the name of the frame and the list of objects bbxs
camref=np.dot(world2cam,np.vstack((pc3d.transpose(),np.ones((1,pc3d.shape[0])))))
behind=camref[2]<=0 # boolean array which is true for the points which are behind the camera
full_pc2d=np.dot(m_calibrationColorIntrinsic, camref)
full_pc2d=np.divide(full_pc2d,np.tile(full_pc2d[2],(4,1))) #normalising the homogeneous point: [x y 1]
for oid, (l, pc3di) in objs.items():
if behind[pc3di].any():# if any of the object point is behind the camera, we skip
continue
rows=np.array( [len(pc3di)*[0], len(pc3di)*[1]] )
cols=np.array([pc3di,pc3di])
pc2d=full_pc2d[rows,cols] # get the points related to the objects, from the indices
(x1,y1,x2,y2)=(min(pc2d[0]),min(pc2d[1]),max(pc2d[0]),max(pc2d[1])) # getting the bounding box coordinate
if x1<0 or y1<0 or wi<x2 or hi<y2 or int(x2-x1)<5 or int(y2-y1)<5: # I don't keep it if the bounding box is outside the image or if the object is small on the image
continue
o=getOcclusion(camref, behind, full_pc2d, pc3di, (oid,l,x1,y1,x2,y2))
obj_by_img[frame_num_from_name(pose)][1].append([oid,l,o,x1,y1,x2,y2]) #note here I am saving only the bounding box coordinate. You might want to add the full 2D point cloud
print('Sampiyon Besiktas', c)
c += 1
# import pdb; pdb.set_trace()
fw=open(outdir+'/'+scene+'.2dgt','w')# writting the results
for (fnum,v) in obj_by_img.items():
if len(v)==1:
continue
fn = v[0]
for detection in v[1]:
# detection[1] = detection[1].encode('utf8')
# fw.write(' '.join([fn]+[str(x).replace(' ','_') for x in detection ])+'\n')
print('%s %d %s %1.3f %1.6f %1.6f %1.6f %1.6f' % ('frame-'+'%0.6d'% (fnum), int(detection[0]), str(detection[1]), float(detection[2]), float(detection[3]),
float(detection[4]), float(detection[5]), float(detection[6])),file=fw)
from 3rscan.
Related Issues (14)
- Pose Matrix per frame HOT 1
- Is your data from Scannet? HOT 1
- How can I get the dataset? HOT 2
- Inconsistent & unaligned annotations HOT 6
- Find the reference_id of a large scene previewed in the paper HOT 4
- How to load texture file mtl into python? HOT 1
- Reading OBB info from semseg files HOT 1
- I cannot access the script downloading webpage... HOT 13
- Question on using 3RScan dataset HOT 5
- renderer.cc:(.text+0xc9e): undefined reference to `__glewBindFramebuffer' HOT 2
- Missing data
- Enquires about room labels
- Storage of this dataset
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from 3rscan.