[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] GeoMesa range query performance

Hi Emilio,

Thanks for your enthusiasm. I did not useÂgeotools API programmatically. Instead, I use the GeoMesa-Accumulo command lines tool to submit a query. In particular, a query looks like this:

geomesa-accumulo export -u root -p *password* -c *dataset* -f *data_model* -q bbox(geom,x1,y1,x2,y2) -F csv

How could I check that my data is distributed across cluster? I store them by Accumulo with HDFS as the file system.

Thanks,

Tin

On Mon, May 20, 2019 at 6:48 AM Emilio Lahr-Vivaz <elahrvivaz@xxxxxxxx> wrote:
Are you using the geotools API programmatically then? There are a lot of things that can affect the query performance, a few things I would look at:

* Check if you data is distributed across the cluster. By default, GeoMesa will create 4 splits on ingestion. If your data doesn't reach the split threshold, then you will only be querying 4 regions on at most 4 servers.
* Check that client can handle the number of threads being used. GeoMesa spawns multiple client threads per query (based on the data store configuration), so by default you'd be running 8 threads per query.
* Try to determine the bottleneck - you may be saturating your network, or your client may not be reading results as fast as they are being delivered.

I'm not familiar with how SpatialHadoop works, so those things may or may not be affecting it as well.

At any rate, I don't think anyone has compared the two before. I'd be interested to see some more detailed results (code samples, timings, etc), if you'd share them.

Thanks,

Emilio

On 5/20/19 9:10 AM, Tin Vu wrote:
I used concurrent threads. 1 thread for 1 query.

On Mon, May 20, 2019, 6:00 AM Emilio Lahr-Vivaz <elahrvivaz@xxxxxxxx> wrote:
Hello,

How are you submitting queries to GeoMesa?

Thanks,

Emilio

On 5/19/19 3:25 PM, Tin Vu wrote:
Hi Emilio,

Thanks for your response. I executed my experiments as follows:
1. Cluster: 1 master node, 12 slave nodes, 64 GB memory in each node.
2. Dataset: Open street map All Nodes (size 96 GB, 2.7 Billion records).
3. Queries: I created 10 batches of queries with different size (for example, query area / whole space area = 10^-12, 10^-11,...., 10^-2). Each batch contains 100 square query in the same size. Those query is randomly distributed in the whole space.
4. I submit those batches of queries to SpatialHadoop and GeoMesa, wait until they finish then count the running time.

Thanks,

Tin


On Thu, May 16, 2019 at 2:16 PM Emilio Lahr-Vivaz <elahrvivaz@xxxxxxxx> wrote:
Hello,

Could you say more about how you're querying? SpatialHadoop uses map/reduce jobs, if I understand - it seems like there would be a lot of overhead to spin up the job. How long are your queries taking? How big is your cluster?

Thanks,

Emilio

On 5/16/19 3:20 PM, Tin Vu wrote:
Hi all,

I just wanted to to ask you a question about the performance of GeoMesa range query. This is my experimental set up:
1. Systems: GeoMesa on Accumulo, SpatialHadoop (http://spatialhadoop.cs.umn.edu/)
2. Dataset: All node dataset fromÂhttp://spatialhadoop.cs.umn.edu/datasets.html, with 96GB and 2.7 billions points.
3. Query: range query with different selectivity: 10^-12, 10^-11, 10^-10, which is the ratio of query range and total area of the dataset space.

I saw that GeoMesa does not work better than SpatialHadoop, which is not expected. Since I think that GeoMesa (organize data in record-level) should be better than SpatialHadoop (organize data in block-level) in highly selective queries. Could you tell me any idea to tune GeoMesa such that it can provide a better performance?

Thanks,

Tin

_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users

_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users

_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users

_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users

_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users