pyDrill-dsl¶
Pythonic DSL for Apache Drill.
Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud Storage
- Free software: MIT license
- Documentation: https://pydrill_dsl.readthedocs.org.
Features¶
- Uses Peewee syntax. examples for selecting data are in peewee docs.
- Support for all storage plugins
- Support for drivers PyODBC and pyDrill
Installation¶
Version from https://pypi.python.org/pypi/pydrill_dsl:
$ pip install pydrill_dsl
Latest version from git:
$ pip install git+git://github.com/PythonicNinja/pydrill_dsl.git
Sample usage¶
from pydrill_dsl.resource import Resource
class Employee(Resource):
first_name = Field()
salary = Field()
position_id = Field()
department_id = Field()
class Meta:
storage_plugin = 'cp'
path = 'employee.json'
# by default it uses pydrill
# example of using pydobc
# database = Drill({'dsn': 'Driver=/opt/mapr/drillodbc/lib/universal/libmaprdrillodbc.dylib;ConnectionType=Direct;Host=127.0.0.1;Port=31010;Catalog=DRILL;AuthenticationType=Basic Authentication;AdvancedProperties=CastAnyToVarchar=true;HandshakeTimeout=5;QueryTimeout=180;TimestampTZDisplayTimezone=utc;ExcludedSchemas=sys,INFORMATION_SCHEMA;NumberOfPrefetchBuffers=5;UID=[USERNAME];PWD=[PASSWORD]'})
Employee.select().filter(salary__gte=17000)
Employee.select().paginate(page=1, paginate_by=5)
salary_gte_17K = (Employee.salary >= 17000)
salary_lte_25K = (Employee.salary <= 25000)
Employee.select().where(salary_gte_17K & salary_lte_25K)
Employee.select(
fn.Min(Employee.salary).alias('salary_min'),
fn.Max(Employee.salary).alias('salary_max')
).scalar(as_tuple=True)
# creation of resource can be done without creation of class:
employee = Resource(storage_plugin='cp', path='employee.json',
fields=('first_name', 'salary', 'position_id', 'department_id'))