Metadata-Version: 1.1
Name: PyUniProt
Version: 0.0.10
Summary: Importing and querying UniProt
Home-page: https://github.com/cebel/pyuniprot
Author: Christian Ebeling
Author-email: christian.ebeling@scai.fraunhofer.de
License: Apache 2.0 License
Description: PyUniProt |stable_build|
        ========================
        
        |project_logo_large|
        
        |stable_documentation| |pypi_license|
        
        `PyUniProt <http://pyUniProt.readthedocs.io>`_ is a Python package
        to access and query UniProt data provided by the European Bioinformatics Institute (EMBL-EBI),
        the SIB Swiss Institute of Bioinformatics and the Protein Information Resource (PIR).
        
        Data are installed in a (local or remote) RDBMS enabling bioinformatic algorithms very fast response times
        to sophisticated queries and high flexibility by using SOLAlchemy database layer.
        PyUniProt is developed by the
        `Department of Bioinformatics <https://www.scai.fraunhofer.de/en/business-research-areas/bioinformatics.html>`_
        at the Fraunhofer Institute for Algorithms and Scientific Computing
        `SCAI <https://www.scai.fraunhofer.de/en.html>`_
        For more in for information about pyUniProt go to
        `the documentation <http://pyUniProt.readthedocs.io>`_.
        
        |er_model|
        
        This development is supported by following `IMI <https://www.imi.europa.eu/>`_ projects:
        
        - `AETIONOMY <http://www.aetionomy.eu/>`_ and
        - `PHAGO <http://www.phago.eu/>`_.
        
        |imi_logo| |aetionomy_logo| |phago_logo| |scai_logo|
        
        Supported databases
        -------------------
        
        `PyUniProt` uses `SQLAlchemy <http://sqlalchemy.readthedocs.io>`_ to cover a wide spectrum of RDMSs
        (Relational database management system). For best performance MySQL or MariaDB is recommended. But if you have no
        possibility to install software on your system SQLite - which needs no further
        installation - also works. Following RDMSs are supported (by SQLAlchemy):
        
        1. Firebird
        2. Microsoft SQL Server
        3. MySQL / `MariaDB <https://mariadb.org/>`_
        4. Oracle
        5. PostgreSQL
        6. SQLite
        7. Sybase
        
        Getting Started
        ---------------
        This is a quick start tutorial for impatient.
        
        Installation |pypi_version| |python_versions|
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        PyUniProt can be installed with `pip <https://pip.pypa.io/en/stable/>`_.
        
        .. code-block:: bash
        
            pip install pyuniprot
        
        If you fail because you have no rights to install use superuser (sudo on Linux before the commend) or ...
        
        .. code-block:: bash
        
            pip install --user pyuniprot
        
        If you want to make sure you are installing this under python3 use ...
        
        .. code-block:: bash
        
            python3 -m pip install pyuniprot
        
        SQLite
        ~~~~~~
        .. note:: If you want to use SQLite as your database system, because you ...
        
            - have no possibility to use RDMSs like MySQL/MariaDB
            - just test PyUniProt, but don't want to spend time in setting up a database
        
            skip the next *MySQL/MariaDB setup* section. But in general we strongly recommend MySQL or MariaDB as your
            relational database management system.
        
        If you don't know what all that means skip the section *MySQL/MariaDB setup*.
        
        Don't worry! You can always later change the configuration. For more information about
        changing database system later go to the subtitle *Changing database configuration*
        `Changing database configuration <http://pyuniport.readthedocs.io/en/latest/installation.html>`_
        in the documentation on readthedocs.
        
        MySQL/MariaDB setup
        ~~~~~~~~~~~~~~~~~~~
        Log in MySQL as root user and create a new database, create a user, assign the rights and flush privileges.
        
        .. code-block:: mysql
        
            CREATE DATABASE pyuniprot CHARACTER SET utf8 COLLATE utf8_general_ci;
            GRANT ALL PRIVILEGES ON pyuniprot.* TO 'pyuniprot_user'@'%' IDENTIFIED BY 'pyuniprot_passwd';
            FLUSH PRIVILEGES;
        
        There are two options to set the MySQL/MariaDB.
        
        1. The simplest is to start the command line tool
        
        .. code-block:: sh
        
            pyuniprot mysql
        
        You will be guided with input prompts. Accept the default value in squared brackets with RETURN. You will see
        something like this
        
        .. code-block:: sh
        
            server name/ IP address database is hosted [localhost]:
            MySQL/MariaDB user [pyuniprot_user]:
            MySQL/MariaDB password [pyuniprot_passwd]:
            database name [pyuniprot]:
            character set [utf8]:
        
        Connection will be tested and in case of success return `Connection was successful`.
        Otherwise you will see following hinte
        
        .. code-block:: sh
        
            Test was NOT successful
        
            Please use one of the following connection schemas
            MySQL/MariaDB (strongly recommended):
                    mysql+pymysql://user:passwd@localhost/database?charset=utf8
        
            PostgreSQL:
                    postgresql://user:passwd@localhost/database
        
            MsSQL (pyodbc needed):
                    mssql+pyodbc://user:passwd@database
        
            SQLite (always works):
        
            - Linux:
                    sqlite:////absolute/path/to/database.db
        
            - Windows:
                    sqlite:///C:\absolute\path\to\database.db
        
            Oracle:
                    oracle://user:passwd@localhost:1521/database
        
        2. The second option is to start a python shell and set the MySQL configuration.
        If you have not changed anything in the SQL statements above ...
        
        .. code-block:: python
        
            import pyuniprot
            pyuniprot.set_mysql_connection()
        
        If you have used you own settings, please adapt the following command to you requirements.
        
        .. code-block:: python
        
            import pyuniprot
            pyuniprot.set_mysql_connection(host='localhost', user='pyuniprot_user', passwd='pyuniprot_passwd', db='pyuniprot')
        
        Updating
        ~~~~~~~~
        The updating process will download the *uniprot_sprot.xml.gz* file provided by the UniProt team on their ftp server
        `download page <ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/>`_
        
        .. warning::
        
            Please note that UniProt download file needs ~700 Mb of disk space and the update takes ~2h only for
            human, mouse and rat (depending on your computer)
        
        It is strongly recommended to restrict the entries liked to specific organisms your are interested in by parsing a list
        of NCBI Taxonomy IDs to the parameter `taxids`. To identify correct NCBI Taxonomy IDs please go to
        `NCBI Taxonomy web form <https://www.ncbi.nlm.nih.gov/taxonomy/>`_. In the following example we use 9606 as identifier
        for Homo sapiens, 10090 for Mus musculus and 10116 for Rattus norvegicus.
        
        There are two options to import the data:
        
        1. Command line import
        
            .. code-block:: sh
        
                pyuniprot update --taxids 9606,10090,10116
        
        2. Python
        
            .. code-block:: python
        
                import pyuniprot
                pyuniprot.update(taxids=[9606, 10090, 10116])
        
        We only recommend to import the whole UniProt dataset if you don't want to restrict your search. Import with no
        restrictions will take several hours and take a lot of disk space.
        
        If you want to load all UniProt entries in the database:
        
        .. code-block:: python
        
            import pyuniprot
            pyuniprot.update() # not recommended, please read the notes above
        
        The update uses the download file if it still exists on you system (~/.pyuniprot/data/uniprot_sprot.xml.gz). If you use
        the parameter `force_download` the current file from UniProt will be downloaded.
        
        .. code-block:: python
        
            import pyuniprot
            pyuniprot.update(force_download=True, taxids=[9606, 10090, 10116])
        
        Quick start with query functions
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        Initialize the query object
        
        .. code-block:: python
        
            query = pyuniprot.query()
        
        Get all entries
        
        .. code-block:: python
        
            all_entries = query.entry()
        
        
        Use parameters like gene_name to find specific entries
        
        .. code-block:: python
        
            >>> entry = query.entry(gene_name='YWHAE', taxid=9606, recommended_short_name='14-3-3E', name='1433E_HUMAN')[0]
            >>> entry
            14-3-3 protein epsilon
        
        Entry is the root element in the database. Form here you can reach all other data
            >>> entry.accessions
            [P62258, B3KY71, D3DTH5, P29360, P42655, Q4VJB6, Q53XZ5, Q63631, Q7M4R4]
            >>> entry.functions
            ["Adapter protein implicated in the regulation of a large spectrum of both ..."]
        
        If a parameter ends on a **s** you can search
            >>> alcohol_dehydrogenases = q.entry(ec_numbers='1.1.1.1')
            >>> [x.name for x in q.get_entry(ec_numbers='1.1.1.1')]
            ['ADHX_RAT', 'ADH1_RAT', 'ADHX_HUMAN', 'ADHX_MOUSE']
            >>> query.entry(ec_numbers=('1.1.1.1', '1.1.1.2'))
            ['Adh5', 'Adh1', 'ADH5', 'Adh5', 'Adh6', 'ADH7', 'Adh7', 'Adh7', 'Adh1']
        
        As dataframe with a limit of 10 and accession number starts with Q9 (% used as wildcard)
        
        .. code-block:: python
        
            >>> query.accession(as_df=True, limit=3, accession='Q9%')
               id accession  entry_id
            0   1    Q9CQV8         1
            1  32    Q9GIK8         6
            2  33    Q9TQB4         6
        
        Full documentation on query function you will find
        `here <http://pyuniprot.readthedocs.io/en/latest/query_functions.html>`_
        
        More information
        ----------------
        See the `installation documentation <http://pyuniprot.readthedocs.io/en/latest/installation.html>`_ for more advanced
        instructions. Also, check the change log at :code:`CHANGELOG.rst`.
        
        UniProt tools and licence (use of data)
        ---------------------------------------
        UniProt provides also many online `query interfaces <http://www.uniprot.org>`_ on their website.
        
        Please be aware of the `UniProt licence <http://www.uniprot.org/help/license>`_.
        
        Links
        -----
        Universal Protein Resource (UniProt)
        
        - `UniProt website <http://www.uniprot.org/>`_
        - `About UniProt <http://www.uniprot.org/help/about>`_
        
        PyUniProt
        
        - Documented on `Read the Docs <http://pyuniprot.readthedocs.io/>`_
        - Versioned on `GitHub <https://github.com/cebel/pyuniprot>`_
        - Tested on `Travis CI <https://travis-ci.org/cebel/pyuniprot>`_
        - Distributed by `PyPI <https://pypi.python.org/pypi/pyuniprot>`_
        - Chat on `Gitter <https://gitter.im/pyuniprot/Lobby>`_
        
        .. |stable_build| image:: https://api.travis-ci.org/cebel/pyuniprot.svg?branch=master
            :target: https://travis-ci.org/cebel/pyuniprot
            :alt: Stable Build Status
        
        .. |stable_documentation| image:: https://readthedocs.org/projects/pyuniprot/badge/?version=latest
            :target: http://pyuniprot.readthedocs.io/en/latest/
            :alt: Development Documentation Status
        
        .. |pypi_license| image:: https://img.shields.io/pypi/l/pyuniprot.svg
            :alt: Apache 2.0 License
        
        .. |python_versions| image:: https://img.shields.io/pypi/pyversions/pyuniprot.svg
            :alt: Stable Supported Python Versions
        
        .. |pypi_version| image:: https://img.shields.io/pypi/v/pyuniprot.svg
            :alt: Current version on PyPI
        
        .. |phago_logo| image:: https://raw.githubusercontent.com/cebel/pyuniprot/master/docs/source/_static/logos/phago_logo.jpeg
            :target: https://www.imi.europa.eu/content/phago
            :alt: PHAGO project logo
        
        .. |aetionomy_logo| image:: https://raw.githubusercontent.com/cebel/pyuniprot/master/docs/source/_static/logos/aetionomy_logo.png
            :target: http://www.aetionomy.eu/en/vision.html
            :alt: AETIONOMY project logo
        
        .. |imi_logo| image:: https://raw.githubusercontent.com/cebel/pyuniprot/master/docs/source/_static/logos/imi_logo.png
            :target: https://www.imi.europa.eu/
            :alt: IMI project logo
        
        .. |scai_logo| image:: https://raw.githubusercontent.com/cebel/pyuniprot/master/docs/source/_static/logos/scai_logo.png
            :target: https://www.scai.fraunhofer.de/en/business-research-areas/bioinformatics.html
            :alt: SCAI project logo
        
        .. |er_model| image:: https://owncloud.scai.fraunhofer.de/index.php/apps/files_sharing/ajax/publicpreview.php?x=1921&y=445&a=true&file=all.png&t=6d3ZxWeBX6Jndgu&scalingup=0
            :target: http://pyuniprot.readthedocs.io/en/latest/
            :alt: Entity relationship model
        
        .. |project_logo_large| image:: https://raw.githubusercontent.com/cebel/pyuniprot/master/docs/source/_static/logos/project_logo_large.png
            :alt: Project logo
Keywords: UniProt,Database,Protein
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Operating System :: OS Independent
Classifier: Environment :: Console
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Topic :: Database
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
