Skip to content

Commit

Permalink
limita o texto em 78 colunas para cada linha
Browse files Browse the repository at this point in the history
  • Loading branch information
augusto-herrmann committed Feb 20, 2021
1 parent 89c9120 commit 3f91d63
Show file tree
Hide file tree
Showing 2 changed files with 93 additions and 31 deletions.
94 changes: 71 additions & 23 deletions README.en.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,85 +4,129 @@

![pytest@docker](https://github.com/turicas/covid19-br/workflows/pytest@docker/badge.svg) ![goodtables](https://github.com/turicas/covid19-br/workflows/goodtables/badge.svg)

This repository unifies links and data about reports on the number of cases from State Health Secretariats (Secretarias Estaduais de Saúde - SES), about the cases of covid19 in Brazil (at each city, daily), amongst other data relevant for analysis, such as deaths tolls accounted for in the notary service (by state, daily).
This repository unifies links and data about reports on the number of
cases from State Health Secretariats (Secretarias Estaduais de Saúde -
SES), about the cases of covid19 in Brazil (at each city, daily),
amongst other data relevant for analysis, such as deaths tolls accounted
for in the notary service (by state, daily).

## License and Quotations

The code's license is [LGPL3](https://www.gnu.org/licenses/lgpl-3.0.en.html) and the converted data is [Creative Commons Attribution ShareAlike](https://creativecommons.org/licenses/by-sa/4.0/). In case you use the data, **mention the original data source and who treated the data** and in case you share the data, **use the same license**.
Example of how the data can be quoted:
- **Source: Secretarias de Saúde das Unidades Federativas, data treated by Álvaro Justen and a team of volunteers [Brasil.IO](https://brasil.io/)**
- **Brasil.IO: epidemiological reports of COVID-19 by city daily, available at: https://brasil.io/dataset/covid19/ (last checked in: XX of XX of XXXX, access in XX XX, XXXX).**
The code's license is
[LGPL3](https://www.gnu.org/licenses/lgpl-3.0.en.html) and the converted
data is [Creative Commons Attribution
ShareAlike](https://creativecommons.org/licenses/by-sa/4.0/). In case
you use the data, **mention the original data source and who treated the
data** and in case you share the data, **use the same license**. Example
of how the data can be quoted:
- **Source: Secretarias de Saúde das Unidades Federativas, data treated
by Álvaro Justen and a team of volunteers
[Brasil.IO](https://brasil.io/)**
- **Brasil.IO: epidemiological reports of COVID-19 by city daily,
available at: https://brasil.io/dataset/covid19/ (last checked in: XX
of XX of XXXX, access in XX XX, XXXX).**


## Data

The data, after collected and treated, stays available in 3 ways on [Brasil.IO](https://brasil.io/):

- [Web Interface](https://brasil.io/dataset/covid19) (made for humans)
- [API](https://brasil.io/api/dataset/covid19) (made for humans that develop apps) - [see available API documentation](api.md)
- [API](https://brasil.io/api/dataset/covid19) (made for humans that
develop apps) - [see available API documentation](api.md)
- [Full dataset download](https://data.brasil.io/dataset/covid19/_meta/list.html)

In case you want to access the data before they are published (ATTENTION: they may not have been checked yet), you can [access directly the sheets in which we are working on](https://drive.google.com/open?id=1l3tiwrGEcJEV3gxX0yP-VMRNaE1MLfS2).
In case you want to access the data before they are published
(ATTENTION: they may not have been checked yet), you can [access
directly the sheets in which we are working
on](https://drive.google.com/open?id=1l3tiwrGEcJEV3gxX0yP-VMRNaE1MLfS2).

If this program and/or the resulting data are useful to you or your company, **consider [donating to the project Brasil.IO](https://brasil.io/doe)**, which is maintained voluntarily.
If this program and/or the resulting data are useful to you or your
company, **consider [donating to the project
Brasil.IO](https://brasil.io/doe)**, which is maintained voluntarily.


### FAQ ABOUT THE DATA

**Before contacting us to ask questions about the data (we're quite busy), [CHECK OUR FAQ](faq.md)** (still in Portuguese).
**Before contacting us to ask questions about the data (we're quite
busy), [CHECK OUR FAQ](faq.md)** (still in Portuguese).

For more information [see the data collection methodology](https://drive.google.com/open?id=1escumcbjS8inzAKvuXOQocMcQ8ZCqbyHU5X5hFrPpn4).
For more information [see the data collection
methodology](https://drive.google.com/open?id=1escumcbjS8inzAKvuXOQocMcQ8ZCqbyHU5X5hFrPpn4).

### Analyzing the data

In case you want to analyze our data using SQL, look at the script [`analysis.sh`](analysis.sh) (it downloads and transforms CSVs to an SQLite database and create indexes and views that make the job easier) and the archives in the folder [`sql/`](sql/).
In case you want to analyze our data using SQL, look at the script
[`analysis.sh`](analysis.sh) (it downloads and transforms CSVs to an
SQLite database and create indexes and views that make the job easier)
and the archives in the folder [`sql/`](sql/).

By default, the script reuses the same files if they have already been
downloaded; in order to always download the most up-to-date version of
the data, run `./analysis.sh --clean`.

### Validating the data

The metadata are described like the *Data Package* and
*[Table Schema](https://specs.frictionlessdata.io/table-schema/#language)* standards of
*[Frictionless Data](https://frictionlessdata.io/)*. This means that the data can be automatically validated to detect, for example, if the values of a field conform with the type defined, if a date is valid, if columns are missing or if there are duplicated lines.
The metadata are described like the *Data Package* and *[Table
Schema](https://specs.frictionlessdata.io/table-schema/#language)*
standards of *[Frictionless Data](https://frictionlessdata.io/)*. This
means that the data can be automatically validated to detect, for
example, if the values of a field conform with the type defined, if a
date is valid, if columns are missing or if there are duplicated lines.

To verify, activate the virtual Python environment and after that type:

```
goodtables data/datapackage.json
```

The report from the tool *[Good Tables](https://github.com/frictionlessdata/goodtables-py)* will indicate if there are any inconsistencies. The validation can also be done online through [Goodtables.io](http://goodtables.io/).
The report from the tool *[Good
Tables](https://github.com/frictionlessdata/goodtables-py)* will
indicate if there are any inconsistencies. The validation can also be
done online through [Goodtables.io](http://goodtables.io/).

## Contributing

You can contribute in many ways:

- Building programs (crawlers/scrapers/spiders) to extract data automatically ([READ THIS BEFORE](#criando-scrapers));
- Building programs (crawlers/scrapers/spiders) to extract data
automatically ([READ THIS BEFORE](#criando-scrapers));
- Collecting links for your state reports;
- Collecting data about cases by city daily;
- Contacting the State Secretariat from your State, suggesting the [recommendations for data release](recomendacoes.md);
- Contacting the State Secretariat from your State, suggesting the
[recommendations for data release](recomendacoes.md);
- Avoiding physical contact with humans;
- Washing your hands several times a day;
- Being solidary to the most vulnerable;

In order to volunteer, [follow these steps](CONTRIBUTING.md).

Look for your state [in this repository's issues](https://github.com/turicas/covid19-br/issues) and let's talk through there.
Look for your state [in this repository's
issues](https://github.com/turicas/covid19-br/issues) and let's talk
through there.

### Creating Scrapers

We're changing the way we upload the data to make the job easier for volunteers and to make the process more solid and reliable and, with that, it will be easier to make so that bots can also upload data; that being said, scrapers will help *a lot* in this process. However, when creating a scraper it is important that you follow a few rules:
We're changing the way we upload the data to make the job easier for
volunteers and to make the process more solid and reliable and, with
that, it will be easier to make so that bots can also upload data; that
being said, scrapers will help *a lot* in this process. However, when
creating a scraper it is important that you follow a few rules:

- It's **required** that you create it using `scrapy`;
- **Do Not** use `pandas`, `BeautifulSoup`, `requests` or other unnecessary libraries (the standard Python lib already has lots of useful libs, `scrapy` with XPath is already capable of handling most of the scraping and `rows` is already a dependency of this repository);
- **Do Not** use `pandas`, `BeautifulSoup`, `requests` or other
unnecessary libraries (the standard Python lib already has lots of
useful libs, `scrapy` with XPath is already capable of handling most
of the scraping and `rows` is already a dependency of this
repository);
- Create a file named `web/spiders/spider_xx.py`, where `xx` is the state
acronym, in lower case. Create a new class and inherit from the
`BaseCovid19Spider` class, from `base.py`. The state acronym, in two upper
case characters, must be an attribute of the class and use `self.state`.
See the examples that have already been implemented;
- There must be an easy way to make the scraper collect reports and cases for an specific date (but it should be able to identify which dates the data is available for and to capture several dates too);
- There must be an easy way to make the scraper collect reports and
cases for an specific date (but it should be able to identify which
dates the data is available for and to capture several dates too);
- The data can be read from the tallies by municipality or from individual case
microdata. In that latter case, the scraper must tally up itself the
municipal numbers;
Expand All @@ -101,7 +145,9 @@ We're changing the way we upload the data to make the job easier for volunteers
omission of the `city` parameter;
- When possible, use automated tests;

Right now we don't have much time available for reviews, so **please**, only create a pull request with code of a new scraper if you can fulfill the requirements above.
Right now we don't have much time available for reviews, so **please**,
only create a pull request with code of a new scraper if you can fulfill
the requirements above.

## Installing

Expand Down Expand Up @@ -156,7 +202,9 @@ Run the script:
`./deploy.sh`

It will collect the data from the sheets (that are linked in
`data/boletim_url.csv` and `data/caso_url.csv`), add the data to the repository, compact them, send them to the server, and execute the dataset update command.
`data/boletim_url.csv` and `data/caso_url.csv`), add the data to the
repository, compact them, send them to the server, and execute the
dataset update command.

> Note: the script that automatically downloads and converts data must
> be executed separately, with the command `./run-spiders.sh`.
30 changes: 22 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,12 @@ ShareAlike](https://creativecommons.org/licenses/by-sa/4.0/). Caso utilize os
dados, **cite a fonte original e quem tratou os dados** e caso compartilhe os
dados, **utilize a mesma licença**.
Exemplos de como os dados podem ser citados:
- **Fonte: Secretarias de Saúde das Unidades Federativas, dados tratados por Álvaro Justen e equipe de voluntários [Brasil.IO](https://brasil.io/)**
- **Brasil.IO: boletins epidemiológicos da COVID-19 por município por dia, disponível em: https://brasil.io/dataset/covid19/ (última atualização: XX de XX de XXXX, acesso em XX de XX de XXXX).**
- **Fonte: Secretarias de Saúde das Unidades Federativas, dados tratados
por Álvaro Justen e equipe de voluntários
[Brasil.IO](https://brasil.io/)**
- **Brasil.IO: boletins epidemiológicos da COVID-19 por município por
dia, disponível em: https://brasil.io/dataset/covid19/ (última
atualização: XX de XX de XXXX, acesso em XX de XX de XXXX).**


## Dados
Expand All @@ -27,7 +31,8 @@ Depois de coletados e checados os dados ficam disponíveis de 3 formas no
[Brasil.IO](https://brasil.io/):

- [Interface Web](https://brasil.io/dataset/covid19) (feita para humanos)
- [API](https://brasil.io/api/dataset/covid19) (feita para humanos que desenvolvem programas) - [veja a documentação da API](api.md)
- [API](https://brasil.io/api/dataset/covid19) (feita para humanos que
desenvolvem programas) - [veja a documentação da API](api.md)
- [Download do dataset completo](https://data.brasil.io/dataset/covid19/_meta/list.html)

Caso queira acessar os dados antes de serem publicados (ATENÇÃO: pode ser que
Expand Down Expand Up @@ -81,7 +86,8 @@ pelo site [Goodtables.io](http://goodtables.io/).

Você pode contribuir de diversas formas:

- Criando programas (crawlers/scrapers/spiders) para extrair os dados automaticamente ([LEIA ISSO ANTES](#criando-scrapers));
- Criando programas (crawlers/scrapers/spiders) para extrair os dados
automaticamente ([LEIA ISSO ANTES](#criando-scrapers));
- Coletando links para os boletins de seu estado;
- Coletando dados sobre os casos por município por dia;
- Entrando em contato com a secretaria estadual de seu estado, sugerindo as
Expand All @@ -98,7 +104,11 @@ por lá.

### Criando Scrapers

Estamos mudando a forma de subida dos dados para facilitar o trabalho dos voluntários e deixar o processo mais robusto e confiável e, com isso, será mais fácil que robôs possam subir também os dados; dessa forma, os scrapers ajudarão *bastante* no processo. Porém, ao criar um scraper é importante que você siga algumas regras:
Estamos mudando a forma de subida dos dados para facilitar o trabalho
dos voluntários e deixar o processo mais robusto e confiável e, com
isso, será mais fácil que robôs possam subir também os dados; dessa
forma, os scrapers ajudarão *bastante* no processo. Porém, ao criar um
scraper é importante que você siga algumas regras:

- **Necessário** fazer o scraper usando o `scrapy`;
- **Não usar** `pandas`, `BeautifulSoup`, `requests` ou outras bibliotecas
Expand Down Expand Up @@ -131,7 +141,9 @@ Estamos mudando a forma de subida dos dados para facilitar o trabalho dos volunt
município, exceto pela omissão do parâmetro `city`;
- Quando possível, use testes automatizados.

Nesse momento não temos muito tempo disponível para revisão, então **por favor**, só crie um *pull request* com código de um novo scraper caso você possa cumprir os requisitos acima.
Nesse momento não temos muito tempo disponível para revisão, então **por
favor**, só crie um *pull request* com código de um novo scraper caso
você possa cumprir os requisitos acima.

## Instalando

Expand All @@ -145,13 +157,15 @@ Necessita de Python 3 (testado em 3.8.2). Para montar seu ambiente:
- Rode o script de coleta: `./run-spiders.sh`
- Rode o script de consolidação: `./run.sh`
- Rode o script que sobe o serviço de scraping: `./web.sh`
- Os scrapers estarão disponíveis por uma interface web a partir do endereço http://localhost:5000
- Os scrapers estarão disponíveis por uma interface web a partir do
endereço http://localhost:5000

Verifique o resultado em `data/output`.

### Docker

Se você preferir utilizar o Docker para executar, basta usar os comandos a seguir :
Se você preferir utilizar o Docker para executar, basta usar os comandos
a seguir:

```shell
make docker-build # para construir a imagem
Expand Down

0 comments on commit 3f91d63

Please sign in to comment.